?? optimtips.m
字號:
With no error in the data at all, our regression estimates willbe perfect. Lets suppose there is some noise on the data pointsx(i). Look at the highest point in x. The extreme points willhave the highest leverage on your estimate of the slope. If thenoise on this is positive, moving x higher, then this point willhave MORE leverage. It will also tend to bias the slope estimatetowards zero. High values of x with noise which decreases theirvalue will see their leverage decreased. A decrease in thesevalues would tend to bias the slope away from zero. But rememberthat its the points with high leverage that affect the slopethe most. The same effects happen in reverse at the bottom endof our data.The net effect is that errors in x will tend to result in slopeestimates that are biased towards zero. Can we back this up withan experiment?%}%%x = linspace(-1,1,201)';y = 1 + x;coef0 = [ones(size(x)) , x]\y % as you would expect, the estimated parameters are exact (to% within double precision noise.)%%% add some noise to xu = x + randn(size(x))/10;coef1 = [ones(size(u)) , u]\y % as predicted, the second coefficient in this model was less than 1.% (Note that I am willing to make this prediction on random data.)%%% Better would have been to form the model in its inverse form.% the constant term will be different, but the slope should still% be nominally 1.0coef2 = [ones(size(y)) , y]\u%%% We can even try it several more times, just in case you don't% believe me.for i=1:10 u = x + randn(size(x))/10; coef1 = [ones(size(u)) , u]\y; coef2 = [ones(size(y)) , y]\u; disp(['Errors in x: ',num2str(coef1(2)),', ',num2str(coef2(2))])end% Note that on every pass through this loop, the first slope% estimate tends to be uniformly less than 1, whereas the second% was fairly randomly above and below 1.0.%%% The errors in variables problem is also known as Total Least% Squares. Here we wish to minimize the squared deviations of% each point from the regression line. We'll now assume that% both x and y have variability that we need to deal with.x = linspace(-1,1,101)';y = 1 + 2*x;% add in some noise, the variance is the same for each variable.x = x + randn(size(x))/10;y = y + randn(size(x))/10;% if we use the basic \ estimator, then the same errors in x% problem as before rears its ugly head. The slope is biased% towards zero.coef0 = [ones(size(x)) , x]\y%%% The trick is to use principal components. In this case we can% do so with a singular value decomposition.M = [x-mean(x),y-mean(y)];[u,s,v] = svd(M,0);% The model comes from the (right) singular vectors.v1 = v(:,1);disp(['(x - ',num2str(mean(x)),')*',num2str(v1(2)), ... ' - (y - ',num2str(mean(y)),')*',num2str(v1(1)),' = 0']) % Only a little algebra will be needed to convince you that% this model is indeed approximately y = 1+2*x.%% 14. Passing extra information/variables into an optimization%{Many optimizations involve extraneous variables to the objectivefunction. With respect to the optimizer, they are fixed, but theuser may need to change these variables at will. A simple examplecomes from nonlinear root-finding. My example will use erf, afunction for which an explicit inverse already exists. I'll giveseveral solutions to passing in these extra variables. (One I willnot recommend is the use of global variables. While they will workfor this problem, I rarely like to use them when any other solution exists. There are many reasons for disliking globals, my favoriteis the confusion they can sometimes cause in debugging your code.)The problem we will solve is computation of the inverse of y = erf(x)when y is known, and subject to change.%}%%% 1. We can embed the parameters inside an anonymous functiony = 0.5;fun = @(x) erf(x) - y;% solve using fzerostart = 0;x = fzero(fun,start,optimset('disp','iter'))%%% To convince ourselves that fzero was successful, make a direct% call to erfinv.erfinv(y)%%% The value of y has been embedded in fun. If we choose to change y% we must redefine the anonymous function.y = -.5;fun = @(x) erf(x) - y;x = fzero(fun,start,optimset('disp','off'))%%% 2. An alternative is to pass in the value as an argument, passing% it through fzero. The optimizers in the optimization toolbox allow% the user to do so, by appending them after the options argument.% This is an undocumented feature, because the Mathworks would prefer% that we use anonymous functions.% Here fun is defined as a function of two variables, y is no% longer fixed at its value when the function is created.fun = @(x,y) erf(x) - y;y = 0.5;start = 0;x = fzero(fun,start,optimset('disp','off'),y)x = fzero(fun,start,optimset('disp','off'),-0.5)%%% 3. For those individuals who prefer inline functions over anonymous% functions, or who do not use release 14 or above, this solution% looks just like the one above.fun = inline('erf(x) - y','x','y');y = 0.5;start = 0;x = fzero(fun,start,optimset('disp','off'),y)x = fzero(fun,start,optimset('disp','off'),-0.5)%%%{4. Using a nested function. Nested functions can only exist insideother functions (also only in release 14 and above), so I'll definea function that will enclose the nested function. Its naturallycalled testnestfun. This function is already saved as an m-filefor your convenience.function x = testnestfun(y) function res = nestfun(x,yi) res = erf(x) - yi; endx = zeros(size(y));start = 0;for i=1:prod(size(y)) yi = y(i); x(i) = fzero(@nestfun,start,optimset('disp','off'))endend % testnestfun terminator%}% Solve a series of inverse problems.x = [-.9:.1:.9]';disp([x,testnestfun(x)])%% 15. Minimizing the sum of absolute deviations%{Minimizing the sums of squares of errors is appropriate whenthe noise in your model is normally distributed. Its notuncommon to expect a normal error structure. But sometimeswe choose instead to minimize the sum of absolute errors.How do we do this? Its a linear programming trick this time.For each data point, we add a pair of unknowns called slackvariables. Thus y(i) = a + b*x(i) + u(i) - v(i)Here the scalars a and b, and the vectors u and v are all unknowns.We will constrain both u(i) and v(i) to be non-negative. Solvethe linear programming system with equality constraints asabove, and the objective will be to minimize sum(u) + sum(v).The total number of unknowns will be 2+2*n, where n is thenumber of data points in our "regression" problem.%}%%x = sort(rand(100,1));y = 1+2*x + rand(size(x))-.5;closeplot(x,y,'o')title 'Linear data with noise'xlabel 'x'ylabel 'y'%%% formulate the linear programming problem.n = length(x);% our objective sums both u and v, ignores the regression% coefficients themselves.f = [0 0 ones(1,2*n)]';% a and b are unconstrained, u and v vectors must be positive.LB = [-inf -inf , zeros(1,2*n)];% no upper bounds at all.UB = [];% Build the regression problem as EQUALITY constraints, when% the slack variables are included in the problem.Aeq = [ones(n,1), x, eye(n,n), -eye(n,n)];beq = y;% estimation using linprogparams = linprog(f,[],[],Aeq,beq,LB,UB);% we can now drop the slack variablescoef = params(1:2)% and plot the fitplot(x,y,'o',x,coef(1) + coef(2)*x,'-')title 'Linprog solves the sum of absolute deviations problem (1 norm)'xlabel 'x'ylabel 'y'%% 16. Minimize the maximum absolute deviation%{We can take a similar approach to this problem as we did for thesum of absolute deviations, although here we only need a pair ofslack variables to formulate this as a linear programming problem.The slack variables will correspond to the maximally positivedeviation and the maximally negative deviation. (As long as aconstant term is present in the model, only one slack variableis truly needed. I'll develop this for the general case.)Suppose we want to solve the linear "least squares" problem M*coef = yin a mini-max sense. We really don't care what the other errorsdo as long as the maximum absolute error is minimized. So wesimply formulate the linear programming problem (for positivescalars u and v) min (u+v) M*coef - y <= u M*coef - y >= -vIf the coefficient vector (coef) has length p, then there are2+p parameters to estimate in total.%}%%% As usual, lets make up some data.x = sort(rand(100,1));y = pi - 3*x + rand(size(x))-.5;closeplot(x,y,'o')title 'Linear data with noise'xlabel 'x'ylabel 'y'%%% Build the regression matrix for a model y = a+b*x + noisen = length(x);M = [ones(n,1),x];% Our objective here is to minimize u+vf = [0 0 1 1]';% The slack variables have non-negativity constraintsLB = [-inf -inf 0 0];UB = [];% Augment the design matrix to include the slack variables,% the result will be a set of general INEQUALITY constraints.A = [[M,-ones(n,1),zeros(n,1)];[-M,zeros(n,1),-ones(n,1)]];b = [y;-y];% estimation using linprogparams = linprog(f,A,b,[],[],LB,UB);% strip off the slack variablescoef = params(1:2)%%% The maximum positive residualparams(3)%%% And the most negative residualparams(4)%%% plot the resultplot(x,y,'o',x,coef(1) + coef(2)*x,'-')title 'Linprog solves the infinity norm (minimax) problem'xlabel 'x'ylabel 'y'%% 17. Batching small problems into large problems%{Suppose we wanted to solve many simple nonlinear optimizationproblems, all of which were related. To pick one such example,I'll arbitrarily decide to invert a zeroth order Besselfunction at a large set of points. I'll choose to know onlythat the root lies in the interval [0,4].%}%%% Solve for x(i), given that y(i) = bessel(0,x(i)).n = 1000;y = rand(n,1);fun = @(x,y_i) bessel(0,x) - y_i;% first, in a loopticx = zeros(n,1);for i=1:n x(i) = fzero(fun,[0 4],optimset('disp','off'),y(i));endtoc% as a test, compare the min and max residualsyhat = bessel(0,x);disp(['Min & max residuals: ',num2str([min(y-yhat),max(y-yhat)])])% tic and toc reported that this took roughly 9 seconds to run on% my computer.%%% Can we do better? Suppose we considered this as a multivariable% optimization problem, with hundreds of unknowns. We could batch% many small problems into one large one, solving all our problems% simultaneously. With the optimization toolbox, this is possible.% At least it is if we use the LargeScale solver in conjunction% with the JacobPattern option.% I'll use lsqnonlin because I chose to bound my solutions in the% interval [0,4]. Fsolve does not accept bound constraints.% define a batched objective functionbatchfun = @(x,y) bessel(0,x) - yoptions = optimset('lsqnonlin');options.Largescale = 'on';options.TolX = 1.e-13;options.TolFun = 1.e-13;% I'll just put the first p=10 problems in a batchp = 10;start = ones(p,1);LB = repmat(0,p,1);UB = repmat(4,p,1);ticxb = lsqnonlin(batchfun,start,LB,UB,options,y(1:p));toc% This took .1 seconds on my computer, roughly the same amount% of time per problem as did the loop, so no gain was achieved.%%% Why was the call to lsqnonlin so slow? Because I did not tell% lsqnonlin to expect that the Jacobian matrix would be sparse.% How sparse is it? Recall that each problem is really independent
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -