吴恩达机器学习Ex2

本文由两部分组成

逻辑回归 Logistic Regression
问题背景：预测学生是否可以入学
正则化逻辑回归 regularized logistic regression
问题背景：预测芯片是否可以通过质量检测。

Logistic Regression

问题背景：预测学生是否可以入学
数据集：历届申请者的历史数据：两次考试的成绩，以及是否入学的结果（0表示未入学，1表示入学）
目标：建立一个分类模型，基于申请者的两次考试成绩来评估申请者可以上大学的概率

两部分的成绩

初始化和加载数据
数据集:ex2data1.txt 位于文章最后

%% Initialization
clear ; close all; clc%% Load Data
%  The first two columns contains the exam scores and the third column
%  contains the label.data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);

Part 1:Plotting

%% ==================== Part 1: Plotting ====================
%  We start the exercise by first plotting the data to understand the
%  the problem we are working with.fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...'indicating (y = 0) examples.\n']);plotData(X, y);%自己编写该函数% Put some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;fprintf('\nProgram paused. Press enter to continue.\n');
pause;

需要完成如下效果:两个坐标轴分别是两次得分，绘制出来的点的颜色代表是否录取。黑色代表录取，黄色代表未录取

plotData.m

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.% Create New Figure
figure; hold on;% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
%               2D plot, using the option 'k+' for the positive
%               examples and 'ko' for the negative examples.
%pos=find(y==1);
neg=find(y==0);
plot(X(pos,1),X(pos,2),'k+','LineWidth',2,...'MarkerSize',7);
plot(X(neg,1),X(neg,2),'ko','MarkerFaceColor','y',...'MarkerSize',7);% =========================================================================hold off;end

绘制结果

资料查询
find函数
功能：找到非零元素的索引（也就是它在array中的位置）

find - Find indices and values of nonzero elementsThis MATLAB function returns a vector containing the linear indices of eachnonzero element in array X.k = find(X)k = find(X,n)k = find(X,n,direction)[row,col] = find(___)[row,col,v] = find(___)

本例中使用的是 pos=find(y==1)这里y是一个列向量，find(y== 1)得到的就是 y==1时的位置，通过pos找到矩阵X中对应的一行，进行绘制。

绘制的代码

‘MarkerFaceColor’ - 标记填充颜色
本例中'MarkerFaceColor','y'使用的是黄色
图片来源：matlab 帮助中心
如果没有加入'MarkerFaceColor','y'得到的是黑白图形

Part2 :Compute Cost and Gradient

%% ============ Part 2: Compute Cost and Gradient ============
%  In this part of the exercise, you will implement the cost and gradient
%  for logistic regression. You neeed to complete the code in
%  costFunction.m%  Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);% Add intercept term to x and X_test
X = [ones(m, 1) X];% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros): \n');
fprintf(' %f \n', grad);
fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');% Compute and display cost and gradient with non-zero theta
test_theta = [-24; 0.2; 0.2];
[cost, grad] = costFunction(test_theta, X, y);fprintf('\nCost at test theta: %f\n', cost);
fprintf('Expected cost (approx): 0.218\n');
fprintf('Gradient at test theta: \n');
fprintf(' %f \n', grad);
fprintf('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n');fprintf('\nProgram paused. Press enter to continue.\n');
pause;

sigmoid.m函数
预测函数
hθ(x)=g(θTx)h_\theta(x)=g(\theta^Tx)hθ(x)=g(θTx)
其中 g为sigmoid function
g(z)=11+e−zg(z)=\frac{1}{1+e^{-z}}g(z)=1+e−z1

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.% You need to return the following variables correctly
g = zeros(size(z));% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).g=1./(1+exp(-z));%这里需要点除，否则报错% =============================================================end

函数的性质

sigmoid(100000)
ans = 1
sigmoid(0)
ans = 0.5000
sigmoid(-100000)
ans =0

代价函数
J(θ)=−1m∑i=1m[y(i)×log(hθ(x(i)))+(1−y(i))×log(1−hθ(x(i)))]J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}\left [y^{(i)}\times log\left(h_\theta(x^{(i)})\right)+(1-y^{(i)}) \times log\left(1-h_\theta(x^{(i)})\right)\right ]J(θ)=−m1i=1∑m[y(i)×log(hθ(x(i)))+(1−y(i))×log(1−hθ(x(i)))]

代价函数的梯度是一个向量，第jthj^{th}jth个元素被定义为
∂J(θ)∂θj=1m∑i=1m(hθ(x(i))−y(i))xj(i)\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)

costFunction.m

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.% Initialize some useful values
m = length(y); % number of training examples% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
h=sigmoid(X*theta);
first=y.*log(h);%第一项，点乘
second=(1-y).*log(1-h);%第二项，同样是点乘
J=-1/m*sum(first+second);%求和，代价函数grad=1/m*X'*(h-y);% =============================================================end

Part 3: Optimizing using fminunc

下面是调用内置函数 fminunc

首先是定义fminunc的选项设置GradObj为on，是告诉函数返回值有两个：cost 和 gradient
设置MaxIter为400，是告诉函数最多迭代400次就要结束运行。

options = optimset('GradObj', 'on', 'MaxIter', 400);

@(t)(costFunction(t, X, y))创造一个函数，参数为t的函数，调用costFunction

[theta, cost] = ...fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

%% ============= Part 3: Optimizing using fminunc  =============
%  In this exercise, you will use a built-in function (fminunc) to find the
%  optimal parameters theta.%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[theta, cost] = ...fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
fprintf('Expected cost (approx): 0.203\n');
fprintf('theta: \n');
fprintf(' %f \n', theta);
fprintf('Expected theta (approx):\n');
fprintf(' -25.161\n 0.206\n 0.201\n');% Plot Boundary
plotDecisionBoundary(theta, X, y);% Put some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;fprintf('\nProgram paused. Press enter to continue.\n');
pause;

函数fminunc不需要自己手动写循环，不需要手动为梯度下降法设置学习率，只需要提供计算代价和梯度的函数costFunction.它会收敛到正确的最优参数，并且返回cost和θ\thetaθ

Cost at theta found by fminunc: 0.203498
Expected cost (approx): 0.203
theta: -25.161343 0.206232 0.201472
Expected theta (approx):-25.1610.2060.201

后面是绘制决策边界

对于函数
hθ(x)=g(θ1+θ2x2+θ3x3)h_\theta(x)=g(\theta_1+\theta_2x_2+\theta_3x_3)hθ(x)=g(θ1+θ2x2+θ3x3)
由于sigmoid函数分界点为z=0，z>0时,输出为1；z<0时，输出为0；

所以分界点时，有：这里的变量是(x2,x3x_2,x_3x2,x3)
θ1+θ2x2+θ3x3=0\theta_1+\theta_2x_2+\theta_3x_3=0θ1+θ2x2+θ3x3=0
这里下标使用1，2，3的原因是：matlab中初始下标是从1开始，而不是0，便于计算

其中绘图函数代码如下
plotDecisionBoundary.m函数

function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the
%   positive examples and o for the negative examples. X is assumed to be
%   a either
%   1) Mx3 matrix, where the first column is an all-ones column for the
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones% Plot Data
plotData(X(:,2:3), y);%调用前面的绘图函数
hold onif size(X, 2) <= 3% Only need 2 points to define a line, so choose two endpointsplot_x = [min(X(:,2))-2,  max(X(:,2))+2];% Calculate the decision boundary lineplot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));% Plot, and adjust axes for better viewingplot(plot_x, plot_y)% Legend, specific for the exerciselegend('Admitted', 'Not admitted', 'Decision Boundary')axis([30, 100, 30, 100])
else% Here is the grid rangeu = linspace(-1, 1.5, 50);v = linspace(-1, 1.5, 50);z = zeros(length(u), length(v));% Evaluate z = theta*x over the gridfor i = 1:length(u)for j = 1:length(v)z(i,j) = mapFeature(u(i), v(j))*theta;endendz = z'; % important to transpose z before calling contour% Plot z = 0% Notice you need to specify the range [0, 0]contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold offend

绘制直线需要知道点对(x,y)的值，
举例子

plot_x=[0,10];
plot_y=[3,13];
plot(plot_x,plot_y);

绘制的图形为：

对于本题，对应点对是什么呢？
这里的变量是(x2,x3x_2,x_3x2,x3),其中x2=x_2=x2=plot_x, 而x3=x_3=x3=plot_y可由θ1+θ2x2+θ3x3=0\theta_1+\theta_2x_2+\theta_3x_3=0θ1+θ2x2+θ3x3=0解出来。

 % Only need 2 points to define a line, so choose two endpointsplot_x = [min(X(:,2))-2,  max(X(:,2))+2];% Calculate the decision boundary line，这实际上是x3plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));

Part 4: Predict and Accuracies

%% ============== Part 4: Predict and Accuracies ==============
%  After learning the parameters, you'll like to use it to predict the outcomes
%  on unseen data. In this part, you will use the logistic regression model
%  to predict the probability that a student with score 45 on exam 1 and
%  score 85 on exam 2 will be admitted.
%
%  Furthermore, you will compute the training and test set accuracies of
%  our model.
%
%  Your task is to complete the code in predict.m%  Predict probability for a student with score 45 on exam 1
%  and score 85 on exam 2 prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...'probability of %f\n'], prob);
fprintf('Expected value: 0.775 +/- 0.002\n\n');% Compute accuracy on our training set
p = predict(theta, X);fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (approx): 89.0\n');fprintf('\n');

下面是predict.m函数的内容

返回值p是预测值，结果只能是0或者1，0代表不能上大学，1代表能够上大学。
X的维度是m*(n+1)，m表示训练样本的数量。theta长度是(n+1)*1

index=find(sigmoid(X*theta)>=0.5);%找到＞=0.5的
p(index)=1;

下面是简化写法，也是matlab推荐的写法，速度更快

%或者写成
index=sigmoid(X*theta)>=0.5;
p(index)=1;

代码

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)m = size(X, 1); % Number of training examples% You need to return the following variables correctly
p = zeros(m, 1);% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters.
%               You should set p to a vector of 0's and 1's
%index=sigmoid(X*theta)>=0.5;
p(index)=1;% =========================================================================end

然后是计算准确率
这里准确率的计算是通过比较预测值p和真实值y差异的平均值。若p==y，则结果=1，若p≠y,则结果=0，这样所有的结果累加，然后取平均值，计算出来的就是准确率。比如100个数据中，有90个是预测值等于真实值，然后平均值就是（901+100）/100=0.9。

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);

结果

For a student with scores 45 and 85, we predict an admission probability of 0.776291
Expected value: 0.775 +/- 0.002Train Accuracy: 89.000000
Expected accuracy (approx): 89.0

数据集
ex2data1.txt

34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0
38.78580379679423,64.99568095539578,0
61.379289447425,72.80788731317097,1
85.40451939411645,57.05198397627122,1
52.10797973193984,63.12762376881715,0
52.04540476831827,69.43286012045222,1
40.23689373545111,71.16774802184875,0
54.63510555424817,52.21388588061123,0
33.91550010906887,98.86943574220611,0
64.17698887494485,80.90806058670817,1
74.78925295941542,41.57341522824434,0
34.1836400264419,75.2377203360134,0
83.90239366249155,56.30804621605327,1
51.54772026906181,46.85629026349976,0
94.44336776917852,65.56892160559052,1
82.36875375713919,40.61825515970618,0
51.04775177128865,45.82270145776001,0
62.22267576120188,52.06099194836679,0
77.19303492601364,70.45820000180959,1
97.77159928000232,86.7278223300282,1
62.07306379667647,96.76882412413983,1
91.56497449807442,88.69629254546599,1
79.94481794066932,74.16311935043758,1
99.2725269292572,60.99903099844988,1
90.54671411399852,43.39060180650027,1
34.52451385320009,60.39634245837173,0
50.2864961189907,49.80453881323059,0
49.58667721632031,59.80895099453265,0
97.64563396007767,68.86157272420604,1
32.57720016809309,95.59854761387875,0
74.24869136721598,69.82457122657193,1
71.79646205863379,78.45356224515052,1
75.3956114656803,85.75993667331619,1
35.28611281526193,47.02051394723416,0
56.25381749711624,39.26147251058019,0
30.05882244669796,49.59297386723685,0
44.66826172480893,66.45008614558913,0
66.56089447242954,41.09209807936973,0
40.45755098375164,97.53518548909936,1
49.07256321908844,51.88321182073966,0
80.27957401466998,92.11606081344084,1
66.74671856944039,60.99139402740988,1
32.72283304060323,43.30717306430063,0
64.0393204150601,78.03168802018232,1
72.34649422579923,96.22759296761404,1
60.45788573918959,73.09499809758037,1
58.84095621726802,75.85844831279042,1
99.82785779692128,72.36925193383885,1
47.26426910848174,88.47586499559782,1
50.45815980285988,75.80985952982456,1
60.45555629271532,42.50840943572217,0
82.22666157785568,42.71987853716458,0
88.9138964166533,69.80378889835472,1
94.83450672430196,45.69430680250754,1
67.31925746917527,66.58935317747915,1
57.23870631569862,59.51428198012956,1
80.36675600171273,90.96014789746954,1
68.46852178591112,85.59430710452014,1
42.0754545384731,78.84478600148043,0
75.47770200533905,90.42453899753964,1
78.63542434898018,96.64742716885644,1
52.34800398794107,60.76950525602592,0
94.09433112516793,77.15910509073893,1
90.44855097096364,87.50879176484702,1
55.48216114069585,35.57070347228866,0
74.49269241843041,84.84513684930135,1
89.84580670720979,45.35828361091658,1
83.48916274498238,48.38028579728175,1
42.2617008099817,87.10385094025457,1
99.31500880510394,68.77540947206617,1
55.34001756003703,64.9319380069486,1
74.77589300092767,89.52981289513276,1

regularized logistic regression正则化逻辑回归

问题背景：预测芯片是否可以通过质量检测。
数据集：过去芯片的两项检测结果的历史数据

数据可视化

%% Initialization
clear ; close all; clc%% Load Data
%  The first two columns contains the X values and the third column
%  contains the label (y).data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);plotData(X, y);% Put some labels
hold on;% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')% Specified in plot order
legend('y = 1', 'y = 0')
hold off;

结果：芯片经历两次检测的结果，调用的仍然是上文的 plotData函数

我们可以看到，该题不是一个线性的决策边界，不能用线性函数来做。需要考虑多项式来做。

Part 1: Regularized Logistic Regression

从每一个特征点中提取出来更多的特征，我们使用x1x_1x1和x2x_2x2的最高六次多项式来进行映射。

从上面可以看到，我们把两个特征（x1x_1x1和x2x_2x2）转换成了一个（28×128\times 128×1）的向量
经过这个高维的向量，训练出来的逻辑回归分类器，它的决策边界更加复杂，而且非线性。
对应的多项式特征转变使用如下函数
mapFeature.m函数

function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degreefor j = 0:iout(:, end+1) = (X1.^(i-j)).*(X2.^j);end
endend

下面计算正则化逻辑回归的代价函数和梯度
代价函数为
需要注意的是，不需要正则化θ0\theta_0θ0，即不需要正则化theta(1)，因为matlab下标是从1开始的。
对应的梯度函数

函数为
costFunctionReg.m文件
核心部分就是上面两个公式的matlab实现

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. % Initialize some useful values
m = length(y); % number of training examples% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in thetafirst=y.*log(sigmoid(X*theta));
second=(1-y).*log(1-sigmoid(X*theta));
theta_1=[0;theta(2:end)];%把theta（1）拿掉，不参与正则化J=-sum((first+second))/m+lambda/(2*m)*(theta_1'*theta_1);%注意这里不是点乘，点乘得到的是向量，这里需要的是一个数：向量转置*向量=数
%calculate the gradient
%不参与正则化
grad=X'*(sigmoid(X*theta)-y)/m+lambda/m*theta_1;% =============================================================end

下面是调用函数，并且进行正则化逻辑回归

%% =========== Part 1: Regularized Logistic Regression ============
%  In this part, you are given a dataset with data points that are not
%  linearly separable. However, you would still like to use logistic
%  regression to classify the data points.
%
%  To do so, you introduce more features to use -- in particular, you add
%  polynomial features to our data matrix (similar to polynomial
%  regression).
%% Add Polynomial Features% Note that mapFeature also adds a column of ones for us, so the intercept
% term is handled
X = mapFeature(X(:,1), X(:,2));% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);%size(X, 2)%返回X第二维度的长度，也就是每一行的元素个数% Set regularization parameter lambda to 1
lambda = 1;% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');fprintf('\nProgram paused. Press enter to continue.\n');
pause;% Compute and display cost and gradient
% with all-ones theta and lambda = 10
test_theta = ones(size(X,2),1);
[cost, grad] = costFunctionReg(test_theta, X, y, 10);fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
fprintf('Expected cost (approx): 3.16\n');
fprintf('Gradient at test theta - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');fprintf('\nProgram paused. Press enter to continue.\n');
pause;

测试结果

Cost at initial theta (zeros): 0.693147
Expected cost (approx): 0.693
Gradient at initial theta (zeros) - first five values only:0.008475 0.018788 0.000078 0.050345 0.011501
Expected gradients (approx) - first five values only:0.00850.01880.00010.05030.0115Program paused. Press enter to continue.Cost at test theta (with lambda = 10): 3.164509
Expected cost (approx): 3.16
Gradient at test theta - first five values only:0.346045 0.161352 0.194796 0.226863 0.092186
Expected gradients (approx) - first five values only:0.34600.16140.19480.22690.0922Program paused. Press enter to continue.

Part 2: Regularization and Accuracies

正则化和准确性：尝试不同的λ\lambdaλ取值，看决策边界如何变化和训练集的准确性如何变化。

%% ============= Part 2: Regularization and Accuracies =============
%  Optional Exercise:
%  In this part, you will get to try different values of lambda and
%  see how regularization affects the decision coundart
%
%  Try the following values of lambda (0, 1, 10, 100).
%
%  How does the decision boundary change when you vary lambda? How does
%  the training set accuracy vary?
%% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);% Optimize
[theta, J, exit_flag] = ...fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')legend('y = 1', 'y = 0', 'Decision boundary')
hold off;% Compute accuracy on our training set
p = predict(theta, X);fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

结果

对应的准确率

Train Accuracy: 88.983051
Expected accuracy (with lambda = 1): 83.1 (approx)

对应的准确率

Train Accuracy: 61.016949
Expected accuracy (with lambda = 1): 83.1 (approx)

其中
plotDecisionBoundary.m函数如下，绘制决策边界的函数代码

function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the
%   positive examples and o for the negative examples. X is assumed to be
%   a either
%   1) Mx3 matrix, where the first column is an all-ones column for the
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones% Plot Data
plotData(X(:,2:3), y);
hold onif size(X, 2) <= 3% Only need 2 points to define a line, so choose two endpointsplot_x = [min(X(:,2))-2,  max(X(:,2))+2];% Calculate the decision boundary lineplot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));% Plot, and adjust axes for better viewingplot(plot_x, plot_y)% Legend, specific for the exerciselegend('Admitted', 'Not admitted', 'Decision Boundary')axis([30, 100, 30, 100])
else% Here is the grid rangeu = linspace(-1, 1.5, 50);v = linspace(-1, 1.5, 50);z = zeros(length(u), length(v));% Evaluate z = theta*x over the gridfor i = 1:length(u)for j = 1:length(v)z(i,j) = mapFeature(u(i), v(j))*theta;endendz = z'; % important to transpose z before calling contour% Plot z = 0% Notice you need to specify the range [0, 0]contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold offend

数据集
ex2data2.txt文件中内容如下


0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0
0.29896,0.61915,0
0.50634,0.75804,0
0.61578,0.7288,0
0.60426,0.59722,0
0.76555,0.50219,0
0.92684,0.3633,0
0.82316,0.27558,0
0.96141,0.085526,0
0.93836,0.012427,0
0.86348,-0.082602,0
0.89804,-0.20687,0
0.85196,-0.36769,0
0.82892,-0.5212,0
0.79435,-0.55775,0
0.59274,-0.7405,0
0.51786,-0.5943,0
0.46601,-0.41886,0
0.35081,-0.57968,0
0.28744,-0.76974,0
0.085829,-0.75512,0
0.14919,-0.57968,0
-0.13306,-0.4481,0
-0.40956,-0.41155,0
-0.39228,-0.25804,0
-0.74366,-0.25804,0
-0.69758,0.041667,0
-0.75518,0.2902,0
-0.69758,0.68494,0
-0.4038,0.70687,0
-0.38076,0.91886,0
-0.50749,0.90424,0
-0.54781,0.70687,0
0.10311,0.77997,0
0.057028,0.91886,0
-0.10426,0.99196,0
-0.081221,1.1089,0
0.28744,1.087,0
0.39689,0.82383,0
0.63882,0.88962,0
0.82316,0.66301,0
0.67339,0.64108,0
1.0709,0.10015,0
-0.046659,-0.57968,0
-0.23675,-0.63816,0
-0.15035,-0.36769,0
-0.49021,-0.3019,0
-0.46717,-0.13377,0
-0.28859,-0.060673,0
-0.61118,-0.067982,0
-0.66302,-0.21418,0
-0.59965,-0.41886,0
-0.72638,-0.082602,0
-0.83007,0.31213,0
-0.72062,0.53874,0
-0.59389,0.49488,0
-0.48445,0.99927,0
-0.0063364,0.99927,0
0.63265,-0.030612,0