基于Octave/Matlab的二元逻辑回归（logistic regression）算法

本博文基于吴恩达老师的机器学习网课，是对作业代码进行简化和补充完整后的实现。

逻辑回归算法的基本思想

sigmoid函数

在分类问题中，我们需要根据变量x来确定变量y的种类（离散的值），这种分类问题的算法称之为逻辑回归算法。在最简单的二元逻辑分类问题中，我们将y分为两类，并分别赋值0和1来区分。0表示负向类，1表示正向类。
我们需要计算出一个函数h（x），用来预测y的值，这里我们用到sigmoid函数：

上图中g（z）即为sigmoid函数，当z>0时g（z）>0.5；当z<0时，g（z）<0.5 。而z = theta‘*X ，所以当 theta’*X大于等于0时，预测y为1；反之为零。
和之前提到过的线性回归相类似，这里的theta仍然是一个系数的列矩阵，而X则稍有不同，是一个可以存在多项式的X矩阵。

代价函数

在这个问题中，与线性回归不同的是我们需要重新定义代价函数来更好的贴合这个问题的本质特性。

这样可以发现，当预测值与实际值不符的情况下，代价函数将会反馈趋于无穷大的值；而如果预测值与实际值相符，则代价函数返回零值。
在这里我们仍然可以使用梯度下降法来求代价函数的最小值所对应的theta，算法的基础思想如下：

由于h（x）的定义和之前的线性回归的定义不同，所以虽然形式与之前的线性回归相类似，但实际上是有所不同的。

fminunc函数

除梯度下降算法之外，还有一些常被用来令代价函数最小的算法，fminunc是Octave和matlab中都带的一个最小值优化函数，使用时我们需要提供代价函数和每个参数的求导。
在octave中使用fminunc函数的示例如下：

二元线性逻辑回归的算法实现

主函数：

clear; close all; clc
data = load('ex2data1.txt');
X = data(:,[1,2]); y = data(:,3); %X代表数据的两个变量，y是是否被录取%画实例图
fprintf(['ploting...']);
plotData(X,y); %调用plotData函数来绘制不同录取结果的实例图
hold on ;
xlabel('exam1_score');
ylabel('exam2_score');
legend('admitted','unadmitted');
hold off;%计算代价和梯度
[m , n] = size(X);
X = [ones(m,1) , X]; %在X的矩阵左侧加一行代表X0的1
initial_theta = zeros(n + 1 , 1); %初始化theta值为0 ，且因为X矩阵加了一列， 所以theta矩阵也要增加一行
[cost, grad] = costFunction(initial_theta , X , y); %计算初始theta值情况下的代价函数和梯度值
fprintf('the cost by initial_theta is: %f\n',cost);
fprintf('the grad by initial_theta is: %f\n',grad);%用fminunc函数来优化theta值
options = optimset('Gradobj','on','MaxIter',400);
[theta, cost] = ...fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);%用fminunc函数来计算最优的theta值和相应的cost
fprintf('theta found by fminunc: %f\n',theta);
fprintf('cost at the theta found by fminunc: %f\n',cost);%画出分界线
plotDecisionBoundary(theta,X,y); %调用plotDecisionBoundary函数来画出分界线
hold on;
xlabel('exam1_score');
ylabel('exam2_score');
legend('Admitted', 'Not admitted')
hold off;%预测具体情况和计算准确度
prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...'probability of %f\n\n'], prob);
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);

PlotData函数：（用于绘制实例数据）

function plotData(X, y)
pos = find(y==1);
neg = find(y == 0);plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, 'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', 'MarkerSize', 7);
hold off;
end

CostFunction函数：（计算代价函数）

function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examplesJ = 0;
grad = zeros(size(theta));
J = (1 / m) * sum( -y'*log(sigmoid(X*theta)) - (1-y)'*log( 1 - sigmoid(X*theta)) );
grad = (1 / m) * sum( X .* repmat((sigmoid(X*theta) - y), 1, size(X,2)) );end

PlotDecisionBoundary函数：（绘制回归边界）

function plotDecisionBoundary(theta, X, y)
plotData(X(:,2:3), y);
hold onif size(X, 2) <= 3% Only need 2 points to define a line, so choose two endpointsplot_x = [min(X(:,2))-2,  max(X(:,2))+2];% Calculate the decision boundary lineplot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));% Plot, and adjust axes for better viewingplot(plot_x, plot_y)% Legend, specific for the exerciselegend('Admitted', 'Not admitted', 'Decision Boundary')axis([30, 100, 30, 100])
else% Here is the grid rangeu = linspace(-1, 1.5, 50);v = linspace(-1, 1.5, 50);z = zeros(length(u), length(v));% Evaluate z = theta*x over the gridfor i = 1:length(u)for j = 1:length(v)z(i,j) = mapFeature(u(i), v(j))*theta;endendz = z'; % important to transpose z before calling contour% Plot z = 0% Notice you need to specify the range [0, 0]contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold offend

Predict函数：（预测未知x所对应的y）

function p = predict(theta, X)
p = sigmoid(X * theta)>=0.5 ;end

运行结果

上图是运行plotData函数的结果，可以发现实例中有两类结果，分别是被录取和没有被录取的。

上图是运行PlotDecisionBoundary函数的结果，可见在这里两种实例被一条直线分开，基本实现了分类。

带正则化的二元回归

基本思想

由于实际问题中大部分的分类问题不能简单的用线性来分类，所以在回归曲线中引入高次项来更好的贴合分界曲线是有必要的。但高次项的引入会导致过拟合（overfitting）的问题，即得到的曲线过于贴合已有实例而不能很好的适应添加进判定系统的新实例。

过拟合的问题一般是由于高次项的参数偏大，导致判定曲线过于曲折。但如果去除高次项会导致欠拟合，如上图所示，左侧为欠拟合（underfitting）,右侧为过拟合（overfitting）。所以这里采用折中的办法——正则化。即保留所有的特征，但是减少参数的大小。
正则化的具体操作是在原有的代价函数上做一定的修改，正则化的代价函数为：

这样就将参数加入了代价函数，这样做的结果是随着代价函数的更新，高次项的参数也会相应的减小以使代价函数减小。
这样，减小代价函数的算法就变成了：

在接下来的实现过程中，实际的求取最优theta值的过程仍然是使用fminunc函数。

算法实现

主函数

clear ; close all; clc%% Load Data
data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);plotData(X, y);% Put some labels
hold on;% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')% Specified in plot order
legend('y = 1', 'y = 0')
hold off;%% =========== Part 1: Regularized Logistic Regression ============
X = mapFeature(X(:,1), X(:,2));% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);% Set regularization parameter lambda to 1
lambda = 1;% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);fprintf('Cost at initial theta (zeros): %f\n', cost);%% ============= Part 2: Regularization and Accuracies =============% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);% Optimize
[theta, J, exit_flag] = ...fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')legend('y = 1', 'y = 0', 'Decision boundary')
hold off;% Compute accuracy on our training set
p = predict(theta, X);fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);

正则化代价函数

function [J, grad] = costFunctionReg(theta, X, y, lambda)
m = length(y); % number of training examplesJ = 0;
grad = zeros(size(theta));J = ( (1 / m) * sum(-y'*log(sigmoid(X*theta)) - (1-y)'*log( 1 - sigmoid(X*theta))) ) + (lambda/(2*m))*sum(theta(2:length(theta)).*theta(2:length(theta))) ;grad = (1 / m) * sum( X .* repmat((sigmoid(X*theta) - y), 1, size(X,2)) );grad(:,2:length(grad)) = grad(:,2:length(grad)) + (lambda/m)*theta(2:length(theta))';
end

运行结果

这是本算法用到的实例的绘制

采用正则化，可以拟合出较为准确的判定边界。
这里应注意，算法要求设置不同的lambda值，这里仅呈现效果最好的lambda=1的情况，其他的情况拟合效果欠佳。
至此就是二元逻辑回归算法的基本内容。