机器学习-Andrew Ng课程笔记

本文是参考Andrew Ng网上课程和别人总结的机器学习笔记，作为笔者学习的参考，笔者主要实现了一些程序问题。

持续更新中…

1 两个参数梯度下降法

现在有一组数据，plot之后如下图

x = np.array([1,3,4,7,9,10,11,18])
y = np.array([5,12,12,20,32,30,33,60])

现在问：当x等于8的时候，y的取值是多少？

首先看这个图，具有线性，于是我们假设先用一条直线拟合（即一次函数）
采用梯度下降法
我们的hypothesis为（两个参数θ0和θ1\theta_0和\theta_1θ0和θ1）

loss function或说cost function或说目标函数（objective function）为
J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2J(θ0,θ1)=2m1i=1∑m(hθ(x(i))−y(i))2
有两个参数，使用梯度下降法参数更新为

用python写code

import numpy as np
import matplotlib.pyplot as pltx = np.array([1,3,4,7,9,10,11,18])
y = np.array([5,12,12,20,32,30,33,60])def Gradient_descent(x):alpha = 0.001; m = len(x)theta0,theta1 = np.random.randint(1,6,2) #initialize two parameters with int between [1,6)while(True):h = theta0 + theta1*xJ = 1/(2*m)*np.sum(((h-y)**2))theta0 = theta0 - alpha*1/m*np.sum((h-y))theta1 = theta1 - alpha*1/m*np.sum(((h-y)*x))print(J,theta0,theta1)if J <=5:return theta0, theta1breakif __name__ == "__main__":theta0,theta1 = Gradient_descent(x)#plot datax1 = np.linspace(np.min(x),np.max(x)+1)y1 = theta0 + theta1*x1plt.scatter(x,y,marker='x')plt.plot(x1,y1)plt.show()

结果如下

43.375 3.00675 2.083625
36.64715182128906 3.012834703125 2.159869203125
31.055311490596658 3.0183128984472654 2.2293845909140626
26.407644730247434 3.02323818189537 2.2927655520599455
22.544711758531772 3.0276594149910023 2.3505539698782667
19.333996925479305 3.03162114306322 2.4032438603746296
16.665362113113904 3.0351639765197063 2.45128560060768
14.447267906771756 3.038324938438401 2.4950897835393393
12.603632886217634 3.0411377814545903 2.5350307323665024
11.071223279127338 3.0436332766557497 2.571449704413933
9.797483416187205 3.045839476956834 2.604657812010998
8.738732550924968 3.0477819572102907 2.634938685352499
7.858666177326327 3.0494840331059296 2.6625509001354555
7.127110425582591 3.050966960734257 2.687730190750377
6.5189867994023265 3.0522501185213633 2.710691467970093
6.013451735119135 3.0533511730925773 2.731630658405858
5.593181460948169 3.0542862304845384 2.7507263814749403
5.243777619977898 3.0550699739999385 2.768141478233133
4.9532732638769845 3.055715789884853 2.784024405157705

第一列是平方误差，可以看到误差在逐渐减小。第二第三列分别是θ0和θ1\theta_0和\theta_1θ0和θ1
注意：code中的alpha是learning rate，这里我取0.001；此处不能取很大，比如取1则结果会越来越差，平方误差会越来越大。

另外，这里设置了J<5时输出，也可以设置<0.01输出，但时间久一点，或者通常的，可以设置迭代的次数，比如对平方误差损失函数迭代计算100次就输出。

1 对结果分析

画出两个参数和损失函数的关系图如下

我们通过上面梯度下降法找到一条路径使得J越来越小，即找到最小值点。
在上面程序的基础上，如果我们画出theta0,theta1和J的函数的三维图，则如下：

可以看到找了一条路径来找到最小值
代码：

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3Dx = np.array([1,3,4,7,9,10,11,18])
y = np.array([5,12,12,20,32,30,33,60])x_scale = []
y_scale = []
z_scale = []
def Gradient_descent(x):alpha = 0.001; m = len(x)theta0,theta1 = np.random.randint(1,6,2) #initialize two parameters with int between [1,6)while(True):h = theta0 + theta1*xJ = 1/(2*m)*np.sum(((h-y)**2))theta0 = theta0 - alpha*1/m*np.sum((h-y))theta1 = theta1 - alpha*1/m*np.sum(((h-y)*x))x_scale.append(theta0)y_scale.append(theta1)z_scale.append(J)
#        print(J,theta0,theta1)if J <=5:return theta0, theta1breakif __name__ == "__main__":theta0,theta1 = Gradient_descent(x)#plot data
#    x1 = np.linspace(np.min(x),np.max(x)+1)
#    y1 = theta0 + theta1*x1
#    plt.scatter(x,y,marker='x')
#    plt.plot(x1,y1)
#    plt.show()fig = plt.figure()ax = fig.add_subplot(111,projection='3d')ax.scatter(x_scale,y_scale,z_scale)plt.show()

2 三个参数梯度下降法

现在将参数调到三个，即
hθ(x)=θ0+θ1x1+θ2x2h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2hθ(x)=θ0+θ1x1+θ2x2
比如我先假定结果是
y=2+3x1+7x2y=2+3x_1+7x_2y=2+3x1+7x2
先随机给出x1和x2的几个值x_1和x_2的几个值x1和x2的几个值，求出y作为真实结果

import matplotlib.pyplot as plt
import numpy as npx1 = [0,1,2.2,3,3.4,5,8,10,13,15]
x2 = [2,3,4,5,2,3,4,5,18,23]y = 2+3*np.asfarray(x1) + 7*np.asfarray(x2)fig = plt.figure()
ax = fig.gca(projection = '3d')ax.plot(x1,x2,y)
plt.show()

y对应的值为

图如下

因此再总结一下数据是：

x1 = [0,1,2.2,3,3.4,5,8,10,13,15]
x2 = [2,3,4,5,2,3,4,5,18,23]
y = [ 16. ,  26. ,  36.6,  46. ,  26.2,  38. ,  54. ,  67. , 167. ,208. ]

现在假设参数θ0,θ1,θ2\theta_0,\theta_1,\theta_2θ0,θ1,θ2都是未知的，要通过上面的数据使用梯度下降法得到这三个参数。
使用同上面两个参数一样的更新方法（这里x0=1x_0=1x0=1）

我J的结果小于1的时候输出值，即

结果的最后几行为：第一列是J，第二列到第四列分别是θ0,θ1,θ2第一列是J，第二列到第四列分别是\theta_0,\theta_1,\theta_2第一列是J，第二列到第四列分别是θ0,θ1,θ2

即结果近似是
h=4+3x1+7x2h=4+3x_1+7x_2h=4+3x1+7x2
增加数据量并将J小于0.01才输出，则结果如下
θ0=2.077,θ1=2.997,θ2=6.997\theta_0=2.077,\theta_1=2.997,\theta_2=6.997 θ0=2.077,θ1=2.997,θ2=6.997
已经很接近真实函数
对于N个参数的拟合与上面一样

使用正规方程Normal Equation求解

对于线性回归问题，其实要得到最小值就是求导等于0即可得到参数
推导过程要用到矩阵的导数，在此不说了，可以参考课程的文本。参数求导=0可以推导得到，参数向量为

代码为

import numpy as np
x1 = [0,1,2.2,3,3.4,5,8,10,13,15]
x2 = [2,3,4,5,2,3,4,5,18,23]
x0 = [1]*len(x1)
y = [ 16. ,  26. ,  36.6,  46. ,  26.2,  38. ,  54. ,  67. , 167. ,208. ]
y = np.asfarray(y).TX = np.mat((x0,x1,x2)).T
theta_left = np.dot(np.dot((np.dot(X.T,X)).I,X.T),y)

最后给出直接调用sklearn包的方法。

通过调用包可以很快的完成，因为所有的方法已经被优秀的人写入包中，代码和解释如下：

# import packages that may be used
import numpy as np
from sklearn import linear_model#make data linear regression from datasets
x1 = [0,1,2.2,3,3.4,5,8,10,13,15]
x2 = [2,3,4,5,2,3,4,5,18,23]X = np.array([[0,2],[1,3],[2.2,4],[3,5],[3.4,2],[5,3],[6,4],[10,5],[13,18],[15,23]])# y = 2+3*x1+7*x2
Y = 2+np.dot(X,np.array([3,2]))#create linear regression object
reg = linear_model.LinearRegression()#Train the model using the training sets
reg.fit(X,Y)#Make predictions using the testing set
test_pred = reg.predict(np.array([[4,2]]))# The coefficients
print('coefficients and intercept >> ',(reg.coef_, reg.intercept_))

从输出的结果知道
θ0=1.999999,θ1=3,θ2=7\theta_0=1.999999, \theta_1=3,\theta_2=7θ0=1.999999,θ1=3,θ2=7