《机器学习实战》支持向量机(手稿+代码)
注释:已经看过有半年了,而且当时只是看了理论和opencv的函数调用,现在忘的九霄云外了,Ng的视频也没看SVM,现在准备系统的再学一遍吧。
之前记录的博客:http://www.cnblogs.com/wjy-lulu/p/6979436.html
一.SVM理论重点和难点剖析
注释:这里主要讲解公式推导和证明的重难点,不是按部就班的讲解SVM的求解过程,算是对推导过程的补充吧!
一点未接触过SVM的请看大神的博客:
http://blog.csdn.net/c406495762/article/details/78072313
http://www.cnblogs.com/jerrylead/archive/2011/03/13/1982684.html
http://blog.csdn.net/zouxy09/article/details/17291543
http://blog.pluskid.org/?p=685
http://blog.csdn.net/v_july_v/article/details/7624837
1.1点到直线距离的由来
我们先讨论点到平面的距离,由此推广到点到直线和点到超平面的距离公式。
点到平面公式推导
SVM公式推导一
SVM公式推导二
1.2拉格朗日对偶问题
用于求解带条件的最优化问题,其实到最后你就明白了SVM从头到尾最主要做的就是如何高效的求解目标值。而其它的学习算法做的都是对数据的求解优化问题,这点是SVM和其它算法根本的区别。
原始问题
对偶问题一
对偶问题2
原始问题和对偶问题的关系
KKT条件
SVM公式推导三
SVM公式推导四
1.3核函数的推导
目的:1.处理线性不可分的情况。2.求解方便。
过程:二维情况的不可分割,就映射到三维、四维....等高维空间去分割。
通俗解释:https://www.zhihu.com/question/24627666 知乎大神开始装逼的时刻了。
理论部分:如下公式推导.......
核函数引出一
1.4松弛变量的引入
目的:防止部分异常点的干扰。
原理:和其它算法引入惩罚系数一样的,允许有异常点但是要接受惩罚。比如:异常的点肯定都是偏离群体的点,既然偏离群体,那么它的值就为负数且绝对值愈大惩罚程度越大。
具体推导:见下文......
松弛变量的引入
1.5.SMO算法
SMO算法一
SMO算法二
注释:后面还有参数如何最优选择,有点看不懂而且也有点不想看了,干脆从下面的代码去分析SMO的具体过程吧!
二.程序实现
代码实现强烈推荐:http://blog.csdn.net/c406495762/article/details/78072313
给了程序伪代码很详细,程序读起来很方便。
2.1.SMO实现
2.1.1简化版SMO
简化版:针对理论中“SMO”的最后一句话,最优选择的问题!简化版是随机选择,选择上不做优化。
1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6 dataMat = []7 labelMat = []8 fr = open(fileName,'r')9 for line in fr.readlines():
10 lineArr = line.strip().split('\t')
11 dataMat.append([float(lineArr[0]),float(lineArr[1])])
12 labelMat.append(float(lineArr[2]))
13 a = np.mat(dataMat)
14 b = np.mat(labelMat).transpose()
15 DataLabel = np.array(np.hstack((a,b)))
16 return dataMat, labelMat, DataLabel
17 #随机
18 def selectJrand(i,m):
19 j = i
20 while(j==i):
21 j = int(np.random.uniform(0,m))
22 return j
23 #约束之后的aj
24 def clipAlpha(aj,H,L):
25 if aj>H:
26 aj = H
27 elif aj<L:
28 aj = L
29 return aj
30 #C:松弛系数
31 #maxIter:迭代次数
32 def smoSimple(dataMatIn,classLabels,C,toler,maxIter):
33 dataMatraix = np.mat(dataMatIn)
34 labelMatraix = np.mat(classLabels).transpose()
35 b =0
36 m,n = dataMatraix.shape
37 alphas = np.mat(np.zeros((m,1)))#系数都初始化为0
38 iter = 0
39 while(iter<maxIter):#大循环是最大迭代次数
40 alphaPairsChange = 0
41 for i in range(m):#样本训练
42 #预测值,具体看博客手写图片
43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b
44 #误差
45 Ei = fXi - float(labelMatraix[i])
46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果
47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\
48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):
49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]
50 fXj = float(np.multiply(alphas,labelMatraix).transpose()\
51 *(dataMatraix*dataMatraix[j,:].T))+b
52 Ej = fXj - float(labelMatraix[j])
53 alphaIold = alphas[i].copy()
54 alphaJold = alphas[j].copy()
55 if (labelMatraix[i] != labelMatraix[j]):
56 L = max(0,alphas[j]-alphas[i])
57 H = min(C,C+alphas[j]-alphas[i])
58 else:
59 L = max(0,alphas[i]+alphas[j]-C)
60 H = min(C,alphas[i]+alphas[j])
61 if (L==H):
62 print('L==H')
63 continue
64 #计算
65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\
66 - dataMatraix[i,:]*dataMatraix[i,:].T\
67 - dataMatraix[j,:]*dataMatraix[j,:].T
68 if (eta>0):
69 print("eta>0")
70 continue
71 #更新a的新值j
72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta
73 #修剪aj
74 alphas[j] = clipAlpha(alphas[j],H,L)
75 if (abs(alphas[j] - alphaJold) < 0.00001):
76 print("aj not moving")
77 #更新ai
78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])
79 #更新b1,b2
80 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\
81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T
82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\
83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T
84 #通过b1和b2计算b
85 if (0< alphas[i] <C): b = b1
86 elif (0<alphas[j]<C): b = b2
87 else: b = (b1+b2)/2
88 #计算更新次数,用来判断是否训练数据是否全部达标
89 alphaPairsChange += 1
90 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))
91 #连续的达标次数
92 if(alphaPairsChange==0): iter +=1
93 else: iter = 0
94 print("iteration number: %d"%(iter))
95 return b, alphas
2.1.2效果图
main.py文件
1 import svm2 import matplotlib.pyplot as plt3 import numpy as np4 5 if __name__ == '__main__':6 fig = plt.figure()7 axis = fig.add_subplot(111)8 dataMat, labelMat,DataLabel= svm.loadDataSet("testSet.txt")9 #b, alphas = svm.smoSimple(dataMat,labelMat,0.6,0.001,40)
10 #ws = svm.calsWs(alphas,dataMat,labelMat)
11 pData0 = [0,0,0]
12 pData1 = [0,0,0]
13 for hLData in DataLabel:
14 if (hLData[-1]==1):pData0 = np.vstack((pData0,hLData))
15 elif(hLData[-1]==-1):pData1 = np.vstack((pData1,hLData))
16 else:continue
17 vmax = np.max(pData0[:,0:1])
18 vmin = np.min(pData0[:,0:1])
19 axis.scatter(pData0[:,0:1],pData0[:,1:2],marker = 'v')
20 axis.scatter(pData1[:,0:1],pData1[:,1:2],marker = 's')
21 xdata = np.random.uniform(2.0,8.0,[1,20])
22 ydata = xdata*(0.81445/0.27279) - (3.837/0.27279)
23 axis.plot(xdata.tolist()[0],ydata.tolist()[0],'r')
24
25 fig.show()
26
27 #print("alphas = ",alphas[alphas>0])
View Code
svm.py文件
1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6 dataMat = []7 labelMat = []8 fr = open(fileName,'r')9 for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13 a = np.mat(dataMat)14 b = np.mat(labelMat).transpose()15 DataLabel = np.array(np.hstack((a,b)))16 return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19 j = i20 while(j==i):21 j = int(np.random.uniform(0,m))22 return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25 if aj>H:26 aj = H27 elif aj<L:28 aj = L29 return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smoSimple(dataMatIn,classLabels,C,toler,maxIter):33 dataMatraix = np.mat(dataMatIn)34 labelMatraix = np.mat(classLabels).transpose()35 b =036 m,n = dataMatraix.shape37 alphas = np.mat(np.zeros((m,1)))#系数都初始化为038 iter = 039 while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpose()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66 - dataMatraix[i,:]*dataMatraix[i,:].T\67 - dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95 return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99 X = np.mat(dataArr)
100 labelMat = np.mat(classLabels).T
101 alphas = np.mat(np.array(alphas).reshape((1,100))).T
102 m, n = X.shape
103 w = np.zeros([n,1])
104 for i in range(m):
105 w += np.multiply(alphas[i]*labelMat[i],X[i,:].T)
106 return w
View Code
2.1.3非线性分类(核向量)
程序代码:
1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6 dataMat = []7 labelMat = []8 fr = open(fileName,'r')9 for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13 a = np.mat(dataMat)14 b = np.mat(labelMat).T15 DataLabel = np.array(np.hstack((a,b)))16 return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19 j = i20 while(j==i):21 j = int(np.random.uniform(0,m))22 return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25 if aj>H:26 aj = H27 elif aj<L:28 aj = L29 return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smselfimple(dataMatIn,classLabels,C,toler,maxIter):33 dataMatraix = np.mat(dataMatIn)34 labelMatraix = np.mat(classLabels).transpselfe()35 b =036 m,n = dataMatraix.shape37 alphas = np.mat(np.zerself((m,1)))#系数都初始化为038 iter = 039 while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpselfe()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66 - dataMatraix[i,:]*dataMatraix[i,:].T\67 - dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95 return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99 X = np.mat(dataArr)
100 labelMat = np.mat(classLabels).T
101 alphas = np.mat(np.array(alphas).reshape((1,100))).T
102 w = np.sum(np.multiply(np.multiply(alphas,labelMat),X),axis=0)
103 return w
104
105 #完整版SMO
106 class optStruct:
107 def __init__(self,dataMatIn,classLabel,C,toler,kTup):
108 self.X = dataMatIn
109 self.labelMat = classLabel
110 self.C = C
111 self.tol = toler
112 self.m = np.shape(dataMatIn)[0]
113 self.alphas = np.mat(np.zeros((self.m,1)))
114 self.b = 0
115 self.eCache = np.mat(np.zeros((self.m,2)))#存储误差,用于启发式搜索
116 self.K = np.mat(np.zeros((self.m,self.m)))#存储核函数转化后的dataMatIn
117 for i in range(self.m):
118 self.K[:,i] = kernelTrans(self.X,self.X[i,:],kTup)
119 def clacEk(self,k):
120 #计算当前值
121 fXk = float(np.multiply(self.alphas,self.labelMat).T*self.K[:,k]+ self.b)
122 Ek = fXk - float(self.labelMat[k])
123 return Ek
124 def selecJ(self,i,Ei):
125 maxK = -1
126 maxDeltaE = 0
127 Ej = 0
128 self.eCache[i] = [1,Ei]#存储偏差
129 #A代表mat转换成array类型,nonzero返回非零元素的下标
130 validEcacheList = np.nonzero(self.eCache[:,0].A)[0]
131 if (len(validEcacheList)>1):#启发式选择
132 for k in validEcacheList:
133 if (k==i):continue#k不能等于i
134 Ek = self.clacEk(k)#计算绝对偏差
135 deltaE = abs(Ei - Ek)#相对偏差
136 if (deltaE >maxDeltaE):
137 maxK = k
138 maxDeltaE = deltaE
139 Ej = Ek
140 return maxK, Ej
141
142 else:
143 j = selectJrand(i,self.m)#随机选择
144 Ej = self.clacEk(j)#随机绝对偏差
145 return j,Ej
146 def updataEk(self,k):
147 Ek = self.clacEk(k)
148 self.eCache[k] = [k,Ek]
149 def innerL(self,i):
150 Ei = self.clacEk(i)
151 if ((self.labelMat[i]*Ei<-self.tol and self.alphas[i]<self.C)\
152 or(self.labelMat[i]*Ei>self.tol and self.alphas[i]>0)):
153 j,Ej = self.selecJ(i,Ei)#选择J
154 alphaIold = self.alphas[i].copy()
155 alphaJold = self.alphas[j].copy()
156 #计算L和H的值
157 if (self.labelMat[i] != self.labelMat[j]):
158 L = max(0,self.alphas[j]-self.alphas[i])
159 H = min(self.C,self.C+self.alphas[j]-self.alphas[i])
160 else:
161 L = max(0,self.alphas[j]+self.alphas[i]-self.C)
162 H = min(self.C,self.alphas[i] +self.alphas[j])
163 if (L==H): return 0
164 #eta = 2.0* self.X[i,:]*self.X[j,:].T - self.X[i,:]*self.X[i,:].T - \
165 # self.X[j,:]*self.X[j,:].T
166 eta = 2.0*self.K[i,j] - self.K[i,i] - self.K[j,j]#在此应用核函数
167 if (eta>=0): return 0
168 self.alphas[j] -= self.labelMat[j] * (Ei - Ej)/eta
169 self.alphas[j] = clipAlpha(self.alphas[j],H,L)
170 #更新新出现的aj
171 self.updataEk(j)
172 if (abs(self.alphas[j] - alphaJold)<0.00001):
173 print('J not move')
174 self.alphas[i] += self.labelMat[j]*self.labelMat[i]*(alphaJold-self.alphas[j])
175 self.updataEk(i)#更新新出现的ai
176 b1 = self.b - Ei - self.labelMat[i] *(self.alphas[i] - alphaIold)*\
177 self.K[i,i] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[i,j]
178 b2 = self.b - Ej - self.labelMat[i] *(self.alphas[i] - alphaIold)*\
179 self.K[i,j] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[j,j]
180 if (self.alphas[i]>0 and self.alphas[i]<self.C): self.b = b1
181 elif (self.alphas[j]>0 and self.alphas[j]<self.C): self.b = b2
182 else:self.b = (b1+b2)/2.0
183 return 1
184 else:
185 return 0
186
187 def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup): # full Platt SMO
188 self = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler, kTup)
189 iter = 0
190 entireSet = True;
191 alphaPairsChanged = 0
192 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
193 alphaPairsChanged = 0
194 if entireSet: # go over all
195 for i in range(self.m):
196 alphaPairsChanged += self.innerL(i)
197 print('fullSet iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))
198 iter += 1
199 else: # go over non-bound (railed) alphas
200 nonBoundIs = np.nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
201 for i in nonBoundIs:
202 alphaPairsChanged += self.innerL(i)
203 print('nonBound iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))
204 iter += 1
205 if entireSet:
206 entireSet = False # toggle entire set loop
207 elif (alphaPairsChanged == 0):
208 entireSet = True
209 print("iteration number: %d" % iter)
210 return self.b, self.alphas
211 #生成核函数,注意这里核函数的计算公式,见博文对其进行说明!
212 def kernelTrans(X,A,kTup):
213 m,n = X.shape
214 K = np.mat(np.zeros([m,1]))
215 if (kTup[0]=='lin'):K = X*A.T
216 elif(kTup[0]=='rbf'):
217 for j in range(m):
218 deltaRow = X[j,:] - A
219 K[j] = deltaRow * deltaRow.T
220 K = np.exp(K/(-1*kTup[1]**2))
221 else:raise NameError('Houston We have a problem')
222 return K
223 def testRbf(k1 = 1.3):
224 dataArr, labelArr, dataLbel = loadDataSet('testSetRBF.txt')
225 b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,('rbf',k1))
226 dataMat = np.mat(dataArr)
227 labelMat = np.mat(labelArr).T
228 svInd = np.nonzero(alphas.A>0)[0]#支持向量a
229 sVs = dataMat[svInd]#支持向量X
230 labelSV = labelMat[svInd]
231 print("There are %d Support Vector"%(sVs.shape[0]))
232 m, n = dataMat.shape
233 errorCount = 0
234 for i in range(m):
235 kernelEval = kernelTrans(sVs,dataMat[i,:],('rbf',k1))
236 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd])+b
237 if(np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1
238 print("The training error is: %f"%(float(errorCount/m)))
239 dataArr, labelArr, datalabel = loadDataSet('testSetRBF2.txt')
240 errorCount = 0
241 dataMat = np.mat(dataArr)
242 labelMat = np.mat(labelArr).T
243 m, n = dataMat.shape
244 for i in range(m):
245 kernelEval = kernelTrans(sVs, dataMat[i,:], ('rbf', k1))
246 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b
247 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1
248 print("The test error is: %f" %(float(errorCount / m)))
249
250 if __name__ == '__main__':
251
252 testRbf(1.3)
2.1.4手写数字识别测试
1 import numpy as np2 import matplotlib.pyplot as plt3 4 #预处理数据5 def loadDataSet(fileName):6 dataMat = []7 labelMat = []8 fr = open(fileName,'r')9 for line in fr.readlines():10 lineArr = line.strip().split('\t')11 dataMat.append([float(lineArr[0]),float(lineArr[1])])12 labelMat.append(float(lineArr[2]))13 a = np.mat(dataMat)14 b = np.mat(labelMat).T15 DataLabel = np.array(np.hstack((a,b)))16 return dataMat, labelMat, DataLabel17 #随机18 def selectJrand(i,m):19 j = i20 while(j==i):21 j = int(np.random.uniform(0,m))22 return j23 #约束之后的aj24 def clipAlpha(aj,H,L):25 if aj>H:26 aj = H27 elif aj<L:28 aj = L29 return aj30 #C:松弛系数31 #maxIter:迭代次数32 def smselfimple(dataMatIn,classLabels,C,toler,maxIter):33 dataMatraix = np.mat(dataMatIn)34 labelMatraix = np.mat(classLabels).transpselfe()35 b =036 m,n = dataMatraix.shape37 alphas = np.mat(np.zerself((m,1)))#系数都初始化为038 iter = 039 while(iter<maxIter):#大循环是最大迭代次数40 alphaPairsChange = 041 for i in range(m):#样本训练42 #预测值,具体看博客手写图片43 fXi = float(np.multiply(alphas,labelMatraix).T*(dataMatraix*dataMatraix[i,:].T))+b44 #误差45 Ei = fXi - float(labelMatraix[i])46 #满足条件才能更新a,b的值,感觉这里的labelMat[i]不影响结果47 if(((labelMatraix[i]*Ei<-toler) and (alphas[i]<C))\48 or ((labelMatraix[i]*Ei)>toler) and (alphas[i]>0)):49 j = selectJrand(i,m)#随机选择一个不同于i的[0,m]50 fXj = float(np.multiply(alphas,labelMatraix).transpselfe()\51 *(dataMatraix*dataMatraix[j,:].T))+b52 Ej = fXj - float(labelMatraix[j])53 alphaIold = alphas[i].copy()54 alphaJold = alphas[j].copy()55 if (labelMatraix[i] != labelMatraix[j]):56 L = max(0,alphas[j]-alphas[i])57 H = min(C,C+alphas[j]-alphas[i])58 else:59 L = max(0,alphas[i]+alphas[j]-C)60 H = min(C,alphas[i]+alphas[j])61 if (L==H):62 print('L==H')63 continue64 #计算65 eta = 2.0*dataMatraix[i,:]*dataMatraix[j,:].T\66 - dataMatraix[i,:]*dataMatraix[i,:].T\67 - dataMatraix[j,:]*dataMatraix[j,:].T68 if (eta>0):69 print("eta>0")70 continue71 #更新a的新值j72 alphas[j] -= labelMatraix[j]*(Ei - Ej)/eta73 #修剪aj74 alphas[j] = clipAlpha(alphas[j],H,L)75 if (abs(alphas[j] - alphaJold) < 0.00001):76 print("aj not moving")77 #更新ai78 alphas[i] += labelMatraix[j]*labelMatraix[i]*(alphaJold - alphas[j])79 #更新b1,b280 b1 = b - Ei - labelMatraix[i]*(alphas[i]-alphaIold)*dataMatraix[i,:]\81 *dataMatraix[i,:].T - labelMatraix[j]*(alphas[j]-alphaJold)*dataMatraix[i,:]*dataMatraix[j,:].T82 b2 = b - Ej - labelMatraix[i]*(alphas[i] - alphaIold)*dataMatraix[i,:]\83 * dataMatraix[j, :].T - labelMatraix[j]*(alphas[j] - alphaJold) * dataMatraix[j,:] * dataMatraix[j,:].T84 #通过b1和b2计算b85 if (0< alphas[i] <C): b = b186 elif (0<alphas[j]<C): b = b287 else: b = (b1+b2)/288 #计算更新次数,用来判断是否训练数据是否全部达标89 alphaPairsChange += 190 print("iter: %d i:%d pairs:%d "%(iter,i,alphaPairsChange))91 #连续的达标次数92 if(alphaPairsChange==0): iter +=193 else: iter = 094 print("iteration number: %d"%(iter))95 return b, alphas96 97 #分类函数98 def calsWs(alphas,dataArr,classLabels):99 X = np.mat(dataArr)
100 labelMat = np.mat(classLabels).T
101 alphas = np.mat(np.array(alphas).reshape((1,100))).T
102 w = np.sum(np.multiply(np.multiply(alphas,labelMat),X),axis=0)
103 return w
104 #文本转化为int
105 def img2vector(filename):
106 returnVect = np.zeros((1,1024))
107 fr = open(filename)
108 for i in range(32):
109 lineStr = fr.readline()
110 for j in range(32):
111 returnVect[0,32*i+j] = int(lineStr[j])
112 return returnVect
113 #完整版SMO
114 class optStruct:
115 def __init__(self,dataMatIn,classLabel,C,toler,kTup):
116 self.X = dataMatIn
117 self.labelMat = classLabel
118 self.C = C
119 self.tol = toler
120 self.m = np.shape(dataMatIn)[0]
121 self.alphas = np.mat(np.zeros((self.m,1)))
122 self.b = 0
123 self.eCache = np.mat(np.zeros((self.m,2)))#存储误差,用于启发式搜索
124 self.K = np.mat(np.zeros((self.m,self.m)))#存储核函数转化后的dataMatIn
125 for i in range(self.m):
126 self.K[:,i] = kernelTrans(self.X,self.X[i,:],kTup)
127 def clacEk(self,k):
128 #计算当前值
129 fXk = float(np.multiply(self.alphas,self.labelMat).T*self.K[:,k]+ self.b)
130 Ek = fXk - float(self.labelMat[k])
131 return Ek
132 def selecJ(self,i,Ei):
133 maxK = -1
134 maxDeltaE = 0
135 Ej = 0
136 self.eCache[i] = [1,Ei]#存储偏差
137 #A代表mat转换成array类型,nonzero返回非零元素的下标
138 validEcacheList = np.nonzero(self.eCache[:,0].A)[0]
139 if (len(validEcacheList)>1):#启发式选择
140 for k in validEcacheList:
141 if (k==i):continue#k不能等于i
142 Ek = self.clacEk(k)#计算绝对偏差
143 deltaE = abs(Ei - Ek)#相对偏差
144 if (deltaE >maxDeltaE):
145 maxK = k
146 maxDeltaE = deltaE
147 Ej = Ek
148 return maxK, Ej
149
150 else:
151 j = selectJrand(i,self.m)#随机选择
152 Ej = self.clacEk(j)#随机绝对偏差
153 return j,Ej
154 def updataEk(self,k):
155 Ek = self.clacEk(k)
156 self.eCache[k] = [k,Ek]
157 def innerL(self,i):
158 Ei = self.clacEk(i)
159 if ((self.labelMat[i]*Ei<-self.tol and self.alphas[i]<self.C)\
160 or(self.labelMat[i]*Ei>self.tol and self.alphas[i]>0)):
161 j,Ej = self.selecJ(i,Ei)#选择J
162 alphaIold = self.alphas[i].copy()
163 alphaJold = self.alphas[j].copy()
164 #计算L和H的值
165 if (self.labelMat[i] != self.labelMat[j]):
166 L = max(0,self.alphas[j]-self.alphas[i])
167 H = min(self.C,self.C+self.alphas[j]-self.alphas[i])
168 else:
169 L = max(0,self.alphas[j]+self.alphas[i]-self.C)
170 H = min(self.C,self.alphas[i] +self.alphas[j])
171 if (L==H): return 0
172 #eta = 2.0* self.X[i,:]*self.X[j,:].T - self.X[i,:]*self.X[i,:].T - \
173 # self.X[j,:]*self.X[j,:].T
174 eta = 2.0*self.K[i,j] - self.K[i,i] - self.K[j,j]#在此应用核函数
175 if (eta>=0): return 0
176 self.alphas[j] -= self.labelMat[j] * (Ei - Ej)/eta
177 self.alphas[j] = clipAlpha(self.alphas[j],H,L)
178 #更新新出现的aj
179 self.updataEk(j)
180 if (abs(self.alphas[j] - alphaJold)<0.00001):
181 print('J not move')
182 self.alphas[i] += self.labelMat[j]*self.labelMat[i]*(alphaJold-self.alphas[j])
183 self.updataEk(i)#更新新出现的ai
184 b1 = self.b - Ei - self.labelMat[i] *(self.alphas[i] - alphaIold)*\
185 self.K[i,i] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[i,j]
186 b2 = self.b - Ej - self.labelMat[i] *(self.alphas[i] - alphaIold)*\
187 self.K[i,j] - self.labelMat[j]*(self.alphas[j]-alphaJold)*self.K[j,j]
188 if (self.alphas[i]>0 and self.alphas[i]<self.C): self.b = b1
189 elif (self.alphas[j]>0 and self.alphas[j]<self.C): self.b = b2
190 else:self.b = (b1+b2)/2.0
191 return 1
192 else:
193 return 0
194
195 def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup): # full Platt SMO
196 self = optStruct(np.mat(dataMatIn), np.mat(classLabels).transpose(), C, toler, kTup)
197 iter = 0
198 entireSet = True;
199 alphaPairsChanged = 0
200 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
201 alphaPairsChanged = 0
202 if entireSet: # go over all
203 for i in range(self.m):
204 alphaPairsChanged += self.innerL(i)
205 print('fullSet iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))
206 iter += 1
207 else: # go over non-bound (railed) alphas
208 nonBoundIs = np.nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
209 for i in nonBoundIs:
210 alphaPairsChanged += self.innerL(i)
211 print('nonBound iter:%d, i=%d, pair change:%d' % (iter, i, alphaPairsChanged))
212 iter += 1
213 if entireSet:
214 entireSet = False # toggle entire set loop
215 elif (alphaPairsChanged == 0):
216 entireSet = True
217 print("iteration number: %d" % iter)
218 return self.b, self.alphas
219 #生成核函数,注意这里核函数的计算公式,见博文对其进行说明!
220 def kernelTrans(X,A,kTup):
221 m,n = X.shape
222 K = np.mat(np.zeros([m,1]))
223 if (kTup[0]=='lin'):K = X*A.T
224 elif(kTup[0]=='rbf'):
225 for j in range(m):
226 deltaRow = X[j,:] - A
227 K[j] = deltaRow * deltaRow.T
228 K = np.exp(K/(-1*kTup[1]**2))
229 else:raise NameError('Houston We have a problem')
230 return K
231
232 def testRbf(k1 = 1.3):
233 dataArr, labelArr, dataLbel = loadDataSet('testSetRBF.txt')
234 b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,('rbf',k1))
235 dataMat = np.mat(dataArr)
236 labelMat = np.mat(labelArr).T
237 svInd = np.nonzero(alphas.A>0)[0]#支持向量a
238 sVs = dataMat[svInd]#支持向量X
239 labelSV = labelMat[svInd]
240 print("There are %d Support Vector"%(sVs.shape[0]))
241 m, n = dataMat.shape
242 errorCount = 0
243 for i in range(m):
244 kernelEval = kernelTrans(sVs,dataMat[i,:],('rbf',k1))
245 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd])+b
246 if(np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1
247 print("The training error is: %f"%(float(errorCount/m)))
248 dataArr, labelArr, datalabel = loadDataSet('testSetRBF2.txt')
249 errorCount = 0
250 dataMat = np.mat(dataArr)
251 labelMat = np.mat(labelArr).T
252 m, n = dataMat.shape
253 for i in range(m):
254 kernelEval = kernelTrans(sVs, dataMat[i,:], ('rbf', k1))
255 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b
256 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1
257 print("The test error is: %f" %(float(errorCount / m)))
258
259 def loadImage(dirName):
260 from os import listdir
261 hwLabel = []
262 trainingFileList = listdir(dirName)
263 m = len(trainingFileList)
264 trainingMat = np.zeros((m,1024))
265 for i in range(m):
266 fileNameStr = trainingFileList[i]
267 fileStr = fileNameStr.split('.')[0]
268 classNumStr = int(fileStr.split('_')[0])
269 if (classNumStr==9):
270 hwLabel.append(-1)
271 else:hwLabel.append(1)
272 trainingMat[i,:] = img2vector('%s/%s'%(dirName,fileNameStr))
273 return trainingMat, np.array(hwLabel)
274
275 def testDigits(kTup = ('rbf',10)):
276 dataArr, labelArr = loadImage('trainingDigits')
277 b, alphas = smoP(dataArr,labelArr,200,0.0001,10000,kTup)
278 dataMat = np.mat(dataArr);labelMat = np.mat(labelArr).T
279 svInd = np.nonzero(alphas.A>0)[0]
280 sVs = dataMat[svInd]
281 labelSV = labelMat[svInd]
282 print("there are %d Support Vectors"%(int(sVs.shape[0])))
283 m,n = dataMat.shape
284 errorCount = 0
285 for i in range(m):
286 kernelEval = kernelTrans(sVs,dataMat[i,:],kTup)
287 predict = kernelEval.T * np.multiply(labelSV,alphas[svInd]) + b
288 if (np.sign(predict)!=np.sign(labelArr[i])):errorCount+=1
289 print("The training error is: %f"%(float((errorCount)/m)))
290 dataArr, labelArr = loadImage('testDigits')
291 errorCount = 0
292 dataMat = np.mat(dataArr);labelMat = np.mat(labelArr).T
293 m, n = dataMat.shape
294 for i in range(m):
295 kernelEval = kernelTrans(sVs, dataMat[i, :], kTup)
296 predict = kernelEval.T * np.multiply(labelSV, alphas[svInd]) + b
297 if (np.sign(predict) != np.sign(labelArr[i])): errorCount += 1
298 print("The test error is: %f" % (float((errorCount) / m)))
三.参考文献
参考:
注释:以下是参考链接里面的内容,按以下中文描述排列!都是大神的结晶,没有主观次序。
点到平面距离、拉格朗日二次优化、二次曲线公式推导、SMO公式推导
https://www.cnblogs.com/graphics/archive/2010/07/10/1774809.html
http://www.cnblogs.com/90zeng/p/Lagrange_duality.html
https://wenku.baidu.com/view/e28aa3040b1c59eef9c7b40a.html
http://blog.csdn.net/xuanyuansen/article/details/41153023
《机器学习实战》支持向量机(手稿+代码)相关推荐
- 机器学习实战 支持向量机SVM 代码解析
机器学习实战 支持向量机SVM 代码解析 <机器学习实战>用代码实现了算法,理解源代码更有助于我们掌握算法,但是比较适合有一定基础的小伙伴.svm这章代码看起来风轻云淡,实则对于新手来说有 ...
- apriori算法代码_资源 | 《机器学习实战》及代码(基于Python3)
〇.<机器学习实战> 今天推荐给大家的是<机器学习实战>这本书. 机器学习作为人工智能研究领域中一个极其重要的研究方向(一文章看懂人工智能.机器学习和深度学习),在当下极其热门 ...
- 《机器学习实战》配套代码下载
<机器学习实战>配套代码资源下载网址:http://www.ituring.com.cn/book/1021(图灵社区),网址里有随书下载,可以下载配套资源.
- 机器学习实战——决策树(代码)
最近在学习Peter Harrington的<机器学习实战>,代码与书中的略有不同,但可以顺利运行. from math import log import operator# 计算熵 d ...
- 基于pyhton3.6-机器学习实战-支持向量机SVM代码解释
本人是一名数学系研究生,于2017年底第一次接触python和机器学习,作为一名新手,欢迎与大家交流. 我主要给大家讲解代码,理论部分给大家推荐3本书: <机器学习实战中文版> <机 ...
- 【机器学习】支持向量机(SVM)代码练习
本课程是中国大学慕课<机器学习>的"支持向量机"章节的课后代码. 课程地址: https://www.icourse163.org/course/WZU-1464096 ...
- python支持向量机回归_机器学习实战-支持向量机原理、Python实现和可视化(分类)...
支持向量机(SVM)广泛应用于模式分类和非线性回归领域. SVM算法的原始形式由Vladimir N.Vapnik和Alexey Ya提出.自从那以后,SVM已经被巨大地改变以成功地用于许多现实世界问 ...
- 机器学习实战-第二章代码+注释-KNN
#-*- coding:utf-8 -*- #https://blog.csdn.net/fenfenmiao/article/details/52165472 from numpy import * ...
- Python3《机器学习实战》学习笔记(八):支持向量机原理篇之手撕线性SVM
原 Python3<机器学习实战>学习笔记(八):支持向量机原理篇之手撕线性SVM 置顶 2017年09月23日 17:50:18 阅读数:12644 转载请注明作者和出处: https: ...
- 机器学习实战——绘制决策树(代码)
最近在学习Peter Harrington的<机器学习实战>,代码与书中的略有不同,但可以顺利运行. import matplotlib.pyplot as plt# 定义文本框和箭头格式 ...
最新文章
- 华南理工大学计算机应用基础随堂作业,华南理工大学计算机应用基础随堂练习题目及答案...
- Java8 LocalDateTime获取时间戳(毫秒/秒)、LocalDateTime与String互转、Date与LocalDateTime互转
- boost::hana::remove_range用法的测试程序
- python项目开发视频
- mysql必知必会的数据_MySQL必知必会---数据过滤
- c++ 显示三维散点图_Matplotlib中的三维绘图
- 数字阵列麦克风处理技术概述
- 廖雪峰git教程学习记录
- 全国python一级考试_全国青少年软件编程(Python)等级考试试卷(一级)测试卷...
- 阿里巴巴 html圆代码,阿里巴巴国际站HTML代码全透视
- UEFI实战 gST、gBS和gImageHandle
- 已知华氏温度f c语言,编程题:已知两种温度的换算公式C=(5/9)(F-32),试编写一个程序输入华氏度F,输出摄氏度。...
- 「ZBrush」学习ZB出来可以从事什么工作
- linux图像显示(五)使用freetype处理矢量字体
- ubuntu搜狗输入法
- 小傻蛋的妹妹跟随小甲鱼学习Python的第四节004
- Opencv 简单视频播放器
- 泰凌微ble mesh蓝牙模组天猫精灵学习之旅 ② 如何实现 微信小程序蓝牙控制 Ble Mesh模组 安信可TB02,全部开源!
- win7系统重装之u盘装系统教程,u盘安装win7系统
- 绕过Windows正版验证新方法
热门文章
- VS提示无可用源,此模块的调试信息…
- Python字符串的转义字符
- 多空博弈主力资金控盘强度指标公式 主/副图
- c语言撇号的用法,不同的语言标点使用也不同,英语中的撇号如何使用?美联英语带你了解...
- 静态博客网页中的网易云音乐播放器
- check the manual that corresponds to your MySQL server version for the right syntax to use near
- ROS系统之安装系列(一):安装步骤
- TCP/IP协议号和端口
- TRUE PARTNER迎来戴维斯双击,资产规模业绩双增长
- python学习资源整理