文章目录

The prcface
Compare
- Logical Regression
- Decision tree
- SVM
- Analysis
- - Logistic Regression Analysis
  - - Advantages
    - Disadvantages
  - Decision Tree Analysis
  - - Advantages
    - Disadvantages
  - SVM Analysis
  - - Advantages
    - Disadvantages
- Adivces
Mathematrical principle of SVM
- Target Function
- Lagrange duality
- Simplify
- example
- Soft interval
- Kernel function
Coding
- Data type
- Core
- forcast
- All code
- get data
Real case

The prcface

For improve my English level and prepare for the postgraduate entrance examination(研究生入学考试) i will wriiten blog by English. So although I have written a lot of content in Chinese, I still have to translate it into English by myself and this is why my blog updates will be slow . Of course, if I have time, I will launch the corresponding Chinese version.

okey let’s today’s blog. Welcome to my channel!

Before you read this blog i hope you have already to know something about machine learning it is not fit someone who hasn’t learned that.

target:

The mathematrical principle of SVM algorithm

how coding that

Real case

no coding no future that’s go gays

Compare

If we said SVM algorithm we must said others algorithms in machine learning such as decision tree and Logical regression,because of thare all could help us to class object. But why we should use SVM,what different?

Logical Regression

this algorithm is very easy to learn you can get a probability for classification.

if we have this type object (x1,x2,label) in our decision-making space, you are likely to see this:

You will find that the effect is not good. Because no matter what you do, the decision boundary obtained by the logical regression method is always linear, and you can’t get the ring boundary you need here. Therefore, logical regression is suitable for dealing with classification problems that are nearly linearly separable.

Decision tree

if we use this algorithm you can see this:

and then you will got this:

If you continue to increase the size of the tree, you will notice that the decision boundary is constantly surrounded by parallel lines. Therefore, if the boundary is nonlinear and can be simulated by constantly dividing the feature space into rectangles, then the decision tree is a better choice than logical regression.

SVM

Although our data is two-dimensional, we can use kernel functions to map it to high-dimensional space for classification.

so maybe you will see this:

and then you can get this:

Note: the decision boundary is not such a standard circle, but it is very close (probably polygons). In order to make it easy to operate, we used rings instead.

So now we can simply analyze which algorithm we can use in different scenarios.

Analysis

Logistic Regression Analysis

A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.

In addition, the logical regression algorithm is robust to small and medium noise and will not be specially affected by slight multicollinearity. Serious multicollinearity can be solved by logical regression combined with L2 regularization, but if you want to get a reduced model, L2 regularization is not the best choice, because the model it establishes covers all the features.

When you have a large number of features and lose most of your data, logical regression will be inadequate. At the same time, too many category variables are also a problem for logical regression. Logical regression is the probability of using the whole data to get it. So when you try to draw a separation curve, logical regression may assume that the “obvious” data points at both ends of the score should not be paid attention to. Ideally, logical regression should rely on these boundary points. At the same time, if some features are nonlinear, then you have to rely on transformation, and this becomes another problem when the dimension of your feature space increases.

Advantages

Convenient probability score of observation sample.

Efficient implementation of existing tools.

For logical regression, multicollinearity is not a problem, it can be solved with L2 regularization.

Logical regression is widely used in industrial problems (this is very important).

Disadvantages

When the feature space is very large, the performance of logical regression is not very good.

Can not handle a large number of multi-class features or variables well.

For nonlinear features, conversion is needed.

Depends on all the data (over-fitting is serious).

Decision Tree Analysis

The inherent characteristic of the decision tree is that it does not care about unidirectional transformations or nonlinear features (which is different from the nonlinear correlation in predictors), because they simply insert rectangles in the feature space, and these shapes can adapt to any monotone transformation. When the decision tree is designed to deal with discrete data or categories of predictors, any number of classification variables is not a real problem for the decision tree. The model trained using the decision tree is quite intuitive and easy to explain in business. The decision tree does not take the probability score as the direct result, but you can use the class probability to assign it to the end node in turn. This brings us to the biggest problem associated with decision trees, that is, they are highly biased models. You can build a decision tree model on the training set, and its results on the training set may be better than other algorithms, but your test set will eventually prove to be a poor predictor. You must prune the tree and combine cross-validation to get a decision tree model without over-fitting.

Random forest overcomes the defect of over-fitting to a great extent, and there is nothing special about it, but it is a very excellent extension of the decision tree. Random forests also deprive business rules of ease of interpretation, because now you have thousands of such trees, and most of the voting rules they use make the model more complex. At the same time, there are interactions between decision tree variables, which can make the results very inefficient if most of your variables do not interact with each other or are very weak. In addition, this design also makes them less susceptible to multicollinearity.

Advantages

Intuitive decision rules.

Can deal with non-linear features.

The interaction between variables is considered.

Disadvantages

It is easy to fit, of course, it can be solved by random deep forest.

SVM Analysis

The characteristic of support vector machine is that it relies on boundary samples to establish the desired separation curve. As we have seen between us, it can deal with nonlinear decision boundaries. Their dependence on boundaries also gives them the ability to deal with “obvious” sample instances in missing data. Support vector machine can deal with large feature space, so it has become one of the most popular algorithms in text analysis. Because text data almost always produce a large number of features, logical regression is not a very good choice in this case.

The results of SVM are not as intuitive as decision trees. At the same time, the nonlinear kernel is used, which makes the training of support vector machine on large data very time-consuming.

Advantages

Ability to handle large feature spaces.

Ability to deal with the interaction between nonlinear features.

No need to rely on the entire data

Disadvantages

When there are many observation samples, the efficiency is not very high.

Sometimes it is difficult to find a suitable kernel function

Adivces

Logical regression is the first choice to bear the brunt. If its effect is not good, then its results can be used as a benchmark for reference.
Then try whether the decision tree (random forest) can greatly improve the performance of the model. Even if you don’t use it as the final model, you can use random forests to remove noise variables.
If the number of features and observation samples are particularly large, then when the resources and time are sufficient, the use of SVM is a choice.

Don’t find it troublesome, in fact, in the current situation, you just need to use sklearn to call different API.

Mathematrical principle of SVM

Target Function

okey is time to say SVM how it work what mathematrical about it?

Our target is sample that we just need to class objects in tow categorys.

just like this:

(If we don’t use kernel functions to calculate)

We need to find a straight line or hyperplane to make the nearest point on both sides farthest from the plane, so as to achieve a good classification effect.

just like this:

We can assume that this hyperplane looks like this.

like this:

So, by deriving the distance from the point to the straight line, we can actually derive the distance from the point to the hyperplane.

(For example, the distance from a point (x0) to a straight line Ax+By+C=0:)

The distance from the point to the hyperplane:

However, since SVM is a binary algorithm, we can stipulate like this:

it is means you input data will like it:
(x1,x2,1),(x2_1,x2_2,0)…
the number 1 and 0 is your label
so,the distance will be this:

Find a hyperplane (w and b) so that the point closest to the plane can be farthest.
Argmax makes min (the distance from the nearest point to the plane)

tips:

For better calculation, we make the following assumption, since scaling does not affect the distance from the point to the plane, we assume:

So, we can reduce the distance formula to:

For better calculation, we change the maximum value to the minimum value.

And for the convenience of derivation, we add square.

so we need do it:

Lagrange duality

To solve this constrained problem, we need to use this

and then we get this:

but if we use this the function will be:

So We transform the dual problem.

so we can find the partial derivative.

Simplify

Last we can get this:

and then we make it

extend this:

(To make the equation clearer, we changed it a little bit.)

To:

with this Conditions

Similarly, in order to facilitate the operation, we convert the maximum value to the minimum value.

We just need to add a minus sign.

Finally, we got all the values we needed.

Take these values into the formula of the partial derivative obtained previously, and get w and b.

example

this example from this video:
https://www.bilibili.com/video/BV15A4y1X7K1?p=31&spm_id_from=pageDriver&vd_source=641e71dfd1a118fb834c4a5d156688d5

and methematrical princple from there:
https://zhuanlan.zhihu.com/p/270298485

Soft interval

If we strictly follow the maximum and minimum interval we have calculated, this kind of situation is easy to occur.

so we make

to

and then target function become this:

Last the result be:

Kernel function

Through the above derivation and examples, you will find that what has been done above is only linear processing, and what we end up with is not a nonlinear boundary, so we may need more information to convert the distance between the point and the line into the distance of the face. and this needs to be dealt with by a kernel function.

such as this:

It is very difficult for us to divide into two categories through a line segment.

But if the points above can look like this:

If this is the case, we can directly use a face to split into two classes from the middle。
for it we can use this:

def kernelTrans(X, A, kTup): m,n = shape(X)K = mat(zeros((m,1)))if kTup[0]=='lin': #linner nothing to do hereK = X * A.Telif kTup[0]=='gaosi': # (gaosi function)for j in range(m):deltaRow = X[j,:] - AK[j] = deltaRow*deltaRow.TK = exp(K/(-1*kTup[1]**2)) else:raise NameError("Son of a bitch doesn't have this kernel function.")return K

Here we can still quote the case in the video above (about kernel functions, of course we have more direct examples in the actual code, and we will make some additional additions, but the general process is similar)

After that, the calculation is the same as before, but one more mapping is done.

I don't want to say too much here, because it's very simple, we'll show it in the code. In fact, I've been ready to do this a long time ago, but I don't know how to talk about it. I also have to thank Dr. Tang Yudi, the author of the above-mentioned video. In fact, until then, I had been abstracting the mathematical principles before SVM, just knowing how to use the formula we finally got.

Coding

Data type

okey,The first imporant thing is what the data type,because programe equal data type and your algorithm.Today we data type will like yestoday when we coding decision tree algorithm but i will save these data in txt file.

class DataSet(object):def __init__(self,path):self.Features=[]self.Labels = []self.path = pathdef LoadDataSet(self):if(os.path.exists(self.path)):with open(self.path) as file:for line in file.readlines():lineArr = [float(x) for x in line.strip().split()]self.Features.append([lineArr[0], lineArr[1]])self.Labels.append(lineArr[2])return self.Features, self.Labelselse:raise Exception("Fuck you no such file")

Core

when we start coding we must to know what is our core or target.

Now we have to know what should we do,wo have a target function,when we input our data we will get a function like this:

we need to know aplpha about our input data,and then we can know our function will be like:

But if we want to get this,we need to do something make machine can do this:

That is, how to let the machine help us solve the equation?

So here, our core is actually how to use a method to make the machine complete the operation of automatically solving equations.

SMO algorithm is mentioned in the book 《Statistical Learning Methods》.Of course, we can also use other similar heuristic algorithms.

def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)iter = 0entireSet = TruealphaPairsChanged = 0while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):alphaPairsChanged = 0if entireSet:for i in range(self.m):  # 遍历所有数据alphaPairsChanged += self.KKTGoing(i)print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))  iter += 1else:nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]for i in nonBoundIs:  alphaPairsChanged += self.KKTGoing(i)print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))iter += 1if entireSet:entireSet = Falseelif (alphaPairsChanged == 0):entireSet = Trueprint("iteration number: %d" % iter)return self.b, self.alphas

forcast

when we get the alaphas and b we can get our hyperplane, and then substitute new samples to calculate the distance between these samples and hyperplane, and then complete the classification by sign function.

    def forcast(self,dataSet:DataSet,show=False):res = []dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()datMat_forcast = mat(dataArr_forcast)m, n = shape(datMat_forcast)for i in range(m):  # 在测试数据上检验错误率kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.bres.append(predict)if(show):print("the result is:",res)return res

All code

now that’s we show all code:

from numpy import *
import osclass DataSet(object):def __init__(self,path):self.Features=[]self.Labels = []self.path = pathdef LoadDataSet(self):if(os.path.exists(self.path)):with open(self.path) as file:for line in file.readlines():lineArr = [float(x) for x in line.strip().split()]self.Features.append([lineArr[0], lineArr[1]])self.Labels.append(lineArr[2])return self.Features, self.Labelselse:raise Exception("Fuck you no such file")class SVMModel(object):def __init__(self,Ktype):self.Ktype = Ktypedef __SMO_init(self,Features, Labels, C, toler, Ktype):""":param Features::param Labels::param C: Soft interval:param toler: Stop threshold:param Ktype:"""self.X = Featuresself.labelMat = Labelsself.C = Cself.tol = tolerself.m = shape(Features)[0]self.alphas = mat(zeros((self.m,1)))self.b = 0self.eCache = mat(zeros(shape(Features)))self.K = mat(zeros((self.m,self.m)))self.sVs = Noneself.labelSV = Noneself.svInd = Nonefor i in range(self.m):self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)def kernelFunction(self,X, A, Ktype):""":param X::param A::param Ktype: (type,param):return:"""m, n = shape(X)K = mat(zeros((m, 1)))if Ktype[0] == 'lin':K = X * A.Telif Ktype[0] == 'rbf':for j in range(m):deltaRow = X[j, :] - AK[j] = deltaRow * deltaRow.TK = exp(K / (-1 * Ktype[1] ** 2))else:raise NameError('Houston We Have a Problem -- That Kernel is not recognized')return Kdef __SelectRand(self,i, m): j = iwhile (j == i):j = int(random.uniform(0, m))return jdef __SelectAj(self,i, oS, Ei):maxK = -1maxDeltaE = 0Ej = 0oS.eCache[i] = [1, Ei]validEcacheList = nonzero(self.eCache[:, 0].A)[0]  # 返回矩阵中的非零位置的行数if (len(validEcacheList)) > 1:for k in validEcacheList:if k == i:continueEk = self.__calcEk(k)deltaE = abs(Ei - Ek)if (deltaE > maxDeltaE): maxK = kmaxDeltaE = deltaEEj = Ekreturn maxK, Ejelse:j = self.__SelectRand(i, self.m)Ej = self.__calcEk(j)return j, Ejdef __HoldAlpha(self,al, H, L):#（L <= a <= H）if (al > H):al = Helif(L > al):al = Lreturn aldef __calcEk(self, k): fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)Ek = fXk - float(self.labelMat[k])return Ekdef __updateEk(self,k):Ek = self.__calcEk(k)self.eCache[k] = [1, Ek]def KKTGoing(self,i):"""Refer to the following 《Statistical Learning Methods》.First, check whether ai meets KKT conditions. If not, randomly select aj for optimization and update the values of AI, AJ and B.:param self: :return: """Ei = self.__calcEk(i)  # 计算E值if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or ((self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)): j, Ej = self.__SelectAj(i, self, Ei) alphaIold = self.alphas[i].copy()alphaJold = self.alphas[j].copy()if (self.labelMat[i] != self.labelMat[j]):L = max(0, self.alphas[j] - self.alphas[i])H = min(self.C, self.C + self.alphas[j] - self.alphas[i])else:L = max(0, self.alphas[j] + self.alphas[i] - self.C)H = min(self.C, self.alphas[j] + self.alphas[i])if L == H:print("L==H")return 0eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j] if eta >= 0:print("eta>=0")return 0self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta  self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)  self.__updateEk(j)if (abs(self.alphas[j] - alphaJold) < self.tol): print("j not moving enough")return 0self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j]) self.__updateEk(i)  b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (self.alphas[j] - alphaJold) * self.K[i, j]b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (self.alphas[j] - alphaJold) * self.K[j, j]if (0 < self.alphas[i] < self.C):self.b = b1elif (0 < self.alphas[j] < self.C):self.b = b2else:self.b = (b1 + b2) / 2.0return 1else:return 0def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):"""SMO algorithm is a heuristic algorithm, and I don't know the specific principle. The code comes from Github, and I plan to use PSO algorithm later."""self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)iter = 0entireSet = TruealphaPairsChanged = 0while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):alphaPairsChanged = 0if entireSet:for i in range(self.m):  alphaPairsChanged += self.KKTGoing(i)print("fullSet, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged)) iter += 1else:nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]for i in nonBoundIs:  alphaPairsChanged += self.KKTGoing(i)print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))iter += 1if entireSet:entireSet = Falseelif (alphaPairsChanged == 0):entireSet = Trueprint("iteration number: %d" % iter)return self.b, self.alphasdef fit(self,dataSet:DataSet):dataArr, labelArr = dataSet.LoadDataSet()b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype) self.b = bself.alphas = alphasdatMat = mat(dataArr)labelMat = mat(labelArr).transpose()svInd = nonzero(alphas)[0]# Select the number of rows of data that is not 0 (that is, support vector)sVs = datMat[svInd]labelSV = labelMat[svInd]self.sVs = sVsself.labelSV = labelSVself.svInd  = svIndprint("there are %d Support Vectors" % shape(sVs)[0])m, n = shape(datMat)  errorCount = 0for i in range(m):kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + bif sign(predict) != sign(labelArr[i]):  # sign: -1 if x < 0, 0 if x==0, 1 if x > 0errorCount += 1print("the training error rate is: %f" % (float(errorCount) / m))  def save_model(self,path):dict = {}dict['b'] = self.bdict['alphas'] = self.alphasdict['sVs'] = self.sVsdict['labelSV'] = self.labelSVdict['svInd'] = self.svIndwith open(path,'w') as file:file.write(dict)def load_mode(self,path):if(os.path.exists(path)):with open(path) as file:model = file.read()model = eval(model)self.b = model['b']self.alphas = model['alphas']self.sVs = model['sVs']self.labelSV = model['labelSV']self.svInd = model['svInd']else:raise Exception("Fuck you no such file")def predict(self,dataSet:DataSet):dataArr_test, labelArr_test = dataSet.LoadDataSet() errorCount_test = 0datMat_test = mat(dataArr_test)m, n = shape(datMat_test)for i in range(m):  # 在测试数据上检验错误率kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.bif sign(predict) != sign(labelArr_test[i]):errorCount_test += 1print("the test error rate is: %f" % (float(errorCount_test) / m))def forcast(self,dataSet:DataSet,show=False):res = []dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()datMat_forcast = mat(dataArr_forcast)m, n = shape(datMat_forcast)for i in range(m):  kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.bres.append(predict)if(show):print("the result is:",res)return resif __name__ == '__main__':train_path = r'\Data\svm_train.txt'test_data = r'\Data\svm_eval.txt'train_data = DataSet(train_path)test_data = DataSet(test_data)SVM = SVMModel(('rbf', 1.3))SVM.fit(train_data)SVM.predict(test_data)

get data

if you want get my data you can go there:
链接：https://pan.baidu.com/s/1rTmao4zkQJiRW9zGcWpXHQ
提取码：6666
Mr. Wu Enda is very good, and I don’t accept any objection to this sentence.

Real case

This is just an online case.
from:https://blog.csdn.net/weixin_48577398/article/details/117465475

(Mr: Wu)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy.io import loadmat
from sklearn import svm'''
1.Prepare datasets
'''
mat = loadmat('data/ex6data1.mat')
print(mat.keys())
X = mat['X']
y = mat['y']def plotData(X, y):plt.figure(figsize=(8, 6))plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')plt.xlabel('x1')plt.ylabel('x2')passdef plotBoundary(clf, X):'''Plot Decision Boundary'''x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1# np.linspace(x_min, x_max, 500).shape---->(500, )  500是样本数# xx.shape, yy.shape ---->(500, 500) (500, 500)xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])# model.predict:模型预测 (250000, )# ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)# np.c 中的c是column（列）的缩写，就是按列叠加两个矩阵，就是把两个矩阵左右组合，要求行数相等。# np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本Z = Z.reshape(xx.shape)plt.contour(xx, yy, Z)# 等高线得作用就是画出分隔得线passmodels = [svm.SVC(C, kernel='linear') for C in [1, 100]]
# 支持向量机模型 (kernel:核函数选项，这里是线性核函数 , C:权重，这里取1和100)
# 线性核函数画的决策边界就是直线
clfs = [model.fit(X, y.ravel()) for model in models]    # model.fit:拟合出模型
score = [model.score(X, y) for model in models]        # [0.9803921568627451, 1.0]
# title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]def plot():title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]for model, title in zip(clfs, title):# zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。plt.figure(figsize=(8, 5))plotData(X, y)plotBoundary(model, X)  # 用拟合好的模型（预测那些250000个样本），绘制决策边界plt.title(title)passpass# plt.show()'''
2.SVM with Gaussian Kernels
'''def gaussKernel(x1, x2, sigma):return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.)  # 0.32465246735834974
# print(a)'''
Example Dataset 2
'''mat = loadmat('data/ex6data2.mat')
x2 = mat['X']
y2 = mat['y']
plotData(x2, y2)
plt.show()sigma = 0.1
gamma = np.power(sigma, -2)/2
'''
高斯核函数中的gamma越大，相当高斯函数中的σ越小，此时的分布曲线也就会越高越瘦。
高斯核函数中的gamma越小，相当高斯函数中的σ越大，此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
'''
clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
model = clf.fit(x2, y2.flatten())       # kernel='rbf'表示支持向量机使用高斯核函数
# https://blog.csdn.net/guanyuqiu/article/details/85109441
# plotData(x2, y2)
# plotBoundary(model, x2)
# plt.show()'''
Example Dataset3
'''
mat3 = loadmat('data/ex6data3.mat')
x3, y3 = mat3['X'], mat3['y']
Xval, yval = mat3['Xval'], mat3['yval']
plotData(x3, y3)
# plt.show()Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.)  # 权重C的候选值
sigmavalues = Cvalues   # 核函数参数的候选值
best_pair, best_score = (0, 0), 0        # 最佳的（C，sigma）权值 ，决定系数（R2）
# 寻找最佳的权值（C，sigma）
for C in Cvalues:for sigma in sigmavalues:gamma = np.power(sigma, -2.) / 2model = svm.SVC(C=C, kernel='rbf', gamma=gamma)     # 使用核函数的支持向量机model.fit(x3, y3.flatten())      # 拟合出模型this_score = model.score(Xval, yval)        # 利用交叉验证集来选择最合适的权重'''model.score函数的返回值是决定系数,也称R2。可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。'''# 选择拟合得最好得权重值if this_score > best_score:best_score = this_scorebest_pair = (C, sigma)passpass
print('最优（C, sigma）权值：', best_pair, '决定系数：', best_score)
# 最优（C, sigma）权值： (1.0, 0.1) 决定系数： 0.965
model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
# 用确定好的权重再重新声明一次支持向量机
model.fit(x3, y3.flatten())
plotData(x3, y3)
plotBoundary(model, x3)
# plt.show()

What is SVM algorithm相关推荐

matlab处理svm的数据,SVM-GUI 使用支持向量机（SVM）算法进行处理数据，提取特征参数，并通过MATLAB界面显示相关数 238万源代码下载- www.pudn.com...
文件名称: SVM-GUI下载收藏√ [ 5 4 3 2 1 ] 开发工具: matlab 文件大小: 231 KB 上传时间: 2014-05-13 下载次数: 13 提供者: 幽灵 ...
机器学习算法 --- SVM (Support Vector Machine)
一.SVM的简介 SVM(Support Vector Machine,中文名:支持向量机),是一种非常常用的机器学习分类算法,也是在传统机器学习(在以神经网络为主的深度学习出现以前)中一种非常牛X的 ...
模式识别与机器学习作业——SVM（Python实现）
SVM Homework 3 报告 Problem 1 Problem 2 代码线性SVM 非线性可分SVM 学习笔记参考内容 Homework 3 报告 Problem 1 In this pr ...
机器学习之支持向量机： Support Vector Machines (SVM)
机器学习之支持向量机: Support Vector Machines (SVM) 欢迎访问人工智能研究网课程中心网址是:http://i.youku.com/studyai 本篇博客介绍机器学习 ...
基于OpenCV的 SVM算法实现数字识别(四)---代码实现
三.基于SVM算法实现手写数字识别作为一个工科生,而非数学专业的学生,我们研究一个算法,是要将它用于实际领域的.下面给出基于OpenCV3.0的SVM算法手写数字识别程序源码(参考http://bl ...
SVM优化方法--SMO优化算法（Sequential minimal optimization）
Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines SMO算法由Microso ...
使用机器学习检测TLS 恶意加密流——业界调研***有开源的数据集，包括恶意证书的，以及恶意tls pcap报文***...
2018 年的文章, Using deep neural networks to hunt malicious TLS certificates from:https://techxplore.com ...
基于深度学习的安卓恶意应用检测----------android manfest.xml + run time opcode, use 深度置信网络（DBN）...
基于深度学习的安卓恶意应用检测 from:http://www.xml-data.org/JSJYY/2017-6-1650.htm 苏志达, 祝跃飞, 刘龙摘要: 针对传统安卓恶意程序检测 ...
机器学习技法1-Linear Support Vector Machine
注: 文章中所有的图片均来自台湾大学林轩田<机器学习技法>课程. 笔记原作者:红色石头微信公众号:AI有道本系列分成16节课,将会介绍<机器学习基石>的进阶版<机器学 ...

What is SVM algorithm

文章目录

The prcface

Compare

Logical Regression

Decision tree

SVM

Analysis

Logistic Regression Analysis

Advantages

Disadvantages

Decision Tree Analysis

Advantages

Disadvantages

SVM Analysis

Advantages

Disadvantages

Adivces

Mathematrical principle of SVM

Target Function

Lagrange duality

Simplify

example

Soft interval

Kernel function

Coding

Data type

Core

forcast

All code

get data

Real case

What is SVM algorithm相关推荐

最新文章

热门文章