机器学习基础算法29-EM实践
文章目录
- 1.EM算法的实现
- 2.EM算法估算GMM的参数
- 3.GMM调参:covariance_type
- 4.EM算法无监督分类鸢尾花数据
- 5.GMM/DPGMM(贝叶斯高斯分布)比较
1.EM算法的实现
import numpy as np
from scipy.stats import multivariate_normal
from sklearn.mixture import GaussianMixture
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import pairwise_distances_argminmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falseif __name__ == '__main__':style = 'myself'np.random.seed(0)# 构造均值与方差mu1_fact = (0, 0, 0)# numpy.diag()返回一个矩阵的对角线元素,或者创建一个对角阵( diagonal array.)cov1_fact = np.diag((1, 2, 3))# 依据指定的均值和协方差生成高斯分布数据data1 = np.random.multivariate_normal(mu1_fact, cov1_fact, 400)# print(data1)mu2_fact = (2, 2, 1)cov2_fact = np.array(((1, 1, 3), (1, 2, 1), (0, 0, 1)))data2 = np.random.multivariate_normal(mu2_fact, cov2_fact, 100)# print(data2)# 纵向叠加矩阵data = np.vstack((data1, data2))# print(data)y = np.array([True] * 400 + [False] * 100)if style == 'sklearn':# 高斯分布# n_components :混合元素(聚类)的数量,默认为1# covariance_type:描述要使用的协方差参数类型的字符串,必选一个(‘full’ , ‘tied’, ‘diag’, ‘spherical’),默认为full,# full:每个混合元素有它公用的协方差矩阵;即表示不要求方差相等,不要求方差平行坐标轴# tol:float类型, 默认值: 0.001.收敛阈值,当平均增益低于这个值时迭代停止# max_iter:最大迭代次数,默认为100g = GaussianMixture(n_components=2, covariance_type='full', tol=1e-6, max_iter=1000)g.fit(data)# weights_ : array-like, shape (n_components,),每个混合元素权重# means_ : array-like, shape (n_components, n_features),每个混合元素均值# covariances_ : array-like,每个混合元素的协方差,它的形状依靠协方差类型print('类别概率:\t', g.weights_[1])print('均值:\n', g.means_, '\n')print('方差:\n', g.covariances_, '\n')mu1, mu2 = g.means_sigma1, sigma2 = g.covariances_# 自定义的基于高斯分布的EM算法else:num_iter = 100n, d = data.shape# 随机指定# mu1 = np.random.standard_normal(d)# print mu1# mu2 = np.random.standard_normal(d)# print mu2# 均值、方差、先验概率pimu1 = data.min(axis=0)mu2 = data.max(axis=0)# Numpy.identity()的document. 输入n为行数或列数,返回一个n*n的对角阵,对角线元素为1,其余为0sigma1 = np.identity(d)sigma2 = np.identity(d)pi = 0.5# EMfor i in range(num_iter):# E Step# 两个模型norm1 = multivariate_normal(mu1, sigma1)norm2 = multivariate_normal(mu2, sigma2)# 先验概率*概率密度tau1 = pi * norm1.pdf(data)tau2 = (1 - pi) * norm2.pdf(data)# ganma值,对于样本x_i,它由第一个组分生成的概率gamma = tau1 / (tau1 + tau2)# M Stepmu1 = np.dot(gamma, data) / np.sum(gamma)mu2 = np.dot((1 - gamma), data) / np.sum((1 - gamma))sigma1 = np.dot(gamma * (data - mu1).T, data - mu1) / np.sum(gamma)sigma2 = np.dot((1 - gamma) * (data - mu2).T, data - mu2) / np.sum(1 - gamma)pi = np.sum(gamma) / n# print(i, ":\t", mu1, mu2)print('类别概率:\t', pi)print('均值:\t', mu1, mu2)print('方差:\n', sigma1, '\n\n', sigma2, '\n')# 预测分类# multivariate_normal根据实际情况生成一个多元正态分布矩阵norm1 = multivariate_normal(mu1, sigma1)norm2 = multivariate_normal(mu2, sigma2)# 计算在data下的概率密度值tau1 = norm1.pdf(data)tau2 = norm2.pdf(data)# 面向对象fig = plt.figure(figsize=(13, 7), facecolor='w')# 3dax = fig.add_subplot(121, projection='3d')ax.scatter(data[:, 0], data[:, 1], data[:, 2], c='b', s=30, marker='o', depthshade=True)ax.set_xlabel('X')ax.set_ylabel('Y')ax.set_zlabel('Z')ax.set_title(u'原始数据', fontsize=18)ax = fig.add_subplot(122, projection='3d')# pairwise_distances_argmin使用欧几里得距离,返回的是X距离Y最近点的index,如果是[0,1]则没问题。如果[1,0]则反了order = pairwise_distances_argmin([mu1_fact, mu2_fact], [mu1, mu2], metric='euclidean')print(order) # [1 0]# 调整顺序if order[0] == 0:c1 = tau1 > tau2else:c1 = tau1 < tau2c2 = ~c1acc = np.mean(y == c1)print(u'准确率:%.2f%%' % (100*acc))ax.scatter(data[c1, 0], data[c1, 1], data[c1, 2], c='r', s=30, marker='o', depthshade=True)ax.scatter(data[c2, 0], data[c2, 1], data[c2, 2], c='g', s=30, marker='^', depthshade=True)ax.set_xlabel('X')ax.set_ylabel('Y')ax.set_zlabel('Z')ax.set_title(u'EM算法分类', fontsize=18)plt.suptitle(u'EM算法的实现', fontsize=21)plt.subplots_adjust(top=0.90)plt.tight_layout()plt.show()
类别概率: 0.7650337783291882
均值: [-0.123994 -0.02138048 -0.06003756] [1.9076683 1.79622192 1.11752474]
方差:[[ 0.82563399 -0.10180706 -0.0414597 ][-0.10180706 2.15816316 -0.16360603][-0.0414597 -0.16360603 2.79283956]] [[0.69690051 0.90370392 0.73552321][0.90370392 1.8856117 0.76747618][0.73552321 0.76747618 2.94819132]] [0 1]
准确率:89.80%
2.EM算法估算GMM的参数
import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt# 指定字体
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
# from matplotlib.font_manager import FontProperties
# font_set = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=15)
# fontproperties=font_setdef expand(a, b):d = (b - a) * 0.05return a-d, b+dif __name__ == '__main__':data = np.loadtxt('HeightWeight.csv', dtype=np.float, delimiter=',', skiprows=1)print(data.shape)y, x = np.split(data, [1, ], axis=1)x, x_test, y, y_test = train_test_split(x, y, train_size=0.6, random_state=0)# 高斯混合模型gmm = GaussianMixture(n_components=2, covariance_type='full', random_state=0)x_min = np.min(x, axis=0)x_max = np.max(x, axis=0)gmm.fit(x)print('均值 = \n', gmm.means_)print('方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)y_test_hat = gmm.predict(x_test)change = (gmm.means_[0][0] > gmm.means_[1][0])if change:z = y_hat == 0y_hat[z] = 1y_hat[~z] = 0z = y_test_hat == 0y_test_hat[z] = 1y_test_hat[~z] = 0acc = np.mean(y_hat.ravel() == y.ravel())acc_test = np.mean(y_test_hat.ravel() == y_test.ravel())acc_str = u'训练集准确率:%.2f%%' % (acc * 100)acc_test_str = u'测试集准确率:%.2f%%' % (acc_test * 100)print(acc_str)print(acc_test_str)# 样本颜色cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0'])# 图的背景色cm_dark = mpl.colors.ListedColormap(['r', 'g'])# 得到最小最大值并扩大x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)# np.mgrid生成等间隔数值点x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]# np.stack函数就是一个用于numpy数组堆叠的函数grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)if change:z = grid_hat == 0grid_hat[z] = 1grid_hat[~z] = 0plt.figure(figsize=(9, 7), facecolor='w')plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)# c=np.squeeze(Y)解决绘制散点图出现的RGBA sequence should have length 3 or 4的错误plt.scatter(x[:, 0], x[:, 1], s=50, c=np.squeeze(y), marker='o', cmap=cm_dark, edgecolors='k')plt.scatter(x_test[:, 0], x_test[:, 1], s=60, c=np.squeeze(y_test), marker='^', cmap=cm_dark, edgecolors='k')# predict_proba在给定数据的情况下,预测每个分量的后验概率p = gmm.predict_proba(grid_test)# 设置显示宽度np.set_printoptions(suppress=True)print(p)p = p[:, 0].reshape(x1.shape)# 绘制等值线CS = plt.contour(x1, x2, p, levels=(0.1, 0.5, 0.8), colors=list('rgb'), linewidths=2)plt.clabel(CS, fontsize=15, fmt='%.1f', inline=True)ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()xx = 0.9*ax1_min + 0.1*ax1_maxyy = 0.1*ax2_min + 0.9*ax2_maxplt.text(xx, yy, acc_str, fontsize=18)yy = 0.15*ax2_min + 0.85*ax2_maxplt.text(xx, yy, acc_test_str, fontsize=18)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.xlabel(u'身高(cm)', fontsize='large')plt.ylabel(u'体重(kg)', fontsize='large')plt.title(u'EM算法估算GMM的参数', fontsize=20)plt.grid()plt.show()
(114, 3)
均值 = [[160.13983374 55.93370575][173.50243688 65.03359308]]
方差 = [[[ 18.82128194 12.30370549][ 12.30370549 31.23596113]][[ 23.22794989 28.48688647][ 28.48688647 105.81824734]]]
训练集准确率:77.94%
测试集准确率:82.61%
[[0.99999775 0.00000225][0.99999784 0.00000216][0.99999792 0.00000208]...[0. 1. ][0. 1. ][0. 1. ]]
3.GMM调参:covariance_type
import numpy as np
from sklearn.mixture import GaussianMixture
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as pltmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falsedef expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+ddef accuracy_rate(y1, y2):acc = np.mean(y1 == y2)return acc if acc > 0.5 else 1-accif __name__ == '__main__':np.random.seed(0)#cov1 = np.diag((1, 2))print(cov1)N1 = 500N2 = 300N = N1 + N2# 两个多元高斯分布x1 = np.random.multivariate_normal(mean=(1, 2), cov=cov1, size=N1)m = np.array(((1, 1), (1, 3)))x1 = x1.dot(m)x2 = np.random.multivariate_normal(mean=(-1, 10), cov=cov1, size=N2)x = np.vstack((x1, x2))y = np.array([0]*N1 + [1]*N2)types = ('spherical', 'diag', 'tied', 'full')err = np.empty(len(types))bic = np.empty(len(types))for i, type in enumerate(types):gmm = GaussianMixture(n_components=2, covariance_type=type, random_state=0)gmm.fit(x)err[i] = 1 - accuracy_rate(gmm.predict(x), y)bic[i] = gmm.bic(x)print('错误率:', err.ravel())print('BIC:', bic.ravel())xpos = np.arange(4)plt.figure(facecolor='w')ax = plt.axes()b1 = ax.bar(xpos-0.3, err, width=0.3, color='#77E0A0')b2 = ax.twinx().bar(xpos, bic, width=0.3, color='#FF8080')plt.grid(True)bic_min, bic_max = expand(bic.min(), bic.max())plt.ylim((bic_min, bic_max))plt.xticks(xpos, types)plt.legend([b1[0], b2[0]], (u'错误率', u'BIC'))plt.title(u'不同方差类型的误差率和BIC', fontsize=18)plt.show()optimal = bic.argmin()gmm = GaussianMixture(n_components=2, covariance_type=types[optimal], random_state=0)gmm.fit(x)print('均值 = \n', gmm.means_)print('方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0'])cm_dark = mpl.colors.ListedColormap(['r', 'g'])x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)if gmm.means_[0][0] > gmm.means_[1][0]:z = grid_hat == 0grid_hat[z] = 1grid_hat[~z] = 0plt.figure(figsize=(9, 7), facecolor='w')plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, marker='o', cmap=cm_dark, edgecolors='k')ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title(u'GMM调参:covariance_type=%s' % types[optimal], fontsize=20)plt.grid()plt.show()
[[1 0][0 2]]
错误率: [0.385 0.315 0.3 0.00125]
BIC: [7990.71460065 7855.56050855 8006.49834359 6845.79374805]
均值 = [[ 2.88444448 6.69484552][-0.97642254 10.06927801]]
方差 = [[[ 2.87015473 6.64421303][ 6.64421303 18.00318872]][[ 0.91302546 -0.04298504][-0.04298504 1.9603531 ]]]
4.EM算法无监督分类鸢尾花数据
import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import pairwise_distances_argminmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falseiris_feature = u'花萼长度', u'花萼宽度', u'花瓣长度', u'花瓣宽度'def expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+dif __name__ == '__main__':path = 'iris.data'data = pd.read_csv(path, header=None)x_prime, y = data[np.arange(4)], data[4]y = pd.Categorical(y).codesn_components = 3feature_pairs = [[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]plt.figure(figsize=(10, 9), facecolor='#FFFFFF')for k, pair in enumerate(feature_pairs):x = x_prime[pair]m = np.array([np.mean(x[y == i], axis=0) for i in range(3)]) # 均值的实际值print('实际均值 = \n', m)gmm = GaussianMixture(n_components=n_components, covariance_type='full', random_state=0)gmm.fit(x)print('预测均值 = \n', gmm.means_)print('预测方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)# 使用欧几里得距离,返回的是X距离Y最近点的indexorder = pairwise_distances_argmin(m, gmm.means_, axis=1, metric='euclidean')print('顺序:\t', order)# 变换顺序的方法n_sample = y.sizen_types = 3change = np.empty((n_types, n_sample), dtype=np.bool)for i in range(n_types):change[i] = y_hat == order[i]for i in range(n_types):y_hat[change[i]] = iacc = u'准确率:%.2f%%' % (100*np.mean(y_hat == y))print(acc)cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0', '#A0A0FF'])cm_dark = mpl.colors.ListedColormap(['r', 'g', '#6060FF'])x1_min, x2_min = x.min()x1_max, x2_max = x.max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)change = np.empty((n_types, grid_hat.size), dtype=np.bool)for i in range(n_types):change[i] = grid_hat == order[i]for i in range(n_types):grid_hat[change[i]] = igrid_hat = grid_hat.reshape(x1.shape)plt.subplot(3, 2, k+1)plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)plt.scatter(x[pair[0]], x[pair[1]], s=30, c=y, marker='o', cmap=cm_dark, edgecolors='k')xx = 0.95 * x1_min + 0.05 * x1_maxyy = 0.1 * x2_min + 0.9 * x2_maxplt.text(xx, yy, acc, fontsize=14)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.xlabel(iris_feature[pair[0]], fontsize=14)plt.ylabel(iris_feature[pair[1]], fontsize=14)plt.grid()plt.tight_layout(2)plt.suptitle(u'EM算法无监督分类鸢尾花数据', fontsize=20)plt.subplots_adjust(top=0.92)plt.show()
实际均值 = [[5.006 3.418][5.936 2.77 ][6.588 2.974]]
预测均值 = [[5.01493896 3.4404862 ][6.6814044 3.0285628 ][5.90114537 2.74385294]]
预测方差 = [[[0.1194876 0.08969867][0.08969867 0.12147459]][[0.36087007 0.05158991][0.05158991 0.08923683]][[0.27544608 0.08866062][0.08866062 0.09382524]]]
顺序: [0 2 1]
准确率:79.33%
实际均值 = [[5.006 1.464][5.936 4.26 ][6.588 5.552]]
预测均值 = [[5.0060006 1.46399865][6.58888904 5.63329718][6.04240777 4.41742864]]
预测方差 = [[[0.12176525 0.01581631][0.01581631 0.0295045 ]][[0.48521779 0.36602418][0.36602418 0.32601109]][[0.28119672 0.23746926][0.23746926 0.31503012]]]
顺序: [0 2 1]
准确率:91.33%
实际均值 = [[5.006 0.244][5.936 1.326][6.588 2.026]]
预测均值 = [[5.00607264 0.23754806][6.56291563 2.02408174][5.94928821 1.32089151]]
预测方差 = [[[0.1239802 0.01055412][0.01055412 0.00914172]][[0.41146765 0.0558402 ][0.0558402 0.07689828]][[0.29249903 0.07999787][0.07999787 0.0509237 ]]]
顺序: [0 2 1]
准确率:96.00%
实际均值 = [[3.418 1.464][2.77 4.26 ][2.974 5.552]]
预测均值 = [[3.41800009 1.46400001][2.80062882 4.43004172][2.97017899 5.56073357]]
预测方差 = [[[0.14227691 0.01144799][0.01144799 0.029505 ]][[0.09376548 0.10702236][0.10702236 0.34454954]][[0.11477629 0.07760424][0.07760424 0.38871245]]]
顺序: [0 1 2]
准确率:92.67%
实际均值 = [[3.418 0.244][2.77 1.326][2.974 2.026]]
预测均值 = [[3.41800003 0.244 ][2.93629236 1.98607968][2.79657869 1.31224583]]
预测方差 = [[[0.14227697 0.011208 ][0.011208 0.011265 ]][[0.11263095 0.06192916][0.06192916 0.08966439]][[0.09554395 0.04869984][0.04869984 0.03787478]]]
顺序: [0 2 1]
准确率:93.33%
实际均值 = [[1.464 0.244][4.26 1.326][5.552 2.026]]
预测均值 = [[1.46399926 0.24399973][4.32760641 1.36230588][5.60395923 2.0545418 ]]
预测方差 = [[[0.02950475 0.00558391][0.00558391 0.01126496]][[0.25410036 0.09152045][0.09152045 0.05088321]][[0.29156635 0.03719346][0.03719346 0.07073705]]]
顺序: [0 1 2]
准确率:97.33%
5.GMM/DPGMM(贝叶斯高斯分布)比较
import numpy as np
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
import scipy as sp
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipsedef expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+dmatplotlib.rcParams['font.sans-serif'] = [u'SimHei']
matplotlib.rcParams['axes.unicode_minus'] = Falseif __name__ == '__main__':np.random.seed(0)cov1 = np.diag((1, 2))N1 = 500N2 = 300N = N1 + N2x1 = np.random.multivariate_normal(mean=(3, 2), cov=cov1, size=N1)m = np.array(((1, 1), (1, 3)))x1 = x1.dot(m)x2 = np.random.multivariate_normal(mean=(-1, 10), cov=cov1, size=N2)x = np.vstack((x1, x2))y = np.array([0]*N1 + [1]*N2)n_components = 3# 绘图使用colors = '#A0FFA0', '#2090E0', '#FF8080'cm = mpl.colors.ListedColormap(colors)x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)plt.figure(figsize=(9, 9), facecolor='w')plt.suptitle(u'GMM/DPGMM比较', fontsize=23)ax = plt.subplot(211)gmm = GaussianMixture(n_components=n_components, covariance_type='full', random_state=0)gmm.fit(x)centers = gmm.means_covs = gmm.covariances_print('GMM均值 = \n', centers)print('GMM方差 = \n', covs)y_hat = gmm.predict(x)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)plt.pcolormesh(x1, x2, grid_hat, cmap=cm)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, cmap=cm, marker='o')clrs = list('rgbmy')for i, (center, cov) in enumerate(zip(centers, covs)):value, vector = sp.linalg.eigh(cov)width, height = value[0], value[1]v = vector[0] / sp.linalg.norm(vector[0])angle = 180* np.arctan(v[1] / v[0]) / np.pie = Ellipse(xy=center, width=width, height=height,angle=angle, color=clrs[i], alpha=0.5, clip_box = ax.bbox)ax.add_artist(e)ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title(u'GMM', fontsize=20)plt.grid(True)# DPGMMdpgmm = BayesianGaussianMixture(n_components=n_components, covariance_type='full', max_iter=1000, n_init=5,weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=0.1)dpgmm.fit(x)centers = dpgmm.means_covs = dpgmm.covariances_print('DPGMM均值 = \n', centers)print('DPGMM方差 = \n', covs)y_hat = dpgmm.predict(x)print(y_hat)ax = plt.subplot(212)grid_hat = dpgmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)plt.pcolormesh(x1, x2, grid_hat, cmap=cm)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, cmap=cm, marker='o')for i, cc in enumerate(zip(centers, covs)):if i not in y_hat:continuecenter, cov = ccvalue, vector = sp.linalg.eigh(cov)width, height = value[0], value[1]v = vector[0] / sp.linalg.norm(vector[0])angle = 180* np.arctan(v[1] / v[0]) / np.pie = Ellipse(xy=center, width=width, height=height,angle=angle, color='m', alpha=0.5, clip_box = ax.bbox)ax.add_artist(e)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title('DPGMM', fontsize=20)plt.grid(True)plt.tight_layout()plt.subplots_adjust(top=0.9)plt.show()
GMM均值 = [[ 3.77430768 5.86579463][ 6.0239399 11.61448122][-0.98543679 10.0756839 ]]
GMM方差 = [[[ 1.5383593 3.21210121][ 3.21210121 9.04107582]][[ 1.6667472 3.58655076][ 3.58655076 10.40673433]][[ 0.89079177 -0.02572518][-0.02572518 1.95106592]]]
DPGMM均值 = [[ 4.87807808 8.69858646][-0.97320511 10.07279749][ 2.68191465 9.21436833]]
DPGMM方差 = [[[ 2.88507577 6.60477348][ 6.60477348 17.92760296]][[ 0.9632641 -0.02865042][-0.02865042 1.98179578]][[ 5.104264 1.12039777][ 1.12039777 6.21286898]]]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
机器学习基础算法29-EM实践相关推荐
- 机器学习基础算法之随机森林
英文原文<The Random Forest Algorithm> 专知 编译<机器学习基础算法之随机森林> [导读]在当今深度学习如此火热的背景下,其他基础的机器学习算法显得 ...
- 小白机器学习基础算法学习必经之路
https://www.toutiao.com/a6657427848900379150/ 2019-02-14 15:21:13 未来,人工智能是生产力,是变革社会的主要技术力量之一. 掌握人工智能 ...
- 深度学习(二) 神经网络基础算法推导与实践
深度学习的核心就是各种不同的神经网络模型(CNN.RNN.GCN.GNN等)的学习和训练过程.这些神经网络模型的共同点都是一个"黑盒子",通过一定的学习算法将大量数据交给模型训练, ...
- 【机器学习】十大机器学习基础算法
十大机器学习算法入门 近年来,机器学习与人工智能已广泛应用于学术与工程,比如数据挖掘.计算机视觉.自然语言处理.生物特征识别.搜索引擎.医学诊断.检测信用卡欺诈.证券市场分析.DNA序列测序.语音和手 ...
- 机器学习--基础算法--机器学习基础
1 机器学习世界的数据 1.数据 数据整体叫数据集(data set) 每一行数据称为一个样本(sample) 除最后一列,每一列表达样本的一个特征(eature) 最后一列,称为标记(label) ...
- 机器学习基础算法概述
机器学习算法大致可以分为三类: 监督学习算法 (Supervised Algorithms):在监督学习训练过程中,可以由训练数据集学到或建立一个模式(函数 / learning model),并依此 ...
- 机器学习基础算法四:逻辑回归算法实验
逻辑回归算法实验 一.逻辑回归介绍 逻辑回归是一种分类模型 z=WTX=w0+w1x1+w2x2+......+wnxnz =W^{T}X=w^{0}+w^{1}x^{1}+w^{2}x^{2}+.. ...
- 机器学习基础算法(2)
2.分类算法-k-邻近算法 分类算法的判定依据:目标值是离散型的 2.1 k-邻近算法基本概念.原理及应用 k近邻算法是一种基本分类和回归方法.本篇文章只讨论分类问题的k近邻法. K近邻算法,即是给定 ...
- 20/03/07 机器学习---基础算法 (2)
回归 线性回归 使用极大似然估计解释最小二乘 y(i)=θTx(i)+ε(i)y^{(i)}=\theta^Tx^{(i)}+\varepsilon^{(i)}y(i)=θTx(i)+ε(i) 误差ε ...
- 【机器学习基础】数学推导+纯Python实现机器学习算法27:EM算法
Python机器学习算法实现 Author:louwill Machine Learning Lab 从本篇开始,整个机器学习系列还剩下最后三篇涉及导概率模型的文章,分别是EM算法.CRF条件随机场和 ...
最新文章
- 简洁好用的数据库表结构文档生成工具!
- 多种分布式文件系统简介
- cf1009F. Dominant Indices
- php 大批量的删除图片,PHP批量删除记录同时删除图片文件
- Java笔记-concurrent集合及线程池
- canvas路径剪切和判断是否在路径内
- TimeSpan 用法 求离最近发表时间的函数
- java 改像素不改尺寸_如何不改变分辨率的情况下缩小尺寸PNG图片
- SRv6技术课堂(一):SRv6概述
- atitit uke产品线 attilax总结.docx 1. 知识聚合 知识检索 产品线	1 2. 爬虫产品线	1 3. 发帖机产品线	1 4. 发动机产品线	1 5. O2o产品线(旅游
- java实习周记_计算机java开发实习周记20篇
- 硬件芯片----74HC595芯片的运用原理
- 淘宝天猫融合能拉回“出淘”的用户吗?
- 甲骨文CEO独家揭秘企业转型秘诀,就一个字
- 商店英雄显示无法连接服务器,商店英雄攻略 新手常见问题FAQ汇总[视频][多图]...
- ps水花飞溅效果制作
- 电视端虚拟鼠标的设计
- Java 使用Reactive Redis
- 关于 .NET Core(.NET Core 指南)
- Android+jenkins自动打包教程
热门文章
- Android 拦截TextView中超链接点击事件
- java day10【接口、多态】
- wampserver下修改mysql root用户的登录密码
- 两个栈实现队列 以及两个队列实现栈
- 杭电 看归并排序和快速排序
- MacOS平台上编译 hadoop 3.1.2 源码
- jqgrid 使用小记——与springboot jpa 一起使用的分页,翻页。(使用springboot jpa 原生的分页)...
- Java中多态的一些简单理解
- Hadoop平台简述
- Rational Rose 2003 逆向工程转换C++ / VC++ 6.0源代码成UML类图