文章目录

  • 1.EM算法的实现
  • 2.EM算法估算GMM的参数
  • 3.GMM调参:covariance_type
  • 4.EM算法无监督分类鸢尾花数据
  • 5.GMM/DPGMM(贝叶斯高斯分布)比较

1.EM算法的实现

import numpy as np
from scipy.stats import multivariate_normal
from sklearn.mixture import GaussianMixture
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import pairwise_distances_argminmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falseif __name__ == '__main__':style = 'myself'np.random.seed(0)# 构造均值与方差mu1_fact = (0, 0, 0)# numpy.diag()返回一个矩阵的对角线元素,或者创建一个对角阵( diagonal array.)cov1_fact = np.diag((1, 2, 3))# 依据指定的均值和协方差生成高斯分布数据data1 = np.random.multivariate_normal(mu1_fact, cov1_fact, 400)# print(data1)mu2_fact = (2, 2, 1)cov2_fact = np.array(((1, 1, 3), (1, 2, 1), (0, 0, 1)))data2 = np.random.multivariate_normal(mu2_fact, cov2_fact, 100)# print(data2)# 纵向叠加矩阵data = np.vstack((data1, data2))# print(data)y = np.array([True] * 400 + [False] * 100)if style == 'sklearn':# 高斯分布# n_components :混合元素(聚类)的数量,默认为1# covariance_type:描述要使用的协方差参数类型的字符串,必选一个(‘full’ , ‘tied’, ‘diag’, ‘spherical’),默认为full,# full:每个混合元素有它公用的协方差矩阵;即表示不要求方差相等,不要求方差平行坐标轴# tol:float类型, 默认值: 0.001.收敛阈值,当平均增益低于这个值时迭代停止# max_iter:最大迭代次数,默认为100g = GaussianMixture(n_components=2, covariance_type='full', tol=1e-6, max_iter=1000)g.fit(data)# weights_ : array-like, shape (n_components,),每个混合元素权重# means_ : array-like, shape (n_components, n_features),每个混合元素均值# covariances_ : array-like,每个混合元素的协方差,它的形状依靠协方差类型print('类别概率:\t', g.weights_[1])print('均值:\n', g.means_, '\n')print('方差:\n', g.covariances_, '\n')mu1, mu2 = g.means_sigma1, sigma2 = g.covariances_# 自定义的基于高斯分布的EM算法else:num_iter = 100n, d = data.shape# 随机指定# mu1 = np.random.standard_normal(d)# print mu1# mu2 = np.random.standard_normal(d)# print mu2# 均值、方差、先验概率pimu1 = data.min(axis=0)mu2 = data.max(axis=0)# Numpy.identity()的document. 输入n为行数或列数,返回一个n*n的对角阵,对角线元素为1,其余为0sigma1 = np.identity(d)sigma2 = np.identity(d)pi = 0.5# EMfor i in range(num_iter):# E Step# 两个模型norm1 = multivariate_normal(mu1, sigma1)norm2 = multivariate_normal(mu2, sigma2)# 先验概率*概率密度tau1 = pi * norm1.pdf(data)tau2 = (1 - pi) * norm2.pdf(data)# ganma值,对于样本x_i,它由第一个组分生成的概率gamma = tau1 / (tau1 + tau2)# M Stepmu1 = np.dot(gamma, data) / np.sum(gamma)mu2 = np.dot((1 - gamma), data) / np.sum((1 - gamma))sigma1 = np.dot(gamma * (data - mu1).T, data - mu1) / np.sum(gamma)sigma2 = np.dot((1 - gamma) * (data - mu2).T, data - mu2) / np.sum(1 - gamma)pi = np.sum(gamma) / n# print(i, ":\t", mu1, mu2)print('类别概率:\t', pi)print('均值:\t', mu1, mu2)print('方差:\n', sigma1, '\n\n', sigma2, '\n')# 预测分类# multivariate_normal根据实际情况生成一个多元正态分布矩阵norm1 = multivariate_normal(mu1, sigma1)norm2 = multivariate_normal(mu2, sigma2)# 计算在data下的概率密度值tau1 = norm1.pdf(data)tau2 = norm2.pdf(data)# 面向对象fig = plt.figure(figsize=(13, 7), facecolor='w')# 3dax = fig.add_subplot(121, projection='3d')ax.scatter(data[:, 0], data[:, 1], data[:, 2], c='b', s=30, marker='o', depthshade=True)ax.set_xlabel('X')ax.set_ylabel('Y')ax.set_zlabel('Z')ax.set_title(u'原始数据', fontsize=18)ax = fig.add_subplot(122, projection='3d')# pairwise_distances_argmin使用欧几里得距离,返回的是X距离Y最近点的index,如果是[0,1]则没问题。如果[1,0]则反了order = pairwise_distances_argmin([mu1_fact, mu2_fact], [mu1, mu2], metric='euclidean')print(order) # [1 0]# 调整顺序if order[0] == 0:c1 = tau1 > tau2else:c1 = tau1 < tau2c2 = ~c1acc = np.mean(y == c1)print(u'准确率:%.2f%%' % (100*acc))ax.scatter(data[c1, 0], data[c1, 1], data[c1, 2], c='r', s=30, marker='o', depthshade=True)ax.scatter(data[c2, 0], data[c2, 1], data[c2, 2], c='g', s=30, marker='^', depthshade=True)ax.set_xlabel('X')ax.set_ylabel('Y')ax.set_zlabel('Z')ax.set_title(u'EM算法分类', fontsize=18)plt.suptitle(u'EM算法的实现', fontsize=21)plt.subplots_adjust(top=0.90)plt.tight_layout()plt.show()
类别概率:     0.7650337783291882
均值:  [-0.123994   -0.02138048 -0.06003756] [1.9076683  1.79622192 1.11752474]
方差:[[ 0.82563399 -0.10180706 -0.0414597 ][-0.10180706  2.15816316 -0.16360603][-0.0414597  -0.16360603  2.79283956]] [[0.69690051 0.90370392 0.73552321][0.90370392 1.8856117  0.76747618][0.73552321 0.76747618 2.94819132]] [0 1]
准确率:89.80%

2.EM算法估算GMM的参数

import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt# 指定字体
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
# from matplotlib.font_manager import FontProperties
# font_set = FontProperties(fname=r"c:\windows\fonts\simsun.ttc", size=15)
# fontproperties=font_setdef expand(a, b):d = (b - a) * 0.05return a-d, b+dif __name__ == '__main__':data = np.loadtxt('HeightWeight.csv', dtype=np.float, delimiter=',', skiprows=1)print(data.shape)y, x = np.split(data, [1, ], axis=1)x, x_test, y, y_test = train_test_split(x, y, train_size=0.6, random_state=0)# 高斯混合模型gmm = GaussianMixture(n_components=2, covariance_type='full', random_state=0)x_min = np.min(x, axis=0)x_max = np.max(x, axis=0)gmm.fit(x)print('均值 = \n', gmm.means_)print('方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)y_test_hat = gmm.predict(x_test)change = (gmm.means_[0][0] > gmm.means_[1][0])if change:z = y_hat == 0y_hat[z] = 1y_hat[~z] = 0z = y_test_hat == 0y_test_hat[z] = 1y_test_hat[~z] = 0acc = np.mean(y_hat.ravel() == y.ravel())acc_test = np.mean(y_test_hat.ravel() == y_test.ravel())acc_str = u'训练集准确率:%.2f%%' % (acc * 100)acc_test_str = u'测试集准确率:%.2f%%' % (acc_test * 100)print(acc_str)print(acc_test_str)# 样本颜色cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0'])# 图的背景色cm_dark = mpl.colors.ListedColormap(['r', 'g'])# 得到最小最大值并扩大x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)# np.mgrid生成等间隔数值点x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]# np.stack函数就是一个用于numpy数组堆叠的函数grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)if change:z = grid_hat == 0grid_hat[z] = 1grid_hat[~z] = 0plt.figure(figsize=(9, 7), facecolor='w')plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)# c=np.squeeze(Y)解决绘制散点图出现的RGBA sequence should have length 3 or 4的错误plt.scatter(x[:, 0], x[:, 1], s=50, c=np.squeeze(y), marker='o', cmap=cm_dark, edgecolors='k')plt.scatter(x_test[:, 0], x_test[:, 1], s=60, c=np.squeeze(y_test), marker='^', cmap=cm_dark, edgecolors='k')# predict_proba在给定数据的情况下,预测每个分量的后验概率p = gmm.predict_proba(grid_test)# 设置显示宽度np.set_printoptions(suppress=True)print(p)p = p[:, 0].reshape(x1.shape)# 绘制等值线CS = plt.contour(x1, x2, p, levels=(0.1, 0.5, 0.8), colors=list('rgb'), linewidths=2)plt.clabel(CS, fontsize=15, fmt='%.1f', inline=True)ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()xx = 0.9*ax1_min + 0.1*ax1_maxyy = 0.1*ax2_min + 0.9*ax2_maxplt.text(xx, yy, acc_str, fontsize=18)yy = 0.15*ax2_min + 0.85*ax2_maxplt.text(xx, yy, acc_test_str, fontsize=18)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.xlabel(u'身高(cm)', fontsize='large')plt.ylabel(u'体重(kg)', fontsize='large')plt.title(u'EM算法估算GMM的参数', fontsize=20)plt.grid()plt.show()
(114, 3)
均值 = [[160.13983374  55.93370575][173.50243688  65.03359308]]
方差 = [[[ 18.82128194  12.30370549][ 12.30370549  31.23596113]][[ 23.22794989  28.48688647][ 28.48688647 105.81824734]]]
训练集准确率:77.94%
测试集准确率:82.61%
[[0.99999775 0.00000225][0.99999784 0.00000216][0.99999792 0.00000208]...[0.         1.        ][0.         1.        ][0.         1.        ]]

3.GMM调参:covariance_type

import numpy as np
from sklearn.mixture import GaussianMixture
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as pltmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falsedef expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+ddef accuracy_rate(y1, y2):acc = np.mean(y1 == y2)return acc if acc > 0.5 else 1-accif __name__ == '__main__':np.random.seed(0)#cov1 = np.diag((1, 2))print(cov1)N1 = 500N2 = 300N = N1 + N2# 两个多元高斯分布x1 = np.random.multivariate_normal(mean=(1, 2), cov=cov1, size=N1)m = np.array(((1, 1), (1, 3)))x1 = x1.dot(m)x2 = np.random.multivariate_normal(mean=(-1, 10), cov=cov1, size=N2)x = np.vstack((x1, x2))y = np.array([0]*N1 + [1]*N2)types = ('spherical', 'diag', 'tied', 'full')err = np.empty(len(types))bic = np.empty(len(types))for i, type in enumerate(types):gmm = GaussianMixture(n_components=2, covariance_type=type, random_state=0)gmm.fit(x)err[i] = 1 - accuracy_rate(gmm.predict(x), y)bic[i] = gmm.bic(x)print('错误率:', err.ravel())print('BIC:', bic.ravel())xpos = np.arange(4)plt.figure(facecolor='w')ax = plt.axes()b1 = ax.bar(xpos-0.3, err, width=0.3, color='#77E0A0')b2 = ax.twinx().bar(xpos, bic, width=0.3, color='#FF8080')plt.grid(True)bic_min, bic_max = expand(bic.min(), bic.max())plt.ylim((bic_min, bic_max))plt.xticks(xpos, types)plt.legend([b1[0], b2[0]], (u'错误率', u'BIC'))plt.title(u'不同方差类型的误差率和BIC', fontsize=18)plt.show()optimal = bic.argmin()gmm = GaussianMixture(n_components=2, covariance_type=types[optimal], random_state=0)gmm.fit(x)print('均值 = \n', gmm.means_)print('方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0'])cm_dark = mpl.colors.ListedColormap(['r', 'g'])x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)if gmm.means_[0][0] > gmm.means_[1][0]:z = grid_hat == 0grid_hat[z] = 1grid_hat[~z] = 0plt.figure(figsize=(9, 7), facecolor='w')plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, marker='o', cmap=cm_dark, edgecolors='k')ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title(u'GMM调参:covariance_type=%s' % types[optimal], fontsize=20)plt.grid()plt.show()
[[1 0][0 2]]
错误率: [0.385   0.315   0.3     0.00125]
BIC: [7990.71460065 7855.56050855 8006.49834359 6845.79374805]
均值 = [[ 2.88444448  6.69484552][-0.97642254 10.06927801]]
方差 = [[[ 2.87015473  6.64421303][ 6.64421303 18.00318872]][[ 0.91302546 -0.04298504][-0.04298504  1.9603531 ]]]


4.EM算法无监督分类鸢尾花数据

import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import pairwise_distances_argminmpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = Falseiris_feature = u'花萼长度', u'花萼宽度', u'花瓣长度', u'花瓣宽度'def expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+dif __name__ == '__main__':path = 'iris.data'data = pd.read_csv(path, header=None)x_prime, y = data[np.arange(4)], data[4]y = pd.Categorical(y).codesn_components = 3feature_pairs = [[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]plt.figure(figsize=(10, 9), facecolor='#FFFFFF')for k, pair in enumerate(feature_pairs):x = x_prime[pair]m = np.array([np.mean(x[y == i], axis=0) for i in range(3)])  # 均值的实际值print('实际均值 = \n', m)gmm = GaussianMixture(n_components=n_components, covariance_type='full', random_state=0)gmm.fit(x)print('预测均值 = \n', gmm.means_)print('预测方差 = \n', gmm.covariances_)y_hat = gmm.predict(x)# 使用欧几里得距离,返回的是X距离Y最近点的indexorder = pairwise_distances_argmin(m, gmm.means_, axis=1, metric='euclidean')print('顺序:\t', order)# 变换顺序的方法n_sample = y.sizen_types = 3change = np.empty((n_types, n_sample), dtype=np.bool)for i in range(n_types):change[i] = y_hat == order[i]for i in range(n_types):y_hat[change[i]] = iacc = u'准确率:%.2f%%' % (100*np.mean(y_hat == y))print(acc)cm_light = mpl.colors.ListedColormap(['#FF8080', '#77E0A0', '#A0A0FF'])cm_dark = mpl.colors.ListedColormap(['r', 'g', '#6060FF'])x1_min, x2_min = x.min()x1_max, x2_max = x.max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)grid_hat = gmm.predict(grid_test)change = np.empty((n_types, grid_hat.size), dtype=np.bool)for i in range(n_types):change[i] = grid_hat == order[i]for i in range(n_types):grid_hat[change[i]] = igrid_hat = grid_hat.reshape(x1.shape)plt.subplot(3, 2, k+1)plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)plt.scatter(x[pair[0]], x[pair[1]], s=30, c=y, marker='o', cmap=cm_dark, edgecolors='k')xx = 0.95 * x1_min + 0.05 * x1_maxyy = 0.1 * x2_min + 0.9 * x2_maxplt.text(xx, yy, acc, fontsize=14)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.xlabel(iris_feature[pair[0]], fontsize=14)plt.ylabel(iris_feature[pair[1]], fontsize=14)plt.grid()plt.tight_layout(2)plt.suptitle(u'EM算法无监督分类鸢尾花数据', fontsize=20)plt.subplots_adjust(top=0.92)plt.show()
实际均值 = [[5.006 3.418][5.936 2.77 ][6.588 2.974]]
预测均值 = [[5.01493896 3.4404862 ][6.6814044  3.0285628 ][5.90114537 2.74385294]]
预测方差 = [[[0.1194876  0.08969867][0.08969867 0.12147459]][[0.36087007 0.05158991][0.05158991 0.08923683]][[0.27544608 0.08866062][0.08866062 0.09382524]]]
顺序:   [0 2 1]
准确率:79.33%
实际均值 = [[5.006 1.464][5.936 4.26 ][6.588 5.552]]
预测均值 = [[5.0060006  1.46399865][6.58888904 5.63329718][6.04240777 4.41742864]]
预测方差 = [[[0.12176525 0.01581631][0.01581631 0.0295045 ]][[0.48521779 0.36602418][0.36602418 0.32601109]][[0.28119672 0.23746926][0.23746926 0.31503012]]]
顺序:   [0 2 1]
准确率:91.33%
实际均值 = [[5.006 0.244][5.936 1.326][6.588 2.026]]
预测均值 = [[5.00607264 0.23754806][6.56291563 2.02408174][5.94928821 1.32089151]]
预测方差 = [[[0.1239802  0.01055412][0.01055412 0.00914172]][[0.41146765 0.0558402 ][0.0558402  0.07689828]][[0.29249903 0.07999787][0.07999787 0.0509237 ]]]
顺序:   [0 2 1]
准确率:96.00%
实际均值 = [[3.418 1.464][2.77  4.26 ][2.974 5.552]]
预测均值 = [[3.41800009 1.46400001][2.80062882 4.43004172][2.97017899 5.56073357]]
预测方差 = [[[0.14227691 0.01144799][0.01144799 0.029505  ]][[0.09376548 0.10702236][0.10702236 0.34454954]][[0.11477629 0.07760424][0.07760424 0.38871245]]]
顺序:   [0 1 2]
准确率:92.67%
实际均值 = [[3.418 0.244][2.77  1.326][2.974 2.026]]
预测均值 = [[3.41800003 0.244     ][2.93629236 1.98607968][2.79657869 1.31224583]]
预测方差 = [[[0.14227697 0.011208  ][0.011208   0.011265  ]][[0.11263095 0.06192916][0.06192916 0.08966439]][[0.09554395 0.04869984][0.04869984 0.03787478]]]
顺序:   [0 2 1]
准确率:93.33%
实际均值 = [[1.464 0.244][4.26  1.326][5.552 2.026]]
预测均值 = [[1.46399926 0.24399973][4.32760641 1.36230588][5.60395923 2.0545418 ]]
预测方差 = [[[0.02950475 0.00558391][0.00558391 0.01126496]][[0.25410036 0.09152045][0.09152045 0.05088321]][[0.29156635 0.03719346][0.03719346 0.07073705]]]
顺序:   [0 1 2]
准确率:97.33%

5.GMM/DPGMM(贝叶斯高斯分布)比较

import numpy as np
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
import scipy as sp
import matplotlib as mpl
import matplotlib.colors
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipsedef expand(a, b, rate=0.05):d = (b - a) * ratereturn a-d, b+dmatplotlib.rcParams['font.sans-serif'] = [u'SimHei']
matplotlib.rcParams['axes.unicode_minus'] = Falseif __name__ == '__main__':np.random.seed(0)cov1 = np.diag((1, 2))N1 = 500N2 = 300N = N1 + N2x1 = np.random.multivariate_normal(mean=(3, 2), cov=cov1, size=N1)m = np.array(((1, 1), (1, 3)))x1 = x1.dot(m)x2 = np.random.multivariate_normal(mean=(-1, 10), cov=cov1, size=N2)x = np.vstack((x1, x2))y = np.array([0]*N1 + [1]*N2)n_components = 3# 绘图使用colors = '#A0FFA0', '#2090E0', '#FF8080'cm = mpl.colors.ListedColormap(colors)x1_min, x1_max = x[:, 0].min(), x[:, 0].max()x2_min, x2_max = x[:, 1].min(), x[:, 1].max()x1_min, x1_max = expand(x1_min, x1_max)x2_min, x2_max = expand(x2_min, x2_max)x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]grid_test = np.stack((x1.flat, x2.flat), axis=1)plt.figure(figsize=(9, 9), facecolor='w')plt.suptitle(u'GMM/DPGMM比较', fontsize=23)ax = plt.subplot(211)gmm = GaussianMixture(n_components=n_components, covariance_type='full', random_state=0)gmm.fit(x)centers = gmm.means_covs = gmm.covariances_print('GMM均值 = \n', centers)print('GMM方差 = \n', covs)y_hat = gmm.predict(x)grid_hat = gmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)plt.pcolormesh(x1, x2, grid_hat, cmap=cm)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, cmap=cm, marker='o')clrs = list('rgbmy')for i, (center, cov) in enumerate(zip(centers, covs)):value, vector = sp.linalg.eigh(cov)width, height = value[0], value[1]v = vector[0] / sp.linalg.norm(vector[0])angle = 180* np.arctan(v[1] / v[0]) / np.pie = Ellipse(xy=center, width=width, height=height,angle=angle, color=clrs[i], alpha=0.5, clip_box = ax.bbox)ax.add_artist(e)ax1_min, ax1_max, ax2_min, ax2_max = plt.axis()plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title(u'GMM', fontsize=20)plt.grid(True)# DPGMMdpgmm = BayesianGaussianMixture(n_components=n_components, covariance_type='full', max_iter=1000, n_init=5,weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=0.1)dpgmm.fit(x)centers = dpgmm.means_covs = dpgmm.covariances_print('DPGMM均值 = \n', centers)print('DPGMM方差 = \n', covs)y_hat = dpgmm.predict(x)print(y_hat)ax = plt.subplot(212)grid_hat = dpgmm.predict(grid_test)grid_hat = grid_hat.reshape(x1.shape)plt.pcolormesh(x1, x2, grid_hat, cmap=cm)plt.scatter(x[:, 0], x[:, 1], s=30, c=y, cmap=cm, marker='o')for i, cc in enumerate(zip(centers, covs)):if i not in y_hat:continuecenter, cov = ccvalue, vector = sp.linalg.eigh(cov)width, height = value[0], value[1]v = vector[0] / sp.linalg.norm(vector[0])angle = 180* np.arctan(v[1] / v[0]) / np.pie = Ellipse(xy=center, width=width, height=height,angle=angle, color='m', alpha=0.5, clip_box = ax.bbox)ax.add_artist(e)plt.xlim((x1_min, x1_max))plt.ylim((x2_min, x2_max))plt.title('DPGMM', fontsize=20)plt.grid(True)plt.tight_layout()plt.subplots_adjust(top=0.9)plt.show()
GMM均值 = [[ 3.77430768  5.86579463][ 6.0239399  11.61448122][-0.98543679 10.0756839 ]]
GMM方差 = [[[ 1.5383593   3.21210121][ 3.21210121  9.04107582]][[ 1.6667472   3.58655076][ 3.58655076 10.40673433]][[ 0.89079177 -0.02572518][-0.02572518  1.95106592]]]
DPGMM均值 = [[ 4.87807808  8.69858646][-0.97320511 10.07279749][ 2.68191465  9.21436833]]
DPGMM方差 = [[[ 2.88507577  6.60477348][ 6.60477348 17.92760296]][[ 0.9632641  -0.02865042][-0.02865042  1.98179578]][[ 5.104264    1.12039777][ 1.12039777  6.21286898]]]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

机器学习基础算法29-EM实践相关推荐

  1. 机器学习基础算法之随机森林

    英文原文<The Random Forest Algorithm> 专知 编译<机器学习基础算法之随机森林> [导读]在当今深度学习如此火热的背景下,其他基础的机器学习算法显得 ...

  2. 小白机器学习基础算法学习必经之路

    https://www.toutiao.com/a6657427848900379150/ 2019-02-14 15:21:13 未来,人工智能是生产力,是变革社会的主要技术力量之一. 掌握人工智能 ...

  3. 深度学习(二) 神经网络基础算法推导与实践

    深度学习的核心就是各种不同的神经网络模型(CNN.RNN.GCN.GNN等)的学习和训练过程.这些神经网络模型的共同点都是一个"黑盒子",通过一定的学习算法将大量数据交给模型训练, ...

  4. 【机器学习】十大机器学习基础算法

    十大机器学习算法入门 近年来,机器学习与人工智能已广泛应用于学术与工程,比如数据挖掘.计算机视觉.自然语言处理.生物特征识别.搜索引擎.医学诊断.检测信用卡欺诈.证券市场分析.DNA序列测序.语音和手 ...

  5. 机器学习--基础算法--机器学习基础

    1 机器学习世界的数据 1.数据 数据整体叫数据集(data set) 每一行数据称为一个样本(sample) 除最后一列,每一列表达样本的一个特征(eature) 最后一列,称为标记(label) ...

  6. 机器学习基础算法概述

    机器学习算法大致可以分为三类: 监督学习算法 (Supervised Algorithms):在监督学习训练过程中,可以由训练数据集学到或建立一个模式(函数 / learning model),并依此 ...

  7. 机器学习基础算法四:逻辑回归算法实验

    逻辑回归算法实验 一.逻辑回归介绍 逻辑回归是一种分类模型 z=WTX=w0+w1x1+w2x2+......+wnxnz =W^{T}X=w^{0}+w^{1}x^{1}+w^{2}x^{2}+.. ...

  8. 机器学习基础算法(2)

    2.分类算法-k-邻近算法 分类算法的判定依据:目标值是离散型的 2.1 k-邻近算法基本概念.原理及应用 k近邻算法是一种基本分类和回归方法.本篇文章只讨论分类问题的k近邻法. K近邻算法,即是给定 ...

  9. 20/03/07 机器学习---基础算法 (2)

    回归 线性回归 使用极大似然估计解释最小二乘 y(i)=θTx(i)+ε(i)y^{(i)}=\theta^Tx^{(i)}+\varepsilon^{(i)}y(i)=θTx(i)+ε(i) 误差ε ...

  10. 【机器学习基础】数学推导+纯Python实现机器学习算法27:EM算法

    Python机器学习算法实现 Author:louwill Machine Learning Lab 从本篇开始,整个机器学习系列还剩下最后三篇涉及导概率模型的文章,分别是EM算法.CRF条件随机场和 ...

最新文章

  1. 简洁好用的数据库表结构文档生成工具!
  2. 多种分布式文件系统简介
  3. cf1009F. Dominant Indices
  4. php 大批量的删除图片,PHP批量删除记录同时删除图片文件
  5. Java笔记-concurrent集合及线程池
  6. canvas路径剪切和判断是否在路径内
  7. TimeSpan 用法 求离最近发表时间的函数
  8. java 改像素不改尺寸_如何不改变分辨率的情况下缩小尺寸PNG图片
  9. SRv6技术课堂(一):SRv6概述
  10. atitit uke产品线 attilax总结.docx 1. 知识聚合 知识检索 产品线 1 2. 爬虫产品线 1 3. 发帖机产品线 1 4. 发动机产品线 1 5. O2o产品线(旅游
  11. java实习周记_计算机java开发实习周记20篇
  12. 硬件芯片----74HC595芯片的运用原理
  13. 淘宝天猫融合能拉回“出淘”的用户吗?
  14. 甲骨文CEO独家揭秘企业转型秘诀,就一个字
  15. 商店英雄显示无法连接服务器,商店英雄攻略 新手常见问题FAQ汇总[视频][多图]...
  16. ps水花飞溅效果制作
  17. 电视端虚拟鼠标的设计
  18. Java 使用Reactive Redis
  19. 关于 .NET Core(.NET Core 指南)
  20. Android+jenkins自动打包教程

热门文章

  1. Android 拦截TextView中超链接点击事件
  2. java day10【接口、多态】
  3. wampserver下修改mysql root用户的登录密码
  4. 两个栈实现队列 以及两个队列实现栈
  5. 杭电 看归并排序和快速排序
  6. MacOS平台上编译 hadoop 3.1.2 源码
  7. jqgrid 使用小记——与springboot jpa 一起使用的分页,翻页。(使用springboot jpa 原生的分页)...
  8. Java中多态的一些简单理解
  9. Hadoop平台简述
  10. Rational Rose 2003 逆向工程转换C++ / VC++ 6.0源代码成UML类图