3.GMM模型-EM算法

项目模板和描述：链接地址

本次实验所用的数据为0-9（其中0的标签为Z（Zero））和o这11个字符的英文录音，每个录音的原始录音文件和39维的MFCC特征都已经提供，实验中，每个字符用一个GMM来建模，在测试阶段，对于某句话，对数似然最大的模型对应的字符为当前语音数据的预测的标签（target）

训练数据：330句话，每个字符30句话，11个字符测试数据：110句话，每个字符10句话，11个字符

digit_test/digit_train里面包含了测试和训练用数据，包括：

wav.scp, 句子id到wav的路径的映射，所用到的数据wav文件的相对路径
feats.scp, 语音识别工具kaldi提取的特征文件之一，句子id到特征数据真实路径和位置的映射
feats.ark, 语音识别工具kaldi提取的特征文件之一，特征实际存储在ark文件中，二进制
text, 句子id到标签的映射，本实验中标签（语音对应的文本）只能是0-9，o这11个字符

程序：

kaldi_io.py提供了读取kaldi特征的功能

utils.py 提供了一个特征读取工具

gmm_estimatior.py 核心代码，提供了GMM训练和测试的代码

gmm_estimator.py：

from utils import *
import scipy.cluster.vq as vq#由逻辑可以看出，以一帧信号的39维MFCC特征为观测点单位而不是一个语音信号为观测点单位#Kmeans分类个数
num_gaussian = 5
#高斯混合模型迭代次数
num_iterations = 5
#11个类别的标签
targets = ['Z', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']class GMM:def __init__(self, D, K=5):assert(D>0)self.dim = Dself.K = K#Kmeans Initial#mu[5,39]每一个类别的中心坐标，即初始化的均值#sigma[5,39,39]，即初始化的协方差矩阵#pi[5,],每一个类别的样本点个数占整体样本点个数的比例，即初始化的pi kself.mu , self.sigma , self.pi= self.kmeans_initial()def kmeans_initial(self):mu = []sigma = []#返回所有330个语音的MFCC特征矩阵[18539,39]，每个语音由于时长不一样帧数也不一样但是总和是18539data = read_all_data('train/feats.scp')#centroids[5,39]为K个高斯分布的39维中心坐标，labels[18593,]为每一个点属于哪一个类别的标志#minit = "points"从数据中随机选择k个观察值（行）初始质心。iter为迭代100次，所以每一个初始化的gmm对象的此步骤得到结果不一样(centroids, labels) = vq.kmeans2(data, self.K, minit="points", iter=100)#创建K行数组，每一行存入属于对应类别点的观测值clusters = [[] for i in range(self.K)]for (l,d) in zip(labels,data):clusters[l].append(d)for cluster in clusters:#axis = 0,计算cluster的每一行均值，即为初始化高斯分布的均值坐标mu.append(np.mean(cluster, axis=0))#计算协方差矩阵sigma.append(np.cov(cluster, rowvar=False))pi = np.array([len(c)*1.0 / len(data) for c in clusters])return mu , sigma , pi#求高斯分布概率def gaussian(self , x , mu , sigma):"""Calculate gaussion probability.:param x: The observed data, dim*1.:param mu: The mean vector of gaussian, dim*1:param sigma: The covariance matrix, dim*dim:return: the gaussion probability, scalor"""D=x.shape[0]det_sigma = np.linalg.det(sigma)inv_sigma = np.linalg.inv(sigma + 0.0001)mahalanobis = np.dot(np.transpose(x-mu), inv_sigma)mahalanobis = np.dot(mahalanobis, (x-mu))const = 1/((2*np.pi)**(D/2))return const * (det_sigma)**(-0.5) * np.exp(-0.5 * mahalanobis)#计算对数似然概率def calc_log_likelihood(self , X):"""Calculate log likelihood of GMMparam: X: A matrix including data samples, num_samples * Dreturn: log likelihood of current model """m = X.shape[0]pdfs = np.zeros((m, self.K))for k in range(self.K):for i in range(X.shape[0]):pdfs[i, k] = self.gaussian(X[i], self.mu[k], self.sigma[k])return np.sum(np.log(np.sum(pdfs, axis=1)))def em_estimator(self , X):"""Update paramters of GMMparam: X: A matrix including data samples, num_samples * Dreturn: log likelihood of updated model """#pdfs[n,k]，为每一帧观测点数据由对应高斯方程生成的概率pdfs = np.zeros((X.shape[0], self.K))gamma = np.zeros((X.shape[0], self.K))for k in range(self.K):for i in range(X.shape[0]):#传入第i个高斯方程的mu和sigma，以及输入观测数据X[i]，返回此高斯方程生成该观测数据的概率pdfs[i, k] = self.gaussian(X[i], self.mu[k], self.sigma[k])#后验分布gammagamma = pdfs / np.sum(pdfs, axis=1).reshape(-1, 1)#更新pi mu sigmapi = np.sum(gamma, axis=0) / np.sum(gamma)mu = np.zeros((self.K, self.dim))sigma = np.zeros((self.K, self.dim, self.dim))for k in range(self.K):mu[k] = np.average(X, axis=0, weights=gamma[:, k])cov = np.zeros((self.dim, self.dim))for i in range(X.shape[0]):tmp = (X[i] - mu[k]).reshape(-1, 1)cov += gamma[i, k] * np.dot(tmp, tmp.T)sigma[k, :, :] = cov / np.sum(gamma[:, k])self.pi = piself.mu = muself.sigma = sigmalog_llh = self.calc_log_likelihood(X)return log_llhdef train(gmms, num_iterations = num_iterations):#dict_utt2feat{330} scp文件以空格分割，wav文件名为key#dict_target2utt{11,30} 每一个类别对应的wav文件名，类型名为keydict_utt2feat, dict_target2utt = read_feats_and_targets('train/feats.scp', 'train/text')'''除了kmeans初始化时候取的默认中心之外，在这里对不同的gmm做出了区分这里gmms[target]只训练标签为target的wav文件里的数据'''for target in targets:#feats[n,39] n等于target类中的33个语音信号的所有帧数feats = get_feats(target, dict_utt2feat, dict_target2utt)#gmms[target]参数迭代num_iterations次for i in range(num_iterations):log_llh = gmms[target].em_estimator(feats)print('GMM-Type \'' + target + '\' training succeeded!')return gmmsdef test(gmms):correction_num = 0error_num = 0acc = 0.0dict_utt2feat, dict_target2utt = read_feats_and_targets('test/feats.scp', 'test/text')#获取到每个wav文件的标签，key为wav文件名dict_utt2target = {}for target in targets:utts = dict_target2utt[target]for utt in utts:dict_utt2target[utt] = target#遍历测试集for utt in dict_utt2feat.keys():#获取某一个wav的每一帧数据[n,39]feats = kaldi_io.read_mat(dict_utt2feat[utt])scores = []for target in targets:scores.append(gmms[target].calc_log_likelihood(feats))#获取到概率最大的标签predict_target = targets[scores.index(max(scores))]#测试正确和错误的数量if predict_target == dict_utt2target[utt]:correction_num += 1else:error_num += 1acc = correction_num * 1.0 / (correction_num + error_num)print('测试完成!')print('此次一共测试%d个数据' % (error_num+correction_num))print('测试正确'+str(correction_num)+'次，测试错误'+str(error_num)+'次，正确率'+str(acc))return accdef main():gmms = {}for target in targets:gmms[target] = GMM(39, K=num_gaussian) #Initial modelprint('GMM Initialization succeeded!')#训练的目的是将该gmm模型最大似然到最大概率生成类别target语音特征数据的gmm模型gmms = train(gmms)acc = test(gmms)fid = open('acc.txt', 'w')fid.write(str(acc))fid.close()if __name__ == '__main__':main()

Terminal输出结果：

GMM Initialization succeeded!
GMM-Type 'Z' training succeeded!
GMM-Type '1' training succeeded!
GMM-Type '2' training succeeded!
GMM-Type '3' training succeeded!
GMM-Type '4' training succeeded!
GMM-Type '5' training succeeded!
GMM-Type '6' training succeeded!
GMM-Type '7' training succeeded!
GMM-Type '8' training succeeded!
GMM-Type '9' training succeeded!
GMM-Type 'O' training succeeded!
测试完成！
此次一共测试110个数据
测试正确109次，测试错误1次，正确率0.990909090909091进程已结束,退出代码0

3.GMM模型-EM算法相关推荐

从生成模型到GDA再到GMM和EM算法
在学习生成模型之前,先学习了解下密度估计和高斯混合模型.为什么呢?因为后面的VAE\GANs模型都需要把训练样本,也就是输入的图像样本看作是一个复杂的.多维的分布. 1. 知乎上关于图像频率的解释作 ...
从 GMM 到 EM 算法
首先需要声明的是,GMM是Gaussian Mixture Model,混合高斯模型,是一个模型.EM算法,Expection Maximization期望最大是一套计算框架(framework),一 ...
GMM的EM算法实现
在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证 ...
语音识别入门第三节：GMM以及EM算法（实战篇）
练习基础代码(包括音频文件.音频文件读取代码.预加重代码.分帧加窗代码.快速傅里叶变换代码)可从Github中获取,链接如下:https://github.com/nwpuaslp/ASR_Cours ...
高斯混合模型（GMM）及其EM算法的理解
一个例子高斯混合模型(Gaussian Mixed Model)指的是多个高斯分布函数的线性组合,理论上GMM可以拟合出任意类型的分布,通常用于解决同一集合下的数据包含多个不同的分布的情况(或者是同 ...
nlp（贪心学院）——时序模型、HMM、隐变量模型、EM算法
任务225: 时序模型随时间维度变化的每时每刻有相关性(t时刻数据t+1时刻数据有相关性) 时序数据长度不确定时序数据:如股票价格.语音.文本.温度任务226: HMM的介绍观测值.隐式变量 ...
使用EM算法估计GMM参数的原理及matlab实现
相关数学概念协方差矩阵多维高斯分布其中k=n,即x的维度. GMM的原理 GMM,高斯混合模型,是一种聚类算法. 1.GMM概念: -将k个高斯模型混合在一起,每个点出现的概率是几个高斯混合的结 ...
浅显易懂的GMM模型及其训练过程
高斯混合模型GMM是一个非常基础并且应用很广的模型.对于它的透彻理解非常重要.网上的关于GMM的大多资料介绍都是大段公式,而且符号表述不太清楚,或者文笔非常生硬.本文尝试用通俗的语言全面介绍一下GMM ...
【机器学习基础】数学推导+纯Python实现机器学习算法27：EM算法
Python机器学习算法实现 Author:louwill Machine Learning Lab 从本篇开始,整个机器学习系列还剩下最后三篇涉及导概率模型的文章,分别是EM算法.CRF条件随机场和 ...

3.GMM模型-EM算法

项目模板和描述：链接地址

程序：

gmm_estimator.py：

Terminal输出结果：

3.GMM模型-EM算法相关推荐

最新文章

热门文章