matlab注释分析高斯混合模型
源码from:http://blog.csdn.net/abcjennifer/article/details/8198352#reply
Rachel-Zhang 提供的源码。
高斯混合模型
没有输入参数判断,没有协方差是否可逆验证。
我要用语音处理的,电脑卡死机,逆矩阵不是所有的都有的。
或者用文库里面的代码
http://wenku.baidu.com/link?url=-bK88ETqMCURoMq1z1Lyiz4nzhpSmDq_J23o8yA0P1NVFM7GoFiH9UjC1b6R-jyYpimEuwYXebmU5WPd7Ek0UsnI_Zq6R5cjeMwmQVM5lWq
function varargout = gmm(X, K_or_centroids)
% ============================================================% Expectation-Maximization iteration implementation of% Gaussian Mixture Model.%% PX = GMM(X, K_OR_CENTROIDS)% [PX MODEL] = GMM(X, K_OR_CENTROIDS)%% - X: N-by-D data matrix.----------NxD的矩阵% - K_OR_CENTROIDS: either K indicating the number of--------单个数字K/[K] 或者 KxD矩阵的聚类% components or a K-by-D matrix indicating the% choosing of the initial K centroids.%% - PX: N-by-K matrix indicating the probability of each--------NxK矩阵,第N个数据点占第K个单一高斯概率密度% component generating each point.% - MODEL: a structure containing the parameters for a GMM:% MODEL.Miu: a K-by-D matrix.-------------KxD矩阵,初始化聚类样本,后面循环为每个数据点概率归一化后再聚类概率归一化后的均值矩阵% MODEL.Sigma: a D-by-D-by-K matrix.------DxDxK矩阵,初始化数据点对于聚类的方差[聚类等概率],后面循环为均值矩阵改变以后的方差% MODEL.Pi: a 1-by-K vector.-----------1xK矩阵,初始化数据点使用聚类的概率分布,后面循环高斯混合概率系数归一化的分母Nk/N,高斯混合加权系数向量% ============================================================threshold = 1e-15;%阈值[N, D] = size(X);%矩阵X的行N,列Dif isscalar(K_or_centroids)%判断值是否为1x1矩阵即单个数字K = K_or_centroids;%取[k]的k% randomly pick centroidsrndp = randperm(N);%返回一行由1到N个的整数无序排列的向量centroids = X(rndp(1:K), :);%取出X矩阵行打乱后的矩阵的前K行elseK = size(K_or_centroids, 1);%取矩阵K_or_centroids的行数centroids = K_or_centroids;%取矩阵K_or_centroids矩阵end% initial values[pMiu pPi pSigma] = init_params();%初始化 嵌套函数后面可见。KxD矩阵pMiu聚类采样点,1*K向量pPi使用同一个聚类采样点出现概率,D*D*K的矩阵pSigma矩阵X的列项j对于聚类采样点的协方差Lprev = -inf; %inf表示正无究大,-inf表示为负无究大while truePx = calc_prob();%NxK矩阵Px存放聚类点k(共有聚类点K个)对于全部数据点的正态分布生成样本的概率密度% new value for pGammapGamma = Px .* repmat(pPi, N, 1);%NxK矩阵pGamma在使用聚类采样点k的条件下某个数据点n生成样本概率密度(条件概率密度)pGamma = pGamma ./ repmat(sum(pGamma, 2), 1, K); %NxK矩阵pGamma对于使用数据点条件概率密度行向归一化。%求每个样本由第K个聚类,也叫“component“生成的概率)% new value for parameters of each ComponentNk = sum(pGamma, 1);%1xK矩阵Nk第k个聚类点被数据点使用的概率总和pMiu = diag(1./Nk) * pGamma' * X;%KxD矩阵 重新计算每个component的均值 维数变化KxK*KxN*NxD=KxD 数据点进行 聚类概率归一化.条件概率密度.数据点=得到均值(期望)%均值=Σ概率Pi*数据点某一属性Xi;而这里还多了个聚类概率NkpPi = Nk/N; %更新混合高斯的加权系数for kk = 1:K %重新计算每个component的协方差矩阵Xshift = X-repmat(pMiu(kk, :), N, 1);%NxD矩阵Xshift在某一个聚类点的情况下,每个属性在这个聚类下的对均值(期望)差数(X-μi)pSigma(:, :, kk) = (Xshift' * ...(diag(pGamma(:, kk)) * Xshift)) / Nk(kk);%DxD矩阵pSigma(::,i) 概率Pi 聚类概率=1/Nk(i)%第i个方差矩阵= Σ(X-μi)转置*概率Pi*(X-μi)*第i个聚类概率end% check for convergenceL = sum(log(Px*pPi')); %求混合高斯分布的似然函数if L-Lprev < threshold %随着迭代次数的增加,似然函数越来越大,直至不变break; %似然函数收敛则退出endLprev = L;endif nargout == 1 %如果返回是一个参数的话,那么varargout=Px;varargout = {Px};else %否则,返回[Px model],其中model是结构体model = [];model.Miu = pMiu;model.Sigma = pSigma;model.Pi = pPi;varargout = {Px, model};endfunction [pMiu pPi pSigma] = init_params()pMiu = centroids;%得X矩阵中的任意K行,KxD矩阵 聚类点pPi = zeros(1, K);%获取K维零向量[0 0 ...0] 加权系数(每行数据与聚类中点最小距离的概率)pSigma = zeros(D, D, K);%获取K个DxD的零矩阵%distmat为D维距离差平方和% hard assign x to each centroids %X有NxD;sum(X.*X, 2)为Nx1; repmat(sum(X.*X, 2), 1, K)行向整体1倍,列向整体K倍;结果NxKdistmat = repmat(sum(X.*X, 2), 1, K) + ... %distmat第j行的第i个元素表示第j个数据与第i个聚类点的距离,如果数据有4个,聚类2个,那么distmat就是4*2矩阵repmat(sum(pMiu.*pMiu, 2)', N, 1) - 2*X*pMiu'; %sum(A,2)结果为每行求和列向量,第i个元素是第i行的求和;[dummy labels] = min(distmat, [], 2); %返回列向量dummy和labels,dummy向量记录distmat的每行的最小值,labels向量记录每行最小值的列号(多个取第一个),即是第几个聚类,labels是N×1列向量,N为样本数for k=1:KXk = X(labels == k, :); %把标志为同一个聚类的样本组合起来pPi(k) = size(Xk, 1)/N; %求混合高斯模型的加权系数,pPi为1*K的向量 pSigma(:, :, k) = cov(Xk); %如果X为Nx1数组,那么cov(Xk)求单个高斯模型的协方差矩阵%如果X为NxD(D>1)的矩阵,那么cov(Xk)求聚类样本的协方差矩阵%cov()求出的为方阵--《概率论与数理统计》-多维随机变量的数字特征,且是对称矩阵(上三角和下三角对称)--《线性代数》%pSigma为D*D*K的矩阵endendfunction Px = calc_prob()Px = zeros(N, K);%NxK零矩阵for k = 1:KXshift = X-repmat(pMiu(k, :), N, 1); %NxD矩阵Xshift表示为对于一个1xD聚类点向量行向增N倍的样本矩阵-Uk,第i行表示xi-ukinv_pSigma = inv(pSigma(:, :, k)); %求协方差矩阵的逆,pSigmaD*D*K的矩阵, inv_pSigmaD*D的矩阵,Σ^-1tmp = sum((Xshift*inv_pSigma) .* Xshift, 2); %tmp为N*1矩阵,第i行表示(xi-uk)^T*Sigma^-1*(xi-uk) 即-(x-μ)转置x(Σ^-1).(x-μ) ----矩阵有叉乘(矩阵乘)和点乘coef = (2*pi)^(-D/2) * sqrt(det(inv_pSigma));%求多维正态分布中指数前面的系数,(2π)^(-d/2)*|(Σ^-(1/2))|Px(:, k) = coef * exp(-0.5*tmp); %NxK矩阵求单独一个D维正态分布生成样本的概率密度或贡献endendend
%聚类 数据挖掘
%矩阵算法 线性代数
%概率密度,近似然 数理统计
被我稍微修改备注的
function model = gmmEM(data,Kk,option)
%
% K 为model数分为几个聚类
% Reference: PRML p438-439
tic
if nargin < 3option.eps = 1e-12;%收敛设置option.maxiter = 1000;%end
global num_data K;
K=Kk;
X = data'; %X为D*N型数据,跟PRML对样本数据描述相反
[dim,num_data] = size(X);
%Initialize
%-------------------------------
%K = numel(unique(data.y));
[inx, C,~] = kmeans(data,K);%[inx, C,~] = kmeans(X',K);把对象分为K个聚类
mu = C';%聚类点均值转置
%inx D个数据项的聚类编号
pai = zeros(1,K);%加权系数清零
E = zeros(dim,dim,K);for k=1:Kpai(k) = sum(inx==k); %inx==index 统一聚类的个数统计E(:,:,k) = eye(dim);%E单位矩阵 初始化协方差矩阵
end
pai = pai/num_data;%各个聚类的加权系数
iter = 0;%游标、统计迭代次数
log_val = logGMM(X,mu,E,pai);
tryiter
while iter<option.maxiter iter = iter+1; % E step Yz = compu_Yz(X,mu,E,pai);%聚类k归一化后的多维条件正态概率 %前面使用了数据点使用k聚类归一化(纵向归一化)而后造成没有横向归一化% M step 高斯迭代(横向归一化需要进行均值,协方差矩阵更新)NK = sum(Yz);%每个聚类被使用的比例总和存放1Xk, 用于后面的横向归一化for k = 1:Kmu(:,k) = 1/NK(k)*X*Yz(:,k); %均值矩阵更新 DXN NXK%均值=Σ概率Pi*数据点某一属性Xi;而这里还多了个聚类概率Nk Ek = zeros(dim,dim);%协方差矩阵清零%n=1:num_data;for n=1:num_dataEk = Ek+Yz(n,k)*(X(:,n)-mu(:,k))*(X(:,n)-mu(:,k))';endE(:,:,k) = 1/NK(k)*Ek; %第i个方差矩阵= Σ(X-μi)转置*概率Pi*(X-μi)*第i个聚类概率 endpai = NK/num_data;%检查是否收敛 log_val_new = logGMM(X,mu,E,pai);if abs(log_val_new-log_val) < option.eps model.Yz = Yz;model.mu = mu;model.E = E;model.iter = iter;return;endlog_val = log_val_new;if mod(iter,10)==0disp(['-----进行第' num2str(iter) '迭代...'])end
end
catchfprintf('出错!!\n已经迭代次数:%d\n最大迭代次数\n',iter,option.maxiter);model.Yz = Yz;model.mu = mu;model.E = E;model.iter = iter;%return;
end
if iter==option.maxiterfprintf('达到最大迭代次数%d',maxiter)model.Yz = Yz;model.mu = mu;model.E = E;model.iter = iter;
end
toc
%model.usedTime = toc-tic;
end%------------------------------------
function val = logGMM(X,mu,E,pai)
%样本数据X(行数据项,列属性)。均值矩阵。协方差矩阵(方阵,对角对称)。加权系数
global num_data K
val = 0;
%N = size(X,2);
%K = size(mu,2);
for n=1:num_datatmp = 0;for k=1:K p = mvnpdf(X(:,n),mu(:,k),E(:,:,k));tmp = tmp+pai(k)*p;endval = val + log(tmp);
end
end%---------------------------------------------------------------
function Yz = compu_Yz(X,mu,E,pai)
%样本数据X(行数据项,列属性)。均值矩阵。协方差矩阵。加权系数global num_data KYz = zeros(num_data,K);%清零for n = 1:num_dataY_nK = zeros(1,K);%清零 第k个聚类的for k=1:KY_nK(k) = pai(k)*mvnpdf(X(:,n),mu(:,k),E(:,:,k));%库内mvnpdf 返回DX1数组Y_nk (DX1;DX1;DXD)%mvnpdf 单独一个N维正态分布生成样本的概率密度(离散此处值当作密度)%pai(k)*正态概率密度=条件概率密度(在使用某个聚类的前提下)endYz(n,:) = Y_nK/sum(Y_nK);%概率归一化 (一个数据使用k个聚类的比例)%求第n个数据分别在聚类中的单独一个N维正态分布生成样本的概率(密度)end
end
matlab自带mvnpdf的源代码
function y = mvnpdf(X, Mu, Sigma)
%MVNPDF Multivariate normal probability density function (pdf).
% Y = MVNPDF(X) returns the probability density of the multivariate normal
% distribution with zero mean and identity covariance matrix, evaluated at
% each row of X. Rows of the N-by-D matrix X correspond to observations or
% points, and columns correspond to variables or coordinates. Y is an
% N-by-1 vector.
%
% Y = MVNPDF(X,MU) returns the density of the multivariate normal
% distribution with mean MU and identity covariance matrix, evaluated
% at each row of X. MU is a 1-by-D vector, or an N-by-D matrix, in which
% case the density is evaluated for each row of X with the corresponding
% row of MU. MU can also be a scalar value, which MVNPDF replicates to
% match the size of X.
%
% Y = MVNPDF(X,MU,SIGMA) returns the density of the multivariate normal
% distribution with mean MU and covariance SIGMA, evaluated at each row
% of X. SIGMA is a D-by-D matrix, or an D-by-D-by-N array, in which case
% the density is evaluated for each row of X with the corresponding page
% of SIGMA, i.e., MVNPDF computes Y(I) using X(I,:) and SIGMA(:,:,I).
% If the covariance matrix is diagonal, containing variances along the
% diagonal and zero covariances off the diagonal, SIGMA may also be
% specified as a 1-by-D matrix or a 1-by-D-by-N array, containing
% just the diagonal. Pass in the empty matrix for MU to use its default
% value when you want to only specify SIGMA.
%
% If X is a 1-by-D vector, MVNPDF replicates it to match the leading
% dimension of MU or the trailing dimension of SIGMA.
%
% Example:
%
% mu = [1 -1]; Sigma = [.9 .4; .4 .3];
% [X1,X2] = meshgrid(linspace(-1,3,25)', linspace(-3,1,25)');
% X = [X1(:) X2(:)];
% p = mvnpdf(X, mu, Sigma);
% surf(X1,X2,reshape(p,25,25));
%
% See also MVTPDF, MVNCDF, MVNRND, NORMPDF.% Copyright 1993-2011 The MathWorks, Inc.if nargin<1error(message('stats:mvnpdf:TooFewInputs'));
elseif ndims(X)~=2error(message('stats:mvnpdf:InvalidData'));
end% Get size of data. Column vectors provisionally interpreted as multiple scalar data.
[n,d] = size(X);
if d<1error(message('stats:mvnpdf:TooFewDimensions'));
end% Assume zero mean, data are already centered
if nargin < 2 || isempty(Mu)X0 = X;% Get scalar mean, and use it to center data
elseif numel(Mu) == 1X0 = X - Mu;% Get vector mean, and use it to center data
elseif ndims(Mu) == 2[n2,d2] = size(Mu);if d2 ~= d % has to have same number of coords as Xerror(message('stats:mvnpdf:ColSizeMismatch'));elseif n2 == n % lengths matchX0 = X - Mu;elseif n2 == 1 % mean is a single row, rep it out to match dataX0 = bsxfun(@minus,X,Mu);elseif n == 1 % data is a single row, rep it out to match meann = n2;X0 = bsxfun(@minus,X,Mu); else % sizes don't matcherror(message('stats:mvnpdf:RowSizeMismatch'));endelseerror(message('stats:mvnpdf:BadMu'));
end% Assume identity covariance, data are already standardized
if nargin < 3 || isempty(Sigma)% Special case: if Sigma isn't supplied, then interpret X% and Mu as row vectors if they were both column vectorsif (d == 1) && (numel(X) > 1)X0 = X0';d = size(X0,2);endxRinv = X0;logSqrtDetSigma = 0;% Single covariance matrix
elseif ndims(Sigma) == 2sz = size(Sigma);if sz(1)==1 && sz(2)>1% Just the diagonal of Sigma has been passed in.sz(1) = sz(2);sigmaIsDiag = true;elsesigmaIsDiag = false;end% Special case: if Sigma is supplied, then use it to try to interpret% X and Mu as row vectors if they were both column vectors.if (d == 1) && (numel(X) > 1) && (sz(1) == n)X0 = X0';d = size(X0,2);end%Check that sigma is the right sizeif sz(1) ~= sz(2)error(message('stats:mvnpdf:BadCovariance'));elseif ~isequal(sz, [d d])error(message('stats:mvnpdf:CovSizeMismatch'));elseif sigmaIsDiagif any(Sigma<=0)error(message('stats:mvnpdf:BadDiagSigma'));endR = sqrt(Sigma);xRinv = bsxfun(@rdivide,X0,R);logSqrtDetSigma = sum(log(R));else% Make sure Sigma is a valid covariance matrix[R,err] = cholcov(Sigma,0);if err ~= 0error(message('stats:mvnpdf:BadMatrixSigma'));end% Create array of standardized data, and compute log(sqrt(det(Sigma)))xRinv = X0 / R;logSqrtDetSigma = sum(log(diag(R)));endend% Multiple covariance matrices
elseif ndims(Sigma) == 3sz = size(Sigma);if sz(1)==1 && sz(2)>1% Just the diagonal of Sigma has been passed in.sz(1) = sz(2);Sigma = reshape(Sigma,sz(2),sz(3))';sigmaIsDiag = true;elsesigmaIsDiag = false;end% Special case: if Sigma is supplied, then use it to try to interpret% X and Mu as row vectors if they were both column vectors.if (d == 1) && (numel(X) > 1) && (sz(1) == n)X0 = X0';[n,d] = size(X0);end% Data and mean are a single row, rep them out to match covarianceif n == 1 % already know size(Sigma,3) > 1n = sz(3);X0 = repmat(X0,n,1); % rep centered data out to match covend% Make sure Sigma is the right sizeif sz(1) ~= sz(2)error(message('stats:mvnpdf:BadCovarianceMultiple'));elseif (sz(1) ~= d) || (sz(2) ~= d) % Sigma is a stack of dxd matriceserror(message('stats:mvnpdf:CovSizeMismatchMultiple'));elseif sz(3) ~= nerror(message('stats:mvnpdf:CovSizeMismatchPages'));elseif sigmaIsDiagif any(any(Sigma<=0))error(message('stats:mvnpdf:BadDiagSigma'));endR = sqrt(Sigma);xRinv = X0./R;logSqrtDetSigma = sum(log(R),2);else% Create array of standardized data, and vector of log(sqrt(det(Sigma)))xRinv = zeros(n,d,superiorfloat(X0,Sigma));logSqrtDetSigma = zeros(n,1,class(Sigma));for i = 1:n% Make sure Sigma is a valid covariance matrix[R,err] = cholcov(Sigma(:,:,i),0);if err ~= 0error(message('stats:mvnpdf:BadMatrixSigmaMultiple'));endxRinv(i,:) = X0(i,:) / R;logSqrtDetSigma(i) = sum(log(diag(R)));endendendelseif ndims(Sigma) > 3error(message('stats:mvnpdf:BadCovariance'));
end% The quadratic form is the inner products of the standardized data
quadform = sum(xRinv.^2, 2);y = exp(-0.5*quadform - logSqrtDetSigma - d*log(2*pi)/2);
matlab注释分析高斯混合模型相关推荐
- 高斯混合模型图像聚类、图像生成、可视化分析实战
高斯混合模型图像聚类.图像生成.可视化分析实战 目录 高斯混合模型图像聚类.图像生成.可视化分析实战 PCA图像数据降维
- K-means算法、高斯混合模型 matlab
K-means算法.高斯混合模型 简介: 本节介绍STANFORD机器学习公开课中的第12.13集视频中的算法:K-means算法.高斯混合模型(GMM).(9.10.11集不进行介绍,略过了哈) 一 ...
- c++版本的高斯混合模型的源代码完全注释
之前看到过C版本的,感觉写的很长,没有仔细看,但是C++版本的写的还是很不错的.我仔细看了一下,并对内容进行了仔细的注释,如果有人没有看懂,欢迎留言讨论.先看一眼头文件,在background_seg ...
- 数字图像处理拓展题目——利用Matlab实现动态目标检测 二帧差法、ViBe法、高斯混合模型法,可应用于学生递东西行为检测
1.二帧差法实现动态目标检测 先上效果图: 利用GUI界面显示出来效果图为: 实现流程 1.利用matlab中的VideoReader函数读取视频流. 2.帧差法:获得视频帧数,用for循环对图像每相 ...
- GMM高斯混合模型聚类的EM估计过程matlab仿真
目录 1.算法概述 2.仿真效果 3.MATLAB源码 1.算法概述 高斯混合模型(Gaussian Mixed Model)指的是多个高斯分布函数的线性组合,理论上GMM可以拟合出任意类型的分布,通 ...
- matlab高斯拟合多峰,MATLAB用“fitgmdist”函数拟合高斯混合模型(一维数据)
MATLAB用"fitgmdist"函数拟合高斯混合模型(一维数据) 在MATLAB中"fitgmdist"的用法及其GMM聚类算法中介绍过"fitg ...
- 高斯混合模型聚类(GMM)matlab实现
Gaussian Mixture Model ,就是假设数据服从 Mixture Gaussian Distribution ,换句话说,数据可以看作是从数个 Gaussian Distributio ...
- 高斯混合模型(GaussianMixture Model, GMM)聚类、可视化最优协方差形式、通过TSNE进行结果可视化分析、抽取核心特征因子
高斯混合模型模型: sklearn.mixture.GaussianMixture 混合高斯模型(Gaussian Mixture Model,简称GMM)是用高斯概率密度函数(正态分布曲线)精确地量 ...
- GMM高斯混合模型及EM算法(matlab实现)
单元 %绘制男女生身高的GMM clc clear all %男女生共取2000人,女生平均身高163,男声平均身高180 male=180+sqrt(10)*randn(1,1000); %产生均值 ...
最新文章
- java批量执行sql语句_Java中批量执行sql语句
- 2021-07-27 对labelme标注出来的JSON文件进行灰度图转化(标签值0.1.2.3.4)
- mysql 数据范围总结
- PHP 7.0新增特性详解
- 车和家李想:在智能电动车的红海里,这是我唯一能够胜出的机会所在...
- Error querying database. Cause: java.sql.SQLSyntaxErrorException: ORA-00911: 无效字符
- how is webdynpro component class initialized
- 双柱状图柱子数量比较多_一条代码完成堆叠柱状图-冲击图的操作-终结版
- drools动态配置规则_微服务实战系列(八)-网关springcloud gateway自定义规则
- js知识学习图谱,新手必看
- Loadrunner如何监控Linux系统资源
- 怎么在大数据里面删除不了_数据库删除大数据怎么操作
- 大数据系统架构是什么
- 西安工程大学计算机是几本专业,2016年西安工程大学计算机科学与技术(卓越班)专业在陕西录取分数线...
- 托福、雅思、托业有什么区别?
- 笨方法学python在线_“笨办法”学Python(第3版)
- python爬取斗鱼主播图片_F_hawk189_新浪博客
- java报错java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that
- java环境JDK的安装及判断是否安装成功
- 球球速刷LC之DP问题 三轮