白板推导系列Pytorch-隐马尔可夫模型

HMM 博客汇总

  • 问题三合一(为了防止部分读者厌烦长文,故而单独发了三个问题的博客)
  • 概率计算问题
  • 学习问题
  • 解码问题
  • 词性标注

状态转移矩阵和观测概率矩阵

状态转移矩阵
A=[aij]N×Nαij=P(it+1=qj∣it=qi)\begin{aligned} A &=\left[a_{i j}\right]_{N \times N} \\ \alpha_{ij} &= P(i_{t+1} = q_j|i_t=q_i) \end{aligned} Aαij​​=[aij​]N×N​=P(it+1​=qj​∣it​=qi​)​
观测概率矩阵
B=[bj(k)]N×Mbj(k)=P(ot=vk∣it=qj)\begin{aligned} B &=\left[b_{j}(k)\right]_{N \times M} \\ b_j(k) &= P(o_{t}=v_k|i_t=q_j) \end{aligned} Bbj​(k)​=[bj​(k)]N×M​=P(ot​=vk​∣it​=qj​)​

一个模型

下面我借助盒子与球模型说明HMM模型的模型参数

假设有4个盒子,每个盒子都装有红白两种颜色的球

盒子 1 2 3 4
红球数 5 3 6 8
白球数 5 7 4 2

按照下面的方法抽球,产生一个球的颜色的观测序列:开始,从4个盒子里以等概率随机选取1个盒子,从这个盒子里随机抽出1个球,记录其颜色后,放回;然后,从当前盒子随机转移到下一个盒子,规则是:如果当前盒子是盒子1,
那么下一盒子一定是盒子2,如果当前是盒子2或3,那么分别以概率0.4和0.6转移到左边或右边的盒子,如果当前是盒子4,那么各以0.5的概率停留在盒子4或转移到盒子3;确定转移的盒子后,再从这个盒子里随机抽出1个球,记录其
颜色,放回;如此下去,重复进行5次,得到一个球的颜色的观测序列
O={红,红,白,白,红}O = \{\text{红},\text{红},\text{白},\text{白},\text{红}\} O={红,红,白,白,红}
在这个过程中,观察者只能观测到球的颜色的序列,观测不到球是从哪个盒子取出的,即观测不到盒子的序列

  1. 状态集合

Q={盒子1,盒子2,盒子3,盒子4}Q = \{\text{盒子}1,\text{盒子}2,\text{盒子}3,\text{盒子}4\} Q={盒子1,盒子2,盒子3,盒子4}

  1. 观测集合

V={红,白}V = \{\text{红},\text{白}\} V={红,白}

  1. 初始概率分布

π=(0.25,0.25,0.25,0.25)T\pi= (0.25, 0.25, 0.25, 0.25)^T π=(0.25,0.25,0.25,0.25)T

  1. 状态转移矩阵P(it+1∣i1,i2,...,it,o1,o2,...ot)P(i_{t+1}|i_1,i_2,...,i_t,o_1,o_2,...o_t)P(it+1​∣i1​,i2​,...,it​,o1​,o2​,...ot​)

A=[01000.400.6000.400.6000.50.5]A=\left[\begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0.4 & 0 & 0.6 & 0 \\ 0 & 0.4 & 0 & 0.6 \\ 0 & 0 & 0.5 & 0.5 \end{array}\right] A=⎣⎢⎢⎡​00.400​100.40​00.600.5​000.60.5​⎦⎥⎥⎤​

  1. 观测概率分布P(ot+1∣i1,i2,...,it+1,o1,o2,...,ot)P(o_{t+1}|i_1,i_2,...,i_{t+1},o_1,o_2,...,o_t)P(ot+1​∣i1​,i2​,...,it+1​,o1​,o2​,...,ot​)

B=[0.50.50.30.70.60.40.80.2]B=\left[\begin{array}{ll} 0.5 & 0.5 \\ 0.3 & 0.7 \\ 0.6 & 0.4 \\ 0.8 & 0.2 \end{array}\right] B=⎣⎢⎢⎡​0.50.30.60.8​0.50.70.40.2​⎦⎥⎥⎤​

两个假设

齐次马尔可夫假设
P(it∣it−1,ot−1,⋯,i1,o1)=P(it∣it−1),t=1,2,⋯,TP\left(i_{t} \mid i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(i_{t} \mid i_{t-1}\right), \quad t=1,2, \cdots, T P(it​∣it−1​,ot−1​,⋯,i1​,o1​)=P(it​∣it−1​),t=1,2,⋯,T
观测独立假设
P(ot∣iT,oT,iT−1,oT−1,⋯,it+1,ot+1,it,it−1,ot−1,⋯,i1,o1)=P(ot∣it)P\left(o_{t} \mid i_{T}, o_{T}, i_{T-1}, o_{T-1}, \cdots, i_{t+1}, o_{t+1}, i_{t}, i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(o_{t} \mid i_{t}\right) P(ot​∣iT​,oT​,iT−1​,oT−1​,⋯,it+1​,ot+1​,it​,it−1​,ot−1​,⋯,i1​,o1​)=P(ot​∣it​)

现在我们所有知道的东西都在这了,看看我们能做什么

三个问题

一、概率计算问题(Evaluation)

给定模型 λ=(A,B,π)\lambda=(A, B, \pi)λ=(A,B,π) 和观测序列 O=(o1,o2,⋯,oT)O=\left(o_{1}, o_{2}, \cdots, o_{T}\right)O=(o1​,o2​,⋯,oT​) , 计算在模型 λ\lambdaλ 下观测序列 OOO 出现的概率 P(O∣λ)P(O \mid \lambda)P(O∣λ).

直接计算法

P(O∣λ)=∑IP(O,I∣λ)=∑IP(O∣I,λ)⋅P(I∣λ)P(O|\lambda) = \sum_{I}P(O,I|\lambda) = \sum_{I}P(O|I,\lambda)\cdot P(I|\lambda) P(O∣λ)=I∑​P(O,I∣λ)=I∑​P(O∣I,λ)⋅P(I∣λ)

前向算法

定义前向概率为
αt(i)=P(o1,o2,...,ot,it=qi∣λ)\alpha_t(i) = P(o_1,o_2,...,o_t,i_t=q_i|\lambda) αt​(i)=P(o1​,o2​,...,ot​,it​=qi​∣λ)
那么我们要求的
P(O∣λ)=P(o1,o2,...,oT∣λ)=∑itP(o1,o2,...,oT,iT=qi∣λ)=∑iαT(i)P(O|\lambda) = P(o_1,o_2,...,o_T|\lambda) = \sum_{i_t}P(o_1,o_2,...,o_T,i_T=q_i|\lambda) = \sum_{i}\alpha_T(i) P(O∣λ)=P(o1​,o2​,...,oT​∣λ)=it​∑​P(o1​,o2​,...,oT​,iT​=qi​∣λ)=i∑​αT​(i)

算法过程:

(1)初值
α1(i)=P(o1,i1=qi∣λ)=P(i1∣λ)⋅P(o1∣i1=qi,λ)=πibi(o1)\alpha_1(i) = P(o_1,i_1=q_i|\lambda) = P(i_1|\lambda)\cdot P(o_1|i_1=qi,\lambda) = \pi_i b_i(o_1) α1​(i)=P(o1​,i1​=qi​∣λ)=P(i1​∣λ)⋅P(o1​∣i1​=qi,λ)=πi​bi​(o1​)
(2)递推
αt+1(i)=P(o1,o2,...,ot,ot+1,it+1=qi∣λ)=P(o1,o2,...,ot,it+1=qi∣λ)⋅P(ot+1∣o1,o2,...,ot,it+1=qi∣λ)=[∑j=1P(o1,o2,...,ot,it=qj,it+1=qi∣λ)]⋅P(ot+1∣it+1=qi)=[∑j=1P(o1,o2,...,ot,it=qj∣λ)⋅P(it+1=qi∣o1,o2,...,ot,it=qj,λ)]⋅bi(ot+1)=[∑j=1αt(j)⋅P(it+1=qi∣it=qj,λ)]⋅bi(ot+1)=[∑j=1αt(j)⋅aji]⋅bi(ot+1)\begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,...,o_t,o_{t+1},i_{t+1}=q_i|\lambda) \\ &= P(o_1,o_2,...,o_t,i_{t+1}=q_i|\lambda)\cdot P(o_{t+1}|o_1,o_2,...,o_t,i_{t+1} = q_i|\lambda) \\ &= [\sum_{j=1} P(o_1,o_2,...,o_t,i_t=q_j,i_{t+1}=q_i|\lambda)]\cdot P(o_{t+1}|i_{t+1}=q_i) \\ &= [\sum_{j=1}P(o_1,o_2,...,o_t,i_t=q_j|\lambda)\cdot P(i_{t+1}=q_i|o_1,o_2,...,o_t,i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot P(i_{t+1}=q_i|i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot a_{ji}]\cdot b_i(o_{t+1}) \end{aligned} αt+1​(i)​=P(o1​,o2​,...,ot​,ot+1​,it+1​=qi​∣λ)=P(o1​,o2​,...,ot​,it+1​=qi​∣λ)⋅P(ot+1​∣o1​,o2​,...,ot​,it+1​=qi​∣λ)=[j=1∑​P(o1​,o2​,...,ot​,it​=qj​,it+1​=qi​∣λ)]⋅P(ot+1​∣it+1​=qi​)=[j=1∑​P(o1​,o2​,...,ot​,it​=qj​∣λ)⋅P(it+1​=qi​∣o1​,o2​,...,ot​,it​=qj​,λ)]⋅bi​(ot+1​)=[j=1∑​αt​(j)⋅P(it+1​=qi​∣it​=qj​,λ)]⋅bi​(ot+1​)=[j=1∑​αt​(j)⋅aji​]⋅bi​(ot+1​)​
(3)终止
P(O∣λ)=∑iαT(i)P(O|\lambda) = \sum_{i}\alpha_T(i) P(O∣λ)=i∑​αT​(i)

算法实现

定义观测数据和λ\lambdaλ​​参数(π,A,B\pi,A,Bπ,A,B)

import numpy as npO = [1,1,0,0,1]
Pi= [0.25,0.25,0.25,0.25]
A = [[0, 1, 0, 0 ],[0.4,0,0.6,0],[0,0.4,0,0.6],[0,0,0.5,0.5]
]
B = [[0.5,0.5],[0.3,0.7],[0.6,0.4],[0.8,0.2]
]

定义模型

class ForwardModel:def __init__(self,Pi,A,B) -> None:self.Pi = Piself.A = Aself.B = Bself.T = len(A)def predict(self,O):alpha = np.zeros(shape=(self.T,),dtype=float)# 初值for i in range(self.T):alpha[i] = self.Pi[i]*self.B[i][O[0]]# 递推for observation in O:temp = np.zeros_like(alpha)for i in range(self.T):for j in range(self.T):temp[i] += alpha[j]*self.A[j][i]temp[i] = temp[i]*self.B[i][observation]alpha = temp# 终止return np.sum(alpha)

预测

model = ForwardModel(Pi,A,B)
model.predict(O)

0.01164323808

读者可自行验证是否正确

后向算法

定义后向概率为
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)\beta_{t}(i)=P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) βt​(i)=P(ot+1​,ot+2​,⋯,oT​∣it​=qi​,λ)
那么
P(O∣λ)=P(o1,o2,...,oT∣λ)=∑i=1NP(o1,o2,...,oT,i1=qi∣λ)=∑i=1NP(o1∣o2,...,oT,i1=qi,λ)⋅P(o2,...,oT,i1=qi∣λ)=∑i=1NP(o1∣i1=qi,λ)⋅P(o2,...,oT∣i1=qi,λ)⋅P(i1=qi∣λ)=∑i=1Nbi(o1)⋅β1(i)⋅πi\begin{aligned} P(O|\lambda) &= P(o_1,o_2,...,o_T|\lambda) \\ &=\sum_{i=1}^{N}P(o_1,o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|o_2,...,o_T,i_1=q_i,\lambda)\cdot P(o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|i_1=q_i,\lambda)\cdot P(o_2,...,o_T|i_1=q_i,\lambda)\cdot P(i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(O∣λ)​=P(o1​,o2​,...,oT​∣λ)=i=1∑N​P(o1​,o2​,...,oT​,i1​=qi​∣λ)=i=1∑N​P(o1​∣o2​,...,oT​,i1​=qi​,λ)⋅P(o2​,...,oT​,i1​=qi​∣λ)=i=1∑N​P(o1​∣i1​=qi​,λ)⋅P(o2​,...,oT​∣i1​=qi​,λ)⋅P(i1​=qi​∣λ)=i=1∑N​bi​(o1​)⋅β1​(i)⋅πi​​

算法过程

(1)初值
βT(i)=1\beta_T(i) = 1 βT​(i)=1

(2)递推
βt(i)=P(ot+1,ot+2,⋯,oT∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT,it+1=qj∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅P(it+1=qj∣it=qi,λ)=∑j=1NP(ot+1,ot+2,⋯,oT∣it+1=qj,it=qi,λ)⋅αij\begin{aligned} \beta_{t}(i) &= P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T},i_{t+1}=q_j \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot P(i_{t+1}=q_j|i_t=q_i,\lambda) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot \alpha_{ij} \end{aligned} βt​(i)​=P(ot+1​,ot+2​,⋯,oT​∣it​=qi​,λ)=j=1∑N​P(ot+1​,ot+2​,⋯,oT​,it+1​=qj​∣it​=qi​,λ)=j=1∑N​P(ot+1​,ot+2​,⋯,oT​∣it+1​=qj​,it​=qi​,λ)⋅P(it+1​=qj​∣it​=qi​,λ)=j=1∑N​P(ot+1​,ot+2​,⋯,oT​∣it+1​=qj​,it​=qi​,λ)⋅αij​​
接下来这个步骤很关键,根据概率图模型。。。(挖个坑,这儿差个证明)
βt(i)=∑j=1NP(ot+1,...oT∣it+1=qj,λ)⋅αij=∑j=1NP(ot+2,...oT∣it+1=qj,λ)⋅P(ot+1∣ot+2,...oT,it+1=qj,λ)⋅αij=∑j=1Nβt+1(j)⋅bj(ot+1)⋅αij\begin{aligned} \beta_t(i) &= \sum_{j=1}^{N}P(o_{t+1},...o_T|i_{t+1}=q_j,\lambda)\cdot \alpha_{ij} \\ &=\sum_{j=1}^{N}P(o_{t+2},...o_T|i_{t+1}=q_j,\lambda)\cdot P(o_{t+1}|o_{t+2},...o_T,i_{t+1}=q_j,\lambda) \cdot \alpha_{ij} \\ &= \sum_{j=1}^{N}\beta_{t+1}(j)\cdot b_j(o_{t+1})\cdot \alpha_{ij} \end{aligned} βt​(i)​=j=1∑N​P(ot+1​,...oT​∣it+1​=qj​,λ)⋅αij​=j=1∑N​P(ot+2​,...oT​∣it+1​=qj​,λ)⋅P(ot+1​∣ot+2​,...oT​,it+1​=qj​,λ)⋅αij​=j=1∑N​βt+1​(j)⋅bj​(ot+1​)⋅αij​​
(3)终止
P(O∣λ)=∑i=1Nbi(o1)⋅β1(i)⋅πi\begin{aligned} P(O|\lambda) =\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(O∣λ)=i=1∑N​bi​(o1​)⋅β1​(i)⋅πi​​
(ps 如果你都看到这儿了,可以要一个点赞吗)

算法实现

数据和参数定义部分与上面一致,略

定义模型

class BackwardModel:def __init__(self,Pi,A,B) -> None:self.Pi = Piself.A = Aself.B = Bself.T = len(A)def predict(self,O):# 初值beta = np.ones(shape=(self.T,))# 递推for o in reversed(O):temp = np.zeros_like(beta)for i in range(self.T):for j in range(self.T):temp[i] += beta[j]*B[j][o]*A[i][j]beta = temp# 终止res = 0for i in range(self.T):res += B[i][O[0]]*beta[i]*Pi[i]return res

预测

model = BackwardModel(Pi,A,B)
model.predict(O)

0.01164323808

我们看到这跟前面使用前向算法得出的预测值是相同的

二、学习问题(Learning)

Baum-Welch算法

事实上,Baum-Welch算法是EM算法在HMM学习问题中的应用

我们要求的λ\lambdaλ应该满足
λ=argmaxλP(O∣λ)\lambda = \underset{\lambda}{argmax} P(O|\lambda) λ=λargmax​P(O∣λ)
我们还是先把EM算法的公式写出来先
θt+1=argmaxθ∫zlogP(X,Z∣θ)⋅P(Z∣X,θ)dz\theta^{t+1} = \underset{\theta}{argmax}\int_zlogP(X,Z|\theta)\cdot P(Z|X,\theta)dz θt+1=θargmax​∫z​logP(X,Z∣θ)⋅P(Z∣X,θ)dz
其中Z表示隐变量,X表示观测变量,那在HMM中,隐变量是I,观测变量是O,且是离散的
λt+1=argmaxθ∑IlogP(O,I∣λ)⋅P(I∣O,λt)=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)P(O∣λt)=argmaxθ1P(O∣λt)∑IlogP(O,I∣λ)⋅P(O,I∣λt)=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)\begin{aligned} \lambda^{t+1} &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(I|O,\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot \frac{P(O,I|\lambda^t)}{P(O|\lambda^t)} \\ &= \underset{\theta}{argmax}\frac{1}{P(O|\lambda^t)}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ \end{aligned} λt+1​=θargmax​I∑​logP(O,I∣λ)⋅P(I∣O,λt)=θargmax​I∑​logP(O,I∣λ)⋅P(O∣λt)P(O,I∣λt)​=θargmax​P(O∣λt)1​I∑​logP(O,I∣λ)⋅P(O,I∣λt)=θargmax​I∑​logP(O,I∣λ)⋅P(O,I∣λt)​
其中
F(T)=P(O,I∣λ)=P(o1,o2,...,oT,i1,i2,...,iT∣λ)=P(oT∣o1,o2,...,oT−1,i1,i2,...,iT,λ)⋅P(o1,o2,...,oT−1,i1,i2,...,iT∣λ)=P(oT∣iT,λ)⋅P(o1,o2,...,oT−1,i1,i2,...,iT−1∣λ)⋅P(iT∣o1,o2,...,oT−1,i1,i2,...iT−1,λ)=biT(oT)⋅F(T−1)⋅αiT−1iT\begin{aligned} F(T) = P(O,I\mid \lambda) &= P(o_1,o_2,...,o_T,i_1,i_2,...,i_T|\lambda) \\ &= P(o_T\mid o_1,o_2,...,o_{T-1},i_1,i_2,...,i_T,\lambda)\cdot P(o_1,o_2,...,o_{T-1},i_1,i_2,...,i_T|\lambda) \\ &= P(o_T\mid i_T,\lambda)\cdot P(o_1,o_2,...,o_{T-1},i_1,i_2,...,i_{T-1}|\lambda)\cdot P(i_T|o_1,o_2,...,o_{T-1},i_1,i_2,...i_{T-1},\lambda) \\ &= b_{i_T}(o_T)\cdot F(T-1)\cdot \alpha_{i_{T-1}i_T} \end{aligned} F(T)=P(O,I∣λ)​=P(o1​,o2​,...,oT​,i1​,i2​,...,iT​∣λ)=P(oT​∣o1​,o2​,...,oT−1​,i1​,i2​,...,iT​,λ)⋅P(o1​,o2​,...,oT−1​,i1​,i2​,...,iT​∣λ)=P(oT​∣iT​,λ)⋅P(o1​,o2​,...,oT−1​,i1​,i2​,...,iT−1​∣λ)⋅P(iT​∣o1​,o2​,...,oT−1​,i1​,i2​,...iT−1​,λ)=biT​​(oT​)⋅F(T−1)⋅αiT−1​iT​​​
故(简单的等比数列)
F(T)=∏t=2Tbit(ot)⋅αit−1it⋅F(1)=[∏t=2Tbit(ot)⋅αit−1it]⋅P(o1,i1∣λ)=[∏t=2Tbit(ot)⋅αit−1it]⋅P(o1∣i1,λ)⋅P(i1∣λ)=[∏t=2Tbit(ot)⋅αit−1it]⋅bi1(o1)⋅πi1=[∏t=1Tbit(ot)]⋅[∏t=2Tαit−1it]⋅πi1\begin{aligned} F(T) &= \prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}\cdot F(1) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot P(o_1,i_1\mid \lambda) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot P(o_1\mid i_1,\lambda)\cdot P(i_1\mid \lambda) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot b_{i_1}(o_1)\cdot \pi_{i_1} \\ &= [\prod_{t=1}^{T}b_{i_t}(o_t)]\cdot [\prod_{t=2}^{T}\alpha_{i_{t-1}i_t}]\cdot \pi_{i_1} \\ \end{aligned} F(T)​=t=2∏T​bit​​(ot​)⋅αit−1​it​​⋅F(1)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅P(o1​,i1​∣λ)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅P(o1​∣i1​,λ)⋅P(i1​∣λ)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅bi1​​(o1​)⋅πi1​​=[t=1∏T​bit​​(ot​)]⋅[t=2∏T​αit−1​it​​]⋅πi1​​​
现在我们已经得到
P(O,I∣λ)=[∏t=1Tbit(ot)]⋅[∏t=2Tαit−1it]⋅πi1P(O,I\mid \lambda) = [\prod_{t=1}^{T}b_{i_t}(o_t)]\cdot [\prod_{t=2}^{T}\alpha_{i_{t-1}i_t}]\cdot \pi_{i_1} P(O,I∣λ)=[t=1∏T​bit​​(ot​)]⋅[t=2∏T​αit−1​it​​]⋅πi1​​
然后
λt+1=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)=argmaxθ∑I(∑t=1Tlog(bit(ot))+∑t=2Tlog(αit−1it)+log(πi1))⋅P(O,I∣λt)\begin{aligned} \lambda^{t+1} &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}(\sum_{t=1}^Tlog(b_{i_t}(o_t))+\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})+log(\pi_{i_1}))\cdot P(O,I|\lambda^t) \end{aligned} λt+1​=θargmax​I∑​logP(O,I∣λ)⋅P(O,I∣λt)=θargmax​I∑​(t=1∑T​log(bit​​(ot​))+t=2∑T​log(αit−1​it​​)+log(πi1​​))⋅P(O,I∣λt)​

Q(λ,λt)=argmaxθ∑I(∑t=1Tlog(bit(ot))+∑t=2Tlog(αit−1it)+log(πi1))⋅P(O,I∣λt)Q(\lambda,\lambda^t) = \underset{\theta}{argmax}\sum_{I}(\sum_{t=1}^Tlog(b_{i_t}(o_t))+\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})+log(\pi_{i_1}))\cdot P(O,I|\lambda^t) Q(λ,λt)=θargmax​I∑​(t=1∑T​log(bit​​(ot​))+t=2∑T​log(αit−1​it​​)+log(πi1​​))⋅P(O,I∣λt)
为了求πt+1\pi^{t+1}πt+1​,我们需要求πit+1\pi_i^{t+1}πit+1​​​

在∑i=1Nπi=1\sum_{i=1}^N \pi_i = 1∑i=1N​πi​=1​的约束下,

记拉格朗日函数
L(π,η)=∑I(log(πi1)⋅P(O∣I,λt))+η⋅(∑i=1Nπi−1)=∑i1...∑iT(log(πi1)⋅P(O∣I,λt))+η⋅(∑i=1Nπi−1)=∑i1log(πi1)⋅P(O∣i1,λt)+η⋅(∑i=1Nπi−1)=∑i=1Nlog(πi)⋅P(O∣i1=qi,λt)+η⋅(∑i=1Nπi−1)\begin{aligned} L(\pi,\eta) &= \sum_{I}(log(\pi_{i_1})\cdot P(O|I,\lambda^t)) +\eta\cdot (\sum_{i=1}^N \pi_i-1)\\ &= \sum_{i_1}...\sum_{i_T}(log(\pi_{i_1})\cdot P(O|I,\lambda^t)) +\eta\cdot (\sum_{i=1}^N \pi_i-1)\\ &= \sum_{i_1}log(\pi_{i_1})\cdot P(O|i_1,\lambda^t)+\eta\cdot (\sum_{i=1}^N \pi_i-1) \\ &= \sum_{i=1}^Nlog(\pi_i)\cdot P(O|i_1=q_i,\lambda^t) +\eta\cdot (\sum_{i=1}^N \pi_i-1) \end{aligned} L(π,η)​=I∑​(log(πi1​​)⋅P(O∣I,λt))+η⋅(i=1∑N​πi​−1)=i1​∑​...iT​∑​(log(πi1​​)⋅P(O∣I,λt))+η⋅(i=1∑N​πi​−1)=i1​∑​log(πi1​​)⋅P(O∣i1​,λt)+η⋅(i=1∑N​πi​−1)=i=1∑N​log(πi​)⋅P(O∣i1​=qi​,λt)+η⋅(i=1∑N​πi​−1)​

∂L∂πi=P(O∣i1=qi,λt)πi+η=0\begin{aligned} \frac{\partial L}{\partial \pi_i} &= \frac{P(O|i_1=q_i,\lambda^t)}{\pi_i} +\eta = 0 \end{aligned} ∂πi​∂L​​=πi​P(O∣i1​=qi​,λt)​+η=0​

我们对得到的N个式子求和即可求出η\etaη
∑i=1N(P(O∣i1=qi,λt)+η⋅πi)=∑i=1NP(O∣i1=qi,λt)+η=0\begin{aligned} \sum_{i=1}^{N}(P(O|i_1=q_i,\lambda^t) +\eta\cdot \pi_i) \\ =\sum_{i=1}^{N}P(O|i_1=q_i,\lambda^t) +\eta = 0 \end{aligned} i=1∑N​(P(O∣i1​=qi​,λt)+η⋅πi​)=i=1∑N​P(O∣i1​=qi​,λt)+η=0​

η=−∑i=1NP(O∣i1=qi,λt)=−P(O∣λt)\eta = -\sum_{i=1}^{N}P(O|i_1=q_i,\lambda^t) =-P(O|\lambda^t) η=−i=1∑N​P(O∣i1​=qi​,λt)=−P(O∣λt)

πit+1=−P(O∣i1=qi,λt)η=P(O∣i1=qi,λt)P(O∣λt)\pi_i^{t+1} = -\frac{P(O|i_1=q_i,\lambda^t)}{\eta} = \frac{P(O|i_1=q_i,\lambda^t)}{P(O|\lambda^t)} πit+1​=−ηP(O∣i1​=qi​,λt)​=P(O∣λt)P(O∣i1​=qi​,λt)​
**为了求At+1A^{t+1}At+1​,我们即需要求αijt+1\alpha_{ij}^{t+1}αijt+1​**​​

在状态转移矩阵的每一行和为1的约束下,定义拉格朗日函数
L(α,η)=∑I(∑t=2Tlog(αit−1it)⋅P(O∣I,λt))+∑i=1Nηi⋅(∑j=1Nαij−1)=∑i=1N∑j=1N∑t=2Tlog⁡aijP(O,it−1=i,it=j∣λt)+∑i=1Nηi⋅(∑j=1Nαij−1)\begin{aligned} L(\alpha,\eta) &= \sum_{I}(\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})\cdot P(O|I,\lambda^t)) + \sum_{i=1}^{N}\eta_i\cdot (\sum_{j=1}^N \alpha_{ij}-1)\\ &=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=2}^{T} \log a_{i j} P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)+\sum_{i=1}^{N}\eta_i\cdot (\sum_{j=1}^N \alpha_{ij}-1)\\ \end{aligned} L(α,η)​=I∑​(t=2∑T​log(αit−1​it​​)⋅P(O∣I,λt))+i=1∑N​ηi​⋅(j=1∑N​αij​−1)=i=1∑N​j=1∑N​t=2∑T​logaij​P(O,it−1​=i,it​=j∣λt)+i=1∑N​ηi​⋅(j=1∑N​αij​−1)​
然后对αij\alpha_{ij}αij​求偏导
∂L∂αij=∑t=2TP(O,it−1=i,it=j∣λt)αij+ηi=0\frac{\partial L}{\partial \alpha_{ij}} = \sum_{t=2}^{T}\frac{P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\alpha_{ij}}+\eta_i = 0 ∂αij​∂L​=t=2∑T​αij​P(O,it−1​=i,it​=j∣λt)​+ηi​=0

∑t=2TP(O,it−1=i,it=j∣λt)+ηi⋅αij=0\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right) + \eta_i\cdot \alpha_{ij} = 0 t=2∑T​P(O,it−1​=i,it​=j∣λt)+ηi​⋅αij​=0

我们对j求和
∑j=1N∑t=2TP(O,it−1=i,it=j∣λt)+ηi⋅∑j=1Nαij=0\sum_{j=1}^{N}\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)+\eta_i\cdot \sum_{j=1}^N \alpha_{ij} = 0 j=1∑N​t=2∑T​P(O,it−1​=i,it​=j∣λt)+ηi​⋅j=1∑N​αij​=0
得到
ηi=−∑j=1N∑t=2TP(O,it−1=qi,it=qj∣λt)=∑t=2TP(O,it−1=qi∣λt)\eta_i = -\sum_{j=1}^{N}\sum_{t=2}^{T}P\left(O, i_{t-1}=q_i, i_{t}=q_j \mid \lambda^t\right) = \sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t) ηi​=−j=1∑N​t=2∑T​P(O,it−1​=qi​,it​=qj​∣λt)=t=2∑T​P(O,it−1​=qi​∣λt)

αijt+1=−∑t=2TP(O,it−1=i,it=j∣λt)ηi=∑t=2TP(O,it−1=i,it=j∣λt)∑t=2TP(O,it−1=qi∣λt)\begin{aligned} \alpha_{ij}^{t+1} &= -\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\eta_i} \\ &=\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t)} \end{aligned} αijt+1​​=−ηi​∑t=2T​P(O,it−1​=i,it​=j∣λt)​=∑t=2T​P(O,it−1​=qi​∣λt)∑t=2T​P(O,it−1​=i,it​=j∣λt)​​

At+1=[αijt+1]N×NA^{t+1} = [\alpha_{ij}^{t+1}]_{N\times N} At+1=[αijt+1​]N×N​
为了求Bt+1B^{t+1}Bt+1,我们需要求bj(k)t+1b_j(k)^{t+1}bj​(k)t+1​​

在每行和为1的约束下,得到拉格朗日函数如下
L(b,η)=∑I[∑t=1Tlog(bit(ot))]⋅P(O,I∣λt)+∑j=1Nηj⋅[∑k=1Mbj(k)−1]=∑j=1N∑t=1Tlog⁡bj(ot)P(O,it=qj∣λt)+∑j=1Nηj⋅[∑k=1Mbj(k)−1]\begin{aligned} L(b,\eta) &= \sum_{I}[\sum_{t=1}^Tlog(b_{i_t}(o_t))]\cdot P(O,I|\lambda^t) + \sum_{j=1}^{N}\eta_j\cdot [\sum_{k=1}^{M}b_j(k)-1]\\ &=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P(O, i_{t}=q_j \mid \lambda^t)+\sum_{j=1}^{N}\eta_j\cdot [\sum_{k=1}^{M}b_j(k)-1]\\ \end{aligned} L(b,η)​=I∑​[t=1∑T​log(bit​​(ot​))]⋅P(O,I∣λt)+j=1∑N​ηj​⋅[k=1∑M​bj​(k)−1]=j=1∑N​t=1∑T​logbj​(ot​)P(O,it​=qj​∣λt)+j=1∑N​ηj​⋅[k=1∑M​bj​(k)−1]​
然后对bj(k)b_j(k)bj​(k)​求偏导
∂L∂bj(k)=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)bj(k)+ηj=0\begin{aligned} \frac{\partial L}{\partial b_j(k)} = \sum_{t=1}^{T}\frac{P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{b_j(k)}+\eta_j = 0 \end{aligned} ∂bj​(k)∂L​=t=1∑T​bj​(k)P(O,it​=qj​∣λt)⋅I(ot​=vk​)​+ηj​=0​
∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)+ηj⋅bj(k)=0\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)+\eta_j\cdot b_j(k)= 0 t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)+ηj​⋅bj​(k)=0

对k求和
∑k=1M∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)+ηj=0\sum_{k=1}^{M}\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)+\eta_j = 0 k=1∑M​t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)+ηj​=0

ηj=−∑k=1M∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)=∑t=1TP(O,it=qj∣λt)∑k=1MI(ot=vk)=∑t=1TP(O,it=qj∣λt)\begin{aligned} \eta_j &= -\sum_{k=1}^{M}\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k) \\ &=\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\sum_{k=1}^MI(o_t=v_k) \\ &=\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t) \end{aligned} ηj​​=−k=1∑M​t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)=t=1∑T​P(O,it​=qj​∣λt)k=1∑M​I(ot​=vk​)=t=1∑T​P(O,it​=qj​∣λt)​
从而得
bj(k)t+1=−∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)ηj=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)∑t=1TP(O,it=qj∣λt)\begin{aligned} b_j(k)^{t+1} &= -\frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\eta_j} \\ &=\frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)} \end{aligned} bj​(k)t+1​=−ηj​∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​=∑t=1T​P(O,it​=qj​∣λt)∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​​
综上,得到递推式如下:
πit+1=P(O∣i1=qi,λt)P(O∣λt)αijt+1=∑t=2TP(O,it−1=i,it=j∣λt)∑t=2TP(O,it−1=qi∣λt)bj(k)t+1=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)∑t=1TP(O,it=qj∣λt)\begin{aligned} &\pi_i^{t+1} = \frac{P(O|i_1=q_i,\lambda^t)}{P(O|\lambda^t)} \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)} \end{aligned} ​πit+1​=P(O∣λt)P(O∣i1​=qi​,λt)​αijt+1​=∑t=2T​P(O,it−1​=qi​∣λt)∑t=2T​P(O,it−1​=i,it​=j∣λt)​bj​(k)t+1=∑t=1T​P(O,it​=qj​∣λt)∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​​

定义
ξt(i,j)=p(O,it=qi,it+1=qj∣λt)P(O∣λt)\begin{aligned} \xi_t(i,j) = \frac{p(O,i_{t}=q_i,i_{t+1}=q_j|\lambda^t)}{P(O|\lambda^t)} \end{aligned} ξt​(i,j)=P(O∣λt)p(O,it​=qi​,it+1​=qj​∣λt)​​
γt(i)=P(O∣it=qi,λt)P(O∣λt)\gamma_t(i) = \frac{P(O|i_t=q_i,\lambda^t)}{P(O|\lambda^t)} γt​(i)=P(O∣λt)P(O∣it​=qi​,λt)​
于是
πit+1=γ1(i)αijt+1=∑t=1T−1ξt(i,j)∑1T−1γt(i)bj(k)t+1=∑t=1Tγt(j)⋅I(ot=vk)∑t=1Tγt(j)\begin{aligned} &\pi_i^{t+1} =\gamma_1(i) \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{1}^{T-1}\gamma_t(i)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}\gamma_t(j)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}\gamma_t(j)} \end{aligned} ​πit+1​=γ1​(i)αijt+1​=∑1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​bj​(k)t+1=∑t=1T​γt​(j)∑t=1T​γt​(j)⋅I(ot​=vk​)​​

算法过程总结

(1)初始化

选取αij0,bj(k)0,πi0\alpha_{ij}^0,b_j(k)^0,\pi_i^0αij0​,bj​(k)0,πi0​

(2)递推
πit+1=γ1(i)αijt+1=∑t=1T−1ξt(i,j)∑1T−1γt(i)bj(k)t+1=∑t=1Tγt(j)⋅I(ot=vk)∑t=1Tγt(j)\begin{aligned} &\pi_i^{t+1} =\gamma_1(i) \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{1}^{T-1}\gamma_t(i)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}\gamma_t(j)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}\gamma_t(j)} \end{aligned} ​πit+1​=γ1​(i)αijt+1​=∑1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​bj​(k)t+1=∑t=1T​γt​(j)∑t=1T​γt​(j)⋅I(ot​=vk​)​​
(3)终止

得到模型参数λT=(πiT,AT,BT)\lambda^T = (\pi_i^T,A^T,B^T)λT=(πiT​,AT,BT)

算法实现

我们还是以盒子和球模型为例

生成观测序列

def generate(T):boxes = [[0,0,0,0,0,1,1,1,1,1],[0,0,0,1,1,1,1,1,1,1],[0,0,0,0,0,0,1,1,1,1],[0,0,0,0,0,0,0,0,1,1]]I = []O = []# 第一步,选择一个盒子index = np.random.choice(len(boxes))for t in range(T):I.append(index)# 第二步,抽球O.append(np.random.choice(boxes[index]))# 第三步,重新选择盒子if index==0:index = 1elif index==1 or index==2:index = np.random.choice([index-1,index-1,index+1,index+1,index+1])elif index==3:index = np.random.choice([index,index-1])return I,O

定义模型

class Model:def __init__(self,M,N) -> None:# 状态集合self.N = N# 观测集合self.M = Mdef _gamma(self,t,i):return self.alpha[t][i]*self.beta[t][i]/np.sum(self.alpha[t]*self.beta[t])def _xi(self,t,i,j):fenmu = 0for i1 in range(self.N):for j1 in range(self.N):fenmu += self.alpha[t][i1]*self.A[i1][j1]*self.B[j1][self.O[t+1]]*self.beta[t+1][j1]return (self.alpha[t][i]*self.A[i][j]*self.B[j][self.O[t+1]]*self.beta[t+1][j])/fenmudef backward(self):# 初值self.beta = np.ones(shape=(self.T,self.N))# 递推for t in reversed(range(1,self.T)):for i in range(self.N):for j in range(self.N):self.beta[t-1][i] += self.beta[t][j]*self.B[j][self.O[t]]*self.A[i][j]def forward(self):self.alpha = np.zeros(shape=(self.T,self.N),dtype=float)# 初值for i in range(self.N):self.alpha[0][i] = self.pi[i]*self.B[i][self.O[0]]# 递推for t in range(self.T-1):for i in range(self.N):for j in range(self.N):self.alpha[t+1][i] += self.alpha[t][j]*self.A[j][i]self.alpha[t+1][i] = self.alpha[t+1][i]*self.B[i][self.O[t]]def train(self,O,max_iter=100,toler=1e-5):self.O = Oself.T = len(O)# 初始化# pi是一个N维的向量self.pi = np.ones(shape=(self.N,))/self.N# A是一个NxN的矩阵self.A = np.ones(shape=(self.N,self.N))/self.N# B是一个NxM的矩阵self.B = np.ones(shape=(self.N,self.M))/self.M# pi是一个N维的向量pi = np.ones(shape=(self.N,))/self.N# A是一个NxN的矩阵A = np.ones(shape=(self.N,self.N))/self.N# B是一个NxM的矩阵B = np.ones(shape=(self.N,self.M))/self.M## 递推for epoch in range(max_iter):# 计算前向概率和后向概率self.forward()self.backward()# 计算gammafor i in range(self.N):pi[i] = self._gamma(1,i)for j in range(self.N):fenzi = 0fenmu = 0for t2 in range(self.T-1):fenzi += self._xi(t2,i,j)fenmu += self._gamma(t2,j)A[i][j] = fenzi/fenmufor j in range(self.N):for k in range(self.M):fenzi = 0fenmu = 0for t2 in range(self.T):fenzi += self._gamma(t2,j)*(self.O[t2]==k)fenmu += self._gamma(t2,j)B[j][k] = fenzi/fenmuif np.max(abs(self.pi - pi)) < toler and \np.max(abs(self.A - A)) < toler and \np.max(abs(self.B - B)) < toler:self.pi = piself.A = Aself.B = Breturn pi,A,Bself.pi = pi.copy()self.A = A.copy()self.B = B.copy()return self.pi,self.A,self.Bdef predict(self,O):self.O = Oself.T = len(O)self.forward()return np.sum(self.alpha[self.T-1])

训练

model = Model(2,4)
model.train(O,toler=1e-10)

输出如下

(array([0.25, 0.25, 0.25, 0.25]),array([[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25]]),array([[0.625, 0.375],[0.625, 0.375],[0.625, 0.375],[0.625, 0.375]]))

完全不对劲,但是可以理解。

监督式学习算法-极大似然估计

推导过Baum-Welch算法之后推导极大似然估计就轻松很多了

假设我们的样本如下
{(O1,I1),(O2,I2),⋯,(OS,IS)}\left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \cdots,\left(O_{S}, I_{S}\right)\right\} {(O1​,I1​),(O2​,I2​),⋯,(OS​,IS​)}
根据极大似然估计则有
λ^=argmaxλP(O,I∣λ)=argmaxλ∏j=1SP(Oj,Ij∣λ)=argmaxλ∑j=1SlogP(Oj,Ij∣λ)=argmaxλ∑j=1Slog(∏t=1Tbitj(ot)⋅∏t=2Tαit−1ttj⋅πi1j)=argmaxλ∑j=1S[∑t=1Tlog(bitj(ot))+∑t=2Tlog(αit−1ttj)+log(πi1j)]\begin{aligned} \hat\lambda &= \underset{\lambda}{argmax}\ P(O,I|\lambda) \\ &= \underset{\lambda}{argmax}\ \prod_{j=1}^{S}P(O_j,I_j|\lambda) \\ &= \underset{\lambda}{argmax}\ \sum_{j=1}^{S}logP(O_j,I_j|\lambda) \\ &= \underset{\lambda}{argmax}\ \sum_{j=1}^{S}log\left(\prod_{t=1}^{T}b_{i_t}^j(o_t)\cdot \prod_{t=2}^{T}\alpha_{i_{t-1}t_t}^j\cdot \pi_{i_1}^j\right) \\ &=\underset{\lambda}{argmax}\ \sum_{j=1}^{S}\left[\sum_{t=1}^{T}log(b_{i_t}^j(o_t))+ \sum_{t=2}^{T}log(\alpha_{i_{t-1}t_t}^j)+log(\pi_{i_1}^j)\right] \\ \end{aligned} λ^​=λargmax​ P(O,I∣λ)=λargmax​ j=1∏S​P(Oj​,Ij​∣λ)=λargmax​ j=1∑S​logP(Oj​,Ij​∣λ)=λargmax​ j=1∑S​log(t=1∏T​bit​j​(ot​)⋅t=2∏T​αit−1​tt​j​⋅πi1​j​)=λargmax​ j=1∑S​[t=1∑T​log(bit​j​(ot​))+t=2∑T​log(αit−1​tt​j​)+log(πi1​j​)]​
我们先估计π\piπ​​​(i1省略不写),与π\piπ相关的只有最后一项
π^=argmaxπ∑j=1Slog⁡πj\hat{\pi}=\underset{\pi}{argmax} \sum_{j=1}^{S} \log \pi^{j} π^=πargmax​j=1∑S​logπj
对π\piπ有约束
∑i=1Nπi=1\sum_{i=1}^{N}\pi_i = 1 i=1∑N​πi​=1
记拉格朗日函数
L(π,η)=∑j=1Slog⁡πj+η⋅(∑i=1Nπi−1)L(\pi, \eta)=\sum_{j=1}^{S} \log \pi^{j}+\eta \cdot\left(\sum_{i=1}^{N} \pi_i-1\right) L(π,η)=j=1∑S​logπj+η⋅(i=1∑N​πi​−1)
对πi\pi_iπi​求偏导
∂L∂πi=∑j=1SI(I1j=i)πi+η=0\frac{\partial L}{\partial \pi_{i}}=\frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{\pi_{i}}+\eta=0 ∂πi​∂L​=πi​∑j=1S​I(I1j​=i)​+η=0

∑j=1SI(I1j=i)+η⋅πi=0\sum_{j=1}^{S} I\left(I_{1 j}=i\right) + \eta \cdot \pi_i = 0 j=1∑S​I(I1j​=i)+η⋅πi​=0
对i求和
∑i=1N∑j=1SI(I1j=i)+η⋅∑i=1Nπi=0\sum_{i=1}^{N}\sum_{j=1}^{S}I(I_{1j}=i) + \eta \cdot \sum_{i=1}^{N}\pi_i = 0 i=1∑N​j=1∑S​I(I1j​=i)+η⋅i=1∑N​πi​=0

∑j=1S∑i=1NI(I1j=i)+η=0∑j=1S∑i=1NI(I1j=i)+η=0∑j=1S1+η=0η=−S\begin{aligned} &\sum_{j=1}^{S}\sum_{i=1}^{N}I(I_{1j}=i) + \eta = 0 \\ &\sum_{j=1}^{S}\sum_{i=1}^{N}I(I_{1j}=i) + \eta = 0 \\ &\sum_{j=1}^{S}1 +\eta = 0 \\ &\eta = -S \end{aligned} ​j=1∑S​i=1∑N​I(I1j​=i)+η=0j=1∑S​i=1∑N​I(I1j​=i)+η=0j=1∑S​1+η=0η=−S​
所以
π^i=−∑j=1SI(I1j=i)η=∑j=1SI(I1j=i)S\hat \pi_i = -\frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{\eta} = \frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{S} π^i​=−η∑j=1S​I(I1j​=i)​=S∑j=1S​I(I1j​=i)​
剩下A,B的推导过程,请允许我贴两张图,打公式打烦了

图里面写的有一些问题,希望读者阅读过程中可以自行发现,如果阅读的时候没发现,你看到下面也会发现的

于是我们得到最终估计为
π^i=∑j=1SI(i1j=qi)Sα^ij=∑k=1S∑t=1T−1I(itk=qi,it+1k=qj)∑k=1S∑t=1T−1I(itk=qi)b^j(k)=∑i=1S∑t=1TI(iti=qj,oti=vk)∑i=1S∑t=1TI(iti=qj)\begin{aligned} &\hat \pi_i = \frac{\sum_{j=1}^{S} I\left(i_1^j=q_i\right)}{S} \\ &\hat \alpha_{ij} = \frac{\sum_{k=1}^{S} \sum_{t=1}^{T-1} I\left(i_{t}^k=q_i, i_{t+1}^k=q_j\right)}{\sum_{k=1}^{S} \sum_{t=1}^{T-1} I\left(i_{t}^k=q_i\right)} \\ &\hat b_j(k) =\frac{\sum_{i=1}^{S} \sum_{t=1}^{T} I\left(i_t^i=q_{j}, o_{t}^i=v_{k}\right) }{\sum_{i=1}^{S} \sum_{t=1}^{T} I\left(i_{t}^i=q_{j}\right)} \end{aligned} ​π^i​=S∑j=1S​I(i1j​=qi​)​α^ij​=∑k=1S​∑t=1T−1​I(itk​=qi​)∑k=1S​∑t=1T−1​I(itk​=qi​,it+1k​=qj​)​b^j​(k)=∑i=1S​∑t=1T​I(iti​=qj​)∑i=1S​∑t=1T​I(iti​=qj​,oti​=vk​)​​

算法实现

生成数据

data = []
for i in range(100):I,O = generate(100)data.append([I,O])

定义模型

class SupervisedModel:def __init__(self,M,N) -> None:self.M = Mself.N = Ndef train(self,data):# data:Sx2xT# [[[i1,i2,...],[o1,o2,...]],[],[]]self.pi = np.zeros(shape=(self.N,))self.A = np.zeros(shape=(self.N,self.N))self.B = np.zeros(shape=(self.N,self.M))S = len(data)self.T = len(data[0][0])# 求 pifor i in range(self.N):for j in range(S):self.pi[i] += data[j][0][0]==iself.pi[i] = self.pi[i]/S# 求 Afor i in range(self.N):for j in range(self.N):fenzi = 0fenmu = 0for k in range(S):for t in range(self.T-1):fenzi += data[k][0][t]==i and data[k][0][t+1]==jfenmu += data[k][0][t]==iself.A[i][j] = fenzi/fenmu# 求 Bfor j in range(self.N):for k in range(self.M):fenzi = 0fenmu = 0for i in range(S):for t in range(self.T):fenzi += data[i][0][t]==j and data[i][1][t]==kfenmu += data[i][0][t]==jself.B[j][k] = fenzi/fenmureturn self.pi,self.A,self.Bdef predict(self,O):# 初值beta = np.ones(shape=(self.T,))# 递推for o in reversed(O):temp = np.zeros_like(beta)for i in range(self.T):for j in range(self.T):temp[i] += beta[j]*self.B[j][o]*self.A[i][j]beta = temp# 终止res = 0for i in range(self.T):res += self.B[i][O[0]]*beta[i]*self.pi[i]return res

训练模型

model = SupervisedModel(2,4)
model.train(data)
(array([0.19, 0.25, 0.2 , 0.36]),array([[0.        , 1.        , 0.        , 0.        ],[0.38930233, 0.        , 0.61069767, 0.        ],[0.        , 0.4073957 , 0.        , 0.5926043 ],[0.        , 0.        , 0.49933066, 0.50066934]]),array([[0.47313084, 0.52686916],[0.30207852, 0.69792148],[0.60162602, 0.39837398],[0.80851627, 0.19148373]]))

可以看到监督学习的效果非常好

三、解码问题(Decoding)

解码问题就是求argmaxP(I∣O,λ)argmaxP(I|O,\lambda)argmaxP(I∣O,λ)​

Viterbi算法

Viterbi算法事实上是一个动态规划的算法

这个图来自知乎

我们把概率当成距离

那么只要确定了唯一的终点,到这个终点的最大距离必然等于到前一个时间轴5个点的最大距离分别乘以这5个点到终点的距离

我们也可以用公式严格推导出这一性质

定义距离为
δt(i)=maxi1,i2,...,it−1P(o1,o2,...,ot,i1,i2,...,it−1,it=qi)\delta_t(i) = \underset{i_1,i_2,...,i_{t-1}}{max} P(o_1,o_2,...,o_t,i_1,i_2,...,i_{t-1},i_t=q_i) δt​(i)=i1​,i2​,...,it−1​max​P(o1​,o2​,...,ot​,i1​,i2​,...,it−1​,it​=qi​)

δi+1(i)=maxi1,i2,...,itP(o1,o2,...,ot,i1,i2,...,it−1,it,it+1=qi)=maxi1,i2,...,itP(ot+1∣it+1=qi)⋅P(o1,o2,...,ot,i1,i2,...,it,it+1=qi)=bi(ot+1)⋅maxi1,i2,...,itP(o1,o2,...,ot,i1,i2,...,it,it+1=qi)=bi(ot+1)⋅maxjmaxi1,i2,...,it−1P(o1,o2,...,ot,i1,i2,...,it=qj,it+1=qi)=bi(ot+1)⋅maxjmaxi1,i2,...,it−1P(it+1=qi∣it=qj)⋅P(o1,o2,...,ot,i1,i2,...,it=qj)=bi(ot+1)⋅maxjmaxi1,i2,...,it−1αji⋅P(o1,o2,...,ot,i1,i2,...,it=qj)=bi(ot+1)⋅maxj(αji⋅maxi1,i2,...,it−1P(o1,o2,...,ot,i1,i2,...,it=qj))=bi(ot+1)⋅maxj(αji⋅δt(j))\begin{aligned} \delta_{i+1}(i) &= \underset{i_1,i_2,...,i_t}{max} P(o_1,o_2,...,o_t,i_1,i_2,...,i_{t-1},i_t,i_{t+1}=q_i) \\ &= \underset{i_1,i_2,...,i_t}{max} P(o_{t+1}|i_{t+1}=q_i)\cdot P(o_1,o_2,...,o_t,i_1,i_2,...,i_t,i_{t+1}=q_i) \\ &=b_i(o_{t+1})\cdot \underset{i_1,i_2,...,i_t}{max} P(o_1,o_2,...,o_t,i_1,i_2,...,i_t,i_{t+1}=q_i) \\ &=b_i(o_{t+1})\cdot \underset{j}{max}\underset{i_1,i_2,...,i_{t-1}}{max}P(o_1,o_2,...,o_t,i_1,i_2,...,i_t=q_j,i_{t+1}=q_i) \\ &=b_i(o_{t+1})\cdot \underset{j}{max}\underset{i_1,i_2,...,i_{t-1}}{max}P(i_{t+1}=q_i|i_t=q_j)\cdot P(o_1,o_2,...,o_t,i_1,i_2,...,i_t=q_j) \\ &=b_i(o_{t+1})\cdot \underset{j}{max}\underset{i_1,i_2,...,i_{t-1}}{max}\alpha_{ji}\cdot P(o_1,o_2,...,o_t,i_1,i_2,...,i_t=q_j) \\ &=b_i(o_{t+1})\cdot \underset{j}{max}\left( \alpha_{ji}\cdot \underset{i_1,i_2,...,i_{t-1}}{max}P(o_1,o_2,...,o_t,i_1,i_2,...,i_t=q_j)\right) \\ &=b_i(o_{t+1})\cdot \underset{j}{max}\left( \alpha_{ji}\cdot \delta_t(j)\right) \\ \end{aligned} δi+1​(i)​=i1​,i2​,...,it​max​P(o1​,o2​,...,ot​,i1​,i2​,...,it−1​,it​,it+1​=qi​)=i1​,i2​,...,it​max​P(ot+1​∣it+1​=qi​)⋅P(o1​,o2​,...,ot​,i1​,i2​,...,it​,it+1​=qi​)=bi​(ot+1​)⋅i1​,i2​,...,it​max​P(o1​,o2​,...,ot​,i1​,i2​,...,it​,it+1​=qi​)=bi​(ot+1​)⋅jmax​i1​,i2​,...,it−1​max​P(o1​,o2​,...,ot​,i1​,i2​,...,it​=qj​,it+1​=qi​)=bi​(ot+1​)⋅jmax​i1​,i2​,...,it−1​max​P(it+1​=qi​∣it​=qj​)⋅P(o1​,o2​,...,ot​,i1​,i2​,...,it​=qj​)=bi​(ot+1​)⋅jmax​i1​,i2​,...,it−1​max​αji​⋅P(o1​,o2​,...,ot​,i1​,i2​,...,it​=qj​)=bi​(ot+1​)⋅jmax​(αji​⋅i1​,i2​,...,it−1​max​P(o1​,o2​,...,ot​,i1​,i2​,...,it​=qj​))=bi​(ot+1​)⋅jmax​(αji​⋅δt​(j))​

推导完毕

但是上面还没有给出路径

对于给定终点,我们要知道到达它的上一个点


ψt+1(i)=argmax⁡jδt(j)⋅aji\psi_{t+1}(i)=\underset{j}{\operatorname{argmax}} \delta_{t}(j)\cdot a_{ji} ψt+1​(i)=jargmax​δt​(j)⋅aji​

算法过程

(1)初值
δ1(i)=P(o1,i1=qi)=P(o1∣i1=qi)⋅P(i1=qi)=bi(o1)πi\delta_1(i) = P(o_1,i_1=q_i) = P(o_1\mid i_1=q_i)\cdot P(i_1=q_i) = b_i(o_1)\pi_i δ1​(i)=P(o1​,i1​=qi​)=P(o1​∣i1​=qi​)⋅P(i1​=qi​)=bi​(o1​)πi​
(2)递推
δi+1(i)=bi(ot+1)⋅maxj(αji⋅δt(j))ψt+1(i)=argmax⁡j(δt(j)⋅aji)\begin{aligned} \delta_{i+1}(i) &= b_i(o_{t+1})\cdot \underset{j}{max}\left( \alpha_{ji}\cdot \delta_t(j)\right) \\ \psi_{t+1}(i) &= \underset{j}{\operatorname{argmax}}(\delta_{t}(j)\cdot a_{ji}) \end{aligned} δi+1​(i)ψt+1​(i)​=bi​(ot+1​)⋅jmax​(αji​⋅δt​(j))=jargmax​(δt​(j)⋅aji​)​
(3)终止
P∗=max1⩽i⩽NδT(i)iT∗=argmax1⩽i⩽N[δT(i)]\begin{array}{c} P^{*}=\underset{{1 \leqslant i \leqslant N}}{max} \delta_{T}(i) \\ i_{T}^{*}=\underset{{1 \leqslant i \leqslant N}}{argmax}\left[\delta_{T}(i)\right] \end{array} P∗=1⩽i⩽Nmax​δT​(i)iT∗​=1⩽i⩽Nargmax​[δT​(i)]​
(4)回溯(对 ttt​​ 从T−1,T−2,...,1T-1,T-2,...,1T−1,T−2,...,1​​)
it∗=argmax1⩽i⩽N[δt(i)]i_t^* = \underset{{1 \leqslant i \leqslant N}}{argmax}\left[\delta_{t}(i)\right] it∗​=1⩽i⩽Nargmax​[δt​(i)]

算法实现
import numpy as nppi= [0.25,0.25,0.25,0.25]
A = [[0, 1, 0, 0 ],[0.4,0,0.6,0],[0,0.4,0,0.6],[0,0,0.5,0.5]
]
B = [[0.5,0.5],[0.3,0.7],[0.6,0.4],[0.8,0.2]
]

定义模型

class Model:def __init__(self,pi,A,B) -> None:self.pi = np.array(pi)self.A = np.array(A)self.B = np.array(B)self.N = len(A)self.M = len(B[0])def decode(self,O):T = len(O)delta = np.zeros(shape=(T,self.N))fi = np.zeros(shape=(T,self.N),dtype=int)# 初始化delta[0] = self.B[:,O[0]]*self.pi# 前向计算for t in range(0,T-1):for i in range(self.N):p = self.A[:,i]*delta[t]delta[t+1][i] = self.B[i,O[t+1]]*p.max()fi[t+1][i] = p.argmax()#回溯I = []index = delta[T-1].argmax()I.append(index)for t in reversed(range(1,T)):index = fi[t,index]I.insert(0,index)return I

解码

model = Model(pi,A,B)
I,O = generate(5)
print(I)
print(O)
model.decode(O)

白板推导系列Pytorch-隐马尔可夫模型(HMM)相关推荐

  1. 用隐马尔可夫模型(HMM)做命名实体识别——NER系列(二)

    上一篇文章里<用规则做命名实体识别--NER系列(一)>,介绍了最简单的做命名实体识别的方法–规则.这一篇,我们循序渐进,继续介绍下一个模型--隐马尔可夫模型. 隐马尔可夫模型,看上去,和 ...

  2. 《两日算法系列》之第四篇:隐马尔可夫模型HMM

    目录 1. 定义与假设 2. 相关概念的表示 3. 三个基本问题 3.1. 概率计算问题 3.2. 学习问题 3.3. 预测问题 总结 1. 定义与假设 李雷雷所在城市的天气有三种情况,分别是:晴天. ...

  3. 用隐马尔可夫模型(HMM)做命名实体识别——NER系列(一)

    原博python2写的,文末是我改的python3代码 隐马尔可夫模型,看上去,和序列标注问题是天然适配的,所以自然而然的,早期很多做命名实体识别和词性标注的算法,都采用了这个模型. 这篇文章我将基于 ...

  4. 【语音识别】隐马尔可夫模型HMM

    隐马尔可夫模型 (HMM)Hidden Markov Model · 定义 隐马尔可夫模型是关于时间序列的概率模型 • 描述由一个隐藏的马尔可夫链随机生成不可观测的状态序列(state sequenc ...

  5. 隐马尔科夫模型HMM(三)鲍姆-韦尔奇算法求解HMM参数

    在本篇我们会讨论HMM模型参数求解的问题,这个问题在HMM三个问题里算是最复杂的.在研究这个问题之前,建议先阅读这个系列的前两篇以熟悉HMM模型和HMM的前向后向算法,以及EM算法原理总结,这些在本篇 ...

  6. 一、隐马尔科夫模型HMM

    隐马尔科夫模型HMM(一)HMM模型基础 隐马尔科夫模型(Hidden Markov Model,以下简称HMM)是比较经典的机器学习模型了,它在语言识别,自然语言处理,模式识别等领域得到广泛的应用. ...

  7. hmm 求隐藏序列_隐马尔可夫模型HMM

    以下内容来自刘建平Pinard-博客园的学习笔记,总结如下: 1 隐马尔可夫模型HMM 隐马尔科夫模型(Hidden Markov Model,以下简称HMM)是比较经典的机器学习模型了,它在语言识别 ...

  8. viterbi维特比算法和隐马尔可夫模型(HMM)

    阅读目录 隐马尔可夫模型(HMM) 回到目录 隐马尔可夫模型(HMM) 原文地址:http://www.cnblogs.com/jacklu/p/7753471.html 本文结合了王晓刚老师的ENG ...

  9. 隐马尔科夫模型HMM自学 (3)

    Viterbi Algorithm 本来想明天再把后面的部分写好,可是睡觉今天是节日呢?一时情不自禁就有打开电脑.......... 找到可能性最大的隐含状态序列 崔晓源 翻译 多数情况下,我们都希望 ...

  10. 隐马尔科夫模型(HMM)理解与总结

    目录 1. HMM模型概念 1.1 HMM定义 1.2 HMM实例 2 HMM的三个问题: 2.1 计算观察序列的概率 2.1.1 前向算法 2.1.2 后向算法 2.1.3 利用前向概率和后向概率计 ...

最新文章

  1. 【c语言】float强制转换为int类型
  2. PHP之提取多维数组指定列的方法
  3. 安装Fedora 15后需做的25件事情
  4. 《敏捷软件开发》学习笔记 第20章
  5. python坐标轴刻度设置对数_用对数刻度设置刻度
  6. linux 多线程基础2
  7. static与extern 的作用、typedef关键字
  8. 谷歌浏览器提示因应用程序的并行配置不正确无法启动
  9. 我们为什么要搞长沙.NET技术社区(4)
  10. 监控Linux磁盘情况,进行邮件告警
  11. CSS实现炫酷动画背景
  12. 计算机二级msoffice应用基础,计算机二级MSOffice高级应用考试基础知识
  13. 超频技术之内存“时序”重要参数设置解说
  14. Python|每日一练|素数对|优化算法|素数|素数对:找出素数对
  15. 算法面试必备-----贝壳算法面试准备
  16. 山东大学数字图像处理实验(五)
  17. 针对meshlab应用纹理出现错误的解决方法(You need to have at least one valid raster layer in your project)
  18. 成都信息工程大学计算机学院复试指南
  19. 百度开发者认证流程及所需信息
  20. 产品 电信nb接口调用_基于NB-IoT平台数据透传模式的应用接入平台设计方法与流程...

热门文章

  1. Windows下部署安装Docker
  2. Java Hex 16进制的 byte String 转换类
  3. 数组排序使得数组负数在正数左边且按照原来的顺序
  4. MongoDB是我想要的存储么?
  5. STM32f407---oled屏幕配套取字模软件使用
  6. 把项目通过maven生产源码包和文档包并发布到自己的私服上
  7. IDEA的Database表的基本操作
  8. 第六百二十二天 how can I 坚持
  9. ATC空管系统的实时控制软件系统分析
  10. 404页面应该怎么做?