HMM 博客汇总

  • 问题三合一(为了防止部分读者厌烦,故而单独发了三个问题的博客)
  • 概率计算问题
  • 学习问题
  • 解码问题
  • 词性标注

隐马尔可夫模型的学习问题分为监督学习和非监督学习问题,监督学习采用极大似然估计,非监督学习采用Baum-Welch算法,我们直接先讲Baum-Welch算法。

Baum-Welch算法(EM算法)

事实上,Baum-Welch算法是EM算法在HMM学习问题中的应用。

我们要估计的λ^\hat \lambdaλ^应该为
λ^=argmaxλP(O∣λ)\hat \lambda = \underset{\lambda}{argmax} P(O|\lambda) λ^=λargmax​P(O∣λ)
我们还是先把EM算法的公式写出来先
θt+1=argmaxθ∫zlogP(X,Z∣θ)⋅P(Z∣X,θ)dz\theta^{t+1} = \underset{\theta}{argmax}\int_zlogP(X,Z|\theta)\cdot P(Z|X,\theta)dz θt+1=θargmax​∫z​logP(X,Z∣θ)⋅P(Z∣X,θ)dz
其中Z表示隐变量,X表示观测变量,那在HMM中,隐变量是I,观测变量是O,且是离散的
λt+1=argmaxθ∑IlogP(O,I∣λ)⋅P(I∣O,λt)=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)P(O∣λt)=argmaxθ1P(O∣λt)∑IlogP(O,I∣λ)⋅P(O,I∣λt)=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)\begin{aligned} \lambda^{t+1} &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(I|O,\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot \frac{P(O,I|\lambda^t)}{P(O|\lambda^t)} \\ &= \underset{\theta}{argmax}\frac{1}{P(O|\lambda^t)}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ \end{aligned} λt+1​=θargmax​I∑​logP(O,I∣λ)⋅P(I∣O,λt)=θargmax​I∑​logP(O,I∣λ)⋅P(O∣λt)P(O,I∣λt)​=θargmax​P(O∣λt)1​I∑​logP(O,I∣λ)⋅P(O,I∣λt)=θargmax​I∑​logP(O,I∣λ)⋅P(O,I∣λt)​
其中
F(T)=P(O,I∣λ)=P(o1,o2,...,oT,i1,i2,...,iT∣λ)=P(oT∣o1,o2,...,oT−1,i1,i2,...,iT,λ)⋅P(o1,o2,...,oT−1,i1,i2,...,iT∣λ)=P(oT∣iT,λ)⋅P(o1,o2,...,oT−1,i1,i2,...,iT−1∣λ)⋅P(iT∣o1,o2,...,oT−1,i1,i2,...iT−1,λ)=biT(oT)⋅F(T−1)⋅αiT−1iT\begin{aligned} F(T) = P(O,I\mid \lambda) &= P(o_1,o_2,...,o_T,i_1,i_2,...,i_T|\lambda) \\ &= P(o_T\mid o_1,o_2,...,o_{T-1},i_1,i_2,...,i_T,\lambda)\cdot P(o_1,o_2,...,o_{T-1},i_1,i_2,...,i_T|\lambda) \\ &= P(o_T\mid i_T,\lambda)\cdot P(o_1,o_2,...,o_{T-1},i_1,i_2,...,i_{T-1}|\lambda)\cdot P(i_T|o_1,o_2,...,o_{T-1},i_1,i_2,...i_{T-1},\lambda) \\ &= b_{i_T}(o_T)\cdot F(T-1)\cdot \alpha_{i_{T-1}i_T} \end{aligned} F(T)=P(O,I∣λ)​=P(o1​,o2​,...,oT​,i1​,i2​,...,iT​∣λ)=P(oT​∣o1​,o2​,...,oT−1​,i1​,i2​,...,iT​,λ)⋅P(o1​,o2​,...,oT−1​,i1​,i2​,...,iT​∣λ)=P(oT​∣iT​,λ)⋅P(o1​,o2​,...,oT−1​,i1​,i2​,...,iT−1​∣λ)⋅P(iT​∣o1​,o2​,...,oT−1​,i1​,i2​,...iT−1​,λ)=biT​​(oT​)⋅F(T−1)⋅αiT−1​iT​​​
故(简单的等比数列)
F(T)=∏t=2Tbit(ot)⋅αit−1it⋅F(1)=[∏t=2Tbit(ot)⋅αit−1it]⋅P(o1,i1∣λ)=[∏t=2Tbit(ot)⋅αit−1it]⋅P(o1∣i1,λ)⋅P(i1∣λ)=[∏t=2Tbit(ot)⋅αit−1it]⋅bi1(o1)⋅πi1=[∏t=1Tbit(ot)]⋅[∏t=2Tαit−1it]⋅πi1\begin{aligned} F(T) &= \prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}\cdot F(1) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot P(o_1,i_1\mid \lambda) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot P(o_1\mid i_1,\lambda)\cdot P(i_1\mid \lambda) \\ &= [\prod_{t=2}^{T}b_{i_t}(o_t)\cdot \alpha_{i_{t-1}i_t}]\cdot b_{i_1}(o_1)\cdot \pi_{i_1} \\ &= [\prod_{t=1}^{T}b_{i_t}(o_t)]\cdot [\prod_{t=2}^{T}\alpha_{i_{t-1}i_t}]\cdot \pi_{i_1} \\ \end{aligned} F(T)​=t=2∏T​bit​​(ot​)⋅αit−1​it​​⋅F(1)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅P(o1​,i1​∣λ)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅P(o1​∣i1​,λ)⋅P(i1​∣λ)=[t=2∏T​bit​​(ot​)⋅αit−1​it​​]⋅bi1​​(o1​)⋅πi1​​=[t=1∏T​bit​​(ot​)]⋅[t=2∏T​αit−1​it​​]⋅πi1​​​
现在我们已经得到
P(O,I∣λ)=[∏t=1Tbit(ot)]⋅[∏t=2Tαit−1it]⋅πi1P(O,I\mid \lambda) = [\prod_{t=1}^{T}b_{i_t}(o_t)]\cdot [\prod_{t=2}^{T}\alpha_{i_{t-1}i_t}]\cdot \pi_{i_1} P(O,I∣λ)=[t=1∏T​bit​​(ot​)]⋅[t=2∏T​αit−1​it​​]⋅πi1​​
然后
λt+1=argmaxθ∑IlogP(O,I∣λ)⋅P(O,I∣λt)=argmaxθ∑I(∑t=1Tlog(bit(ot))+∑t=2Tlog(αit−1it)+log(πi1))⋅P(O,I∣λt)\begin{aligned} \lambda^{t+1} &= \underset{\theta}{argmax}\sum_{I}logP(O,I|\lambda)\cdot P(O,I|\lambda^t) \\ &= \underset{\theta}{argmax}\sum_{I}(\sum_{t=1}^Tlog(b_{i_t}(o_t))+\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})+log(\pi_{i_1}))\cdot P(O,I|\lambda^t) \end{aligned} λt+1​=θargmax​I∑​logP(O,I∣λ)⋅P(O,I∣λt)=θargmax​I∑​(t=1∑T​log(bit​​(ot​))+t=2∑T​log(αit−1​it​​)+log(πi1​​))⋅P(O,I∣λt)​

Q(λ,λt)=argmaxθ∑I(∑t=1Tlog(bit(ot))+∑t=2Tlog(αit−1it)+log(πi1))⋅P(O,I∣λt)Q(\lambda,\lambda^t) = \underset{\theta}{argmax}\sum_{I}(\sum_{t=1}^Tlog(b_{i_t}(o_t))+\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})+log(\pi_{i_1}))\cdot P(O,I|\lambda^t) Q(λ,λt)=θargmax​I∑​(t=1∑T​log(bit​​(ot​))+t=2∑T​log(αit−1​it​​)+log(πi1​​))⋅P(O,I∣λt)
为了求πt+1\pi^{t+1}πt+1,我们需要求πit+1\pi_i^{t+1}πit+1​

在∑i=1Nπi=1\sum_{i=1}^N \pi_i = 1∑i=1N​πi​=1的约束下,

记拉格朗日函数
L(π,η)=∑I(log(πi1)⋅P(O∣I,λt))+η⋅(∑i=1Nπi−1)=∑i1...∑iT(log(πi1)⋅P(O∣I,λt))+η⋅(∑i=1Nπi−1)=∑i1log(πi1)⋅P(O∣i1,λt)+η⋅(∑i=1Nπi−1)=∑i=1Nlog(πi)⋅P(O∣i1=qi,λt)+η⋅(∑i=1Nπi−1)\begin{aligned} L(\pi,\eta) &= \sum_{I}(log(\pi_{i_1})\cdot P(O|I,\lambda^t)) +\eta\cdot (\sum_{i=1}^N \pi_i-1)\\ &= \sum_{i_1}...\sum_{i_T}(log(\pi_{i_1})\cdot P(O|I,\lambda^t)) +\eta\cdot (\sum_{i=1}^N \pi_i-1)\\ &= \sum_{i_1}log(\pi_{i_1})\cdot P(O|i_1,\lambda^t)+\eta\cdot (\sum_{i=1}^N \pi_i-1) \\ &= \sum_{i=1}^Nlog(\pi_i)\cdot P(O|i_1=q_i,\lambda^t) +\eta\cdot (\sum_{i=1}^N \pi_i-1) \end{aligned} L(π,η)​=I∑​(log(πi1​​)⋅P(O∣I,λt))+η⋅(i=1∑N​πi​−1)=i1​∑​...iT​∑​(log(πi1​​)⋅P(O∣I,λt))+η⋅(i=1∑N​πi​−1)=i1​∑​log(πi1​​)⋅P(O∣i1​,λt)+η⋅(i=1∑N​πi​−1)=i=1∑N​log(πi​)⋅P(O∣i1​=qi​,λt)+η⋅(i=1∑N​πi​−1)​

∂L∂πi=P(O∣i1=qi,λt)πi+η=0\begin{aligned} \frac{\partial L}{\partial \pi_i} &= \frac{P(O|i_1=q_i,\lambda^t)}{\pi_i} +\eta = 0 \end{aligned} ∂πi​∂L​​=πi​P(O∣i1​=qi​,λt)​+η=0​

我们对得到的N个式子求和即可求出η\etaη
∑i=1N(P(O∣i1=qi,λt)+η⋅πi)=∑i=1NP(O∣i1=qi,λt)+η=0\begin{aligned} \sum_{i=1}^{N}(P(O|i_1=q_i,\lambda^t) +\eta\cdot \pi_i) \\ =\sum_{i=1}^{N}P(O|i_1=q_i,\lambda^t) +\eta = 0 \end{aligned} i=1∑N​(P(O∣i1​=qi​,λt)+η⋅πi​)=i=1∑N​P(O∣i1​=qi​,λt)+η=0​

η=−∑i=1NP(O∣i1=qi,λt)=−P(O∣λt)\eta = -\sum_{i=1}^{N}P(O|i_1=q_i,\lambda^t) =-P(O|\lambda^t) η=−i=1∑N​P(O∣i1​=qi​,λt)=−P(O∣λt)

πit+1=−P(O∣i1=qi,λt)η=P(O∣i1=qi,λt)P(O∣λt)\pi_i^{t+1} = -\frac{P(O|i_1=q_i,\lambda^t)}{\eta} = \frac{P(O|i_1=q_i,\lambda^t)}{P(O|\lambda^t)} πit+1​=−ηP(O∣i1​=qi​,λt)​=P(O∣λt)P(O∣i1​=qi​,λt)​
为了求At+1A^{t+1}At+1,我们即需要求αijt+1\alpha_{ij}^{t+1}αijt+1​

在状态转移矩阵的每一行和为1的约束下,定义拉格朗日函数
L(α,η)=∑I(∑t=2Tlog(αit−1it)⋅P(O∣I,λt))+∑i=1Nηi⋅(∑j=1Nαij−1)=∑i=1N∑j=1N∑t=2Tlog⁡aijP(O,it−1=i,it=j∣λt)+∑i=1Nηi⋅(∑j=1Nαij−1)\begin{aligned} L(\alpha,\eta) &= \sum_{I}(\sum_{t=2}^{T}log(\alpha_{i_{t-1}i_t})\cdot P(O|I,\lambda^t)) + \sum_{i=1}^{N}\eta_i\cdot (\sum_{j=1}^N \alpha_{ij}-1)\\ &=\sum_{i=1}^{N} \sum_{j=1}^{N} \sum_{t=2}^{T} \log a_{i j} P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)+\sum_{i=1}^{N}\eta_i\cdot (\sum_{j=1}^N \alpha_{ij}-1)\\ \end{aligned} L(α,η)​=I∑​(t=2∑T​log(αit−1​it​​)⋅P(O∣I,λt))+i=1∑N​ηi​⋅(j=1∑N​αij​−1)=i=1∑N​j=1∑N​t=2∑T​logaij​P(O,it−1​=i,it​=j∣λt)+i=1∑N​ηi​⋅(j=1∑N​αij​−1)​
然后对αij\alpha_{ij}αij​求偏导
∂L∂αij=∑t=2TP(O,it−1=i,it=j∣λt)αij+ηi=0\frac{\partial L}{\partial \alpha_{ij}} = \sum_{t=2}^{T}\frac{P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\alpha_{ij}}+\eta_i = 0 ∂αij​∂L​=t=2∑T​αij​P(O,it−1​=i,it​=j∣λt)​+ηi​=0

∑t=2TP(O,it−1=i,it=j∣λt)+ηi⋅αij=0\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right) + \eta_i\cdot \alpha_{ij} = 0 t=2∑T​P(O,it−1​=i,it​=j∣λt)+ηi​⋅αij​=0

我们对j求和
∑j=1N∑t=2TP(O,it−1=i,it=j∣λt)+ηi⋅∑j=1Nαij=0\sum_{j=1}^{N}\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)+\eta_i\cdot \sum_{j=1}^N \alpha_{ij} = 0 j=1∑N​t=2∑T​P(O,it−1​=i,it​=j∣λt)+ηi​⋅j=1∑N​αij​=0
得到
ηi=−∑j=1N∑t=2TP(O,it−1=qi,it=qj∣λt)=∑t=2TP(O,it−1=qi∣λt)\eta_i = -\sum_{j=1}^{N}\sum_{t=2}^{T}P\left(O, i_{t-1}=q_i, i_{t}=q_j \mid \lambda^t\right) = \sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t) ηi​=−j=1∑N​t=2∑T​P(O,it−1​=qi​,it​=qj​∣λt)=t=2∑T​P(O,it−1​=qi​∣λt)

αijt+1=−∑t=2TP(O,it−1=i,it=j∣λt)ηi=∑t=2TP(O,it−1=i,it=j∣λt)∑t=2TP(O,it−1=qi∣λt)\begin{aligned} \alpha_{ij}^{t+1} &= -\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\eta_i} \\ &=\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t)} \end{aligned} αijt+1​​=−ηi​∑t=2T​P(O,it−1​=i,it​=j∣λt)​=∑t=2T​P(O,it−1​=qi​∣λt)∑t=2T​P(O,it−1​=i,it​=j∣λt)​​

At+1=[αijt+1]N×NA^{t+1} = [\alpha_{ij}^{t+1}]_{N\times N} At+1=[αijt+1​]N×N​
为了求Bt+1B^{t+1}Bt+1,我们需要求bj(k)t+1b_j(k)^{t+1}bj​(k)t+1

在每行和为1的约束下,得到拉格朗日函数如下
L(b,η)=∑I[∑t=1Tlog(bit(ot))]⋅P(O,I∣λt)+∑j=1Nηj⋅[∑k=1Mbj(k)−1]=∑j=1N∑t=1Tlog⁡bj(ot)P(O,it=qj∣λt)+∑j=1Nηj⋅[∑k=1Mbj(k)−1]\begin{aligned} L(b,\eta) &= \sum_{I}[\sum_{t=1}^Tlog(b_{i_t}(o_t))]\cdot P(O,I|\lambda^t) + \sum_{j=1}^{N}\eta_j\cdot [\sum_{k=1}^{M}b_j(k)-1]\\ &=\sum_{j=1}^{N} \sum_{t=1}^{T} \log b_{j}\left(o_{t}\right) P(O, i_{t}=q_j \mid \lambda^t)+\sum_{j=1}^{N}\eta_j\cdot [\sum_{k=1}^{M}b_j(k)-1]\\ \end{aligned} L(b,η)​=I∑​[t=1∑T​log(bit​​(ot​))]⋅P(O,I∣λt)+j=1∑N​ηj​⋅[k=1∑M​bj​(k)−1]=j=1∑N​t=1∑T​logbj​(ot​)P(O,it​=qj​∣λt)+j=1∑N​ηj​⋅[k=1∑M​bj​(k)−1]​
然后对bj(k)b_j(k)bj​(k)求偏导
∂L∂bj(k)=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)bj(k)+ηj=0\begin{aligned} \frac{\partial L}{\partial b_j(k)} = \sum_{t=1}^{T}\frac{P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{b_j(k)}+\eta_j = 0 \end{aligned} ∂bj​(k)∂L​=t=1∑T​bj​(k)P(O,it​=qj​∣λt)⋅I(ot​=vk​)​+ηj​=0​

∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)+ηj⋅bj(k)=0\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)+\eta_j\cdot b_j(k)= 0 t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)+ηj​⋅bj​(k)=0

对k求和
∑k=1M∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)+ηj=0\sum_{k=1}^{M}\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)+\eta_j = 0 k=1∑M​t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)+ηj​=0

ηj=−∑k=1M∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)=∑t=1TP(O,it=qj∣λt)∑k=1MI(ot=vk)=∑t=1TP(O,it=qj∣λt)\begin{aligned} \eta_j &= -\sum_{k=1}^{M}\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k) \\ &=\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\sum_{k=1}^MI(o_t=v_k) \\ &=\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t) \end{aligned} ηj​​=−k=1∑M​t=1∑T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)=t=1∑T​P(O,it​=qj​∣λt)k=1∑M​I(ot​=vk​)=t=1∑T​P(O,it​=qj​∣λt)​
从而得
bj(k)t+1=−∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)ηj=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)∑t=1TP(O,it=qj∣λt)\begin{aligned} b_j(k)^{t+1} &= -\frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\eta_j} \\ &=\frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)} \end{aligned} bj​(k)t+1​=−ηj​∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​=∑t=1T​P(O,it​=qj​∣λt)∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​​
综上,得到递推式如下:
πit+1=P(O∣i1=qi,λt)P(O∣λt)αijt+1=∑t=2TP(O,it−1=i,it=j∣λt)∑t=2TP(O,it−1=qi∣λt)bj(k)t+1=∑t=1TP(O,it=qj∣λt)⋅I(ot=vk)∑t=1TP(O,it=qj∣λt)\begin{aligned} &\pi_i^{t+1} = \frac{P(O|i_1=q_i,\lambda^t)}{P(O|\lambda^t)} \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=2}^{T}P\left(O, i_{t-1}=i, i_{t}=j \mid \lambda^t\right)}{\sum_{t=2}^{T}P(O,i_{t-1}=q_i|\lambda^t)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}P(O,i_t=q_j\mid \lambda^t)} \end{aligned} ​πit+1​=P(O∣λt)P(O∣i1​=qi​,λt)​αijt+1​=∑t=2T​P(O,it−1​=qi​∣λt)∑t=2T​P(O,it−1​=i,it​=j∣λt)​bj​(k)t+1=∑t=1T​P(O,it​=qj​∣λt)∑t=1T​P(O,it​=qj​∣λt)⋅I(ot​=vk​)​​

定义
ξt(i,j)=p(O,it=qi,it+1=qj∣λt)P(O∣λt)\begin{aligned} \xi_t(i,j) = \frac{p(O,i_{t}=q_i,i_{t+1}=q_j|\lambda^t)}{P(O|\lambda^t)} \end{aligned} ξt​(i,j)=P(O∣λt)p(O,it​=qi​,it+1​=qj​∣λt)​​

γt(i)=P(O∣it=qi,λt)P(O∣λt)\gamma_t(i) = \frac{P(O|i_t=q_i,\lambda^t)}{P(O|\lambda^t)} γt​(i)=P(O∣λt)P(O∣it​=qi​,λt)​

于是
πit+1=γ1(i)αijt+1=∑t=1T−1ξt(i,j)∑1T−1γt(i)bj(k)t+1=∑t=1Tγt(j)⋅I(ot=vk)∑t=1Tγt(j)\begin{aligned} &\pi_i^{t+1} =\gamma_1(i) \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{1}^{T-1}\gamma_t(i)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}\gamma_t(j)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}\gamma_t(j)} \end{aligned} ​πit+1​=γ1​(i)αijt+1​=∑1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​bj​(k)t+1=∑t=1T​γt​(j)∑t=1T​γt​(j)⋅I(ot​=vk​)​​

算法过程总结

(1)初始化

选取αij0,bj(k)0,πi0\alpha_{ij}^0,b_j(k)^0,\pi_i^0αij0​,bj​(k)0,πi0​

(2)递推
πit+1=γ1(i)αijt+1=∑t=1T−1ξt(i,j)∑1T−1γt(i)bj(k)t+1=∑t=1Tγt(j)⋅I(ot=vk)∑t=1Tγt(j)\begin{aligned} &\pi_i^{t+1} =\gamma_1(i) \\ \\ &\alpha_{ij}^{t+1} =\frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{1}^{T-1}\gamma_t(i)} \\ \\ &b_j(k)^{t+1} = \frac{\sum_{t=1}^{T}\gamma_t(j)\cdot I(o_t=v_k)}{\sum_{t=1}^{T}\gamma_t(j)} \end{aligned} ​πit+1​=γ1​(i)αijt+1​=∑1T−1​γt​(i)∑t=1T−1​ξt​(i,j)​bj​(k)t+1=∑t=1T​γt​(j)∑t=1T​γt​(j)⋅I(ot​=vk​)​​
(3)终止

得到模型参数λT=(πiT,AT,BT)\lambda^T = (\pi_i^T,A^T,B^T)λT=(πiT​,AT,BT)

补充:忘了讲γ\gammaγ和ξ\xiξ的求法,读者可尝试自行推导,或者查看《统计学习方法》书中hmm本节内容,然后可与下方_gamma函数和_xi对照

算法实现

我们还是以盒子和球模型为例

生成观测序列

def generate(T):boxes = [[0,0,0,0,0,1,1,1,1,1],[0,0,0,1,1,1,1,1,1,1],[0,0,0,0,0,0,1,1,1,1],[0,0,0,0,0,0,0,0,1,1]]I = []O = []# 第一步,选择一个盒子index = np.random.choice(len(boxes))for t in range(T):I.append(index)# 第二步,抽球O.append(np.random.choice(boxes[index]))# 第三步,重新选择盒子if index==0:index = 1elif index==1 or index==2:index = np.random.choice([index-1,index-1,index+1,index+1,index+1])elif index==3:index = np.random.choice([index,index-1])return I,O

定义模型

class Model:def __init__(self,M,N) -> None:# 状态集合self.N = N# 观测集合self.M = Mdef _gamma(self,t,i):return self.alpha[t][i]*self.beta[t][i]/np.sum(self.alpha[t]*self.beta[t])def _xi(self,t,i,j):fenmu = 0for i1 in range(self.N):for j1 in range(self.N):fenmu += self.alpha[t][i1]*self.A[i1][j1]*self.B[j1][self.O[t+1]]*self.beta[t+1][j1]return (self.alpha[t][i]*self.A[i][j]*self.B[j][self.O[t+1]]*self.beta[t+1][j])/fenmudef backward(self):# 初值self.beta = np.ones(shape=(self.T,self.N))# 递推for t in reversed(range(1,self.T)):for i in range(self.N):for j in range(self.N):self.beta[t-1][i] += self.beta[t][j]*self.B[j][self.O[t]]*self.A[i][j]def forward(self):self.alpha = np.zeros(shape=(self.T,self.N),dtype=float)# 初值for i in range(self.N):self.alpha[0][i] = self.pi[i]*self.B[i][self.O[0]]# 递推for t in range(self.T-1):for i in range(self.N):for j in range(self.N):self.alpha[t+1][i] += self.alpha[t][j]*self.A[j][i]self.alpha[t+1][i] = self.alpha[t+1][i]*self.B[i][self.O[t]]def train(self,O,max_iter=100,toler=1e-5):self.O = Oself.T = len(O)# 初始化# pi是一个N维的向量self.pi = np.ones(shape=(self.N,))/self.N# A是一个NxN的矩阵self.A = np.ones(shape=(self.N,self.N))/self.N# B是一个NxM的矩阵self.B = np.ones(shape=(self.N,self.M))/self.M# pi是一个N维的向量pi = np.ones(shape=(self.N,))/self.N# A是一个NxN的矩阵A = np.ones(shape=(self.N,self.N))/self.N# B是一个NxM的矩阵B = np.ones(shape=(self.N,self.M))/self.M## 递推for epoch in range(max_iter):# 计算前向概率和后向概率self.forward()self.backward()# 计算gammafor i in range(self.N):pi[i] = self._gamma(1,i)for j in range(self.N):fenzi = 0fenmu = 0for t2 in range(self.T-1):fenzi += self._xi(t2,i,j)fenmu += self._gamma(t2,j)A[i][j] = fenzi/fenmufor j in range(self.N):for k in range(self.M):fenzi = 0fenmu = 0for t2 in range(self.T):fenzi += self._gamma(t2,j)*(self.O[t2]==k)fenmu += self._gamma(t2,j)B[j][k] = fenzi/fenmuif np.max(abs(self.pi - pi)) < toler and \np.max(abs(self.A - A)) < toler and \np.max(abs(self.B - B)) < toler:self.pi = piself.A = Aself.B = Breturn pi,A,Bself.pi = pi.copy()self.A = A.copy()self.B = B.copy()return self.pi,self.A,self.Bdef predict(self,O):self.O = Oself.T = len(O)self.forward()return np.sum(self.alpha[self.T-1])

训练

model = Model(2,4)
model.train(O,toler=1e-10)

输出如下

(array([0.25, 0.25, 0.25, 0.25]),array([[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25],[0.25, 0.25, 0.25, 0.25]]),array([[0.625, 0.375],[0.625, 0.375],[0.625, 0.375],[0.625, 0.375]]))

完全不对劲,但是可以理解。

监督式学习算法-极大似然估计

推导过Baum-Welch算法之后推导极大似然估计就轻松很多了

假设我们的样本如下
{(O1,I1),(O2,I2),⋯,(OS,IS)}\left\{\left(O_{1}, I_{1}\right),\left(O_{2}, I_{2}\right), \cdots,\left(O_{S}, I_{S}\right)\right\} {(O1​,I1​),(O2​,I2​),⋯,(OS​,IS​)}
根据极大似然估计则有
λ^=argmaxλP(O,I∣λ)=argmaxλ∏j=1SP(Oj,Ij∣λ)=argmaxλ∑j=1SlogP(Oj,Ij∣λ)=argmaxλ∑j=1Slog(∏t=1Tbitj(ot)⋅∏t=2Tαit−1ttj⋅πi1j)=argmaxλ∑j=1S[∑t=1Tlog(bitj(ot))+∑t=2Tlog(αit−1ttj)+log(πi1j)]\begin{aligned} \hat\lambda &= \underset{\lambda}{argmax}\ P(O,I|\lambda) \\ &= \underset{\lambda}{argmax}\ \prod_{j=1}^{S}P(O_j,I_j|\lambda) \\ &= \underset{\lambda}{argmax}\ \sum_{j=1}^{S}logP(O_j,I_j|\lambda) \\ &= \underset{\lambda}{argmax}\ \sum_{j=1}^{S}log\left(\prod_{t=1}^{T}b_{i_t}^j(o_t)\cdot \prod_{t=2}^{T}\alpha_{i_{t-1}t_t}^j\cdot \pi_{i_1}^j\right) \\ &=\underset{\lambda}{argmax}\ \sum_{j=1}^{S}\left[\sum_{t=1}^{T}log(b_{i_t}^j(o_t))+ \sum_{t=2}^{T}log(\alpha_{i_{t-1}t_t}^j)+log(\pi_{i_1}^j)\right] \\ \end{aligned} λ^​=λargmax​ P(O,I∣λ)=λargmax​ j=1∏S​P(Oj​,Ij​∣λ)=λargmax​ j=1∑S​logP(Oj​,Ij​∣λ)=λargmax​ j=1∑S​log(t=1∏T​bit​j​(ot​)⋅t=2∏T​αit−1​tt​j​⋅πi1​j​)=λargmax​ j=1∑S​[t=1∑T​log(bit​j​(ot​))+t=2∑T​log(αit−1​tt​j​)+log(πi1​j​)]​
我们先估计π\piπ(i1省略不写),与π\piπ相关的只有最后一项
π^=argmaxπ∑j=1Slog⁡πj\hat{\pi}=\underset{\pi}{argmax} \sum_{j=1}^{S} \log \pi^{j} π^=πargmax​j=1∑S​logπj
对π\piπ有约束
∑i=1Nπi=1\sum_{i=1}^{N}\pi_i = 1 i=1∑N​πi​=1
记拉格朗日函数
L(π,η)=∑j=1Slog⁡πj+η⋅(∑i=1Nπi−1)L(\pi, \eta)=\sum_{j=1}^{S} \log \pi^{j}+\eta \cdot\left(\sum_{i=1}^{N} \pi_i-1\right) L(π,η)=j=1∑S​logπj+η⋅(i=1∑N​πi​−1)
对πi\pi_iπi​求偏导
∂L∂πi=∑j=1SI(I1j=i)πi+η=0\frac{\partial L}{\partial \pi_{i}}=\frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{\pi_{i}}+\eta=0 ∂πi​∂L​=πi​∑j=1S​I(I1j​=i)​+η=0

∑j=1SI(I1j=i)+η⋅πi=0\sum_{j=1}^{S} I\left(I_{1 j}=i\right) + \eta \cdot \pi_i = 0 j=1∑S​I(I1j​=i)+η⋅πi​=0
对i求和
∑i=1N∑j=1SI(I1j=i)+η⋅∑i=1Nπi=0\sum_{i=1}^{N}\sum_{j=1}^{S}I(I_{1j}=i) + \eta \cdot \sum_{i=1}^{N}\pi_i = 0 i=1∑N​j=1∑S​I(I1j​=i)+η⋅i=1∑N​πi​=0

∑j=1S∑i=1NI(I1j=i)+η=0∑j=1S∑i=1NI(I1j=i)+η=0∑j=1S1+η=0η=−S\begin{aligned} &\sum_{j=1}^{S}\sum_{i=1}^{N}I(I_{1j}=i) + \eta = 0 \\ &\sum_{j=1}^{S}\sum_{i=1}^{N}I(I_{1j}=i) + \eta = 0 \\ &\sum_{j=1}^{S}1 +\eta = 0 \\ &\eta = -S \end{aligned} ​j=1∑S​i=1∑N​I(I1j​=i)+η=0j=1∑S​i=1∑N​I(I1j​=i)+η=0j=1∑S​1+η=0η=−S​
所以
π^i=−∑j=1SI(I1j=i)η=∑j=1SI(I1j=i)S\hat \pi_i = -\frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{\eta} = \frac{\sum_{j=1}^{S} I\left(I_{1 j}=i\right)}{S} π^i​=−η∑j=1S​I(I1j​=i)​=S∑j=1S​I(I1j​=i)​
剩下A,B的推导过程,请允许我贴两张图,打公式打烦了

图里面写的有一些问题,希望读者阅读过程中可以自行发现,如果阅读的时候没发现,你看到下面也会发现的

于是我们得到最终估计为
π^i=∑j=1SI(i1j=qi)Sα^ij=∑k=1S∑t=1T−1I(itk=qi,it+1k=qj)∑k=1S∑t=1T−1I(itk=qi)b^j(k)=∑i=1S∑t=1TI(iti=qj,oti=vk)∑i=1S∑t=1TI(iti=qj)\begin{aligned} &\hat \pi_i = \frac{\sum_{j=1}^{S} I\left(i_1^j=q_i\right)}{S} \\ &\hat \alpha_{ij} = \frac{\sum_{k=1}^{S} \sum_{t=1}^{T-1} I\left(i_{t}^k=q_i, i_{t+1}^k=q_j\right)}{\sum_{k=1}^{S} \sum_{t=1}^{T-1} I\left(i_{t}^k=q_i\right)} \\ &\hat b_j(k) =\frac{\sum_{i=1}^{S} \sum_{t=1}^{T} I\left(i_t^i=q_{j}, o_{t}^i=v_{k}\right) }{\sum_{i=1}^{S} \sum_{t=1}^{T} I\left(i_{t}^i=q_{j}\right)} \end{aligned} ​π^i​=S∑j=1S​I(i1j​=qi​)​α^ij​=∑k=1S​∑t=1T−1​I(itk​=qi​)∑k=1S​∑t=1T−1​I(itk​=qi​,it+1k​=qj​)​b^j​(k)=∑i=1S​∑t=1T​I(iti​=qj​)∑i=1S​∑t=1T​I(iti​=qj​,oti​=vk​)​​

算法实现

生成数据

data = []
for i in range(100):I,O = generate(100)data.append([I,O])

定义模型

class SupervisedModel:def __init__(self,M,N) -> None:self.M = Mself.N = Ndef train(self,data):# data:Sx2xT# [[[i1,i2,...],[o1,o2,...]],[],[]]self.pi = np.zeros(shape=(self.N,))self.A = np.zeros(shape=(self.N,self.N))self.B = np.zeros(shape=(self.N,self.M))S = len(data)self.T = len(data[0][0])# 求 pifor i in range(self.N):for j in range(S):self.pi[i] += data[j][0][0]==iself.pi[i] = self.pi[i]/S# 求 Afor i in range(self.N):for j in range(self.N):fenzi = 0fenmu = 0for k in range(S):for t in range(self.T-1):fenzi += data[k][0][t]==i and data[k][0][t+1]==jfenmu += data[k][0][t]==iself.A[i][j] = fenzi/fenmu# 求 Bfor j in range(self.N):for k in range(self.M):fenzi = 0fenmu = 0for i in range(S):for t in range(self.T):fenzi += data[i][0][t]==j and data[i][1][t]==kfenmu += data[i][0][t]==jself.B[j][k] = fenzi/fenmureturn self.pi,self.A,self.Bdef predict(self,O):# 初值beta = np.ones(shape=(self.T,))# 递推for o in reversed(O):temp = np.zeros_like(beta)for i in range(self.T):for j in range(self.T):temp[i] += beta[j]*self.B[j][o]*self.A[i][j]beta = temp# 终止res = 0for i in range(self.T):res += self.B[i][O[0]]*beta[i]*self.pi[i]return res

训练模型

model = SupervisedModel(2,4)
model.train(data)
(array([0.19, 0.25, 0.2 , 0.36]),array([[0.        , 1.        , 0.        , 0.        ],[0.38930233, 0.        , 0.61069767, 0.        ],[0.        , 0.4073957 , 0.        , 0.5926043 ],[0.        , 0.        , 0.49933066, 0.50066934]]),array([[0.47313084, 0.52686916],[0.30207852, 0.69792148],[0.60162602, 0.39837398],[0.80851627, 0.19148373]]))

可以看到监督学习的效果非常好

白板推导系列Pytorch-隐马尔可夫模型-学习问题相关推荐

  1. 马尔科夫系列——三、隐马尔可夫模型 - 学习问题 - Baum-Welch算法

    转载的过程中发现,原文有些地方不太理解,就阅读了其他的文章,然后把代码的实现也引进来了.之前并没有验证代码的准确性,后面有人说,代码可能有问题,我尝试了修改.把简单修改版本的也放上来. 目录 一.训练 ...

  2. 隐马尔科夫模型——学习算法

    前言 隐马尔科夫模型有三个基本问题:概率计算问题,学习问题,预测问题.本博客介绍学习问题的监督学习算法和非监督学习算法(EM算法).阅读本文前请先学习基本概念. 什么是学习问题 学习问题是一直观测序列 ...

  3. HMM隐马尔科夫模型 学习总结

    隐马尔可夫模型(Hidden Markov Model,简称HMM)是结构最简单的动态贝叶斯网(dynamic Bayesian network),这是一种著名的有向图模型,主要用于时序数据建模,在语 ...

  4. 马尔可夫模型(HMM)与隐马尔克夫模型(转)

    马尔可夫模型         马尔可夫模型:是用来预测具有等时间隔(如一年)的时刻点上各类人员的分布状况. 马尔可夫模型,它是根据历史数据,预测等时间间隔点上的各类人员分布状况.此方法的基本思想上根据 ...

  5. 隐马尔可夫模型(《统计学习方法》、python实现)

    转载地址:http://www.hankcs.com/ml/hidden-markov-model.html 本文是<统计学习方法>第10章的笔记,用一段167行的Python代码实现了隐 ...

  6. 一站式解决:隐马尔可夫模型(HMM)全过程推导及实现

    作者 | 永远在你身后 转载自知乎用户永远在你身后 [导读]隐马尔可夫模型(Hidden Markov Model,HMM)是关于时许的概率模型,是一个生成模型,描述由一个隐藏的马尔科夫链随机生成不可 ...

  7. 隐马尔科夫模型(Hidden Markov Models) 系列之五

    隐马尔科夫模型(Hidden Markov Models) 系列之五 介绍(introduction) 生成模式(Generating Patterns) 隐含模式(Hidden Patterns) ...

  8. 用隐马尔可夫模型(HMM)做命名实体识别——NER系列(二)

    上一篇文章里<用规则做命名实体识别--NER系列(一)>,介绍了最简单的做命名实体识别的方法–规则.这一篇,我们循序渐进,继续介绍下一个模型--隐马尔可夫模型. 隐马尔可夫模型,看上去,和 ...

  9. 【机器学习基础】数学推导+纯Python实现机器学习算法24:HMM隐马尔可夫模型

    Python机器学习算法实现 Author:louwill Machine Learning Lab HMM(Hidden Markov Model)也就是隐马尔可夫模型,是一种由隐藏的马尔可夫链随机 ...

  10. 隐马尔科夫模型(Hidden Markov Models) 系列之三

    隐马尔科夫模型(Hidden Markov Models) 系列之三 介绍(introduction) 生成模式(Generating Patterns) 隐含模式(Hidden Patterns) ...

最新文章

  1. asp.NET自定义服务器控件内部细节系列教程四
  2. 厉害!国内大学生计算机编程第一人,一人挑战一个队,百度最年轻 T10,现创业自动驾驶...
  3. NFV节省企业宽带成本—Vecloud微云
  4. python基础——字典
  5. Java中的classpath
  6. datatables 一列显示两个字段的数据_【tableau入门教程16】计算字段
  7. 【java机器学习】支持向量机之拉格朗日乘子法解释
  8. Android Logcat 报错:Could not create the view: For input string:
  9. 互联网公司们眼中的最强quot;世界杯阵容quot;
  10. DIV+CSS相对IE6、IE7和IE8的兼容问题
  11. 2021年中国氮化镓适配器市场趋势报告、技术动态创新及2027年市场预测
  12. Django 聚合(译)
  13. 前端知识点查文档网站
  14. VSTO程序基本知识(二)
  15. xampp使用教程方法免费分享
  16. echarts文档解读
  17. 异常解决——Tomcat启动异常:Cannot assign requested address
  18. 黑苹果内置硬盘识别成外置硬盘_拥有希捷这款硬盘更加方便了我们的数据转移、保密和随时随地使用...
  19. Redis使用setnx实现分布式锁及其问题、优化
  20. 什么是局域网唤醒? 如何启用局域网唤醒?

热门文章

  1. 【案例】自由运动小球
  2. 全双工音频播放器在c#中使用waveIn / waveOut api
  3. 把CMD下的color 方案遍历一遍
  4. 使用libcurl编写HTTP客户端的方法
  5. Ubuntu安装usb库
  6. 初识Python 04 day
  7. ElasicSearch(2) Linux运行
  8. jqgrid 使用小记——与springboot jpa 一起使用的分页,翻页。(使用springboot jpa 原生的分页)...
  9. ERNIE 3.0 | 刷榜了!60多项任务取得最好效果
  10. 复盘人生第一次科研经历