文章目录

  • Markov Decision Processes
    • Markov Processes
      • Definition
      • Markov Property
      • State Transition Matrix
    • Markov Reward Process
      • Definition
      • Return
        • Why discount
      • Value Function
      • Bellman Equation
    • Markov Decision Processes
      • Definition
      • Policy
      • Value Function
      • Bellman Expectation Equation
      • Optimal Value Function
        • Finding an optimal Policy
      • Bellman Optimaltiy Equation

Markov Decision Processes

MPDs formally describe an environment for RL

Markov Processes

Definition

无记忆的随机过程

i.e. a sequence of random states S1,S2,…S_1,S_2,\dotsS1​,S2​,…​ with the [Markov property](#Markov Property)

A Markov Process (or Markov Chain) is a tuple ⟨S,P⟩\lang \mathcal{S, P} \rang⟨S,P⟩​​​

  • S\mathcal{S}S​ is a (finite) set of states
  • P\mathcal{P}P is a state [transition probability matrix](#state transition matrix)
    Pss′=P[St+1=s′∣St=s]\mathcal{P}_{ss'} = \mathbb{P}\left[S_{t+1} = s' \mid S_t = s \right]Pss′​=P[St+1​=s′∣St​=s]​

Markov Property

“The future is independent of the past given the present”
P[St+1∣St]=P[St+1∣S1,…,St]\mathbb{P}\left[ S_{t+1} \mid S_t \right] = \mathbb{P}\left[ S_{t+1} \mid S_1,\dots,S_t \right] P[St+1​∣St​]=P[St+1​∣S1​,…,St​]
The state is a sufficient statistic of the future

State Transition Matrix

state transition probability Psource_destination\mathcal{P}_{source\_destination}Psource_destination​​
Pss′=P[St+1=s′∣St=s]\mathcal{P}_{ss'} = \mathbb{P}\left[S_{t+1} = s' \mid S_t = s \right] Pss′​=P[St+1​=s′∣St​=s]
state transition matrix P\mathcal{P}P​​

defines transition probabilities from all state s to all successor state s’
P=[P11⋯P1n⋮Pn1⋯Pnn]\mathcal{P} = \left[ \begin{matrix} \mathcal{P}_{11} & \cdots & \mathcal{P}_{1n} \\ \vdots \\ \mathcal{P}_{n1} & \cdots & \mathcal{P}_{nn} \end{matrix} \right] P=⎣⎢⎡​P11​⋮Pn1​​⋯⋯​P1n​Pnn​​⎦⎥⎤​
根据概率的性质,一定有:
∑j=1nPij=1∀i=1,…,n\sum_{j=1}^n \mathcal{P}_{ij} = 1 \qquad \forall i =1,\dots,n j=1∑n​Pij​=1∀i=1,…,n
Example:

Markov Reward Process

Definition

a Markov chain with values

A Markov Reward Process is a tuple ⟨S,P,R,γ⟩\lang \mathcal{S,P,R,\gamma} \rang⟨S,P,R,γ⟩

  • S\mathcal{S}S is a finite set of states
  • P\mathcal{P}P​ is a state transition probability matrix
    Pss′=P[St+1=s′∣St=s]\mathcal{P}_{ss'} = \mathbb{P}\left[S_{t+1} = s' \mid S_t = s \right]Pss′​=P[St+1​=s′∣St​=s]​
  • R\mathcal{R}R​ is a reward function, $\mathcal{R}s = \mathbb{E}[R{t+1} \mid S_t =s] $​
  • γ\gammaγ is a discount factor, γ∈[0,1]\gamma \in [0, 1]γ∈[0,1]​

注意这时的reward只和状态有关 Rs\mathcal{R}_sRs​​​​,例如下图的 class 1 无论去 Facebook 还是 class 2 都是 R=−2R = -2R=−2​

Return

the retrun GtG_tGt​ is the total discounted reward from time-step t
Gt=Rt+1+γRt+2+⋯=∑k=0∞γkRt+k+1G_t = R_{t+1} + \gamma R_{t+2} + \dots = \sum_{k=0}^\infin \gamma^k R_{t+k+1} Gt​=Rt+1​+γRt+2​+⋯=k=0∑∞​γkRt+k+1​

  • γ∈[0,1]\gamma \in [0,1]γ∈[0,1]​ 代表了未来的奖励在现在这一时刻起的作用大小,更希望得到现有的奖励,未来的奖励就要把它打折扣

    • γ\gammaγ 代表“近视”,比较重视短期利益
    • γ\gammaγ 代表“远视”,对未来的奖励也一样看待

Why discount

  • 有些马尔可夫过程是带环的,它并没有终结,我们想避免这个无穷的奖励。
  • 我们并没有建立一个完美的模拟环境的模型,也就是说,我们对未来的评估不一定是准确的,我们不一定完全信任我们的模型,因为这种不确定性,所以我们对未来的预估增加一个折扣。我们想把这个不确定性表示出来,希望尽可能快地得到奖励,而不是在未来某一个点得到奖励。
  • 如果这个奖励是有实际价值的,我们可能是更希望立刻就得到奖励,而不是后面再得到奖励(现在的钱比以后的钱更有价值)。
  • 在人的行为里面来说的话,大家也是想得到即时奖励。
  • 有些时候可以把这个系数设为 0:我们就只关注了它当前的奖励。我们也可以把它设为 1:对未来并没有折扣,未来获得的奖励跟当前获得的奖励是一样的。

Value Function

The state value function v(s) of an MRP is the expected return starting from state s
v⁡(s)=E[Gt∣St=s]\operatorname{v}(s) = \mathbb{E}[G_t \mid S_t = s] v(s)=E[Gt​∣St​=s]

Bellman Equation

记 St+1S_{t+1}St+1​​ 为 s′s's′​​,根据期望的定义:

v⁡(s)=Rs+γ∑s′∈SPss′v⁡(s′)(4)\color{red}{\operatorname{v}(s) = \mathcal{R}_s + \gamma \sum_{s' \in S} \mathcal{P}_{ss'}\operatorname{v}(s')} \tag{4} v(s)=Rs​+γs′∈S∑​Pss′​v(s′)(4)
由 equation 1. 到式 equation 2. 并不是那么的直观,还需要进一步证明 E[Gt+1∣St]=E[v⁡(St+1)∣St]=E[E[Gt+1∣St+1]∣St]\mathbb{E}[G_{t+1} \mid S_{t}] = \mathbb{E}\left[\operatorname{v}(S_{t+1}) \mid S_t \right] = \mathbb{E}\left[\mathbb{E}[G_{t+1} \mid S_{t+1}] \mid S_t \right]E[Gt+1​∣St​]=E[v(St+1​)∣St​]=E[E[Gt+1​∣St+1​]∣St​]

先回顾一下全概率公式:E(X)=∑xxP⁡(X=x∣Y=y)\mathbb{E}(X) = \sum_{x} x\operatorname{P}(X = x \mid Y = y)E(X)=∑x​xP(X=x∣Y=y)​​​​

记 Gt+1=g′,St+1=s′,St=sG_{t+1} = g' ,\ S_{t+1} = s' , \ S_t = sGt+1​=g′, St+1​=s′, St​=s​

E[E[Gt+1∣St+1]∣St]=E[E[g′∣S′]∣St]=E[∑g′g′p(g′∣s′)∣s]=∑s′(∑g′g′p(g′∣s′,s))p(s′∣s)=∑s′∑g′g′p(g′,s′,s)p(s′,s)p(s′,s)p(s)=∑s′∑g′g′p(g′,s′∣s)=∑g′g′p(g′∣s)=E[Gt+1∣st]\begin{aligned} \mathbb{E}\left[\mathbb{E}[G_{t+1} \mid S_{t+1}] \mid S_t \right] &= \mathbb{E}\left[\mathbb{E}[g' \mid S'] \mid S_t \right] \\ &= \mathbb{E}\left[\sum_{g'}g'p(g' \mid s') \mid s \right] \\ &= \sum_{s'} \left(\sum_{g'}g'p(g' \mid s',s) \right)p(s' \mid s) \\ &= \sum_{s'} \sum_{g'} g' \frac{p(g',s',s)}{p(s',s)} \frac{p(s',s)}{p(s)} \\ &= \sum_{s'} \sum_{g'} g'p(g',s' \mid s) \\ &= \sum_{g'} g'p(g' \mid s) \\ & = \mathbb{E}[G_{t+1} \mid s_t] \end{aligned} E[E[Gt+1​∣St+1​]∣St​]​=E[E[g′∣S′]∣St​]=E⎣⎡​g′∑​g′p(g′∣s′)∣s⎦⎤​=s′∑​⎝⎛​g′∑​g′p(g′∣s′,s)⎠⎞​p(s′∣s)=s′∑​g′∑​g′p(s′,s)p(g′,s′,s)​p(s)p(s′,s)​=s′∑​g′∑​g′p(g′,s′∣s)=g′∑​g′p(g′∣s)=E[Gt+1​∣st​]​
Equation 3. in Matrix form

根据矩阵形式,直接可求 value function 的解析解:v=(I−γ(P))−1R\mathbf{v} = \left(I - \gamma \mathcal(P)\right)^{-1} \mathcal{R}v=(I−γ(P))−1R

但是对于含n个状态的矩阵,计算复杂度为 O(n3)O(n^3)O(n3),因此解析解只适合小型的MRPs,对于大型的MRPs,采取迭代的方法:

  • Dynamic programming
  • Monte-Carlo evaluation
  • Temporal-Difference learning

Markov Decision Processes

Definition

A MDP is a MRP with decisions. It is a environment in which all states are Markov

A Markov Decision Process is a tuple ⟨S,A,P,R,γ⟩\lang \mathcal{S, A, P, R, \gamma} \rang⟨S,A,P,R,γ⟩

  • S\mathcal{S}S​ is a finite set of states
  • A\mathcal{A}A is a finite set of actions
  • P\mathcal{P}P​​ is a state transition probability matrix
    Pss′a=P[St+1=s′∣St=s,At=a]\mathcal{P}_{ss'}^{\color{red}a} = \mathbb{P}\left[S_{t+1} = s' \mid S_t = s ,{\color{red} A_t = a}\right]Pss′a​=P[St+1​=s′∣St​=s,At​=a]​​
  • R\mathcal{R}R is a reward function, $\mathcal{R}s{\color{red}a} = \mathbb{E}[R{t+1} \mid S_t =s ,{\color{red} A_t = a}] $
  • γ\gammaγ is a discount factor, γ∈[0,1]\gamma \in [0, 1]γ∈[0,1]​

Policy

A policy π\piπ is a distribution over actions given states
π(a∣s)=P[At=a∣st=s]\pi(a \mid s) = \mathbb{P}\left[A_t =a \mid s_t = s \right] π(a∣s)=P[At​=a∣st​=s]
Given an MDP M=⟨S,A,P,R,γ⟩\mathcal{M} = \lang \mathcal{S,A,P,R,\gamma} \rangM=⟨S,A,P,R,γ⟩ and a policy π\piπ

  • The state sequence S1,S2,…S_1, S_2, \dotsS1​,S2​,… is a Markkov process ⟨S,Pπ⟩\lang \mathcal{S,P^\pi} \rang⟨S,Pπ⟩

  • The state and reward sequence S1,R2,S2,…S_1, R_2, S_2, \dotsS1​,R2​,S2​,… is a Markkov reward process ⟨S,Pπ,Rπ,γ⟩\lang \mathcal{S,P^\pi, R^\pi, \gamma} \rang⟨S,Pπ,Rπ,γ⟩​​

  • where
    Ps,s′π=∑a∈Aπ(a∣s)Pss′aRsπ=∑a∈Aπ(a∣s)Rsa\mathcal{P}_{s,s'}^\pi = \sum_{a \in \mathcal{A}}\pi(a \mid s)\mathcal{P}_{ss'}^a \\ R_s^\pi = \sum_{a \in \mathcal{A}}\pi(a \mid s)\mathcal{R}_{s}^a Ps,s′π​=a∈A∑​π(a∣s)Pss′a​Rsπ​=a∈A∑​π(a∣s)Rsa​

Value Function

  • State-value funtion vπ(s)=E[Gt∣St=s]v_\pi(s) = \mathbb{E}\left[G_t \mid S_t = s \right]vπ​(s)=E[Gt​∣St​=s]​​
  • Action-value function qπ(s,a)=E[Gt∣St=s,At=a]q_\pi(s,a) = \mathbb{E}\left[G_t \mid S_t = s, A_t = a \right]qπ​(s,a)=E[Gt​∣St​=s,At​=a]​

Bellman Expectation Equation

  1. value function can be decomposed into immediate reward plus discounted value of successor state
    v⁡π(s)=Eπ[Rt+1+γv⁡π(St+1)∣St=s](5){\color{blue} \operatorname{v}_{\color{red}\pi}(s) = \mathbb{E}_{\color{red}\pi} \left[R_{t+1} + \gamma \operatorname{v}_{\color{red}\pi}(S_{t+1}) \mid S_t =s \right] } \tag{5} vπ​(s)=Eπ​[Rt+1​+γvπ​(St+1​)∣St​=s](5)

    qπ(s,a)=Eπ[Rt+1+γqπ(St+1,At+1)∣St=s,At=a](6){\color{blue} q_{\color{red}\pi}(s,a) = \mathbb{E}_{\color{red}\pi} \left[R_{t+1} + \gamma q_{\color{red}\pi}(S_{t+1,}A_{t+1}) \mid S_t = s, A_t =a \right] } \tag{6} qπ​(s,a)=Eπ​[Rt+1​+γqπ​(St+1,​At+1​)∣St​=s,At​=a](6)
    equation 5. 和 equation 6. 表明了当前状态和未来状态之间的 value function 的关系

  2. 考虑 state-value function 和 action-value function 之间的关系

v⁡π(s)=∑a∈Aπ(a∣s)qπ(s,a)对所有用黑色实心圆代表的 action: a 求和 (7)\operatorname{v}_{\pi}(s) = \sum_{a \in \mathcal{A}} \pi(a \mid s) q_\pi(s, a) \qquad \text{对所有用黑色实心圆代表的 action: a 求和 } \tag{7} vπ​(s)=a∈A∑​π(a∣s)qπ​(s,a)对所有用黑色实心圆代表的 action: a 求和 (7)
qπ(s,a)=Rsa+γ∑s′∈SPss′av⁡π(s′)对所有用空心圆代表的 state: s 求和(8)q_\pi(s,a) = \mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}} \mathcal{P}_{ss'}^a \operatorname{v}_\pi(s') \tag{8} \qquad \text{对所有用空心圆代表的 state: s 求和} qπ​(s,a)=Rsa​+γs′∈S∑​Pss′a​vπ​(s′)对所有用空心圆代表的 state: s 求和(8)

  1. 把 equation 7. 和 equation 8. 互相代入就可以得出 equation 5. 和 equation 6. 的取去掉 E[]\mathbb{E}[\ ]E[ ]​ 的形式

v⁡π(s)=∑a∈Aπ(a∣s)(Rsa+γ∑s′∈SPss′av⁡π(s′))(9)\operatorname{v}_\pi(s) = \sum_{a \in \mathcal{A}} \pi(a \mid s) \left(\mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \operatorname{v}_\pi(s') \right) \tag{9} vπ​(s)=a∈A∑​π(a∣s)(Rsa​+γs′∈S∑​Pss′a​vπ​(s′))(9)

qπ(s,a)=Rsa+γ∑s′∈SPss′a∑a′∈Aπ(a′∣s′)qπ(s′,a′)(10)q_\pi(s,a) = \mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \sum_{a' \in \mathcal{A}}\pi(a' \mid s') q_\pi(s',a') \tag{10} qπ​(s,a)=Rsa​+γs′∈S∑​Pss′a​a′∈A∑​π(a′∣s′)qπ​(s′,a′)(10)
直接由 equation 5. 6. 分别推 equation 9. 10.
v⁡π(s)=Eπ[Rt+1+γv⁡π(St+1)∣St=s]=Eπ[Rt+1∣St=s]+γEπ[v⁡π(St+1)∣St=s]=∑a∈Aπ(a∣s)Rsa+γ∑s′∈SPss′πv⁡π(s′)=∑a∈Aπ(a∣s)Rsa+γ∑s′∈S[∑a∈Aπ(a∣s)Pss′a]v⁡π(s′)=∑a∈Aπ(a∣s)(Rsa+γ∑s′∈SPss′av⁡π(s′))≜Equation 9.\begin{aligned} \operatorname{v}_{\color{red}\pi}(s) &= \mathbb{E}_{\color{red}\pi} \left[R_{t+1} + \gamma \operatorname{v}_{\color{red}\pi}(S_{t+1}) \mid S_t =s \right] \\ &= \mathbb{E}_{\color{red}\pi} \left[R_{t+1} \mid S_t = s \right] + \gamma \mathbb{E}_{\color{red}\pi} \left[\operatorname{v}_{\color{red}\pi}(S_{t+1}) \mid S_t =s \right] \\ &= \sum_{a \in \mathcal{A}}\pi(a \mid s)\mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^\pi \operatorname{v}_\pi(s') \\ &= \sum_{a \in \mathcal{A}}\pi(a \mid s)\mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}} \left[ \sum_{a \in \mathcal{A}}\pi(a \mid s)\mathcal{P}_{ss'}^a \right] \operatorname{v}_\pi(s') \\ & = \sum_{a \in \mathcal{A}} \pi(a \mid s) \left(\mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \operatorname{v}_\pi(s') \right) \triangleq \text{Equation 9.} \end{aligned} vπ​(s)​=Eπ​[Rt+1​+γvπ​(St+1​)∣St​=s]=Eπ​[Rt+1​∣St​=s]+γEπ​[vπ​(St+1​)∣St​=s]=a∈A∑​π(a∣s)Rsa​+γs′∈S∑​Pss′π​vπ​(s′)=a∈A∑​π(a∣s)Rsa​+γs′∈S∑​[a∈A∑​π(a∣s)Pss′a​]vπ​(s′)=a∈A∑​π(a∣s)(Rsa​+γs′∈S∑​Pss′a​vπ​(s′))≜Equation 9.​

qπ(s,a)=Eπ[Rt+1+γqπ(St+1,At+1)∣St=s,At=a]=Eπ[Rt+1∣St=s,At=a]+γEπ[qπ(St+1,At+1)∣St=s,At=a]=Rsa+γ∑s′∈SPss′a∑a′∈Aπ(a′∣s′)qπ(s′,a′)≜Equation 10.\begin{aligned} q_{\color{red}\pi}(s,a) &= \mathbb{E}_{\color{red}\pi} \left[R_{t+1} + \gamma q_{\color{red}\pi}(S_{t+1,}A_{t+1}) \mid S_t = s, A_t =a \right] \\ &= \mathbb{E}_{\color{red}\pi} \left[R_{t+1} \mid S_t = s, A_t = a \right] + \gamma \mathbb{E}_{\color{red}\pi} \left[q_{\color{red}\pi}(S_{t+1,}A_{t+1}) \mid S_t = s, A_t =a \right] \\ &= \mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \sum_{a' \in \mathcal{A}}\pi(a' \mid s') q_\pi(s',a') \triangleq \text{Equation 10.} \end{aligned} qπ​(s,a)​=Eπ​[Rt+1​+γqπ​(St+1,​At+1​)∣St​=s,At​=a]=Eπ​[Rt+1​∣St​=s,At​=a]+γEπ​[qπ​(St+1,​At+1​)∣St​=s,At​=a]=Rsa​+γs′∈S∑​Pss′a​a′∈A∑​π(a′∣s′)qπ​(s′,a′)≜Equation 10.​

直接使用 Equation 9. 计算如下例子:

Optimal Value Function

记:v⁡∗(s)=max⁡πv⁡π(s)\operatorname{v}_*(s) = \underset{\pi}{\operatorname{max}}\operatorname{v}_{\pi}(s)v∗​(s)=πmax​vπ​(s), q∗(s,a)=max⁡πqπ(s,a)q_*(s,a) = \underset{\pi}{\operatorname{max}}q_{\pi}(s,a)q∗​(s,a)=πmax​qπ​(s,a)

An MDP is “solved” when we know the optimal value function

如何比较 policies 的好坏(大小):π≥π′if⁡v⁡π(s)≥v⁡π′(s),∀s\pi \geq \pi' \operatorname{if} \operatorname{v}_\pi(s) \geq \operatorname{v}_{\pi'}(s), \quad \forall sπ≥π′ifvπ​(s)≥vπ′​(s),∀s​

Finding an optimal Policy

If we know q∗(s,a)q_*(s,a)q∗​(s,a), we immediately have the optimal policy
π∗(a∣s)={1,if a=arg max⁡a∈Aq∗(s,a)0,otherwise\pi_*(a \mid s) = \begin{cases} 1, \quad \text{if}\ a = \underset{a \in \mathcal{A}}{\operatorname{arg\,max}}\ q_*(s,a) \\ 0, \quad otherwise \end{cases} π∗​(a∣s)=⎩⎨⎧​1,if a=a∈Aargmax​ q∗​(s,a)0,otherwise​

Bellman Optimaltiy Equation


v⁡∗(s)=max⁡aq∗(s,a)(11)\operatorname{v}_*(s) = \underset{a}{\operatorname{max}}q_{\color{red}*}(s,a) \tag{11} v∗​(s)=amax​q∗​(s,a)(11)

q∗(s,a)=Rsa+γ∑s′∈SPss′av⁡∗(s′)(12)q_*(s,a) = \mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \operatorname{v}_{\color{red}*}(s') \tag{12} q∗​(s,a)=Rsa​+γs′∈S∑​Pss′a​v∗​(s′)(12)

Equation 11. 和 Equation 12. 互相代入
v⁡∗(s)=max⁡aRsa+γ∑s′∈SPss′av⁡∗(s′)(11)\operatorname{v}_*(s) = \underset{a}{\operatorname{max}}\mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \operatorname{v}_{\color{red}*}(s') \tag{11} v∗​(s)=amax​Rsa​+γs′∈S∑​Pss′a​v∗​(s′)(11)
q∗(s,a)=Rsa+γ∑s′∈SPss′amax⁡a′q∗(s′,a′)(12)q_*(s,a) = \mathcal{R}_s^a + \gamma \sum_{s' \in \mathcal{S}}\mathcal{P}_{ss'}^a \underset{a'}{\operatorname{max}}q_{\color{red}*}(s',a') \tag{12} q∗​(s,a)=Rsa​+γs′∈S∑​Pss′a​a′max​q∗​(s′,a′)(12)

Lect2_MDPs相关推荐

最新文章

  1. 【Android 高性能音频】Oboe 函数库简介 ( Oboe 简介 | Oboe 特点 | Oboe 编译工具 | Oboe 相关文档 | Oboe 测试工具 )
  2. 设计模式之四(抽象工厂模式第一回合)
  3. opencv求两张图像光流_光流(optical flow)和openCV中实现
  4. 休眠NONSTRICT_READ_WRITE CacheConcurrencyStrategy如何工作
  5. 设置了li(float:right),里面的li反过来显示 - 解决办法
  6. (25)二分频verilog与VHDL编码(学无止境)
  7. Inno Setup 如何让生成的setup.exe文件有管理员权限
  8. oracle 触发器入门,ORACLE PL/SQ入门 (存储过程、触发器)
  9. Python利用模糊查询两个excel文件数据 导出新表格
  10. keil4和烧录软件的基本使用
  11. 拍牌系统改版html5,开启上海拍牌的日子,有点玩人的系统,一会快一会慢
  12. csm和uefi_如何以简单正确的姿势理解“UEFI”和“BIOS”?
  13. MSP430开发环境配置
  14. jQuery幻灯片插件Skippr
  15. 《Java程序小作业之自动贩卖机》#谭子
  16. matplotlib 常用图形绘制与官方文档
  17. Tita:2021年的绩效考核(上)
  18. 宝宝的肚子看起来是鼓鼓胀胀
  19. 2022年全球市场输尿管入口导引鞘总体规模、主要生产商、主要地区、产品和应用细分研究报告
  20. 世界各国新娘幸福瞬间

热门文章

  1. [体感游戏]关于体感游戏的一些思考(四) --- 克隆战争!
  2. 说说在CMD命令行模式下ADB命令显示为不是内部或外部命令,亦不是可运行程序和批处理文件的解决办法
  3. 【Java】poi | excel | 合并单元格
  4. 继承类的多态和实现接口的多态的区别
  5. 浅谈栈(Stack)实现
  6. 接口测试Jmeter
  7. JAVA -Xms与-Xmx区别
  8. Range fro mac(随机数字生成软件)
  9. 程序员面试100题之七 最长公共子字符串
  10. QT QTextEdit添加文字的方式