Machine learning approximation algorithmsfor high-dimensional fully nonlinear PDE
1. 总体框架
文中提出了一个新算法来求解完全非线性偏微分方程和非线性二阶倒向随机微分方程2BSDEs
- Section 2
推导(2.1-2.6)并计算(2.7)本文提出算法的一个特例。
核心思想在2.7的简化框架 - Section 3
提出的算法在一般情况下推导(3.1-3.5)和计算(3.7)。 - Section 4
该算法在几个高维偏微分方程情况下的数值结果
4.1:采用2.7中简化框架中提出的算法近似计算一个20维Allen-Cahn方程的解
2. Section 2 (deep 2BSDE method 的主要思想)
此部分主要说明思想,推导比较粗略;
deep 2BSDE更精确、一般的定义(2.7、3.7);
deep 2BSDE推导主要基于E,Han和Jentzen [33]和Cheridito[22]等人观点
2.1 完全非线性二阶偏微分方程(Fully nonlinear second-order PDEs)
设
ddd∈N={1,2,3,…},(ddd 表示维度)
T∈(0,∞)T∈(0,∞)T∈(0,∞),(TTT 表示终止时间点)
u=(u(t,x))t∈[0,T],x∈Rdu=(u(t,x))_{t∈[0,T],x∈\mathbb{R^d}}u=(u(t,x))t∈[0,T],x∈Rd∈C1,2∈C^{1,2}∈C1,2([0,T]×Rd,R)([0,T]×\mathbb{R^d},\mathbb{R})([0,T]×Rd,R),
① t∈[0,T]t∈[0,T]t∈[0,T] 表示某个时间点;x∈Rdx∈\mathbb{R^d}x∈Rd 表示 ddd 维实数向量;
② C1,2C^{1, 2}C1,2 表示连续且有1, 2阶导数;f∈C([0,T]×Rd×R×Rd×Rd×d,R)f∈C([0,T]×\mathbb{R^d}×\mathbb{R}×\mathbb{R^d}×\mathbb{R^{d×d}},\mathbb{R})f∈C([0,T]×Rd×R×Rd×Rd×d,R),(fff 有5个参数)
g∈C(Rd,R)g∈C(\mathbb{R^d},\mathbb{R})g∈C(Rd,R),
对所有t∈[0,T),x∈Rdt∈[0,T),x∈\mathbb{R^d}t∈[0,T),x∈Rd,
满足:
u(T,x)=g(x)u(T,x)=g(x)u(T,x)=g(x) 和
∂u∂t(t,x)=f(t,x,u(t,x),(∇xu)(t,x),(Hessxu)(t,x))\frac{\partial u}{\partial t}(t, x)=f\left(t, x, u(t, x),\left(\nabla_{x} u\right)(t, x),\left(\operatorname{Hess}_{x} u\right)(t, x)\right)∂t∂u(t,x)=f(t,x,u(t,x),(∇xu)(t,x),(Hessxu)(t,x)) (1)(1)(1)
deep 2BSDE允许近似地计算函数u(0,x),x∈Rdu(0,x),x∈\mathbb{R^d}u(0,x),x∈Rd;这一节中对 ξ∈Rdξ∈\mathbb{R^d}ξ∈Rd 的 u(0,ξ)∈Ru(0,ξ)∈\mathbb Ru(0,ξ)∈R 近似计算(实数),参考3.7的一般算法。
2.2 完全非线性二阶偏微分方程与2BSDEs的联系
设
- (Ω,F,P)(Ω, \mathcal{F}, \mathbb{P})(Ω,F,P) 是一个概率空间
- W:[0,T]×Ω→Rd(映射)W: [0, T] × Ω \rightarrow \mathbb{R^d}(映射)W:[0,T]×Ω→Rd(映射) 是 (Ω,F,P)(Ω, \mathcal{F}, \mathbb{P})(Ω,F,P) 上具有连续采样路径的标准布朗运动
- 令 F=(Ft)t∈[0,T]\mathbb{F}=\left(\mathbb{F}_{t}\right)_{t \in[0, T]}F=(Ft)t∈[0,T] 为 WWW 在 (Ω,F,P)(Ω, \mathcal{F}, \mathbb{P})(Ω,F,P) 上产生的正常滤波(存疑?)
令 - Y:[0,T]×Ω→RY:[0, T] \times \Omega \rightarrow \mathbb{R}Y:[0,T]×Ω→R
- Z:[0,T]×Ω→RdZ:[0, T] \times \Omega \rightarrow \mathbb{R^d}Z:[0,T]×Ω→Rd
- Γ:[0,T]×Ω→Rd×d\Gamma:[0, T] \times \Omega \rightarrow \mathbb{R}^{d \times d}Γ:[0,T]×Ω→Rd×d
- A:[0,T]×Ω→RdA:[0, T] \times \Omega \rightarrow \mathbb{R^d}A:[0,T]×Ω→Rd
为具有连续采样路径的 F\mathbb{F}F -适应随机过程
对所有 t∈[0,T]t∈[0,T]t∈[0,T],几乎确定有:
Yt=g(ξ+WT)−∫tT(f(s,ξ+Ws,Ys,Zs,Γs)+12Trace(Γs))ds−∫tT⟨Zs,dWs⟩RdY_{t}=g\left(\xi+W_{T}\right)-\int_{t}^{T}\left(f\left(s, \xi+W_{s}, Y_{s}, Z_{s}, \Gamma_{s}\right)+\frac{1}{2} \operatorname{Trace}\left(\Gamma_{s}\right)\right) d s-\int_{t}^{T}\left\langle Z_{s}, d W_{s}\right\rangle_{\mathbb{R}^{d}}Yt=g(ξ+WT)−∫tT(f(s,ξ+Ws,Ys,Zs,Γs)+21Trace(Γs))ds−∫tT⟨Zs,dWs⟩Rd (2)(2)(2)
Zt=Z0+∫0tAsds+∫0tΓsdWsZ_{t}=Z_{0}+\int_{0}^{t} A_{s} d s+\int_{0}^{t} \Gamma_{s} d W_{s}Zt=Z0+∫0tAsds+∫0tΓsdWs (3)(3)(3)
在适当的平滑度和规则性假设下,完全非线性的 PDE(1)与 2BSDE 中的(2)、(3)有关;从某种意义上说,对于所有 t∈[0,T]t∈[0,T]t∈[0,T],几乎确定有:
Yt=u(t,ξ+Wt)∈R,Zt=(∇xu)(t,ξ+Wt)∈RdY_{t}=u\left(t, \xi+W_{t}\right) \in \mathbb{R}, \quad Z_{t}=\left(\nabla_{x} u\right)\left(t, \xi+W_{t}\right) \in \mathbb{R}^{d}Yt=u(t,ξ+Wt)∈R,Zt=(∇xu)(t,ξ+Wt)∈Rd (4)(4)(4)
Γt=(Hessxu)(t,ξ+Wt)∈Rd×d,\Gamma_{t}=\left(\operatorname{Hess}_{x} u\right)\left(t, \xi+W_{t}\right) \in \mathbb{R}^{d \times d}, \quadΓt=(Hessxu)(t,ξ+Wt)∈Rd×d, (5)(5)(5)
At=(∂∂t∇xu)(t,ξ+Wt)+12(∇xΔxu)(t,ξ+Wt)∈RdA_{t}=\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(t, \xi+W_{t}\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(t, \xi+W_{t}\right) \in \mathbb{R}^{d}At=(∂t∂∇xu)(t,ξ+Wt)+21(∇xΔxu)(t,ξ+Wt)∈Rd (6)(6)(6)
参见Cheridito等人,[22]和引理3.1
2.3 PDE和2BSDE的合并公式(Merged formulation of the PDE and the 2BSDE)
在此节,推导 PDE(1)和 2BSDE(2)–(3)的合并公式((9)和(10));
更具体地,观察(2)和(3)得出,对于任意 τ1,τ2∈[0,T]\tau_{1}, \tau_{2} \in[0, T]τ1,τ2∈[0,T] 且 τ1≤τ2\tau_{1} \leq \tau_{2}τ1≤τ2,可以得到:
Yτ2=Yτ1+∫τ1τ2(f(s,ξ+Ws,Ys,Zs,Γs)+12Trace(Γs))ds+∫τ1τ2⟨Zs,dWs⟩RdY_{\tau_{2}}=Y_{\tau_{1}}+\int_{\tau_{1}}^{\tau_{2}}\left(f\left(s, \xi+W_{s}, Y_{s}, Z_{s}, \Gamma_{s}\right)+\frac{1}{2} \operatorname{Trace}\left(\Gamma_{s}\right)\right) d s+\int_{\tau_{1}}^{\tau_{2}}\left\langle Z_{s}, d W_{s}\right\rangle_{\mathbb{R}^{d}}Yτ2=Yτ1+∫τ1τ2(f(s,ξ+Ws,Ys,Zs,Γs)+21Trace(Γs))ds+∫τ1τ2⟨Zs,dWs⟩Rd (7)(7)(7)
Zτ2=Zτ1+∫τ1τ2Asds+∫τ1τ2ΓsdWsZ_{\tau_{2}}=Z_{\tau_{1}}+\int_{\tau_{1}}^{\tau_{2}} A_{s} d s+\int_{\tau_{1}}^{\tau_{2}} \Gamma_{s} d W_{s}Zτ2=Zτ1+∫τ1τ2Asds+∫τ1τ2ΓsdWs (8)(8)(8)
(7)式由下面过程所得:
把 τ1,τ2\tau_{1}, \tau_{2}τ1,τ2 代入(2)式,得到:Yτ1Y_{\tau_{1}}Yτ1 和 Yτ2Y_{\tau_{2}}Yτ2,再 Yτ2−Yτ1Y_{\tau_{2}} - Y_{\tau_{1}}Yτ2−Yτ1
经过化简就可以得到(7);(8)式同样如此,只是代入的是(3)式
将(5)和(6)放入(7)和(8)中,证明了对于任意 τ1,τ2∈[0,T]\tau_{1}, \tau_{2} \in[0, T]τ1,τ2∈[0,T] 且 τ1≤τ2\tau_{1} \leq \tau_{2}τ1≤τ2,下面式子成立:
Yτ2=Yτ1+∫τ1τ2⟨Zs,dWs⟩RdY_{\tau_{2}}=Y_{\tau_{1}}+\int_{\tau_{1}}^{\tau_{2}}\left\langle Z_{s}, d W_{s}\right\rangle_{\mathbb{R}^{d}}Yτ2=Yτ1+∫τ1τ2⟨Zs,dWs⟩Rd
+∫τ1τ2(f(s,ξ+Ws,Ys,Zs,(Hessxu)(s,ξ+Ws))+12Trace((Hessxu)(s,ξ+Ws)))ds+\int_{\tau_{1}}^{\tau_{2}}\left(f\left(s, \xi+W_{s}, Y_{s}, Z_{s},\left(\operatorname{Hess}_{x} u\right)\left(s, \xi+W_{s}\right)\right)+\frac{1}{2} \operatorname{Trace}\left(\left(\operatorname{Hess}_{x} u\right)\left(s, \xi+W_{s}\right)\right)\right) d s+∫τ1τ2(f(s,ξ+Ws,Ys,Zs,(Hessxu)(s,ξ+Ws))+21Trace((Hessxu)(s,ξ+Ws)))ds (9)(9)(9)
Zτ2=Zτ1+∫τ1τ2((∂∂t∇xu)(s,ξ+Ws)+12(∇xΔxu)(s,ξ+Ws))ds+∫τ1τ2(Hessxu)(s,ξ+Ws)dWs\begin{aligned} Z_{\tau_{2}}=& Z_{\tau_{1}}+\int_{\tau_{1}}^{\tau_{2}}\left(\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(s, \xi+W_{s}\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(s, \xi+W_{s}\right)\right) d s \\ &+\int_{\tau_{1}}^{\tau_{2}}\left(\operatorname{Hess}_{x} u\right)\left(s, \xi+W_{s}\right) d W_{s} \end{aligned}Zτ2=Zτ1+∫τ1τ2((∂t∂∇xu)(s,ξ+Ws)+21(∇xΔxu)(s,ξ+Ws))ds+∫τ1τ2(Hessxu)(s,ξ+Ws)dWs (10)(10)(10)
2.4 合并的PDE-2BSDE系统的前向离散化(Forward-discretization of the merged PDE-2BSDE system)
在本节,推导了合并的PDE-2BSDE系统(9)–(10)的正向离散化。
设 t0,t1,…,tN∈[0,T]t_{0}, t_{1}, \ldots, t_{N} \in[0, T]t0,t1,…,tN∈[0,T] 为实数,其中
0=t0<t1<…<tN=T0=t_{0}<t_{1}<\ldots<t_{N}=T0=t0<t1<…<tN=T (11)(11)(11)
时间间隔足够小,即 (tk+1−tk)\left(t_{k+1}-t_{k}\right)(tk+1−tk) (0≤k≤N0 \leq k \leq N0≤k≤N)足够小;
注意,(9)和(10)表明对于足够大的 N∈NN \in \mathbb{N}N∈N ,它对任意n∈{0,1,…,N−1}n \in\{0,1, \ldots, N-1\}n∈{0,1,…,N−1}都有如下式子成立:
Ytn+1≈Ytn+(f(tn,ξ+Wtn,Ytn,Ztn,(Hessxu)(tn,ξ+Wtn))+12Trace((Hessxu)(tn,ξ+Wtn)))(tn+1−tn)+⟨Ztn,Wtn+1−Wtn⟩Rd\begin{aligned} Y_{t_{n+1}} \approx & Y_{t_{n}}+\left(f\left(t_{n}, \xi+W_{t_{n}}, Y_{t_{n}}, Z_{t_{n}},\left(\operatorname{Hess}_{x} u\right)\left(t_{n}, \xi+W_{t_{n}}\right)\right)\right.\\ &\left.+\frac{1}{2} \operatorname{Trace}\left(\left(\operatorname{Hess}_{x} u\right)\left(t_{n}, \xi+W_{t_{n}}\right)\right)\right)\left(t_{n+1}-t_{n}\right)+\left\langle Z_{t_{n}}, W_{t_{n+1}}-W_{t_{n}}\right\rangle_{\mathbb{R}^{d}} \end{aligned}Ytn+1≈Ytn+(f(tn,ξ+Wtn,Ytn,Ztn,(Hessxu)(tn,ξ+Wtn))+21Trace((Hessxu)(tn,ξ+Wtn)))(tn+1−tn)+⟨Ztn,Wtn+1−Wtn⟩Rd (12)(12)(12)
Ztn+1≈Ztn+((∂∂t∇xu)(tn,ξ+Wtn)+12(∇xΔxu)(tn,ξ+Wtn))(tn+1−tn)+(Hessxu)(tn,ξ+Wtn)(Wtn+1−Wtn)\begin{aligned} Z_{t_{n+1}} \approx & Z_{t_{n}}+\left(\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(t_{n}, \xi+W_{t_{n}}\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(t_{n}, \xi+W_{t_{n}}\right)\right)\left(t_{n+1}-t_{n}\right) \\ &+\left(\operatorname{Hess}_{x} u\right)\left(t_{n}, \xi+W_{t_{n}}\right)\left(W_{t_{n+1}}-W_{t_{n}}\right) \end{aligned}Ztn+1≈Ztn+((∂t∂∇xu)(tn,ξ+Wtn)+21(∇xΔxu)(tn,ξ+Wtn))(tn+1−tn)+(Hessxu)(tn,ξ+Wtn)(Wtn+1−Wtn) (13)(13)(13)
2.5 深度学习近似 (Deep learning approximations)
下一步,对每个 n∈{0,1,…,N−1}n \in\{0,1, \ldots, N-1\}n∈{0,1,…,N−1} 使用合适的近似函数
Rd∋x↦(Hessxu)(tn,x)∈Rd×d\mathbb{R}^{d} \ni x \mapsto\left(\operatorname{Hess}_{x} u\right)\left(t_{n}, x\right) \in \mathbb{R}^{d \times d}Rd∋x↦(Hessxu)(tn,x)∈Rd×d (14)(14)(14)
Rd∋x↦(∂∂t∇xu)(tn,x)+12(∇xΔxu)(tn,x)∈Rd\mathbb{R}^{d} \ni x \mapsto\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(t_{n}, x\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(t_{n}, x\right) \in \mathbb{R}^{d}Rd∋x↦(∂t∂∇xu)(tn,x)+21(∇xΔxu)(tn,x)∈Rd (15)(15)(15)
令 ν∈N∩[d+1,∞)\nu \in \mathbb{N} \cap[d+1, \infty)ν∈N∩[d+1,∞) ;
对每一个 θ∈Rν\theta \in \mathbb{R}^{\nu}θ∈Rν ,n∈{0,1,…,N}n \in\{0,1, \ldots, N\}n∈{0,1,…,N} ,令 Gnθ:Rd→Rd×d\mathbb{G}_{n}^{\theta}: \mathbb{R}^{d} \rightarrow \mathbb{R}^{d \times d}Gnθ:Rd→Rd×d 且 Anθ:Rd→Rd\mathbb{A}_{n}^{\theta}: \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}Anθ:Rd→Rd 都为连续函数;
对每一个 θ=(θ1,θ2,…,θν)∈Rν\theta = \left(\theta_{1}, \theta_{2}, \ldots, \theta_{\nu}\right) \in \mathbb{R}^{\nu}θ=(θ1,θ2,…,θν)∈Rν,令 Yθ:{0,1,…,N}×Ω→R\mathcal{Y}^{\theta}:\{0,1, \ldots, N\} \times \Omega \rightarrow \mathbb{R}Yθ:{0,1,…,N}×Ω→R,Zθ:{0,1,…,N}×Ω→Rd\mathcal{Z}^{\theta}:\{0,1, \ldots, N\} \times \Omega \rightarrow \mathbb{R^d}Zθ:{0,1,…,N}×Ω→Rd 是随机过程 满足 Y0θ=θ1\mathcal{Y}^{\theta}_0={\theta}_1Y0θ=θ1 和 Z0θ=(θ2,θ3,…,θd+1)\mathcal{Z}^{\theta}_0=\left(\theta_{2}, \theta_{3}, \ldots, \theta_{d+1}\right)Z0θ=(θ2,θ3,…,θd+1) 且对任意 n∈{0,1,…,N}n \in\{0,1, \ldots, N\}n∈{0,1,…,N} 满足:
Yn+1θ=Ynθ+⟨Znθ,Wtn+1−Wtn⟩Rd\mathcal{Y}^{\theta}_{n+1}=\mathcal{Y}^{\theta}_{n}+\left\langle Z^{\theta}_{n}, W_{t_{n+1}}-W_{t_n}\right\rangle_{\mathbb{R}^{d}}Yn+1θ=Ynθ+⟨Znθ,Wtn+1−Wtn⟩Rd
+(f(tn,ξ+Wtn,Ynθ,Znθ,Gnθ(ξ+Wtn))+12Trace(Gnθ(ξ+Wtn)))(tn+1−tn)+\left(f\left(t_n, \xi+W_{t_n}, \mathcal{Y}^{\theta}_{n}, \mathcal{Z}^{\theta}_{n},\mathbb{G}_{n}^{\theta} \left(\xi+W_{t_n}\right)\right)+\frac{1}{2} \operatorname{Trace}\left(\operatorname\mathbb{G}_{n}^{\theta}\left(\xi+W_{t_n}\right)\right)\right)(t_{n+1}-t_n)+(f(tn,ξ+Wtn,Ynθ,Znθ,Gnθ(ξ+Wtn))+21Trace(Gnθ(ξ+Wtn)))(tn+1−tn) (16)(16)(16)
Zn+1θ=Znθ+Anθ(ξ+Wtn)(tn+1−tn)+Gnθ(ξ+Wtn)(Wtn+1−Wtn)\mathcal{Z}_{n+1}^{\theta}=\mathcal{Z}_{n}^{\theta}+\mathbb{A}_{n}^{\theta}\left(\xi+W_{t_{n}}\right)\left(t_{n+1}-t_{n}\right)+\mathbb{G}_{n}^{\theta}\left(\xi+W_{t_{n}}\right)\left(W_{t_{n+1}}-W_{t_{n}}\right)Zn+1θ=Znθ+Anθ(ξ+Wtn)(tn+1−tn)+Gnθ(ξ+Wtn)(Wtn+1−Wtn) (17)(17)(17)
对于所有合适的 θ∈Rνθ∈\mathbb R^νθ∈Rν 和所有 n∈{0,1,…,N}n\in\{0,1,\dots,N\}n∈{0,1,…,N},认为 Ynθ:Ω→R\mathcal Y^θ_n:Ω→\mathbb RYnθ:Ω→R 是 Ytn:Ω→RY_{t_n}:Ω→\mathbb RYtn:Ω→R 合适的近似值:
Ynθ≈Ytn\mathcal Y^θ_n ≈ Y_{t_n}Ynθ≈Ytn (18)(18)(18)
对于所有合适的 θ∈Rνθ∈\mathbb R^νθ∈Rν 和所有 n∈{0,1,…,N}n\in\{0,1,\dots,N\}n∈{0,1,…,N},认为 Znθ:Ω→Rd\mathcal Z^θ_n:Ω→\mathbb R^dZnθ:Ω→Rd 是 Ztn:Ω→RdZ_{t_n}:Ω→\mathbb R^dZtn:Ω→Rd 合适的近似值:
Znθ≈Ztn\mathcal Z^θ_n ≈ Z_{t_n}Znθ≈Ztn (19)(19)(19)
对于所有合适的 θ∈Rν,x∈Rdθ∈\mathbb R^ν, x∈\mathbb R^dθ∈Rν,x∈Rd 和所有 n∈{0,1,…,N−1}n\in\{0,1,\dots,N-1\}n∈{0,1,…,N−1},认为 Gnθ(x)∈Rd×d\mathbb G^θ_n(x)\in\mathbb R^{d \times d}Gnθ(x)∈Rd×d 是 (Hessxu)(tn,x)∈Rd×d\left(\operatorname{Hess}_{x} u\right)\left(t_{n}, x\right) \in \mathbb{R}^{d \times d}(Hessxu)(tn,x)∈Rd×d 合适的近似值:
Gnθ(x)≈(Hessxu)(tn,x)\mathbb G^θ_n(x) ≈ \left(\operatorname{Hess}_{x} u\right)\left(t_{n}, x\right)Gnθ(x)≈(Hessxu)(tn,x) (20)(20)(20)
对于所有合适的 θ∈Rν,x∈Rdθ∈\mathbb R^ν, x∈\mathbb R^dθ∈Rν,x∈Rd 和所有 n∈{0,1,…,N−1}n\in\{0,1,\dots,N-1\}n∈{0,1,…,N−1},认为 Anθ(x)∈Rd\mathbb A^θ_n(x)\in\mathbb R^{d}Anθ(x)∈Rd 是 (∂∂t∇xu)(tn,x)+12(∇xΔxu)(tn,x)∈Rd\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(t_{n}, x\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(t_{n}, x\right) \in \mathbb{R}^{d}(∂t∂∇xu)(tn,x)+21(∇xΔxu)(tn,x)∈Rd 合适的近似值:
Anθ(x)≈(∂∂t∇xu)(tn,x)+12(∇xΔxu)(tn,x)\mathbb A^θ_n(x) ≈\left(\frac{\partial}{\partial t} \nabla_{x} u\right)\left(t_{n}, x\right)+\frac{1}{2}\left(\nabla_{x} \Delta_{x} u\right)\left(t_{n}, x\right)Anθ(x)≈(∂t∂∇xu)(tn,x)+21(∇xΔxu)(tn,x) (21)(21)(21)
特别地,将 θ1\theta_1θ1 视为 u(0,ξ)∈Rdu(0,ξ)\in\mathbb R^du(0,ξ)∈Rd 的近似值:
θ1≈u(0,ξ)\theta_1≈u(0,ξ)θ1≈u(0,ξ) (22)(22)(22)
将 (θ2,θ3,…,θd+1)\left(\theta_{2}, \theta_{3}, \ldots, \theta_{d+1}\right)(θ2,θ3,…,θd+1) 视为 (∇xu)(0,ξ)∈Rd(\nabla_{x} u)(0,ξ)\in\mathbb{R}^{d}(∇xu)(0,ξ)∈Rd 的近似:
(θ2,θ3,…,θd+1)≈(∇xu)(0,ξ)\left(\theta_{2}, \theta_{3}, \ldots, \theta_{d+1}\right)≈(\nabla_{x} u)(0,ξ)(θ2,θ3,…,θd+1)≈(∇xu)(0,ξ) (23)(23)(23)
现在为每个 n∈{0,1,…,N−1}n\in\{0,1,\dots,N-1\}n∈{0,1,…,N−1} 选择函数 Gnθ\mathbb G^θ_nGnθ 和 Anθ\mathbb A^θ_nAnθ 作为深度神经网络(参见,[8,67])。
例如,对于每个 k∈Nk∈\mathbb Nk∈N,令 Rk:Rk→Rk\mathcal R_k:\mathbb R^k→\mathbb R^kRk:Rk→Rk 是函数,对所有 x=(x1,…,xk)∈Rkx =(x_1, \dots, x_k) \in \mathbb R^kx=(x1,…,xk)∈Rk 满足:
Rk(x)=(max{x1,0},…,max{xk,0})\mathcal R_k(x)=(\max\{x_1,0\},\dots,\max\{x_k,0\})Rk(x)=(max{x1,0},…,max{xk,0}) (24)(24)(24)
对每个 θ=(θ1,…,θν)∈Rν\theta=\left(\theta_{1},\ldots, \theta_{\nu}\right) \in \mathbb R^{\nu}θ=(θ1,…,θν)∈Rν,v∈N0={0}∪Nv \in \mathbb{N}_{0}=\{0\} \cup \mathbb{N}v∈N0={0}∪N ,kkk ,l∈Nl\in \mathbb{N}l∈N 且 v+k(l+1)≤νv+k(l+1) \leq \nuv+k(l+1)≤ν ,令 Mk,lθ,v:Rl→RkM^{\theta,v}_{k,l}:\mathbb R_l→\mathbb R^kMk,lθ,v:Rl→Rk 是仿射线性函数,对所有的 x=(x1,…,xl)x=(x_1,\dots,x_l)x=(x1,…,xl) 满足:
Mk,lθ,v(x)=(θv+1θv+2…θv+lθv+l+1θv+l+2…θv+2lθv+2l+1θv+2l+2…θv+3l⋮⋮⋮⋮θv+(k−1)l+1θv+(k−1)l+2…θv+kl)(x1x2x3⋮xl)+(θv+kl+1θv+kl+2θv+kl+3⋮θv+kl+k)M_{k, l}^{\theta, v}(x)=\left(\begin{array}{cccc}\theta_{v+1} & \theta_{v+2} & \ldots & \theta_{v+l} \\ \theta_{v+l+1} & \theta_{v+l+2} & \ldots & \theta_{v+2 l} \\ \theta_{v+2 l+1} & \theta_{v+2 l+2} & \ldots & \theta_{v+3 l} \\ \vdots & \vdots & \vdots & \vdots \\ \theta_{v+(k-1) l+1} & \theta_{v+(k-1) l+2} & \ldots & \theta_{v+k l}\end{array}\right)\left(\begin{array}{c}x_{1} \\ x_{2} \\ x_{3} \\ \vdots \\ x_{l}\end{array}\right)+\left(\begin{array}{c}\theta_{v+k l+1} \\ \theta_{v+k l+2} \\ \theta_{v+k l+3} \\ \vdots \\ \theta_{v+k l+k}\end{array}\right)Mk,lθ,v(x)=⎝⎜⎜⎜⎜⎜⎛θv+1θv+l+1θv+2l+1⋮θv+(k−1)l+1θv+2θv+l+2θv+2l+2⋮θv+(k−1)l+2………⋮…θv+lθv+2lθv+3l⋮θv+kl⎠⎟⎟⎟⎟⎟⎞⎝⎜⎜⎜⎜⎜⎛x1x2x3⋮xl⎠⎟⎟⎟⎟⎟⎞+⎝⎜⎜⎜⎜⎜⎛θv+kl+1θv+kl+2θv+kl+3⋮θv+kl+k⎠⎟⎟⎟⎟⎟⎞ (25)(25)(25)
假设 ν≥(5Nd+Nd2+1)(d+1)\nu \geq\left(5 N d+N d^{2}+1\right)(d+1)ν≥(5Nd+Nd2+1)(d+1) ,且假设对所有 θ∈R\theta \in \mathbb Rθ∈R , n∈{m∈N:m<N}n \in \{m \in \mathbb N:m<N\}n∈{m∈N:m<N} ,x∈Rdx\in \mathbb R^dx∈Rd ,有:
Anθ=Md,dθ,[(2N+n)d+1](d+1)∘Rd∘Md,dθ,[(N+n)d+1](d+1)∘Rd∘Md,dθ,(nd+1)(d+1)\mathbb{A}_{n}^{\theta}=M_{d, d}^{\theta,[(2 N+n) d+1](d+1)} \circ \mathcal{R}_{d} \circ M_{d, d}^{\theta,[(N+n) d+1](d+1)} \circ \mathcal{R}_{d} \circ M_{d, d}^{\theta,(n d+1)(d+1)}Anθ=Md,dθ,[(2N+n)d+1](d+1)∘Rd∘Md,dθ,[(N+n)d+1](d+1)∘Rd∘Md,dθ,(nd+1)(d+1) (26)(26)(26)
Gnθ=Md2,dθ,(5Nd+nd2+1)(d+1)∘Rd∘Md,dθ,[(4N+n)d+1](d+1)∘Rd∘Md,dθ,[(3N+n)d+1](d+1)\mathbb{G}_{n}^{\theta}=M_{d^2, d}^{\theta,(5Nd+nd^2+1)(d+1)} \circ \mathcal{R}_{d} \circ M_{d, d}^{\theta,[(4N+n) d+1](d+1)} \circ \mathcal{R}_{d} \circ M_{d, d}^{\theta,[(3N+n)d+1](d+1)}Gnθ=Md2,dθ,(5Nd+nd2+1)(d+1)∘Rd∘Md,dθ,[(4N+n)d+1](d+1)∘Rd∘Md,dθ,[(3N+n)d+1](d+1) (27)(27)(27)
上述(26)式中的函数是一个4层的神经网络(1个输入层:ddd 个神经元,2个隐藏层:各有ddd 个神经元,1个输出层:ddd 个神经元)且其激活函数为上述(24)式rectifier functions(整流函数) ;上述(27)式中的函数也是一个4层的神经网络(1个输入层:ddd 个神经元,2个隐藏层:各有ddd 个神经元,1个输出层:d2d^2d2 个神经元)且其激活函数为上述(24)式rectifier functions(整流函数) 。
激活函数ReLU(Rectified Linear Unit)
① 表达式: f(x)={0,x≤0x,x>0f(x)=\left\{\begin{array}{ll}0, & x \leq 0 \\ x, & x>0\end{array}\right.f(x)={0,x,x≤0x>0
② 图像:
2.6 随机梯度下降优化 (Stochastic gradient descent-type optimization)
对下面函数应用随机梯度下降最小化算法求得合适的 θθθ
Rν∋θ↦E[∣YNθ−g(ξ+WtN)∣2]∈R\mathbb{R}^{\nu} \ni \theta \mapsto \mathbb{E}\left[\left|\mathcal{Y}_{N}^{\theta}-g\left(\xi+W_{t_{N}}\right)\right|^{2}\right] \in \mathbb{R}Rν∋θ↦E[∣∣YNθ−g(ξ+WtN)∣∣2]∈R (28)(28)(28)
E[∣YT−g(ξ+WT)∣2]=0\mathbb{E}\left[\left|Y_{T}-g\left(\xi+W_{T}\right)\right|^{2}\right]=0E[∣YT−g(ξ+WT)∣2]=0 (29)(29)(29)
2.7 特定情况下的算法框架 (Framework for the algorithm in a specific case)
Machine learning approximation algorithmsfor high-dimensional fully nonlinear PDE相关推荐
- Paper:《Multimodal Machine Learning: A Survey and Taxonomy,多模态机器学习:综述与分类》翻译与解读
Paper:<Multimodal Machine Learning: A Survey and Taxonomy,多模态机器学习:综述与分类>翻译与解读 目录 <Multimoda ...
- 【Paper】ConvLSTM:Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
论文原文 论文下载 论文被引:1651(2020/03/01) 4827(2022/03/26) 论文年份:2015 文章目录 Abstract 1 Introduction 2 Preliminar ...
- What are Kernels in Machine Learning and SVM?
一个关于kernel的很好的解析: https://www.quora.com/What-are-Kernels-in-Machine-Learning-and-SVM 将它摘录过来了. What a ...
- 机器学习面试题合集Collection of Machine Learning Interview Questions
The Machine Learning part of the interview is usually the most elaborate one. That's the reason we h ...
- Machine Learning week 8 quiz: programming assignment-K-Means Clustering and PCA
一.ex7.m %% Machine Learning Online Class % Exercise 7 | Principle Component Analysis and K-Means Clu ...
- Paper:《A Few Useful Things to Know About Machine Learning—关于机器学习的一些有用的知识》翻译与解读
Paper:<A Few Useful Things to Know About Machine Learning-关于机器学习的一些有用的知识>翻译与解读 目录 <A Fe ...
- Where Can Machine Learning Help Robotic State Estimation 机器学习在机器人状态估计的应用
Where Can Machine Learning Help Robotic State Estimation Tim Barfoot 关于机器学习在机器人状态估计中应用的报告演讲.演讲时间2021 ...
- 【论文翻译】Machine learning: Trends, perspectives, and prospects
论文题目:Machine learning: Trends, perspectives, and prospects 论文来源:Machine learning: Trends, perspectiv ...
- The Dimpled Manifold Model of Adversarial Examples in Machine Learning 文献阅读
注:本文是楼主在原文的基础上,结合网上内容理解整理的.该文不一定准确,仅供各位参考,欢迎批评指正!另外,禁止商业用途的转载,谢谢! 目录 写在前面 1. 核心思想 1.1. 概念介绍 (Dimpled ...
- Machine Learning Summary
Machine Learning Summary General Idea No Free Lunch Theorem (no "best") CV for complex par ...
最新文章
- 了解OutOfMemoryError
- 如何获取握手包_白话详解TCP的三次握手到底做了些什么
- 升级python3后yum出现异常解决办法
- DNS子域授权及view(三)
- 网信金融 与 金银猫合作
- 贪心算法(leetcode分类解题,C++代码详细注释)
- 前端开发1之Node.js以及npm开发环境搭建
- ssh:Permissions 0644 for ‘/root/.ssh/id_rsa’ are too open
- 《Effective C#》Item 20:区分接口实现与虚函数重载
- java导出excel超出65536条处理
- 数控常见的几种国内外数控系统,你都熟悉吗
- AssetBundle接口详解与优化
- Vue3 script setup
- YUV和RGB调节色彩公式
- 三星pay显示连接服务器失败,Samsung Pay刚刚上线就遇绑卡失败,三星闹哪样?
- 胡昊—第8次作业--继承
- 微信的营销策略你了解多少?
- [ipsec][crypto] 什么是AEAD加密算法中的AAD 及aad length
- (理财七)如何挑选合适的贷款
- 怎么挑选一部适合自己的全景相机?
热门文章
- 什么是IPS?如何对其进行调整?
- 黑盒白盒软件测试报告,黑盒白盒测试报告结果_白盒测试黑盒测试方法_软件黑盒和白盒测试...
- 腐蚀rust服务器命令_服务器指令_腐蚀rust服务器命令一览 腐蚀rust有哪些服务器命令_3DM单机...
- linux长传大文件,linux大文件传输
- vb.net设置分辨率和缩放比例_配置高不一定性能强,Win 10做好这些设置才能“6到飞起”!...
- php中用div句子给背景图片添加文字,PHP给图片添加文字水印实例
- 怎么在HTML图片中加文字,网页制作时如何给图片添加文字?
- 加载webView使用框架AgenWeb
- IndentationError: expected an indented block 解决
- C语言利用回调函数实现qsort函数