
  • note: 这三个因素背后就有数学背景,也就是说想要研究深度学习的可解释性,作者认为可以从三个角度来做。
  1. architecture
  2. regularization techniques
  3. optimization

1. architecture

2. regularization techniques

3. optimization properties

使用向后传播来训练一个神经网络,其中最经典就是梯度下降法(one of those optimizers)。然后作者就讲述来优化器的演化。GD —> SGD —> entropy-SGD

Paper outline

  • Section II describes the input-output map of a deep network.
  • Section III studies the problem of training deep networks and establishes conditions for global optimality.
  • Se

