李宏毅一天搞懂机器学习PPT，SildeShare链接：https://www.slideshare.net/tw_dsconf/ss-62245351?qid=108adce3-2c3d-4758-a830-95d0a57e46bc&v=&b=&from_search=3
也可以在csdn下载中下载（资源附学习笔记全文）：https://download.csdn.net/download/wozaipermanent/11998637

1 Introduction of Deep Learning

1.1 Three Steps for Deep Learning

Step1: define a set of function (Neural Network)
Step2: goodness of function
Step3: pick the best function

1.2 Step1: Neural Network

1.2.1 Fully Connect Feedforward Network

1.2.2 Output Layer(Option)

Softmax(归一化指数函数)：它能将一个含任意实数的k维向量Z“压缩”到另一个k维向量σ(Z)\sigma(Z)σ(Z)中，使得每一个元素的范围都在(0, 1)之间，并且所有元素的和为1。

1.2.3 Example Application

Handwriting Digit Recognition

1.3 Step2: Goodness of Function

1.3.1 Learning Target

1.3.2 Loss

Total Loss:

1.4 Step3: Pick the Best Function

1.4.1 Gradient Descent

RBM(Restricted Boltzmann Machine): 受限玻尔兹曼机，这部分可以参考链接：https://zhuanlan.zhihu.com/p/22794772
Then Compute ∂L/∂w\partial L / \partial w∂L/∂w , if Negative then Increase w; elif Positive then decrease w

η\etaη is called “learning rate”

Gradient Descent Diagram:

Randomly pick a starting point

1.4.2 Gradient Descent Difficulty

Backpropagation(反向传播算法)：an efficient way to compute ∂L/∂w\partial L / \partial w∂L/∂w , link below:
- http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html
- https://www.jianshu.com/p/2e02bc6384a8

1.5 Deep is Better

1.5.1 Universality Theorem

Any continuous function f:RN→RMf: R^N → R^Mf:RN→RM can be realized by a network with one hidden layer (given enough hidden neurons). Ref: http://neuralnetworksanddeeplearning.com/chap4.html

1.5.2 Thin + Tall is Better

Neural network consists of neurons
A hidden layer network can represent any continuous function
Using multiple layers of neurons to represent some functions are much simper
Less parameters, less data

1.5.3 Modularization

1.6 Toolkit

1.6.1 Keras

Documentation: https://keras.io or https://morvanzhou.github.io/tutorials/machine-learning/keras/

1.6.2 Example of Handwriting Digit Recognition

Step1: define a set of function

Step2: goodness of function

Step3: pick the best function

Testing

score = model.evaluate(x_test, y_test)
print('Total loss on Testing Set: ', score[0])
print('Accuracy of Testing Set: ', score[1])
result = model.predict(x_test)

1.6.3 GPU to Speeding Training

Way1

THEANO_FLAGGS=device=gpu0 python YourCode.py

Way2

import os
os.environ["THEANO_FLAGS"] = "device=gpu0"

2 Tips for Training Deep Neural Network

2.1 Good Results on Training Data

2.1.1 Choosing Proper Loss

2.1.2 Mini-Batch

2.1.3 New Activation Function

Vanishing Gradient Problem

ReLU

model.add(Activation('sigmoid'))
model.add(Activation('relu'))

ReLU - variant

2.1.4 Adaptive Learning Rate

Learning Rates

If learning rate is too large, total loss may not decrease after each update
If learning rate is too small, training would be too slow

Adagrad

Notes:

Learning rate is smaller and smaller for all parameters
Smaller derivatives, larger learning rate, and vice versa

2.1.5 Momentum

Adam: RMSProp (Advanced Adagrad) + Momentum. Adam (Adaptive Moment Estimation)本质上是带有动量项的RMSprop，它利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率。Adam的优点主要在于经过偏置校正后，每一次迭代学习率都有个确定范围，使得参数比较平稳。

2.2 Good Results on Testing Data

2.2.1 Early Stopping

Why Overfitting

Learning target is defined by the training data.
The parameters achieving the learning target do not necessary have good results on the testing data.

Early Stopping

2.2.2 Weight Decay

Weight decay is one kind of regularization.

Our brain prunes out the useless link between neurons.
Doing the same thing to machine’s brain imporves the performance.

2.2.3 Dropout

Training

Each time before updating the parameters
- Each neuron has p% to dropout
  - The structure of the network is changed.
- Using the new network for training
For each mini-batch, we resample the dropout neurons

Testing

Dropout - Intuitive Reason

Drop is a Kind of Ensemble

Try It

2.2.4 Network Structure

e.g. CNN is another good example.

3 Variants of Neural Network

3.1 Convolutional Neural Network (CNN)

3.1.1 Why CNN for Image

When processing image, the first layer of fully connected network would be very large.
Some patterns are much smaller than the whole image. A neuron does not have to see the whole image to discover the pattern.
The same patterns appear in different regions.
Subsampling the pixels will not change the object, so we can subsample the pixels to make image smaller.

3.1.2 Three Steps

Step1: Convolutional Neural Network

Convolution

Max Pooling

Smaller than the original image.
The number of the channel is the number of filters.

Flatten

Summary

Step2: goodness of function & Step3: pick the best function

3.2 Recurrent Neural Network (RNN)

Step1: Recurrent Neural Network

LSTM

Step2: goodness of function

Step3 : pick the best function

4 Next Wave

4.1 Supervised Learning

4.1.1 Ultra Deep Network

4.1.2 Attention Model

4.2 Reinforcement Learning

4.2.1 Scenario of Reinforcement Learning

4.2.2 Supervised v.s. Reinforcement

4.2.3 Difficulties of Reinforcement Learning

It may be better to sacrifice immediate reward to gain more long-term reward.
Agent’s actions affect the subsequent data it receives.

4.3 Unsupervised Learning

4.3.1 Image: Realizing what the World Looks Like

4.3.2 Text: Understanding the Meaning of Words

Machine learn the meaning of words from reading a lot of documents without supervision
A word can be understood by its context

4.3.3 Audio: Learning Human Language Without Supervision

Audio segment corresponding to an unknown word (Fixed-length vector)
The audio segments correspondsing to words with similar pronunciations are close to each other.

李宏毅——一天搞懂深度学习PPT学习笔记相关推荐

太强了！李宏毅：1 天搞懂深度学习，我总结了 300 页 PPT
<1 天搞懂深度学习>,300 多页的 ppt,台湾李宏毅教授写的,非常棒.不夸张地说,是我看过最系统,也最通俗易懂的,关于深度学习的文章. 这份 300 页的 PPT,被搬运到了 Sli ...
下载 | 李宏毅：1 天搞懂深度学习，我总结了 300 页 PPT
<1 天搞懂深度学习>,300 多页的 ppt,台湾李宏毅教授写的,非常棒.不夸张地说,是我看过最系统,也最通俗易懂的,关于深度学习的文章. 这份 300 页的 PPT,被搬运到了 Sli ...
【深度学习】李宏毅：1 天搞懂深度学习，我总结了 300 页 PPT（附思维导图）...
转载自:机器学习算法那些事 ID:Charlotte77 公众号:Charlotte数据挖掘 By Charlotte77 前言:李宏毅的教材,非常经典,B站有配套视频,文末附下载链接! ...
干货 | 台大“一天搞懂深度学习”课程PPT（下载方式见文末！！）
微信公众号关键字全网搜索最新排名 [机器学习算法]:排名第一 [机器学习]:排名第一 [Python]:排名第三 [算法]:排名第四 Deep Learing Tutorial 本篇文章我们给出了一 ...
一文搞懂深度学习正则化的L2范数
想要彻底弄明白L2范数,必须要有一定的矩阵论知识,L2范数涉及了很多的矩阵变换.在我们进行数学公式的推到之前,我们先对L2范数有一个感性的认识. L2范数是什么? L2范数的定义其实是一个数学概念,其 ...
[1天搞懂深度学习] 读书笔记 lecture I:Introduction of deep learning
- 通常机器学习,目的是,找到一个函数,针对任何输入:语音,图片,文字,都能够自动输出正确的结果. - 而我们可以弄一个函数集合,这个集合针对同一个猫的图片的输入,可能有多种输出,比如猫,狗,猴子等, ...
计算机科学CSTA,学编程，搞懂CSTA K-12计算机科学学习标准
科学信息技术逐步成为现代人生活和经济的核心.不论是为了适应频繁使用计算机的当今社会,还是为了将来的职业做好准备,学生们都必须对计算机科学原理和实践拥有一个更加清晰的理解.在人们对于信息技术教育的普及与 ...
一文搞懂mysql：mysql学习目录链接大全
之前学习了mysql.整理出来分享给大家. 序号名字 1 mysql数据库入门教程(1):数据库的相关概念,存储特点,软件安装教程,数据库启动,服务端登录退出 2 mysql数据库入门教程(2):常 ...
搞懂深度网络初始化（Xavier and Kaiming initialization）
参数初始化就是这么一个容易被忽视的重要因素,因为不仅使用者对其重要性缺乏概念,而且这些操作都被TF.pytorch这些框架封装了,你可能不知道的是,糟糕的参数初始化是会阻碍复杂非线性系统的训练的. 本 ...

李宏毅——一天搞懂深度学习PPT学习笔记

1 Introduction of Deep Learning

1.1 Three Steps for Deep Learning

1.2 Step1: Neural Network

1.2.1 Fully Connect Feedforward Network

1.2.2 Output Layer(Option)

1.2.3 Example Application

1.3 Step2: Goodness of Function

1.3.1 Learning Target

1.3.2 Loss

1.4 Step3: Pick the Best Function

1.4.1 Gradient Descent

1.4.2 Gradient Descent Difficulty

1.5 Deep is Better

1.5.1 Universality Theorem

1.5.2 Thin + Tall is Better

1.5.3 Modularization

1.6 Toolkit

1.6.1 Keras

1.6.2 Example of Handwriting Digit Recognition

Step1: define a set of function

Step2: goodness of function

Step3: pick the best function

Testing

1.6.3 GPU to Speeding Training

2 Tips for Training Deep Neural Network

2.1 Good Results on Training Data

2.1.1 Choosing Proper Loss

2.1.2 Mini-Batch

2.1.3 New Activation Function

Vanishing Gradient Problem

ReLU

ReLU - variant

2.1.4 Adaptive Learning Rate

Learning Rates

Adagrad

2.1.5 Momentum

2.2 Good Results on Testing Data

2.2.1 Early Stopping

Why Overfitting

Early Stopping

2.2.2 Weight Decay

2.2.3 Dropout

Training

Testing

Dropout - Intuitive Reason

Drop is a Kind of Ensemble

Try It

2.2.4 Network Structure

3 Variants of Neural Network

3.1 Convolutional Neural Network (CNN)

3.1.1 Why CNN for Image

3.1.2 Three Steps

Step1: Convolutional Neural Network

Convolution

Max Pooling

Flatten

Summary

Step2: goodness of function & Step3: pick the best function

3.2 Recurrent Neural Network (RNN)

Step1: Recurrent Neural Network

LSTM

Step2: goodness of function

Step3 : pick the best function

4 Next Wave

4.1 Supervised Learning

4.1.1 Ultra Deep Network

4.1.2 Attention Model

4.2 Reinforcement Learning

4.2.1 Scenario of Reinforcement Learning

4.2.2 Supervised v.s. Reinforcement

4.2.3 Difficulties of Reinforcement Learning

4.3 Unsupervised Learning

4.3.1 Image: Realizing what the World Looks Like

4.3.2 Text: Understanding the Meaning of Words

4.3.3 Audio: Learning Human Language Without Supervision

李宏毅——一天搞懂深度学习PPT学习笔记相关推荐

最新文章

热门文章