李宏毅——一天搞懂深度学习PPT学习笔记
李宏毅一天搞懂机器学习PPT,SildeShare链接:https://www.slideshare.net/tw_dsconf/ss-62245351?qid=108adce3-2c3d-4758-a830-95d0a57e46bc&v=&b=&from_search=3
也可以在csdn下载中下载(资源附学习笔记全文):https://download.csdn.net/download/wozaipermanent/11998637
1 Introduction of Deep Learning
1.1 Three Steps for Deep Learning
- Step1: define a set of function (Neural Network)
- Step2: goodness of function
- Step3: pick the best function
1.2 Step1: Neural Network
1.2.1 Fully Connect Feedforward Network
1.2.2 Output Layer(Option)
- Softmax(归一化指数函数):它能将一个含任意实数的k维向量Z“压缩”到另一个k维向量σ(Z)\sigma(Z)σ(Z)中,使得每一个元素的范围都在(0, 1)之间,并且所有元素的和为1。
1.2.3 Example Application
- Handwriting Digit Recognition
1.3 Step2: Goodness of Function
1.3.1 Learning Target
1.3.2 Loss
- Total Loss:
1.4 Step3: Pick the Best Function
1.4.1 Gradient Descent
RBM(Restricted Boltzmann Machine): 受限玻尔兹曼机,这部分可以参考链接:https://zhuanlan.zhihu.com/p/22794772
Then Compute ∂L/∂w\partial L / \partial w∂L/∂w , if Negative then Increase w; elif Positive then decrease w
- η\etaη is called “learning rate”
Gradient Descent Diagram:
- Randomly pick a starting point
1.4.2 Gradient Descent Difficulty
- Backpropagation(反向传播算法):an efficient way to compute ∂L/∂w\partial L / \partial w∂L/∂w , link below:
- http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html
- https://www.jianshu.com/p/2e02bc6384a8
1.5 Deep is Better
1.5.1 Universality Theorem
- Any continuous function f:RN→RMf: R^N → R^Mf:RN→RM can be realized by a network with one hidden layer (given enough hidden neurons). Ref: http://neuralnetworksanddeeplearning.com/chap4.html
1.5.2 Thin + Tall is Better
Neural network consists of neurons
A hidden layer network can represent any continuous function
Using multiple layers of neurons to represent some functions are much simper
Less parameters, less data
1.5.3 Modularization
1.6 Toolkit
1.6.1 Keras
- Documentation: https://keras.io or https://morvanzhou.github.io/tutorials/machine-learning/keras/
1.6.2 Example of Handwriting Digit Recognition
Step1: define a set of function
Step2: goodness of function
Step3: pick the best function
Testing
score = model.evaluate(x_test, y_test)
print('Total loss on Testing Set: ', score[0])
print('Accuracy of Testing Set: ', score[1])
result = model.predict(x_test)
1.6.3 GPU to Speeding Training
Way1
THEANO_FLAGGS=device=gpu0 python YourCode.py
Way2
import os os.environ["THEANO_FLAGS"] = "device=gpu0"
2 Tips for Training Deep Neural Network
2.1 Good Results on Training Data
2.1.1 Choosing Proper Loss
2.1.2 Mini-Batch
2.1.3 New Activation Function
Vanishing Gradient Problem
ReLU
model.add(Activation('sigmoid'))
model.add(Activation('relu'))
ReLU - variant
2.1.4 Adaptive Learning Rate
Learning Rates
- If learning rate is too large, total loss may not decrease after each update
- If learning rate is too small, training would be too slow
Adagrad
Notes:
- Learning rate is smaller and smaller for all parameters
- Smaller derivatives, larger learning rate, and vice versa
2.1.5 Momentum
- Adam: RMSProp (Advanced Adagrad) + Momentum. Adam (Adaptive Moment Estimation)本质上是带有动量项的RMSprop,它利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率。Adam的优点主要在于经过偏置校正后,每一次迭代学习率都有个确定范围,使得参数比较平稳。
2.2 Good Results on Testing Data
2.2.1 Early Stopping
Why Overfitting
- Learning target is defined by the training data.
- The parameters achieving the learning target do not necessary have good results on the testing data.
Early Stopping
2.2.2 Weight Decay
Weight decay is one kind of regularization.
- Our brain prunes out the useless link between neurons.
- Doing the same thing to machine’s brain imporves the performance.
2.2.3 Dropout
Training
Each time before updating the parameters
- Each neuron has p% to dropout
- The structure of the network is changed.
- Using the new network for training
- Each neuron has p% to dropout
For each mini-batch, we resample the dropout neurons
Testing
Dropout - Intuitive Reason
Drop is a Kind of Ensemble
Try It
2.2.4 Network Structure
e.g. CNN is another good example.
3 Variants of Neural Network
3.1 Convolutional Neural Network (CNN)
3.1.1 Why CNN for Image
- When processing image, the first layer of fully connected network would be very large.
- Some patterns are much smaller than the whole image. A neuron does not have to see the whole image to discover the pattern.
- The same patterns appear in different regions.
- Subsampling the pixels will not change the object, so we can subsample the pixels to make image smaller.
3.1.2 Three Steps
Step1: Convolutional Neural Network
Convolution
Max Pooling
- Smaller than the original image.
- The number of the channel is the number of filters.
Flatten
Summary
Step2: goodness of function & Step3: pick the best function
3.2 Recurrent Neural Network (RNN)
Step1: Recurrent Neural Network
LSTM
Step2: goodness of function
Step3 : pick the best function
4 Next Wave
4.1 Supervised Learning
4.1.1 Ultra Deep Network
4.1.2 Attention Model
4.2 Reinforcement Learning
4.2.1 Scenario of Reinforcement Learning
4.2.2 Supervised v.s. Reinforcement
4.2.3 Difficulties of Reinforcement Learning
- It may be better to sacrifice immediate reward to gain more long-term reward.
- Agent’s actions affect the subsequent data it receives.
4.3 Unsupervised Learning
4.3.1 Image: Realizing what the World Looks Like
4.3.2 Text: Understanding the Meaning of Words
- Machine learn the meaning of words from reading a lot of documents without supervision
- A word can be understood by its context
4.3.3 Audio: Learning Human Language Without Supervision
- Audio segment corresponding to an unknown word (Fixed-length vector)
- The audio segments correspondsing to words with similar pronunciations are close to each other.
李宏毅——一天搞懂深度学习PPT学习笔记相关推荐
- 太强了! 李宏毅:1 天搞懂深度学习,我总结了 300 页 PPT
<1 天搞懂深度学习>,300 多页的 ppt,台湾李宏毅教授写的,非常棒.不夸张地说,是我看过最系统,也最通俗易懂的,关于深度学习的文章. 这份 300 页的 PPT,被搬运到了 Sli ...
- 下载 | 李宏毅:1 天搞懂深度学习,我总结了 300 页 PPT
<1 天搞懂深度学习>,300 多页的 ppt,台湾李宏毅教授写的,非常棒.不夸张地说,是我看过最系统,也最通俗易懂的,关于深度学习的文章. 这份 300 页的 PPT,被搬运到了 Sli ...
- 【深度学习】李宏毅:1 天搞懂深度学习,我总结了 300 页 PPT(附思维导图)...
转载自:机器学习算法那些事 ID:Charlotte77 公众号:Charlotte数据挖掘 By Charlotte77 前言:李宏毅的教材,非常经典,B站有配套视频,文末附下载链接! ...
- 干货 | 台大“一天搞懂深度学习”课程PPT(下载方式见文末!!)
微信公众号 关键字全网搜索最新排名 [机器学习算法]:排名第一 [机器学习]:排名第一 [Python]:排名第三 [算法]:排名第四 Deep Learing Tutorial 本篇文章我们给出了一 ...
- 一文搞懂深度学习正则化的L2范数
想要彻底弄明白L2范数,必须要有一定的矩阵论知识,L2范数涉及了很多的矩阵变换.在我们进行数学公式的推到之前,我们先对L2范数有一个感性的认识. L2范数是什么? L2范数的定义其实是一个数学概念,其 ...
- [1天搞懂深度学习] 读书笔记 lecture I:Introduction of deep learning
- 通常机器学习,目的是,找到一个函数,针对任何输入:语音,图片,文字,都能够自动输出正确的结果. - 而我们可以弄一个函数集合,这个集合针对同一个猫的图片的输入,可能有多种输出,比如猫,狗,猴子等, ...
- 计算机科学CSTA,学编程,搞懂CSTA K-12计算机科学学习标准
科学信息技术逐步成为现代人生活和经济的核心.不论是为了适应频繁使用计算机的当今社会,还是为了将来的职业做好准备,学生们都必须对计算机科学原理和实践拥有一个更加清晰的理解.在人们对于信息技术教育的普及与 ...
- 一文搞懂mysql:mysql学习目录链接大全
之前学习了mysql.整理出来分享给大家. 序号 名字 1 mysql数据库入门教程(1):数据库的相关概念,存储特点,软件安装教程,数据库启动,服务端登录退出 2 mysql数据库入门教程(2):常 ...
- 搞懂深度网络初始化(Xavier and Kaiming initialization)
参数初始化就是这么一个容易被忽视的重要因素,因为不仅使用者对其重要性缺乏概念,而且这些操作都被TF.pytorch这些框架封装了,你可能不知道的是,糟糕的参数初始化是会阻碍复杂非线性系统的训练的. 本 ...
最新文章
- 清瘦的记录者: 一个比dbutils更小巧、好用的的持久化工具
- 「LibreOJ β Round #4」子集
- 客户关系管理系统部分代码实现
- 中级软考 计算机指令执行过程(取指、分析、执行)计算机重叠流水线问题
- 【转】批量删除redis中的key
- php 快速排序函数,PHP实现快速排序算法的三种方法
- 蓝桥杯 ADV-187 算法提高 勾股数
- EL表达式中fn函数
- Atitit.自定义存储引擎的接口设计 api 标准化 attilax 总结 mysql
- html代码实现全国地图分布,echarts基于canvas中国地图省市地区介绍代码
- java混淆书籍介绍,第二代Java混淆器Allatori功能介绍教程资源
- 2022凯立德导航懒人包完整版(地图包)绝对可以用
- 网络流精讲——最大流 包教包会
- Nginx+Tomcat负载均衡--win7配置详解
- 音视频开发系列-H264编码原理
- ssh免密码登录快速配置方法
- 如何查看国内sci期刊有哪些
- 数据结构第一次上机实验报告
- 如何把图片转换成文字?这几个方法或许可以帮到你
- 你不可不知的《哈利波特》秘密(四)
热门文章
- $‘\r‘: command not found,syntax error near unexpected token `$‘in\r‘‘
- 分享几个去图片水印好用的软件给你
- Python: queue.Queue
- 音乐生毕业论文有什么好的选题?
- 家庭光纤宽带有必要升级千兆双频路由器吗?
- 视频教程-Unity网络游戏架构设计-Unity3D
- 1.14 JavaScript5:常用DOM操作
- OTB数据集和VOT数据集融合跟踪算法接口示例
- 阿里云轻量服务器windows系统远程桌面无法连接?
- 执行unzip命令报错Archive: home.zip End-of-central-directory signature not found. Either this file is