文章目录

  • Reinforcement learning
    • 1. Reinforcement learning introduction
      • 1.1. What is Reinforcement Learning?
      • 1.2. Mars rover example
      • 1.3. The return in Reinforcement learning
      • 1.4. Making decisions: Policies in reinforcement learning
      • 1.5. Review of key concepts
    • 2. State-action value function
      • 2.1. State-action value function definition
      • 2.2. State-action value function example
      • 2.3. Bellman Equation
      • 2.4. Random (stochastic) environment
    • 3. Continuous state spaces
      • 3.1. Example of continuous state space applications
      • 3.2. Lunar lander
      • 3.3. Learning the state-value function
      • 3.4. Algorithm refinement: Improved neural network architecture
      • 3.5. Algorithm refinement: ε-greedy policy
      • 3.6. Algorithm refinement: Mini-batch and soft update
      • 3.7. The state of reinforcement learning
  • Summary

Reinforcement learning

1. Reinforcement learning introduction

1.1. What is Reinforcement Learning?

The key idea is rather than you need to tell the algorithm what the right output y for every single input is, all you have to do instead is specify a reward function that tells it when it’s doing well and when it’s doing poorly.

1.2. Mars rover example

1.3. The return in Reinforcement learning


The first step is r 0 r^0 r0.

Select the orientation according to the first two tables

1.4. Making decisions: Policies in reinforcement learning

For example, π ( 2 ) \pi(2) π(2) is left while π ( 5 ) \pi(5) π(5) is right. The number expresses state.

1.5. Review of key concepts


2. State-action value function

2.1. State-action value function definition


The iteration will be used.

2.2. State-action value function example

2.3. Bellman Equation

Q ( s , a ) = R ( s ) + r ∗ m a x Q ( s ′ , a ′ ) Q(s,a) = R(s) + r * max Q(s^{'},a^{'}) Q(s,a)=R(s)+r∗maxQ(s′,a′)

2.4. Random (stochastic) environment

Sometimes it actually ends up accidentally slipping and going in the opposite direction.

3. Continuous state spaces

3.1. Example of continuous state space applications

Every variable is continuous.

3.2. Lunar lander

3.3. Learning the state-value function



Q is a random value at first. We will train the model to find a better Q.

3.4. Algorithm refinement: Improved neural network architecture

3.5. Algorithm refinement: ε-greedy policy

ε = 0.05

If we choose a bad ε, we may take 100 times as long.

3.6. Algorithm refinement: Mini-batch and soft update

The idea of mini-batch gradient descent is to not use all 100 million training examples on every single iteration through this loop. Instead, we may pick a smaller number, let me call it m prime equals say, 1,000. On every step, instead of using all 100 million examples, we would pick some subset of 1,000 or m prime examples.

  • Soft update
    When we set Q equals to Q n e w Q_{new} Qnew​, it can make a very abrupt change to Q.So we will adjust the parameters in Q.
    W = 0.01 ∗ W n e w + 0.99 W W = 0.01*W_{new} + 0.99 W W=0.01∗Wnew​+0.99W
    B = 0.01 ∗ B n e w + 0.99 B B = 0.01*B_{new} + 0.99 B B=0.01∗Bnew​+0.99B

3.7. The state of reinforcement learning

Summary

Machine learning week 10(Andrew Ng)相关推荐

  1. Machine Learning课程 by Andrew Ng

    大名鼎鼎的机器学习大牛Andrew Ng的Machine Learning课程,在此mark一下: 一:Coursera: https://www.coursera.org/learn/machine ...

  2. 【Machine Learning】【Andrew Ng】- Quiz2(Week 9)

    1.Suppose you run a bookstore, and have ratings (1 to 5 stars) of books. Your collaborative filterin ...

  3. 【Machine Learning】【Andrew Ng】- notes(Week 2: Computing Parameters Analytically)

    Normal Equation Gradient descent gives one way of minimizing J. Let's discuss a second way of doing ...

  4. 【Machine Learning】【Andrew Ng】- notes(Week 1: model and cost function)

    Model Representation: To describe the supervised learning problem slightly more formally, our goal i ...

  5. 【Machine Learning】【Andrew Ng】- Quiz1(Week 8)

    1.For which of the following tasks might K-means clustering be a suitable algorithm? Select all that ...

  6. Machine Learning Outline(Andrew Ng课程总结)

  7. Machine Learning week 10 quiz: Large Scale Machine Learning

    Large Scale Machine Learning 5 试题 1. Suppose you are training a logistic regression classifier using ...

  8. [导读]7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills ▪  Python The Hard Way by Zed A. Shaw ▪  Google Developers Python Course ...

  9. 【github】机器学习(Machine Learning)深度学习(Deep Learning)资料

    转自:https://github.com/ty4z2008/Qix/blob/master/dl.md# <Brief History of Machine Learning> 介绍:这 ...

最新文章

  1. Python最常用的函数、基础语句有哪些?
  2. ListView和RecyclerView的Adapter封装
  3. 发那科冲压直线搬运机器人_行业应用 | 直线七轴软件配置
  4. android OKHttp的基本使用详解
  5. Asp.net 安装包制作 (转)
  6. 【亲测有效】无法定位链接器!请检查 tools\link.ini 中的配置是否正确的解决方案
  7. ibatis.net:尽可能的使用匿名类型替换 Hashtable
  8. 解密加油优惠打折券的制作过程
  9. 六级考研单词之路-三十二
  10. 【IoT】5.Business Strategy 商业战略
  11. 世界上最复杂的函数_世界上最伟大的10个公式,其中一个人尽皆知
  12. 使用者多注意! 多件恶意软件潜入Google Play商店
  13. jdbc连接mysql数据库,设置字符集编码
  14. 排水沟槽开挖土方的计算方法(平行相似梯形组成的六面体体积分割计算方法)
  15. 跟着淘宝卖家学转化 打造超高转化率的十大狠招
  16. AFN TTP状态 412 - 前置条件失败
  17. zabbix微信告警(虚拟机脚本测试成功,zabbix上收不到信息)
  18. Machine Learning introduction
  19. 酒店智能联网门锁解决方案
  20. signature=694cde3d7f2450116894167453553a22,FIDO-U2F-Ledger 注册和登录过程中chrome和后台交互log分析...

热门文章

  1. Android设备煲机脚本工具
  2. 迭代和递归的时间复杂度分析
  3. dct变换可以用什么方法代替_什么是DCT变换?为什么要进行DCT变换?
  4. 厨房搭建好了,我们要开始做菜了!
  5. 【硬刚大数据之学习路线篇】2021年从零到大数据专家的学习指南(全面升级版)
  6. 根据简谱的matlab演奏《未闻花名》ed
  7. python之写报文2
  8. 查找数字python
  9. 一款国产自助画二次元妹子图的神器!忍不住想尝试
  10. 【信号与系统homework】信号与系统学习与实践指导1