from:https://zhuanlan.zhihu.com/p/23600620

作者:Alex-zhai
链接:https://zhuanlan.zhihu.com/p/23600620
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

一. 开山鼻祖DQN

1. Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.

2. Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.

二. DQN的各种改进版本(侧重于算法上的改进)

1. Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.

2. Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.

3. Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.

4. Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.

5. Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.

6. Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.

7. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.

8. Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver

9. Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.

10. State of the Art Control of Atari Games using shallow reinforcement learning

11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13更新)

12. Deep Reinforcement Learning with Averaged Target DQN(11.14更新)

三. DQN的各种改进版本(侧重于模型的改进)

1. Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.

2. Deep Attention Recurrent Q-Network

3. Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.

4. Progressive Neural Networks

5. Language Understanding for Text-based Games Using Deep Reinforcement Learning

6. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

7. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

8. Recurrent Reinforcement Learning: A Hybrid Approach

四. 基于策略梯度的深度强化学习

深度策略梯度:

1. End-to-End Training of Deep Visuomotor Policies

2. Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

3. Trust Region Policy Optimization

深度行动者评论家算法:

1. Deterministic Policy Gradient Algorithms

2. Continuous control with deep reinforcement learning

3. High-Dimensional Continuous Control Using Using Generalized Advantage Estimation

4. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

5. Deep Reinforcement Learning in Parameterized Action Space

6. Memory-based control with recurrent neural networks

7. Terrain-adaptive locomotion skills using deep reinforcement learning

8. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

9. SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)

搜索与监督:

1. End-to-End Training of Deep Visuomotor Policies

2. Interactive Control of Diverse Complex Characters with Neural Networks

连续动作空间下探索改进:

1. Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks

结合策略梯度和Q学习:

1. Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC(11.13更新)

2. PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13更新)

其它策略梯度文章:

1. Gradient Estimation Using Stochastic Computation Graphs

2. Continuous Deep Q-Learning with Model-based Acceleration

3. Benchmarking Deep Reinforcement Learning for Continuous Control

4. Learning Continuous Control Policies by Stochastic Value Gradients

五. 分层DRL

1. Deep Successor Reinforcement Learning

2. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

3. Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks

4. Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14更新)

六. DRL中的多任务和迁移学习

1. ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources

2. A Deep Hierarchical Approach to Lifelong Learning in Minecraft

3. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

4. Policy Distillation

5. Progressive Neural Networks

6. Universal Value Function Approximators

7. Multi-task learning with deep model based reinforcement learning(11.14更新)

8. Modular Multitask Reinforcement Learning with Policy Sketches (11.14更新)

七. 基于外部记忆模块的DRL模型

1. Control of Memory, Active Perception, and Action in Minecraft

2. Model-Free Episodic Control

八. DRL中探索与利用问题

1. Action-Conditional Video Prediction using Deep Networks in Atari Games

2. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

3. Deep Exploration via Bootstrapped DQN

4. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

5. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

6. Unifying Count-Based Exploration and Intrinsic Motivation

7. #Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14更新)

8. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14更新)

九. 多Agent的DRL

1. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

2. Multiagent Cooperation and Competition with Deep Reinforcement Learning

十. 逆向DRL

1. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

2. Maximum Entropy Deep Inverse Reinforcement Learning

3. Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14更新)

十一. 探索+监督学习

1. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning

2. Better Computer Go Player with Neural Network and Long-term Prediction

3. Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.

十二. 异步DRL

1. Asynchronous Methods for Deep Reinforcement Learning

2. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU(11.14更新)

十三:适用于难度较大的游戏场景

1. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.

2. Strategic Attentive Writer for Learning Macro-Actions

3. Unifying Count-Based Exploration and Intrinsic Motivation

十四:单个网络玩多个游戏

1. Policy Distillation

2. Universal Value Function Approximators

3. Learning values across many orders of magnitude

十五:德州poker

1. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

2. Fictitious Self-Play in Extensive-Form Games

3. Smooth UCT search in computer poker

十六:Doom游戏

1. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

2. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

3. Playing FPS Games with Deep Reinforcement Learning

4. LEARNING TO ACT BY PREDICTING THE FUTURE(11.13更新)

5. Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)

十七:大规模动作空间

1. Deep Reinforcement Learning in Large Discrete Action Spaces

十八:参数化连续动作空间

1. Deep Reinforcement Learning in Parameterized Action Space

十九:Deep Model

1. Learning Visual Predictive Models of Physics for Playing Billiards

2. J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv

3. Learning Continuous Control Policies by Stochastic Value Gradients

4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

5. Action-Conditional Video Prediction using Deep Networks in Atari Games

6. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

二十:DRL应用

机器人领域:

1. Trust Region Policy Optimization

2. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

3. Path Integral Guided Policy Search

4. Memory-based control with recurrent neural networks

5. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

6. Learning Deep Neural Network Policies with Continuous Memory States

7. High-Dimensional Continuous Control Using Generalized Advantage Estimation

8. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

9. End-to-End Training of Deep Visuomotor Policies

10. DeepMPC: Learning Deep Latent Features for Model Predictive Control

11. Deep Visual Foresight for Planning Robot Motion

12. Deep Reinforcement Learning for Robotic Manipulation

13. Continuous Deep Q-Learning with Model-based Acceleration

14. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

15. Asynchronous Methods for Deep Reinforcement Learning

16. Learning Continuous Control Policies by Stochastic Value Gradients

机器翻译:

1. Simultaneous Machine Translation using Deep Reinforcement Learning

目标定位:

1. Active Object Localization with Deep Reinforcement Learning

目标驱动的视觉导航:

1. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

自动调控参数:

1. Using Deep Q-Learning to Control Optimization Hyperparameters

人机对话:

1. Deep Reinforcement Learning for Dialogue Generation

2. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

3. Strategic Dialogue Management via Deep Reinforcement Learning

4. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

视频预测:

1. Action-Conditional Video Prediction using Deep Networks in Atari Games

文本到语音:

1. WaveNet: A Generative Model for Raw Audio

文本生成:

1. Generating Text with Deep Reinforcement Learning

文本游戏:

1. Language Understanding for Text-based Games Using Deep Reinforcement Learning

无线电操控和信号监控:

1. Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent

DRL来学习做物理实验:

1. LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING(11.13更新)

DRL加速收敛:

1. Deep Reinforcement Learning for Accelerating the Convergence Rate(11.14更新)

利用DRL来设计神经网络:

1. Designing Neural Network Architectures using Reinforcement Learning(11.14更新)

2. Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)

3. Neural Architecture Search with Reinforcement Learning(11.14更新)

控制信号灯:

1. Using a Deep Reinforcement Learning Agent for Traffic Signal Control(11.14更新)

二十一:其它方向

避免危险状态:

1. Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear (11.14更新)

DRL中On-Policy vs. Off-Policy 比较:

1. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning(11.14更新)

注1:小伙伴们如果觉得论文一个个下载太麻烦,可以私信我,我打包发给你。

注2:欢迎大家及时补充新的或者我疏漏的文献。

最近放出来许多2017ICLR的投稿,有不少是关于DRL的,我目前读过里面比较有意思的有:
1. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening,
2. PGQ: Combining policy gradient and Q-learning, 
3. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, 
4. Sample Efficient Actor-Critic with Experience Replay,
5. Learning to Act by Predicting the Future。
1,2,4都应用在了Atari Games上,
3,4 应用在Robotics continuous control上,
5 在 Doom Full Deathmatch track 中赢得了第一名。

深度增强学习方向论文整理相关推荐

  1. 【干货】AI顶会NeurlPS-2019强化学习方向论文速递(附链接):Github持续更新中...

    点击上方蓝色字体,关注:决策智能与机器学习,每天学点AI干货 前言 AI自媒体深度强化学习实验室对NeurIPS(前称NIPS)2019年深度强化学习方向的论文做了较为全面的整理和解读,并发布于Git ...

  2. (ICLR2019)论文阅读-使用深度增强学习框架的基于场景先验知识的视觉语义导航

    论文地址: VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS 1. 介绍   论文的目标是使用场景先验知识来改善陌生场景中未知物体的导航效果,具体地,如下图所 ...

  3. 【重磅整理】提前看287篇ICLR-2021 深度强化学习领域论文得分汇总列表

    深度强化学习实验室 来源:ICLR2021 编辑:DeepRL [1]. What Matters for On-Policy Deep Actor-Critic Methods? A Large-S ...

  4. ECCV2020超分辨率方向论文整理笔记

    ECCV2020超分辨率篇 ECCV的全称是European Conference on Computer Vision(欧洲计算机视觉国际会议) ,是计算机视觉三大顶级会议(另外两个是ICCV]和C ...

  5. 深度增强学习(DRL)简单梳理

    作者:xg123321123 - 时光杂货店 出处:http://blog.csdn.net/xg123321123/article/details/77504032 声明:版权所有,转载请联系作者并 ...

  6. ICCV2019超分辨率方向论文整理笔记

    ICCV2019超分辨率篇 IEEE International Conference on Computer Vision,即国际计算机视觉大会,与计算机视觉模式识别会议(CVPR))和欧洲计算机视 ...

  7. 【重磅最新】163篇ICML-2021强化学习领域论文整理汇总(2021.06.07)

    深度强化学习实验室 官网:http://www.neurondance.com/ 论坛:http://deeprl.neurondance.com/ 作者:深度强化学习实验室 来源:整理自https: ...

  8. 深度增强学习(DRL)漫谈 - 从DQN到AlphaGo

    深度增强学习(Deep reinforcement learning, DRL)是DeepMind(后被谷歌收购)近几近来重点研究且发扬光大的机器学习算法框架.两篇Nature上的奠基性论文(DQN和 ...

  9. 深度增强学习(DRL)漫谈 - 信赖域(Trust Region)系方法

    一.背景 深度学习的兴起让增强学习这个古老的机器学习分支迎来一轮复兴.它们的结合领域-深度增强学习(Deep reinforcement learning, DRL)随着在一系列极具挑战的控制实验场景 ...

最新文章

  1. Android上的MVP:如何组织显示层的内容
  2. 产品经理必备知识之网页设计系列(二)-如何设计出一个优秀的界面
  3. hdu 1558(线段相交+并查集)
  4. python 处理 excel格式文件
  5. python django admin.site.register注册应用
  6. VTK:图片之ImageLuminance
  7. c#中代码中多线程动态创建progressbar的实例,概念很重要可扩展很多类似概念
  8. 十六进制字符串转整形
  9. 【Arthas】Arthas 类查找和反编译原理
  10. 在线node服务器,如何将你的node服务放到线上服务器
  11. linux进程(fork,waitpid)
  12. 利用通用权限管理系统底层解决数据从不同库的导入导出问题
  13. mysql 5.6.26 驱动_mysql版本引起的驱动问题
  14. Silverlight 设置DataGrid中行的提示信息
  15. android 输入支付密码错误,Android 支付宝支付密码输入界面
  16. 如何配置filezilla服务端和客户端
  17. Sogou input method on Ubuntu
  18. Uipath鼠标单击扩展教程
  19. 《行为设计学》听后感及听书笔记
  20. 7-4 查询水果价格 PTA

热门文章

  1. OpenCV中# define CV_EXPORTS __declspec(dllexport)的含义
  2. JAVA操作properties文件
  3. DPDK 初识DPDK(十五)
  4. WePy 整合云开发
  5. 学号:201621123032 《Java程序设计》第6周学习总结
  6. 5G之争,到底争些什么?
  7. Zabbix-3.0.3实现微信(WeChat)告警
  8. FSM状态机之状态模式
  9. HDU ACM 1267 下沙的沙子有几粒?-gt;DP
  10. Windows2012R2 Hyper-v3.0 高可用群集安装及配置(Live Migration)