论文

Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
Dimitri P Bertsekas and John N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
Ronald A Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.
Alessandro Lazaric, Mohammad Ghavamzadeh, and R´emi Munos. “Finite-sample analysis of least-squares policy iteration”. In: The Journal of Machine Learning Research 13 (2012), pp. 3041–3074.
Odalric-Ambrym Maillard et al. “Finite-sample analysis of Bellman residual minimization”. In: Asian Conference on Machine Learning (ACML). 2010, pp. 299–314.
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
R´emi Munos and Csaba Szepesv´ari. “Finite-time bounds for fitted value iteration”. In: Journal of Machine Learning Research 9 (2008), pp. 815–857.
Remi Munos. ´ Introduction to Reinforcement Learning and multi-armed bandits. NETADIS Summer School, 2013
Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Richard S Sutton et al. “Policy gradient methods for reinforcement learning with function approximation”. In: Advances in Neural Information Processing Systems (NeurIPS). 1999, pp. 1057–1063.
Leslie G Valiant. “A theory of the learnable”. In: Communications of the ACM 27.11 (1984), pp. 1134–1142.
Christopher John Cornish Hellaby Watkins. “Learning From Delayed Rewards”. PhD Thesis. University of Cambridge, 1989.
Ronald J. Williams and Leemon C. Baird III. Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep. NU-CCS-93-14, College of Computer Science, Northeastern University. 1993.
Ronald J Williams. “Simple statistical gradient-following algorithms for connectionist reinforcement learning”. In: Machine learning 8.3-4 (1992).
Shuang Wu and Jun Wang. Decision making and AI: a white paper. 2020.
Pan Xu and Quanquan Gu. “A finite-time analysis of q-learning with neural network function approximation”. In: arXiv preprint arXiv:1912.04511 (2019).
Zhuoran Yang, Yuchen Xie, and Zhaoran Wang. “A theoretical analysis of deep Q-learning”. In: arXiv preprint arXiv:1901.00137 (2019).

【RLchina第二讲】汪军老师推荐的强化学习理论学习资料相关推荐

了解第二部分多武装匪徒的强化学习手
系列的链接: (Series' Links:) Introduction 介绍 Multi-Armed Bandits | Notebook 多臂土匪 | 笔记本 This is the second ...
AI研习丨专题：可解释推荐的强化学习框架
2021-03-03 20:06:37 1 机器智能与智能机器让机器具备人一样的智能,赋予机器思考和推理的能力,是人类最伟大的梦想之一.早在 1948 年,图灵在题为<智能机器>的论文里 ...
人工智能AI实战100讲（五）-基于强化学习的自动化剪枝模型
1介绍文中涉及代码请参见: 人工智能AI-图像处理cv-基于强化学习的自动化裁剪目前的强化学习工作很多集中在利用外部环境的反馈训练agent,忽略了模型本身就是一种能够获得反馈的环境.本项目的核心 ...
【原创】强化学习精选资料汇总：从入门到精通，看完这些干货就够啦！
点击上方,选择星标或置顶,不定期资源大放送! 阅读大概需要8分钟 Follow小博主,每天更新前沿干货 [导读]本文为大家整理了公众号之前发过的一系列强化学习资料和学习手册,包括:强化学习视频课程.经 ...
[强化学习]-网络安全资料汇总
文章目录 Papers Surveys Demonstration papers Position papers Regular Papers PhD Theses Master Theses Bac ...
《强化学习周刊》第16期：多智能体强化学习的最新研究与应用
No.16 智源社区强化学习组强化学习研究观点资源活动关于周刊强化学习作为人工智能领域研究热点之一,多智能强化学习的研究进展与成果也引发了众多关注.为帮助研究与工程人员了解该领 ...
【重磅推荐: 强化学习课程】清华大学李升波老师《强化学习与控制》
深度强化学习实验室官网:http://www.neurondance.com/ 论坛:http://deeprl.neurondance.com/ 编辑:DeepRL <强化学习与控制> ...
强化学习最新作品：谷歌最新思想、MIT新书推荐、Sutton经典之作！
关注上方"深度学习技术前沿",选择"星标公众号", 资源干货,第一时间送达! 强化学习一直是研究热点,对于小白来说,看书是最快入门的唯一途径.本期为大家精心准备 ...
博士申请 | 北京大学AI院杨耀东老师招收强化学习博弈论实习生/博士生
合适的工作难找?最新的招聘信息也不知道? AI 求职为大家精选人工智能领域最新鲜的招聘信息,助你先人一步投递,快人一步入职! 北京大学北京大学人工智能研究院杨耀东老师课题组在强化学习,多智能体强化学 ...
推荐系统遇上深度学习(十五)--强化学习在京东推荐中的探索
强化学习在各个公司的推荐系统中已经有过探索,包括阿里.京东等.之前在美团做过的一个引导语推荐项目,背后也是基于强化学习算法.本文,我们先来看一下强化学习是如何在京东推荐中进行探索的. 本文来自于pap ...

【RLchina第二讲】汪军老师推荐的强化学习理论学习资料

文章目录

推荐书籍

论文

【RLchina第二讲】汪军老师推荐的强化学习理论学习资料相关推荐

最新文章

热门文章