实现DQN算法玩CartPole

先安装tensorflow 1.2版本和python 3.6，

接着安装：

numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl

的版本，这个是windows下的，如果linux下直接使用pip install numpy就可以了。

再接着安装scipy版本，也是windows 10下64位版本：

scipy-0.19.1-cp36-cp36m-win_amd64.whl

下载这些文件是通过网站：http://www.lfd.uci.edu/~gohlke/pythonlibs/ ，它是提供WINDOWS的版本。

接着下来，就是安装gym模块：

D:\AI\sample\tensorforce>pip install gym

它的网站连接是https://pypi.python.org/pypi/gym/0.2.0

最后，就是安装tesorforce库：

git clone git@github.com:reinforceio/tensorforce.git
cd tensorforce
pip install -e .

这样就可以运行强化学习的例子：

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json

结果输出如下：

[2017-07-24 11:05:28,928] Making new env: CartPole-v0
2017-07-24 11:05:30.056045: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.056185: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.057611: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059002: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059684: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.060401: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061129: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061744: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.406967: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 4.99GiB
2017-07-24 11:05:30.407097: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-07-24 11:05:30.408846: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: Y
2017-07-24 11:05:30.409518: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
[2017-07-24 11:05:31,317] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-24 11:05:34,728] Finished episode 50 after 16 timesteps
[2017-07-24 11:05:34,728] Episode reward: 16.0
[2017-07-24 11:05:34,729] Average of last 500 rewards: 2.6
[2017-07-24 11:05:34,730] Average of last 100 rewards: 13.0
[2017-07-24 11:05:39,224] Finished episode 100 after 81 timesteps
[2017-07-24 11:05:39,224] Episode reward: 81.0
[2017-07-24 11:05:39,226] Average of last 500 rewards: 6.49
[2017-07-24 11:05:39,226] Average of last 100 rewards: 32.45
[2017-07-24 11:05:47,185] Finished episode 150 after 177 timesteps
[2017-07-24 11:05:47,185] Episode reward: 177.0
[2017-07-24 11:05:47,186] Average of last 500 rewards: 13.364
[2017-07-24 11:05:47,187] Average of last 100 rewards: 53.82
[2017-07-24 11:06:00,082] Finished episode 200 after 136 timesteps
[2017-07-24 11:06:00,083] Episode reward: 136.0
[2017-07-24 11:06:00,084] Average of last 500 rewards: 24.414
[2017-07-24 11:06:00,085] Average of last 100 rewards: 89.62
[2017-07-24 11:06:18,207] Finished episode 250 after 144 timesteps
[2017-07-24 11:06:18,208] Episode reward: 144.0
[2017-07-24 11:06:18,209] Average of last 500 rewards: 40.096
[2017-07-24 11:06:18,210] Average of last 100 rewards: 133.66
[2017-07-24 11:06:35,440] Finished episode 300 after 143 timesteps
[2017-07-24 11:06:35,440] Episode reward: 143.0
[2017-07-24 11:06:35,442] Average of last 500 rewards: 55.014
[2017-07-24 11:06:35,443] Average of last 100 rewards: 153.0
[2017-07-24 11:06:52,119] Finished episode 350 after 169 timesteps
[2017-07-24 11:06:52,119] Episode reward: 169.0
[2017-07-24 11:06:52,120] Average of last 500 rewards: 69.586
[2017-07-24 11:06:52,121] Average of last 100 rewards: 147.45
[2017-07-24 11:07:05,441] Finished episode 400 after 73 timesteps
[2017-07-24 11:07:05,441] Episode reward: 73.0
[2017-07-24 11:07:05,441] Average of last 500 rewards: 81.13
[2017-07-24 11:07:05,441] Average of last 100 rewards: 130.58
[2017-07-24 11:07:15,715] Finished episode 450 after 123 timesteps
[2017-07-24 11:07:15,715] Episode reward: 123.0
[2017-07-24 11:07:15,716] Average of last 500 rewards: 89.798
[2017-07-24 11:07:15,716] Average of last 100 rewards: 101.06
[2017-07-24 11:07:23,885] Finished episode 500 after 51 timesteps
[2017-07-24 11:07:23,885] Episode reward: 51.0
[2017-07-24 11:07:23,886] Average of last 500 rewards: 96.714
[2017-07-24 11:07:23,886] Average of last 100 rewards: 77.92
[2017-07-24 11:07:32,642] Finished episode 550 after 62 timesteps
[2017-07-24 11:07:32,642] Episode reward: 62.0
[2017-07-24 11:07:32,643] Average of last 500 rewards: 101.236
[2017-07-24 11:07:32,643] Average of last 100 rewards: 70.19
[2017-07-24 11:07:42,235] Finished episode 600 after 53 timesteps
[2017-07-24 11:07:42,235] Episode reward: 53.0
[2017-07-24 11:07:42,236] Average of last 500 rewards: 104.336
[2017-07-24 11:07:42,237] Average of last 100 rewards: 70.56
[2017-07-24 11:07:52,850] Finished episode 650 after 72 timesteps
[2017-07-24 11:07:52,851] Episode reward: 72.0
[2017-07-24 11:07:52,851] Average of last 500 rewards: 105.88
[2017-07-24 11:07:52,852] Average of last 100 rewards: 77.04
[2017-07-24 11:08:02,817] Finished episode 700 after 80 timesteps
[2017-07-24 11:08:02,817] Episode reward: 80.0
[2017-07-24 11:08:02,818] Average of last 500 rewards: 102.986
[2017-07-24 11:08:02,818] Average of last 100 rewards: 82.87
[2017-07-24 11:08:13,726] Finished episode 750 after 77 timesteps
[2017-07-24 11:08:13,726] Episode reward: 77.0
[2017-07-24 11:08:13,727] Average of last 500 rewards: 96.038
[2017-07-24 11:08:13,727] Average of last 100 rewards: 84.45
[2017-07-24 11:08:25,828] Finished episode 800 after 59 timesteps
[2017-07-24 11:08:25,828] Episode reward: 59.0
[2017-07-24 11:08:25,829] Average of last 500 rewards: 90.658
[2017-07-24 11:08:25,829] Average of last 100 rewards: 91.36
[2017-07-24 11:08:42,711] Finished episode 850 after 177 timesteps
[2017-07-24 11:08:42,711] Episode reward: 177.0
[2017-07-24 11:08:42,712] Average of last 500 rewards: 90.34
[2017-07-24 11:08:42,712] Average of last 100 rewards: 118.96
[2017-07-24 11:09:03,141] Finished episode 900 after 167 timesteps
[2017-07-24 11:09:03,141] Episode reward: 167.0
[2017-07-24 11:09:03,142] Average of last 500 rewards: 95.556
[2017-07-24 11:09:03,142] Average of last 100 rewards: 155.07
[2017-07-24 11:09:27,219] Finished episode 950 after 200 timesteps
[2017-07-24 11:09:27,219] Episode reward: 200.0
[2017-07-24 11:09:27,220] Average of last 500 rewards: 105.26
[2017-07-24 11:09:27,220] Average of last 100 rewards: 175.66
[2017-07-24 11:09:53,139] Finished episode 1000 after 200 timesteps
[2017-07-24 11:09:53,139] Episode reward: 200.0
[2017-07-24 11:09:53,141] Average of last 500 rewards: 118.344
[2017-07-24 11:09:53,141] Average of last 100 rewards: 191.86
[2017-07-24 11:10:19,722] Finished episode 1050 after 200 timesteps
[2017-07-24 11:10:19,722] Episode reward: 200.0
[2017-07-24 11:10:19,723] Average of last 500 rewards: 131.222
[2017-07-24 11:10:19,724] Average of last 100 rewards: 200.0
[2017-07-24 11:10:45,962] Finished episode 1100 after 200 timesteps
[2017-07-24 11:10:45,963] Episode reward: 200.0
[2017-07-24 11:10:45,963] Average of last 500 rewards: 144.198
[2017-07-24 11:10:45,963] Average of last 100 rewards: 199.83

1. RPG游戏从入门到精通

http://edu.csdn.net/course/detail/5246

2. WiX安装工具的使用

http://edu.csdn.net/course/detail/5207

3. 俄罗斯方块游戏开发
http://edu.csdn.net/course/detail/5110
4. boost库入门基础
http://edu.csdn.net/course/detail/5029
5.Arduino入门基础
http://edu.csdn.net/course/detail/4931
6.Unity5.x游戏基础入门
http://edu.csdn.net/course/detail/4810
7. TensorFlow API攻略
http://edu.csdn.net/course/detail/4495
8. TensorFlow入门基本教程
http://edu.csdn.net/course/detail/4369
9. C++标准模板库从入门到精通
http://edu.csdn.net/course/detail/3324
10.跟老菜鸟学C++
http://edu.csdn.net/course/detail/2901
11. 跟老菜鸟学python
http://edu.csdn.net/course/detail/2592
12. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/2590
13. 在Windows下SVN的版本管理与实战
http://edu.csdn.net/course/detail/2579
14.Visual Studio 2015开发C++程序的基本使用
http://edu.csdn.net/course/detail/2570
15.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/2582
16.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672

实现DQN算法玩CartPole相关推荐

RL之PG：基于TF利用策略梯度算法玩Cartpole游戏实现智能得高分
RL之PG:基于TF利用策略梯度算法玩Cartpole游戏实现智能得高分目录输出结果设计思路测试过程输出结果视频观看地址:强化学习-基于TF利用策略梯度算法玩Cartpole游戏实现智能得 ...
用强化学习DQN算法玩合成大西瓜游戏！（提供Keras版本和Paddlepaddle版本）
本文禁止转载,违者必究! 用强化学习玩合成大西瓜代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learnin ...
dqn在训练过程中loss越来越大_用DQN算法玩FlappyBird
DQN算法可以用于解决离散的动作问题,而FlappyBird的操作正好是离散的. FlappyBird的游戏状态一般可以通过图像加卷积神经网络(CNN)来进行强化学习.但是通过图像分析会比较麻烦,因为 ...
用DQN强化学习算法玩“合成大西瓜”！
用DQN强化学习算法玩"合成大西瓜"! 完整代码地址: 1. 安装依赖库 2. 设置环境变量 3. 构建多层神经网络 4. 构建DQN算法.Agent和经验池 5. 创建Agent ...
动手强化学习（六）：DQN 算法
动手强化学习(六):DQN 算法 1. 简介 2. CartPole 环境 3. DQN 3.1 经验回放 3.2 目标网络 4. DQN 代码实践 5. 以图像为输入的 DQN 算法 6. 小结文 ...
PaddlePaddle版Flappy-Bird—使用DQN算法实现游戏智能
刚刚举行的 WAVE SUMMIT 2019 深度学习开发者峰会上,PaddlePaddle 发布了 PARL 1.1 版本,这一版新增了 IMPALA.A3C.A2C 等一系列并行算法.作者重新测试 ...
第1期技术: DQN算法原理及实现过程
深度强化学习实验室(DeepRLhub) 访问官网: http://deeprlhub.com 特别声明:本文是作者在充分知晓著作权细则的情况下,经过个人付出或者翻译他人著作内容,并已注明翻译原文来源 ...
RL之DQN：基于TF训练DQN模型玩“打砖块”游戏
RL之DQN:基于TF训练DQN模型玩"打砖块"游戏目录输出结果设计思路训练过程输出结果 1.test01 2.test02 设计思路训练过程后期更新--
智能城市dqn算法交通信号灯调度_博客 | 滴滴 KDD 2018 论文详解：基于强化学习技术的智能派单模型...
原标题:博客 | 滴滴 KDD 2018 论文详解:基于强化学习技术的智能派单模型国际数据挖掘领域的顶级会议 KDD 2018 在伦敦举行,今年 KDD 吸引了全球范围内共 1480 篇论文投递,共 ...

实现DQN算法玩CartPole

1. RPG游戏从入门到精通

实现DQN算法玩CartPole相关推荐

最新文章

热门文章