先安装tensorflow 1.2版本和python 3.6,

接着安装:

numpy-1.13.1+mkl-cp36-cp36m-win_amd64.whl

的版本,这个是windows下的,如果linux下直接使用pip install numpy就可以了。

再接着安装scipy版本,也是windows 10下64位版本:

scipy-0.19.1-cp36-cp36m-win_amd64.whl

下载这些文件是通过网站:http://www.lfd.uci.edu/~gohlke/pythonlibs/  ,它是提供WINDOWS的版本。

接着下来,就是安装gym模块:

D:\AI\sample\tensorforce>pip install gym

它的网站连接是https://pypi.python.org/pypi/gym/0.2.0

最后,就是安装tesorforce库:

git clone git@github.com:reinforceio/tensorforce.git
cd tensorforce
pip install -e .

这样就可以运行强化学习的例子:

python examples/openai_gym.py CartPole-v0 -a TRPOAgent -c examples/configs/trpo_cartpole.json -n examples/configs/trpo_cartpole_network.json

结果输出如下:

[2017-07-24 11:05:28,928] Making new env: CartPole-v0
2017-07-24 11:05:30.056045: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.056185: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.057611: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059002: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.059684: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.060401: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061129: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.061744: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-24 11:05:30.406967: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 4.99GiB
2017-07-24 11:05:30.407097: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-07-24 11:05:30.408846: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y
2017-07-24 11:05:30.409518: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
[2017-07-24 11:05:31,317] Starting TRPOAgent for Environment 'OpenAIGym(CartPole-v0)'
[2017-07-24 11:05:34,728] Finished episode 50 after 16 timesteps
[2017-07-24 11:05:34,728] Episode reward: 16.0
[2017-07-24 11:05:34,729] Average of last 500 rewards: 2.6
[2017-07-24 11:05:34,730] Average of last 100 rewards: 13.0
[2017-07-24 11:05:39,224] Finished episode 100 after 81 timesteps
[2017-07-24 11:05:39,224] Episode reward: 81.0
[2017-07-24 11:05:39,226] Average of last 500 rewards: 6.49
[2017-07-24 11:05:39,226] Average of last 100 rewards: 32.45
[2017-07-24 11:05:47,185] Finished episode 150 after 177 timesteps
[2017-07-24 11:05:47,185] Episode reward: 177.0
[2017-07-24 11:05:47,186] Average of last 500 rewards: 13.364
[2017-07-24 11:05:47,187] Average of last 100 rewards: 53.82
[2017-07-24 11:06:00,082] Finished episode 200 after 136 timesteps
[2017-07-24 11:06:00,083] Episode reward: 136.0
[2017-07-24 11:06:00,084] Average of last 500 rewards: 24.414
[2017-07-24 11:06:00,085] Average of last 100 rewards: 89.62
[2017-07-24 11:06:18,207] Finished episode 250 after 144 timesteps
[2017-07-24 11:06:18,208] Episode reward: 144.0
[2017-07-24 11:06:18,209] Average of last 500 rewards: 40.096
[2017-07-24 11:06:18,210] Average of last 100 rewards: 133.66
[2017-07-24 11:06:35,440] Finished episode 300 after 143 timesteps
[2017-07-24 11:06:35,440] Episode reward: 143.0
[2017-07-24 11:06:35,442] Average of last 500 rewards: 55.014
[2017-07-24 11:06:35,443] Average of last 100 rewards: 153.0
[2017-07-24 11:06:52,119] Finished episode 350 after 169 timesteps
[2017-07-24 11:06:52,119] Episode reward: 169.0
[2017-07-24 11:06:52,120] Average of last 500 rewards: 69.586
[2017-07-24 11:06:52,121] Average of last 100 rewards: 147.45
[2017-07-24 11:07:05,441] Finished episode 400 after 73 timesteps
[2017-07-24 11:07:05,441] Episode reward: 73.0
[2017-07-24 11:07:05,441] Average of last 500 rewards: 81.13
[2017-07-24 11:07:05,441] Average of last 100 rewards: 130.58
[2017-07-24 11:07:15,715] Finished episode 450 after 123 timesteps
[2017-07-24 11:07:15,715] Episode reward: 123.0
[2017-07-24 11:07:15,716] Average of last 500 rewards: 89.798
[2017-07-24 11:07:15,716] Average of last 100 rewards: 101.06
[2017-07-24 11:07:23,885] Finished episode 500 after 51 timesteps
[2017-07-24 11:07:23,885] Episode reward: 51.0
[2017-07-24 11:07:23,886] Average of last 500 rewards: 96.714
[2017-07-24 11:07:23,886] Average of last 100 rewards: 77.92
[2017-07-24 11:07:32,642] Finished episode 550 after 62 timesteps
[2017-07-24 11:07:32,642] Episode reward: 62.0
[2017-07-24 11:07:32,643] Average of last 500 rewards: 101.236
[2017-07-24 11:07:32,643] Average of last 100 rewards: 70.19
[2017-07-24 11:07:42,235] Finished episode 600 after 53 timesteps
[2017-07-24 11:07:42,235] Episode reward: 53.0
[2017-07-24 11:07:42,236] Average of last 500 rewards: 104.336
[2017-07-24 11:07:42,237] Average of last 100 rewards: 70.56
[2017-07-24 11:07:52,850] Finished episode 650 after 72 timesteps
[2017-07-24 11:07:52,851] Episode reward: 72.0
[2017-07-24 11:07:52,851] Average of last 500 rewards: 105.88
[2017-07-24 11:07:52,852] Average of last 100 rewards: 77.04
[2017-07-24 11:08:02,817] Finished episode 700 after 80 timesteps
[2017-07-24 11:08:02,817] Episode reward: 80.0
[2017-07-24 11:08:02,818] Average of last 500 rewards: 102.986
[2017-07-24 11:08:02,818] Average of last 100 rewards: 82.87
[2017-07-24 11:08:13,726] Finished episode 750 after 77 timesteps
[2017-07-24 11:08:13,726] Episode reward: 77.0
[2017-07-24 11:08:13,727] Average of last 500 rewards: 96.038
[2017-07-24 11:08:13,727] Average of last 100 rewards: 84.45
[2017-07-24 11:08:25,828] Finished episode 800 after 59 timesteps
[2017-07-24 11:08:25,828] Episode reward: 59.0
[2017-07-24 11:08:25,829] Average of last 500 rewards: 90.658
[2017-07-24 11:08:25,829] Average of last 100 rewards: 91.36
[2017-07-24 11:08:42,711] Finished episode 850 after 177 timesteps
[2017-07-24 11:08:42,711] Episode reward: 177.0
[2017-07-24 11:08:42,712] Average of last 500 rewards: 90.34
[2017-07-24 11:08:42,712] Average of last 100 rewards: 118.96
[2017-07-24 11:09:03,141] Finished episode 900 after 167 timesteps
[2017-07-24 11:09:03,141] Episode reward: 167.0
[2017-07-24 11:09:03,142] Average of last 500 rewards: 95.556
[2017-07-24 11:09:03,142] Average of last 100 rewards: 155.07
[2017-07-24 11:09:27,219] Finished episode 950 after 200 timesteps
[2017-07-24 11:09:27,219] Episode reward: 200.0
[2017-07-24 11:09:27,220] Average of last 500 rewards: 105.26
[2017-07-24 11:09:27,220] Average of last 100 rewards: 175.66
[2017-07-24 11:09:53,139] Finished episode 1000 after 200 timesteps
[2017-07-24 11:09:53,139] Episode reward: 200.0
[2017-07-24 11:09:53,141] Average of last 500 rewards: 118.344
[2017-07-24 11:09:53,141] Average of last 100 rewards: 191.86
[2017-07-24 11:10:19,722] Finished episode 1050 after 200 timesteps
[2017-07-24 11:10:19,722] Episode reward: 200.0
[2017-07-24 11:10:19,723] Average of last 500 rewards: 131.222
[2017-07-24 11:10:19,724] Average of last 100 rewards: 200.0
[2017-07-24 11:10:45,962] Finished episode 1100 after 200 timesteps
[2017-07-24 11:10:45,963] Episode reward: 200.0
[2017-07-24 11:10:45,963] Average of last 500 rewards: 144.198
[2017-07-24 11:10:45,963] Average of last 100 rewards: 199.83

1. RPG游戏从入门到精通

http://edu.csdn.net/course/detail/5246
2. WiX安装工具的使用
http://edu.csdn.net/course/detail/5207

3. 俄罗斯方块游戏开发
http://edu.csdn.net/course/detail/5110
4. boost库入门基础
http://edu.csdn.net/course/detail/5029
5.Arduino入门基础
http://edu.csdn.net/course/detail/4931
6.Unity5.x游戏基础入门
http://edu.csdn.net/course/detail/4810
7. TensorFlow API攻略
http://edu.csdn.net/course/detail/4495
8. TensorFlow入门基本教程
http://edu.csdn.net/course/detail/4369
9. C++标准模板库从入门到精通 
http://edu.csdn.net/course/detail/3324
10.跟老菜鸟学C++
http://edu.csdn.net/course/detail/2901
11. 跟老菜鸟学python
http://edu.csdn.net/course/detail/2592
12. 在VC2015里学会使用tinyxml库
http://edu.csdn.net/course/detail/2590
13. 在Windows下SVN的版本管理与实战 
http://edu.csdn.net/course/detail/2579
14.Visual Studio 2015开发C++程序的基本使用 
http://edu.csdn.net/course/detail/2570
15.在VC2015里使用protobuf协议
http://edu.csdn.net/course/detail/2582
16.在VC2015里学会使用MySQL数据库
http://edu.csdn.net/course/detail/2672

实现DQN算法玩CartPole相关推荐

  1. RL之PG:基于TF利用策略梯度算法玩Cartpole游戏实现智能得高分

    RL之PG:基于TF利用策略梯度算法玩Cartpole游戏实现智能得高分 目录 输出结果 设计思路 测试过程 输出结果 视频观看地址:强化学习-基于TF利用策略梯度算法玩Cartpole游戏实现智能得 ...

  2. 用强化学习DQN算法玩合成大西瓜游戏!(提供Keras版本和Paddlepaddle版本)

    本文禁止转载,违者必究! 用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learnin ...

  3. dqn在训练过程中loss越来越大_用DQN算法玩FlappyBird

    DQN算法可以用于解决离散的动作问题,而FlappyBird的操作正好是离散的. FlappyBird的游戏状态一般可以通过图像加卷积神经网络(CNN)来进行强化学习.但是通过图像分析会比较麻烦,因为 ...

  4. 用DQN强化学习算法玩“合成大西瓜”!

    用DQN强化学习算法玩"合成大西瓜"! 完整代码地址: 1. 安装依赖库 2. 设置环境变量 3. 构建多层神经网络 4. 构建DQN算法.Agent和经验池 5. 创建Agent ...

  5. 动手强化学习(六):DQN 算法

    动手强化学习(六):DQN 算法 1. 简介 2. CartPole 环境 3. DQN 3.1 经验回放 3.2 目标网络 4. DQN 代码实践 5. 以图像为输入的 DQN 算法 6. 小结 文 ...

  6. PaddlePaddle版Flappy-Bird—使用DQN算法实现游戏智能

    刚刚举行的 WAVE SUMMIT 2019 深度学习开发者峰会上,PaddlePaddle 发布了 PARL 1.1 版本,这一版新增了 IMPALA.A3C.A2C 等一系列并行算法.作者重新测试 ...

  7. 第1期技术: DQN算法原理及实现过程

    深度强化学习实验室(DeepRLhub) 访问官网: http://deeprlhub.com 特别声明:本文是作者在充分知晓著作权细则的情况下,经过个人付出或者翻译他人著作内容,并已注明翻译原文来源 ...

  8. RL之DQN:基于TF训练DQN模型玩“打砖块”游戏

    RL之DQN:基于TF训练DQN模型玩"打砖块"游戏 目录 输出结果 设计思路 训练过程 输出结果 1.test01 2.test02 设计思路 训练过程 后期更新--

  9. 智能城市dqn算法交通信号灯调度_博客 | 滴滴 KDD 2018 论文详解:基于强化学习技术的智能派单模型...

    原标题:博客 | 滴滴 KDD 2018 论文详解:基于强化学习技术的智能派单模型 国际数据挖掘领域的顶级会议 KDD 2018 在伦敦举行,今年 KDD 吸引了全球范围内共 1480 篇论文投递,共 ...

最新文章

  1. 洛奇6里很喜欢的一段话!洛奇6经典台词!而是你能挨多重,并且坚持向前,你能承受多少并且坚持向前,这样才叫胜利!
  2. HDU-1789-Doing Homework again
  3. CG CTF WEB 变量覆盖
  4. 入门启发:音视频的简单理解
  5. java反射学习(1):反射的基本操作
  6. 【机器学习实战】第4章 朴素贝叶斯(Naive Bayes)
  7. Visual Studio Code 保存代码时报Applying code action Organize Imports
  8. php 实现同一个账号同时只能一个人登录
  9. content-type对照表
  10. Java数字分类给定一系列正整数,请按要求对数字进行分类,并输出以下5个数字:A1 = 能被5整除的数字中所有偶数的和;A2 = 将被5除后余1的数字按给出顺序进行交错求和,即计算n1-n2+n3
  11. 意外收获字节跳动内部资料,已开源
  12. 《软件项目管理(第二版)》第 9 章——项目监督与控制 重点部分总结
  13. vue开发 - 将方法绑定到window对象,给app端调用
  14. java 循环笔记_Java笔记之嵌套循环1
  15. 基于STM32C8T6F103实现串口通信
  16. 傻妞sillyGirl教程
  17. 将阿拉伯数字 翻译为 罗马数字
  18. CentOS部署单机Presto
  19. chmod 777的含义
  20. git使用大全,强大的项目管理工具

热门文章

  1. android自动登录简书,Android 自动登录——持久化Cookie
  2. n个评委给m个选手打分python_n个评委为m个选手打分(n个评委打分总次数mn)。请问如何评判m个选手的成绩?...
  3. SAP 可配置BOM创建
  4. 12 图浅析人口分布对经济趋势的影响
  5. linux镜像文件太大不好下载_这是什么神仙系统?支持安卓程序 + Windows 程序 + Linux 程序...
  6. 初识iPhone基带通讯
  7. ai的预览模式切换_ai预览缩略图插件 在资源管理器中预览ai文件和eps文件
  8. 对那些家庭经济特别艰难的学生
  9. 属性动画、帧动画、补间动画的介绍使用及对比
  10. websocket连接不成功的原因