目录

1.目的

1.1 记录一下,方便下次直接查看博客

2.参考

1.我的博客

3.注意

4. 课时 12 : 109 - 随机Target的位置并收集观察结果

5.  课时 13 : 110 - 收集观察结果完成前期训练准备

6.  课时 14 : 111 - 让红球可以一直吃到绿球

7. 课时 15 : 112 - 开始训练模型

7.1 解决报错:已解决

7.2 解决报错:成功

8. 课时 16 : 113 - 完成模型的训练:


1.目的

1.1 记录一下,方便下次直接查看博客

2.参考

1.我的博客

快速入门Unity机器学习:一:_Smart_zy的博客-CSDN博客目录1.目的1.1 记录一下2.参考1.SIKI学院3.注意4.课时 3 : 100 - Unity机器学习案例下载5.课时 4 : 101 - 狗子的学习6.课时 5 : 102 - 安装Anaconda并且创建环境6.1 安装Anaconda6.2 运行Anaconda6.3 搜索 mlagents1.目的1.1 记录一下2.参考1.SIKI学院登录 - SiKi学院 - 生命不息,学习不止!siki老师的Unity3D专...https://blog.csdn.net/qq_40544338/article/details/124746037

https://blog.csdn.net/qq_40544338/article/details/124763962?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22124763962%22%2C%22source%22%3A%22qq_40544338%22%7D&ctrtid=MNxBWhttps://blog.csdn.net/qq_40544338/article/details/124763962?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22124763962%22%2C%22source%22%3A%22qq_40544338%22%7D&ctrtid=MNxBW

3.注意

版本

  • windows 10 家庭版
  • Anaconda Navigator 1.9.7
  • Unity 2019.3.15f1

4. 课时 12 : 109 - 随机Target的位置并收集观察结果

using System.Collections;
using System.Collections.Generic;
using UnityEngine;using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{/// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>public Transform target;/// <summary>rBody【105Agent里面的四个函数:添加】</summary>private Rigidbody rBody;/// <summary>rBody【106手动操作智能体:添加】</summary>private float speed = 10;void Start(){rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】}/// <summary>/// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>public override void OnEpisodeBegin(){print("OnEpisodeBegin");//重新开始,设置小球的初始位置this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】}/// <summary>/// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="sensor"></param>public override void CollectObservations(VectorSensor sensor){base.CollectObservations(sensor);}/// <summary>/// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="vectorAction"></param>public override void OnActionReceived(float[] vectorAction){print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】Vector3 control = Vector3.zero;//【106手动操作智能体:添加】control.x = vectorAction[0];//【106手动操作智能体:添加】control.z = vectorAction[1];//【106手动操作智能体:添加】rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】//狗子出界了,使用y去判断if (this.transform.position.y<0){EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  }//狗子吃到东西了float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 if (distance<1.41f){SetReward(1);//给定奖励【108设置智能体奖励:添加】 EndEpisode();//给定奖励【108设置智能体奖励:添加】 }}/// <summary>/// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="actionsOut"></param>public override void Heuristic(float[] actionsOut){//拿到水平和垂直方向actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】}}

5.  课时 13 : 110 - 收集观察结果完成前期训练准备

一共观察了8个float数值

观察的数值,再神经网络输入作为1个向量

Discrete:可列的(可以列举出来的)

Continous:连续的

Default:机器人自己动

Heuristic Only:人为控制

using System.Collections;
using System.Collections.Generic;
using UnityEngine;using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{/// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>public Transform target;/// <summary>rBody【105Agent里面的四个函数:添加】</summary>private Rigidbody rBody;/// <summary>rBody【106手动操作智能体:添加】</summary>private float speed = 10;void Start(){rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】}/// <summary>/// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>public override void OnEpisodeBegin(){print("OnEpisodeBegin");//重新开始,设置小球的初始位置this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】}/// <summary>/// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="sensor"></param>public override void CollectObservations(VectorSensor sensor){//一共观察了 8个float值//2个坐标(当前位置的坐标,Target的坐标)sensor.AddObservation(target.position);//【110收集观察结果完成前期训练准备:添加】sensor.AddObservation(this.transform.position);//【110收集观察结果完成前期训练准备:添加】//2个速度(x和z的速度)sensor.AddObservation(rBody.velocity.x);//【110收集观察结果完成前期训练准备:添加】sensor.AddObservation(rBody.velocity.z);//【110收集观察结果完成前期训练准备:添加】}/// <summary>/// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="vectorAction"></param>public override void OnActionReceived(float[] vectorAction){print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】Vector3 control = Vector3.zero;//【106手动操作智能体:添加】control.x = vectorAction[0];//【106手动操作智能体:添加】control.z = vectorAction[1];//【106手动操作智能体:添加】rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】//狗子出界了,使用y去判断if (this.transform.position.y<0){EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  }//狗子吃到东西了float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 if (distance<1.41f){SetReward(1);//给定奖励【108设置智能体奖励:添加】 EndEpisode();//给定奖励【108设置智能体奖励:添加】 }}/// <summary>/// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="actionsOut"></param>public override void Heuristic(float[] actionsOut){//拿到水平和垂直方向actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】}}

6.  课时 14 : 111 - 让红球可以一直吃到绿球

using System.Collections;
using System.Collections.Generic;
using UnityEngine;using Unity.MLAgents;//【104创建场景:添加】
using Unity.MLAgents.Sensors;//【104创建场景:添加】/// <summary>
/// 【function:球代理】【Time:2022 05 14】【104创建场景:添加】
/// </summary>
public class MyRollerAgent : Agent
{/// <summary>目标的坐标【105Agent里面的四个函数:添加】</summary>public Transform target;/// <summary>rBody【105Agent里面的四个函数:添加】</summary>private Rigidbody rBody;/// <summary>rBody【106手动操作智能体:添加】</summary>private float speed = 10;void Start(){rBody = GetComponent<Rigidbody>();//【105Agent里面的四个函数:添加】}/// <summary>/// 【function:进入新的一轮时候调用的函数】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>public override void OnEpisodeBegin(){//print("OnEpisodeBegin");//只有当小球掉落的时候,才去重置小球的位置(目的是让小球一直可以吃到小球)if (this.transform.position.y<0){//重新开始,设置小球的初始位置this.transform.position = new Vector3(0, 0.5f, 0);//【107重置游戏的函数:添加】this.rBody.velocity = Vector3.zero;//速度【107重置游戏的函数:添加】this.rBody.angularVelocity = Vector3.zero;//旋转【107重置游戏的函数:添加】}target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);//随机target的位置【109随机Target的位置并收集观察结果:添加】}/// <summary>/// 【function:收集观察的结果】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="sensor"></param>public override void CollectObservations(VectorSensor sensor){//一共观察了 8个float值//2个坐标(当前位置的坐标,Target的坐标)sensor.AddObservation(target.position);//【110收集观察结果完成前期训练准备:添加】sensor.AddObservation(this.transform.position);//【110收集观察结果完成前期训练准备:添加】//2个速度(x和z的速度)sensor.AddObservation(rBody.velocity.x);//【110收集观察结果完成前期训练准备:添加】sensor.AddObservation(rBody.velocity.z);//【110收集观察结果完成前期训练准备:添加】}/// <summary>/// 【function:接受动作,是否给予奖励】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="vectorAction"></param>public override void OnActionReceived(float[] vectorAction){//print("Horizontal:"+vectorAction[0]);//【106手动操作智能体:添加】//print("Vertical:"+vectorAction[1]);//【106手动操作智能体:添加】Vector3 control = Vector3.zero;//【106手动操作智能体:添加】control.x = vectorAction[0];//【106手动操作智能体:添加】control.z = vectorAction[1];//【106手动操作智能体:添加】rBody.AddForce(control * speed);//移动小球【106手动操作智能体:添加】//狗子出界了,使用y去判断if (this.transform.position.y<0){EndEpisode();//结束这一轮的测试【108设置智能体奖励:添加】  }//狗子吃到东西了float distance = Vector3.Distance(this.transform.position, target.position);//给定奖励【108设置智能体奖励:添加】 if (distance<1.41f){SetReward(1);//给定奖励【108设置智能体奖励:添加】 EndEpisode();//给定奖励【108设置智能体奖励:添加】 }}/// <summary>/// 【function:手动操作智能体】【Time:2022 05 14】【105Agent里面的四个函数:添加】/// </summary>/// <param name="actionsOut"></param>public override void Heuristic(float[] actionsOut){//拿到水平和垂直方向actionsOut[0] = Input.GetAxis("Horizontal");//【106手动操作智能体:添加】actionsOut[1] = Input.GetAxis("Vertical");//【106手动操作智能体:添加】}}

behaviors:RollerBall:trainer_type: ppohyperparameters:batch_size: 64buffer_size: 12000learning_rate: 0.0003beta: 0.001epsilon: 0.2lambd: 0.99num_epoch: 3learning_rate_schedule: linearnetwork_settings:normalize: truehidden_units: 128num_layers: 2vis_encode_type: simplereward_signals:extrinsic:gamma: 0.99strength: 1.0keep_checkpoints: 5max_steps: 300000time_horizon: 1000summary_freq: 1000threaded: true

基础环境修改

7. 课时 15 : 112 - 开始训练模型

输入以下语句,为了按照config.yaml进行训练

但是我的报错

activate unity_py_3.6_siki
D:
cd D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train
mlagents-learn config.yaml

7.1 解决报错:已解决

参考:

Unity ML-Agents遇到A compatible version of PyTorch was not install解决方法_YZW*威的博客-CSDN博客使用中报错:a compatible version of PyTorch was not install原因是没有安装torch,下面进行安装。打开 pytorch.org 官网,https://pytorch.org/get-started/locally/,安装如下:复制最下面的命令pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whhttps://blog.csdn.net/weixin_44813895/article/details/110312591

原因是没有安装torch,下面进行安装。

pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

下载成功

再次开始

mlagents-learn config.yaml

点击unity编辑器开始按钮就可以

7.2 解决报错:成功

无法正常运行

mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found. Either specify a new run ID, use --resume to resume this run, or use the --force parameter to overwrite existing data.

Unity报错如下

猜测可能是 tensorflow没有下载

输入以下命令行进行下载

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==2.2.0

还是失败

重新输入

mlagents-learn config.yaml --resume

然后点击unity的play,如果有报错,就命令行在输入一次,Unity中再点击一次,多尝试几次,如下图,我就成功了


(base) C:\Users\Lenovo>activate unity_py_3.6_siki(unity_py_3.6_siki) C:\Users\Lenovo>D:(unity_py_3.6_siki) D:\>cd D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml┐  ╖╓╖╬│╡  ││╬╖╖╓╖╬│││││┘  ╬│││││╬╖╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜╙╬╬╬╣╣╣╜╙Version information:ml-agents: 0.28.0,ml-agents-envs: 0.28.0,Communicator API: 1.5.0,PyTorch: 1.7.0+cpu
Traceback (most recent call last):File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main"__main__", mod_spec)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_codeexec(code, run_globals)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in mainrun_cli(parse_command_line())File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_clirun_training(run_seed, options, num_areas)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 75, in run_trainingcheckpoint_settings.maybe_init_path,File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\directory_utils.py", line 26, in validate_existing_directories"Previous data from this run ID was found. "
mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found. Either specify a new run ID, use --resume to resume this run, or use the --force parameter to overwrite existing data.(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>--resume
'--resume' 不是内部或外部命令,也不是可运行的程序
或批处理文件。(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml -resume
usage: mlagents-learn.exe [-h] [--env ENV_PATH] [--resume] [--deterministic][--force] [--run-id RUN_ID][--initialize-from RUN_ID] [--seed SEED][--inference] [--base-port BASE_PORT][--num-envs NUM_ENVS] [--num-areas NUM_AREAS][--debug] [--env-args ...][--max-lifetime-restarts MAX_LIFETIME_RESTARTS][--restarts-rate-limit-n RESTARTS_RATE_LIMIT_N][--restarts-rate-limit-period-s RESTARTS_RATE_LIMIT_PERIOD_S][--torch] [--tensorflow] [--results-dir RESULTS_DIR][--width WIDTH] [--height HEIGHT][--quality-level QUALITY_LEVEL][--time-scale TIME_SCALE][--target-frame-rate TARGET_FRAME_RATE][--capture-frame-rate CAPTURE_FRAME_RATE][--no-graphics] [--torch-device DEVICE][trainer_config_path]
mlagents-learn.exe: error: unrecognized arguments: -resume(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume┐  ╖╓╖╬│╡  ││╬╖╖╓╖╬│││││┘  ╬│││││╬╖╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜╙╬╬╬╣╣╣╜╙Version information:ml-agents: 0.28.0,ml-agents-envs: 0.28.0,Communicator API: 1.5.0,PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
Traceback (most recent call last):File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main"__main__", mod_spec)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_codeexec(code, run_globals)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in mainrun_cli(parse_command_line())File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_clirun_training(run_seed, options, num_areas)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_trainingtc.start_learning(env_manager)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrappedreturn func(*args, **kwargs)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learningself._reset_env(env_manager)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrappedreturn func(*args, **kwargs)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 105, in _reset_envenv_manager.reset(config=new_config)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\env_manager.py", line 68, in resetself.first_step_infos = self._reset_env(config)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 446, in _reset_envew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 101, in recvraise env_exception
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :The environment does not need user interaction to launchThe Agents' Behavior Parameters > Behavior Type is set to "Default"The environment and the Python interface have compatible versions.If you're running on a headless server without graphics support, turn off display by either passing --no-graphics option or build your Unity executable as server build.(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume┐  ╖╓╖╬│╡  ││╬╖╖╓╖╬│││││┘  ╬│││││╬╖╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜╙╬╬╬╣╣╣╜╙Version information:ml-agents: 0.28.0,ml-agents-envs: 0.28.0,Communicator API: 1.5.0,PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 1.2.0-preview and communication version 1.0.0
[INFO] Connected new brain: RollerBall?team=0
2022-05-16 17:10:06.687745: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-05-16 17:10:06.688146: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO] Hyperparameters for behavior name RollerBall:trainer_type:   ppohyperparameters:batch_size:   64buffer_size:  12000learning_rate:        0.0003beta: 0.001epsilon:      0.2lambd:        0.99num_epoch:    3learning_rate_schedule:       linearbeta_schedule:        linearepsilon_schedule:     linearnetwork_settings:normalize:    Truehidden_units: 128num_layers:   2vis_encode_type:      simplememory:       Nonegoal_conditioning_type:       hyperdeterministic:        Falsereward_signals:extrinsic:gamma:      0.99strength:   1.0network_settings:normalize:        Falsehidden_units:     128num_layers:       2vis_encode_type:  simplememory:   Nonegoal_conditioning_type:   hyperdeterministic:    Falseinit_path:      Nonekeep_checkpoints:       5checkpoint_interval:    500000max_steps:      300000time_horizon:   1000summary_freq:   1000threaded:       Trueself_play:      Nonebehavioral_cloning:     None
[INFO] Resuming from results\ppo\RollerBall.
[INFO] Exported results\ppo\RollerBall\RollerBall-0.onnx
[INFO] Copied results\ppo\RollerBall\RollerBall-0.onnx to results\ppo\RollerBall.onnx.
Traceback (most recent call last):File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 193, in _run_module_as_main"__main__", mod_spec)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\runpy.py", line 85, in _run_codeexec(code, run_globals)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 260, in mainrun_cli(parse_command_line())File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 256, in run_clirun_training(run_seed, options, num_areas)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\learn.py", line 132, in run_trainingtc.start_learning(env_manager)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrappedreturn func(*args, **kwargs)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 173, in start_learningself._reset_env(env_manager)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents_envs\timers.py", line 305, in wrappedreturn func(*args, **kwargs)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 107, in _reset_envself._register_new_behaviors(env_manager, env_manager.first_step_infos)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviorsself._create_trainers_and_managers(env_manager, new_behavior_ids)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managersself._create_trainer_and_manager(env_manager, behavior_id)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\trainer_controller.py", line 142, in _create_trainer_and_managertrainer.add_policy(parsed_behavior_id, policy)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\ppo\trainer.py", line 265, in add_policyself.model_saver.initialize_or_load()File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\model_saver\torch_model_saver.py", line 82, in initialize_or_loadreset_global_steps=reset_steps,File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\mlagents\trainers\model_saver\torch_model_saver.py", line 91, in _load_modelsaved_state_dict = torch.load(load_path)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 581, in loadwith _open_file_like(f, 'rb') as opened_file:File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 230, in _open_file_likereturn _open_file(name_or_buffer, mode)File "D:\Software\Anaconda3\Exe\envs\unity_py_3.6_siki\lib\site-packages\torch\serialization.py", line 211, in __init__super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'results\\ppo\\RollerBall\\checkpoint.pt'(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>mlagents-learn config.yaml --resume┐  ╖╓╖╬│╡  ││╬╖╖╓╖╬│││││┘  ╬│││││╬╖╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜╙╬╬╬╣╣╣╜╙Version information:ml-agents: 0.28.0,ml-agents-envs: 0.28.0,Communicator API: 1.5.0,PyTorch: 1.7.0+cpu
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 1.2.0-preview and communication version 1.0.0
[INFO] Connected new brain: RollerBall?team=0
2022-05-16 17:11:01.472214: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-05-16 17:11:01.472336: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO] Hyperparameters for behavior name RollerBall:trainer_type:   ppohyperparameters:batch_size:   64buffer_size:  12000learning_rate:        0.0003beta: 0.001epsilon:      0.2lambd:        0.99num_epoch:    3learning_rate_schedule:       linearbeta_schedule:        linearepsilon_schedule:     linearnetwork_settings:normalize:    Truehidden_units: 128num_layers:   2vis_encode_type:      simplememory:       Nonegoal_conditioning_type:       hyperdeterministic:        Falsereward_signals:extrinsic:gamma:      0.99strength:   1.0network_settings:normalize:        Falsehidden_units:     128num_layers:       2vis_encode_type:  simplememory:   Nonegoal_conditioning_type:   hyperdeterministic:    Falseinit_path:      Nonekeep_checkpoints:       5checkpoint_interval:    500000max_steps:      300000time_horizon:   1000summary_freq:   1000threaded:       Trueself_play:      Nonebehavioral_cloning:     None
[INFO] Resuming from results\ppo\RollerBall.
[INFO] Resuming training from step 0.
[INFO] RollerBall. Step: 1000. Time Elapsed: 14.898 s. Mean Reward: 0.317. Std of Reward: 0.465. Training.
[INFO] RollerBall. Step: 2000. Time Elapsed: 21.509 s. Mean Reward: 0.128. Std of Reward: 0.334. Training.
[INFO] RollerBall. Step: 3000. Time Elapsed: 28.361 s. Mean Reward: 0.083. Std of Reward: 0.276. Training.
[INFO] RollerBall. Step: 4000. Time Elapsed: 35.126 s. Mean Reward: 0.189. Std of Reward: 0.392. Training.
[INFO] RollerBall. Step: 5000. Time Elapsed: 41.974 s. Mean Reward: 0.081. Std of Reward: 0.273. Training.
[INFO] RollerBall. Step: 6000. Time Elapsed: 48.794 s. Mean Reward: 0.114. Std of Reward: 0.318. Training.
[INFO] RollerBall. Step: 7000. Time Elapsed: 55.667 s. Mean Reward: 0.125. Std of Reward: 0.331. Training.
[INFO] RollerBall. Step: 8000. Time Elapsed: 62.408 s. Mean Reward: 0.114. Std of Reward: 0.318. Training.
[INFO] RollerBall. Step: 9000. Time Elapsed: 69.597 s. Mean Reward: 0.105. Std of Reward: 0.307. Training.
[INFO] RollerBall. Step: 10000. Time Elapsed: 76.625 s. Mean Reward: 0.118. Std of Reward: 0.322. Training.
[INFO] RollerBall. Step: 11000. Time Elapsed: 83.861 s. Mean Reward: 0.162. Std of Reward: 0.369. Training.
[INFO] RollerBall. Step: 12000. Time Elapsed: 91.244 s. Mean Reward: 0.233. Std of Reward: 0.422. Training.
[INFO] RollerBall. Step: 13000. Time Elapsed: 101.414 s. Mean Reward: 0.184. Std of Reward: 0.388. Training.
[INFO] RollerBall. Step: 14000. Time Elapsed: 108.521 s. Mean Reward: 0.220. Std of Reward: 0.414. Training.
[INFO] RollerBall. Step: 15000. Time Elapsed: 115.816 s. Mean Reward: 0.103. Std of Reward: 0.303. Training.
[INFO] RollerBall. Step: 16000. Time Elapsed: 123.151 s. Mean Reward: 0.214. Std of Reward: 0.410. Training.
[INFO] RollerBall. Step: 17000. Time Elapsed: 130.571 s. Mean Reward: 0.239. Std of Reward: 0.427. Training.
[INFO] RollerBall. Step: 18000. Time Elapsed: 137.849 s. Mean Reward: 0.200. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 19000. Time Elapsed: 145.127 s. Mean Reward: 0.256. Std of Reward: 0.436. Training.
[INFO] RollerBall. Step: 20000. Time Elapsed: 152.521 s. Mean Reward: 0.300. Std of Reward: 0.458. Training.
[INFO] RollerBall. Step: 21000. Time Elapsed: 159.957 s. Mean Reward: 0.256. Std of Reward: 0.436. Training.
[INFO] RollerBall. Step: 22000. Time Elapsed: 167.344 s. Mean Reward: 0.200. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 23000. Time Elapsed: 174.863 s. Mean Reward: 0.154. Std of Reward: 0.361. Training.
[INFO] RollerBall. Step: 24000. Time Elapsed: 182.129 s. Mean Reward: 0.244. Std of Reward: 0.430. Training.
[INFO] RollerBall. Step: 25000. Time Elapsed: 192.057 s. Mean Reward: 0.190. Std of Reward: 0.393. Training.
[INFO] RollerBall. Step: 26000. Time Elapsed: 199.477 s. Mean Reward: 0.304. Std of Reward: 0.460. Training.
[INFO] RollerBall. Step: 27000. Time Elapsed: 206.889 s. Mean Reward: 0.227. Std of Reward: 0.419. Training.
[INFO] RollerBall. Step: 28000. Time Elapsed: 214.260 s. Mean Reward: 0.209. Std of Reward: 0.407. Training.
[INFO] RollerBall. Step: 29000. Time Elapsed: 221.703 s. Mean Reward: 0.353. Std of Reward: 0.478. Training.
[INFO] RollerBall. Step: 30000. Time Elapsed: 229.239 s. Mean Reward: 0.358. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 31000. Time Elapsed: 236.568 s. Mean Reward: 0.289. Std of Reward: 0.454. Training.
[INFO] RollerBall. Step: 32000. Time Elapsed: 243.837 s. Mean Reward: 0.356. Std of Reward: 0.479. Training.
[INFO] RollerBall. Step: 33000. Time Elapsed: 251.407 s. Mean Reward: 0.302. Std of Reward: 0.459. Training.
[INFO] RollerBall. Step: 34000. Time Elapsed: 258.593 s. Mean Reward: 0.179. Std of Reward: 0.384. Training.
[INFO] RollerBall. Step: 35000. Time Elapsed: 266.037 s. Mean Reward: 0.377. Std of Reward: 0.485. Training.
[INFO] RollerBall. Step: 36000. Time Elapsed: 273.382 s. Mean Reward: 0.304. Std of Reward: 0.460. Training.
[INFO] RollerBall. Step: 37000. Time Elapsed: 283.443 s. Mean Reward: 0.333. Std of Reward: 0.471. Training.
[INFO] RollerBall. Step: 38000. Time Elapsed: 290.705 s. Mean Reward: 0.317. Std of Reward: 0.465. Training.
[INFO] RollerBall. Step: 39000. Time Elapsed: 298.170 s. Mean Reward: 0.474. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 40000. Time Elapsed: 305.522 s. Mean Reward: 0.300. Std of Reward: 0.458. Training.
[INFO] RollerBall. Step: 41000. Time Elapsed: 313.041 s. Mean Reward: 0.360. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 42000. Time Elapsed: 320.362 s. Mean Reward: 0.396. Std of Reward: 0.489. Training.
[INFO] RollerBall. Step: 43000. Time Elapsed: 327.828 s. Mean Reward: 0.364. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 44000. Time Elapsed: 335.334 s. Mean Reward: 0.429. Std of Reward: 0.495. Training.
[INFO] RollerBall. Step: 45000. Time Elapsed: 342.619 s. Mean Reward: 0.346. Std of Reward: 0.476. Training.
[INFO] RollerBall. Step: 46000. Time Elapsed: 350.089 s. Mean Reward: 0.388. Std of Reward: 0.487. Training.
[INFO] RollerBall. Step: 47000. Time Elapsed: 357.409 s. Mean Reward: 0.380. Std of Reward: 0.485. Training.
[INFO] RollerBall. Step: 48000. Time Elapsed: 364.820 s. Mean Reward: 0.408. Std of Reward: 0.491. Training.
[INFO] RollerBall. Step: 49000. Time Elapsed: 375.179 s. Mean Reward: 0.383. Std of Reward: 0.486. Training.
[INFO] RollerBall. Step: 50000. Time Elapsed: 382.544 s. Mean Reward: 0.500. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 51000. Time Elapsed: 390.747 s. Mean Reward: 0.423. Std of Reward: 0.494. Training.
[INFO] RollerBall. Step: 52000. Time Elapsed: 398.292 s. Mean Reward: 0.517. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 53000. Time Elapsed: 405.611 s. Mean Reward: 0.362. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 54000. Time Elapsed: 412.743 s. Mean Reward: 0.417. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 55000. Time Elapsed: 420.042 s. Mean Reward: 0.418. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 56000. Time Elapsed: 426.894 s. Mean Reward: 0.465. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 57000. Time Elapsed: 433.696 s. Mean Reward: 0.463. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 58000. Time Elapsed: 440.624 s. Mean Reward: 0.453. Std of Reward: 0.498. Training.
[INFO] RollerBall. Step: 59000. Time Elapsed: 447.569 s. Mean Reward: 0.547. Std of Reward: 0.498. Training.
[INFO] RollerBall. Step: 60000. Time Elapsed: 454.539 s. Mean Reward: 0.480. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 61000. Time Elapsed: 463.862 s. Mean Reward: 0.465. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 62000. Time Elapsed: 470.829 s. Mean Reward: 0.511. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 63000. Time Elapsed: 478.034 s. Mean Reward: 0.476. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 64000. Time Elapsed: 485.041 s. Mean Reward: 0.415. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 65000. Time Elapsed: 491.911 s. Mean Reward: 0.489. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 66000. Time Elapsed: 498.539 s. Mean Reward: 0.500. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 67000. Time Elapsed: 505.428 s. Mean Reward: 0.583. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 68000. Time Elapsed: 512.027 s. Mean Reward: 0.522. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 69000. Time Elapsed: 518.833 s. Mean Reward: 0.596. Std of Reward: 0.491. Training.
[INFO] RollerBall. Step: 70000. Time Elapsed: 525.841 s. Mean Reward: 0.522. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 71000. Time Elapsed: 532.647 s. Mean Reward: 0.415. Std of Reward: 0.493. Training.
[INFO] RollerBall. Step: 72000. Time Elapsed: 539.779 s. Mean Reward: 0.591. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 73000. Time Elapsed: 549.495 s. Mean Reward: 0.532. Std of Reward: 0.499. Training.
[INFO] RollerBall. Step: 74000. Time Elapsed: 556.365 s. Mean Reward: 0.636. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 75000. Time Elapsed: 563.009 s. Mean Reward: 0.605. Std of Reward: 0.489. Training.
[INFO] RollerBall. Step: 76000. Time Elapsed: 569.911 s. Mean Reward: 0.564. Std of Reward: 0.496. Training.
[INFO] RollerBall. Step: 77000. Time Elapsed: 576.629 s. Mean Reward: 0.478. Std of Reward: 0.500. Training.
[INFO] RollerBall. Step: 78000. Time Elapsed: 583.416 s. Mean Reward: 0.636. Std of Reward: 0.481. Training.
[INFO] RollerBall. Step: 79000. Time Elapsed: 589.926 s. Mean Reward: 0.591. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 80000. Time Elapsed: 596.663 s. Mean Reward: 0.638. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 81000. Time Elapsed: 603.482 s. Mean Reward: 0.588. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 82000. Time Elapsed: 610.259 s. Mean Reward: 0.571. Std of Reward: 0.495. Training.
[INFO] RollerBall. Step: 83000. Time Elapsed: 617.153 s. Mean Reward: 0.575. Std of Reward: 0.494. Training.
[INFO] RollerBall. Step: 84000. Time Elapsed: 623.906 s. Mean Reward: 0.638. Std of Reward: 0.480. Training.
[INFO] RollerBall. Step: 85000. Time Elapsed: 632.944 s. Mean Reward: 0.698. Std of Reward: 0.459. Training.
[INFO] RollerBall. Step: 86000. Time Elapsed: 639.629 s. Mean Reward: 0.675. Std of Reward: 0.468. Training.
[INFO] RollerBall. Step: 87000. Time Elapsed: 646.264 s. Mean Reward: 0.588. Std of Reward: 0.492. Training.
[INFO] RollerBall. Step: 88000. Time Elapsed: 653.018 s. Mean Reward: 0.725. Std of Reward: 0.447. Training.
[INFO] RollerBall. Step: 89000. Time Elapsed: 660.053 s. Mean Reward: 0.795. Std of Reward: 0.403. Training.
[INFO] RollerBall. Step: 90000. Time Elapsed: 666.831 s. Mean Reward: 0.750. Std of Reward: 0.433. Training.
[INFO] RollerBall. Step: 91000. Time Elapsed: 673.867 s. Mean Reward: 0.756. Std of Reward: 0.429. Training.
[INFO] RollerBall. Step: 92000. Time Elapsed: 681.346 s. Mean Reward: 0.667. Std of Reward: 0.471. Training.
[INFO] RollerBall. Step: 93000. Time Elapsed: 688.432 s. Mean Reward: 0.830. Std of Reward: 0.375. Training.
[INFO] RollerBall. Step: 94000. Time Elapsed: 695.400 s. Mean Reward: 0.686. Std of Reward: 0.464. Training.
[INFO] RollerBall. Step: 95000. Time Elapsed: 702.263 s. Mean Reward: 0.721. Std of Reward: 0.449. Training.
[INFO] RollerBall. Step: 96000. Time Elapsed: 709.423 s. Mean Reward: 0.800. Std of Reward: 0.400. Training.
[INFO] RollerBall. Step: 97000. Time Elapsed: 718.726 s. Mean Reward: 0.880. Std of Reward: 0.325. Training.
[INFO] RollerBall. Step: 98000. Time Elapsed: 725.571 s. Mean Reward: 0.865. Std of Reward: 0.341. Training.
[INFO] RollerBall. Step: 99000. Time Elapsed: 732.557 s. Mean Reward: 0.882. Std of Reward: 0.322. Training.
[INFO] RollerBall. Step: 100000. Time Elapsed: 739.284 s. Mean Reward: 0.827. Std of Reward: 0.378. Training.
[INFO] RollerBall. Step: 101000. Time Elapsed: 745.954 s. Mean Reward: 0.854. Std of Reward: 0.353. Training.
[INFO] RollerBall. Step: 102000. Time Elapsed: 753.131 s. Mean Reward: 0.870. Std of Reward: 0.336. Training.
[INFO] RollerBall. Step: 103000. Time Elapsed: 759.850 s. Mean Reward: 0.900. Std of Reward: 0.300. Training.
[INFO] RollerBall. Step: 104000. Time Elapsed: 766.645 s. Mean Reward: 0.812. Std of Reward: 0.390. Training.
[INFO] RollerBall. Step: 105000. Time Elapsed: 773.365 s. Mean Reward: 0.820. Std of Reward: 0.384. Training.
[INFO] RollerBall. Step: 106000. Time Elapsed: 780.067 s. Mean Reward: 0.851. Std of Reward: 0.356. Training.
[INFO] RollerBall. Step: 107000. Time Elapsed: 787.011 s. Mean Reward: 0.804. Std of Reward: 0.397. Training.
[INFO] RollerBall. Step: 108000. Time Elapsed: 793.614 s. Mean Reward: 0.902. Std of Reward: 0.297. Training.
[INFO] RollerBall. Step: 109000. Time Elapsed: 803.009 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 110000. Time Elapsed: 809.970 s. Mean Reward: 0.860. Std of Reward: 0.347. Training.
[INFO] RollerBall. Step: 111000. Time Elapsed: 816.523 s. Mean Reward: 0.833. Std of Reward: 0.373. Training.
[INFO] RollerBall. Step: 112000. Time Elapsed: 823.426 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 113000. Time Elapsed: 830.236 s. Mean Reward: 0.948. Std of Reward: 0.221. Training.
[INFO] RollerBall. Step: 114000. Time Elapsed: 836.938 s. Mean Reward: 0.865. Std of Reward: 0.341. Training.
[INFO] RollerBall. Step: 115000. Time Elapsed: 843.833 s. Mean Reward: 0.925. Std of Reward: 0.264. Training.
[INFO] RollerBall. Step: 116000. Time Elapsed: 850.819 s. Mean Reward: 0.898. Std of Reward: 0.302. Training.
[INFO] RollerBall. Step: 117000. Time Elapsed: 857.805 s. Mean Reward: 0.942. Std of Reward: 0.233. Training.
[INFO] RollerBall. Step: 118000. Time Elapsed: 865.292 s. Mean Reward: 0.966. Std of Reward: 0.181. Training.
[INFO] RollerBall. Step: 119000. Time Elapsed: 872.311 s. Mean Reward: 0.922. Std of Reward: 0.269. Training.
[INFO] RollerBall. Step: 120000. Time Elapsed: 879.230 s. Mean Reward: 0.837. Std of Reward: 0.370. Training.
[INFO] RollerBall. Step: 121000. Time Elapsed: 888.609 s. Mean Reward: 0.940. Std of Reward: 0.237. Training.
[INFO] RollerBall. Step: 122000. Time Elapsed: 895.353 s. Mean Reward: 0.967. Std of Reward: 0.180. Training.
[INFO] RollerBall. Step: 123000. Time Elapsed: 902.055 s. Mean Reward: 0.906. Std of Reward: 0.292. Training.
[INFO] RollerBall. Step: 124000. Time Elapsed: 908.933 s. Mean Reward: 0.921. Std of Reward: 0.270. Training.
[INFO] RollerBall. Step: 125000. Time Elapsed: 915.936 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 126000. Time Elapsed: 922.447 s. Mean Reward: 0.951. Std of Reward: 0.216. Training.
[INFO] RollerBall. Step: 127000. Time Elapsed: 929.275 s. Mean Reward: 0.985. Std of Reward: 0.121. Training.
[INFO] RollerBall. Step: 128000. Time Elapsed: 936.035 s. Mean Reward: 0.969. Std of Reward: 0.174. Training.
[INFO] RollerBall. Step: 129000. Time Elapsed: 942.664 s. Mean Reward: 0.944. Std of Reward: 0.229. Training.
[INFO] RollerBall. Step: 130000. Time Elapsed: 949.341 s. Mean Reward: 0.962. Std of Reward: 0.191. Training.
[INFO] RollerBall. Step: 131000. Time Elapsed: 956.469 s. Mean Reward: 0.925. Std of Reward: 0.264. Training.
[INFO] RollerBall. Step: 132000. Time Elapsed: 962.963 s. Mean Reward: 0.926. Std of Reward: 0.262. Training.
[INFO] RollerBall. Step: 133000. Time Elapsed: 972.607 s. Mean Reward: 0.952. Std of Reward: 0.213. Training.
[INFO] RollerBall. Step: 134000. Time Elapsed: 979.378 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 135000. Time Elapsed: 986.197 s. Mean Reward: 0.957. Std of Reward: 0.203. Training.
[INFO] RollerBall. Step: 136000. Time Elapsed: 992.932 s. Mean Reward: 0.952. Std of Reward: 0.213. Training.
[INFO] RollerBall. Step: 137000. Time Elapsed: 999.726 s. Mean Reward: 0.984. Std of Reward: 0.127. Training.
[INFO] RollerBall. Step: 138000. Time Elapsed: 1006.405 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 139000. Time Elapsed: 1013.174 s. Mean Reward: 0.985. Std of Reward: 0.121. Training.
[INFO] RollerBall. Step: 140000. Time Elapsed: 1019.885 s. Mean Reward: 0.972. Std of Reward: 0.164. Training.
[INFO] RollerBall. Step: 141000. Time Elapsed: 1026.696 s. Mean Reward: 0.983. Std of Reward: 0.128. Training.
[INFO] RollerBall. Step: 142000. Time Elapsed: 1033.564 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 143000. Time Elapsed: 1040.442 s. Mean Reward: 0.971. Std of Reward: 0.167. Training.
[INFO] RollerBall. Step: 144000. Time Elapsed: 1047.188 s. Mean Reward: 0.986. Std of Reward: 0.119. Training.
[INFO] RollerBall. Step: 145000. Time Elapsed: 1056.882 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 146000. Time Elapsed: 1063.685 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 147000. Time Elapsed: 1070.829 s. Mean Reward: 0.986. Std of Reward: 0.119. Training.
[INFO] RollerBall. Step: 148000. Time Elapsed: 1077.540 s. Mean Reward: 0.973. Std of Reward: 0.163. Training.
[INFO] RollerBall. Step: 149000. Time Elapsed: 1084.568 s. Mean Reward: 0.987. Std of Reward: 0.113. Training.
[INFO] RollerBall. Step: 150000. Time Elapsed: 1091.580 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 151000. Time Elapsed: 1098.682 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 152000. Time Elapsed: 1105.768 s. Mean Reward: 0.975. Std of Reward: 0.156. Training.
[INFO] RollerBall. Step: 153000. Time Elapsed: 1112.896 s. Mean Reward: 0.985. Std of Reward: 0.120. Training.
[INFO] RollerBall. Step: 154000. Time Elapsed: 1119.381 s. Mean Reward: 0.955. Std of Reward: 0.208. Training.
[INFO] RollerBall. Step: 155000. Time Elapsed: 1126.167 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 156000. Time Elapsed: 1133.020 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 157000. Time Elapsed: 1142.865 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 158000. Time Elapsed: 1149.926 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 159000. Time Elapsed: 1157.413 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 160000. Time Elapsed: 1164.465 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 161000. Time Elapsed: 1171.384 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 162000. Time Elapsed: 1178.337 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 163000. Time Elapsed: 1185.140 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 164000. Time Elapsed: 1192.192 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 165000. Time Elapsed: 1199.004 s. Mean Reward: 0.987. Std of Reward: 0.113. Training.
[INFO] RollerBall. Step: 166000. Time Elapsed: 1205.923 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 167000. Time Elapsed: 1213.083 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 168000. Time Elapsed: 1220.061 s. Mean Reward: 0.963. Std of Reward: 0.188. Training.
[INFO] RollerBall. Step: 169000. Time Elapsed: 1229.890 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 170000. Time Elapsed: 1236.702 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 171000. Time Elapsed: 1243.546 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 172000. Time Elapsed: 1250.348 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 173000. Time Elapsed: 1257.201 s. Mean Reward: 0.989. Std of Reward: 0.107. Training.
[INFO] RollerBall. Step: 174000. Time Elapsed: 1264.062 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 175000. Time Elapsed: 1270.864 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 176000. Time Elapsed: 1277.759 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 177000. Time Elapsed: 1284.854 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 178000. Time Elapsed: 1291.846 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 179000. Time Elapsed: 1298.717 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 180000. Time Elapsed: 1305.445 s. Mean Reward: 0.987. Std of Reward: 0.115. Training.
[INFO] RollerBall. Step: 181000. Time Elapsed: 1315.365 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 182000. Time Elapsed: 1322.126 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 183000. Time Elapsed: 1328.928 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 184000. Time Elapsed: 1335.715 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 185000. Time Elapsed: 1342.726 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 186000. Time Elapsed: 1349.645 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 187000. Time Elapsed: 1356.639 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 188000. Time Elapsed: 1363.759 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 189000. Time Elapsed: 1370.695 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 190000. Time Elapsed: 1377.922 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 191000. Time Elapsed: 1384.934 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 192000. Time Elapsed: 1391.961 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 193000. Time Elapsed: 1402.132 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 194000. Time Elapsed: 1409.085 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 195000. Time Elapsed: 1415.912 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 196000. Time Elapsed: 1422.706 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 197000. Time Elapsed: 1429.484 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 198000. Time Elapsed: 1436.254 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 199000. Time Elapsed: 1442.973 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 200000. Time Elapsed: 1449.734 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 201000. Time Elapsed: 1456.486 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 202000. Time Elapsed: 1463.264 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 203000. Time Elapsed: 1470.408 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 204000. Time Elapsed: 1477.545 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 205000. Time Elapsed: 1488.081 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 206000. Time Elapsed: 1495.135 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 207000. Time Elapsed: 1502.245 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 208000. Time Elapsed: 1509.239 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 209000. Time Elapsed: 1516.758 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 210000. Time Elapsed: 1523.670 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 211000. Time Elapsed: 1530.765 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 212000. Time Elapsed: 1537.816 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 213000. Time Elapsed: 1544.803 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 214000. Time Elapsed: 1551.623 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 215000. Time Elapsed: 1558.425 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 216000. Time Elapsed: 1565.344 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 217000. Time Elapsed: 1575.214 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 218000. Time Elapsed: 1582.042 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 219000. Time Elapsed: 1588.804 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 220000. Time Elapsed: 1595.573 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 221000. Time Elapsed: 1602.292 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 222000. Time Elapsed: 1609.019 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 223000. Time Elapsed: 1615.688 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 224000. Time Elapsed: 1622.424 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 225000. Time Elapsed: 1629.219 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 226000. Time Elapsed: 1635.938 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 227000. Time Elapsed: 1642.774 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 228000. Time Elapsed: 1649.652 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 229000. Time Elapsed: 1659.548 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 230000. Time Elapsed: 1666.334 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 231000. Time Elapsed: 1673.136 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 232000. Time Elapsed: 1679.905 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 233000. Time Elapsed: 1686.683 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 234000. Time Elapsed: 1693.385 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 235000. Time Elapsed: 1700.121 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 236000. Time Elapsed: 1706.840 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 237000. Time Elapsed: 1713.752 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 238000. Time Elapsed: 1720.504 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 239000. Time Elapsed: 1727.682 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 240000. Time Elapsed: 1734.426 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 241000. Time Elapsed: 1744.295 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 242000. Time Elapsed: 1751.024 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 243000. Time Elapsed: 1757.835 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 244000. Time Elapsed: 1764.678 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 245000. Time Elapsed: 1771.465 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 246000. Time Elapsed: 1778.285 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 247000. Time Elapsed: 1785.037 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 248000. Time Elapsed: 1791.840 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 249000. Time Elapsed: 1798.643 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 250000. Time Elapsed: 1805.479 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 251000. Time Elapsed: 1812.331 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 252000. Time Elapsed: 1819.059 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 253000. Time Elapsed: 1829.052 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 254000. Time Elapsed: 1835.874 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 255000. Time Elapsed: 1842.718 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 256000. Time Elapsed: 1849.537 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 257000. Time Elapsed: 1856.315 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 258000. Time Elapsed: 1863.017 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 259000. Time Elapsed: 1869.770 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 260000. Time Elapsed: 1876.573 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 261000. Time Elapsed: 1883.308 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 262000. Time Elapsed: 1890.127 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 263000. Time Elapsed: 1896.914 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 264000. Time Elapsed: 1903.724 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 265000. Time Elapsed: 1913.611 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 266000. Time Elapsed: 1920.339 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 267000. Time Elapsed: 1927.133 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 268000. Time Elapsed: 1933.828 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 269000. Time Elapsed: 1940.555 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 270000. Time Elapsed: 1947.273 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 271000. Time Elapsed: 1953.911 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 272000. Time Elapsed: 1960.679 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 273000. Time Elapsed: 1967.425 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 274000. Time Elapsed: 1974.194 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 275000. Time Elapsed: 1980.929 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 276000. Time Elapsed: 1987.641 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 277000. Time Elapsed: 1997.551 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 278000. Time Elapsed: 2004.372 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 279000. Time Elapsed: 2011.273 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 280000. Time Elapsed: 2018.135 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 281000. Time Elapsed: 2024.980 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 282000. Time Elapsed: 2031.774 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 283000. Time Elapsed: 2038.426 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 284000. Time Elapsed: 2045.179 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 285000. Time Elapsed: 2051.956 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 286000. Time Elapsed: 2058.633 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 287000. Time Elapsed: 2065.396 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 288000. Time Elapsed: 2072.172 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 289000. Time Elapsed: 2081.992 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 290000. Time Elapsed: 2088.812 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 291000. Time Elapsed: 2095.606 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 292000. Time Elapsed: 2102.384 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 293000. Time Elapsed: 2109.187 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 294000. Time Elapsed: 2115.998 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 295000. Time Elapsed: 2122.851 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 296000. Time Elapsed: 2129.737 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 297000. Time Elapsed: 2136.564 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 298000. Time Elapsed: 2143.332 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 299000. Time Elapsed: 2150.127 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] RollerBall. Step: 300000. Time Elapsed: 2156.922 s. Mean Reward: 1.000. Std of Reward: 0.000. Training.
[INFO] Exported results\ppo\RollerBall\RollerBall-300005.onnx
[INFO] Copied results\ppo\RollerBall\RollerBall-300005.onnx to results\ppo\RollerBall.onnx.(unity_py_3.6_siki) D:\Test\Unity\Git\Roll_A_Ball\Code\MyRoll_A_Ball\Assets\My\Train>

一开始也出现了错误,发现再尝试一下,就成功了

最终实现学习了30万次

8. 课时 16 : 113 - 完成模型的训练:

模型放入后,运行发现一直报错

检查发现我的训练文件位置和名称如下

将文件改名字为 RollerBall ,失败

将文件改名字后放到 之前RollerBall位置,还是失败

最终发现我没有成功生成 .nn文件,将老师的原工程里面的.nn文件导入后发现是成功了

快速入门Unity机器学习:三:相关推荐

  1. 快速入门Unity机器学习:一:

    目录 1.目的 1.1 记录一下 2.参考 1.SIKI学院 3.注意 4. 课时 3 : 100 - Unity机器学习案例下载 5. 课时 4 : 101 - 狗子的学习 6. 课时 5 : 10 ...

  2. pytorch快速入门与实战——三、Unet实现

    专栏目录:pytorch(图像分割UNet)快速入门与实战--零.前言 pytorch快速入门与实战--一.知识准备(要素简介) pytorch快速入门与实战--二.深度学习经典网络发展 pytorc ...

  3. 新手必备pr 2021快速入门教程「三」素材的导入与管理

    PR2021快速入门教程,学完之后,制作抖音视频,vlog,电影混剪,日常记录等不在话下!零基础,欢迎入坑! 本节内容 上节内容我们学习了新建项目以及软件首选项的一些基本设置,接下来我们就可以导入素材 ...

  4. sklearn快速入门教程 -- 机器学习工具的快速入门指引

    本系列教程旨在提供一个直观.快速的入门指引,从应用的角度进行阐述,提供框架性的理解方式. 全程共5篇教程.若抛开下载和安装的过程,应在2小时左右全部阅读和动手测试完成. (一)准备工作 (二)线性回归 ...

  5. 【赠书】快速入门自动机器学习!自动机器学习(AutoML):方法、系统与挑战 图书赠送!...

    周末了,这次给大家赠送3本机器学习好书,<自动机器学习(AutoML):方法.系统与挑战>,请看细节. 这是一本什么书 这是一本全面介绍自动机器学习的好书,主要包含自动机器学习的方法.实际 ...

  6. 3天快速入门python机器学习(黑马xxx)

    目录 一. 机器学习概述 1.1 人工智能概述 1.1.1介绍 1.1.2 机器学习.深度学习能做些什么 1.1.3 人工智能阶段课程安排 1.2 什么是机器学习 1.2.3 数据集构成 1.3 机器 ...

  7. 【两周快速入门pr】三、定格动画制作-基础操作成就你的脑洞

    文章目录 前言 一.准备工作 二.整理素材,调整节奏 三.添加花字(本质还是字幕) 四.添加关键帧动画 五.添加蒙版 六.导出 参考链接 前言 本节将借助定格动画,学习剪辑中的关键帧动画和蒙版(遮罩) ...

  8. GC DevKit 快速入门 -- 游戏概览(三)

    接上节 http://www.cnblogs.com/hangxin1940/archive/2013/04/11/3015553.html ## 启动流程 在构造函数`init`中,我们通过监听`' ...

  9. (深度学习快速入门)第三章第一节:多层感知器简介

    文章目录 一:引入 二:定义 三:反向传播算法 四:构建多层感知器完成波士顿房价预测 一:引入 前文所讲到的波士顿房价预测案例中,涉及到的仅仅是一个非常简单的神经网络,它只含有输入层和输出层 但观察最 ...

  10. 快速入门丨篇三:如何进行运动控制器ZPLC程序开发?

    此前,正运动技术给大家讲了,运动控制器的"固件升级"以及"ZBasic程序开发",今天我们来学习一下运动控制器ZPLC程序开发. ZPLC是Zmotion运动控 ...

最新文章

  1. hdu1245 两个权值的最短路
  2. 数学系列 - 概率论 - 泊松分布和(负)指数分布
  3. VALSE 2020 线上大会简明日程发布(7月31日-8月5日)
  4. java 管理后台前台分离_系统前台后台是否应该分离(包括部署)
  5. 7.业务架构·应用架构·数据架构实战 --- 业务架构书
  6. php将数组打印到txt文件
  7. 用爬虫的底子两天做了一个简单的网页(新手版)
  8. 白平衡(WB:white balance)数值设置
  9. html中的==$0是什么意思
  10. Windows API函数 (绘图函数)
  11. 【Linux】Alibaba Cloud Linux 3 中第二硬盘、MySQL8.*、MySQL7.*、Redis、Nginx、Java 系统安装
  12. 天文基础浏览-盖亚计划
  13. Arena | 用Excel设计的RPG游戏
  14. python练习(1)
  15. 黑龙江省双鸭山市谷歌高清卫星地图下载
  16. 自动化测试框架(从robotframework到hyrobot(黑羽robot) python语言)
  17. Server Disconnect
  18. java 鲜花管理系统_基于jsp的鲜花销售管理系统-JavaEE实现鲜花销售管理系统 - java项目源码...
  19. php 相同数据合并单元格,elementUI table合并相同数据的单元格
  20. MSU2020年度世界视频编码大赛 金山云斩获UGC赛道冠军

热门文章

  1. 全网最全python教程,从零到精通(学python有它就够必收藏)
  2. 全国各省市区县数据整理
  3. 统计学原理 数值型数据的整理与展示
  4. [Linux]CentOS修改YUM镜像地址提高下载速度
  5. 蒟蒻的第一篇博客(洛谷P1113)
  6. Python3爬取拉钩网职位,并分析
  7. 组图:释放性感诱惑 内地超级豪放女星大盘点
  8. C语言:封装图形面积公式
  9. python梯形公式面积_算法(一)梯形近似法求曲线面积
  10. Flutter 学习之路 -- 异步任务