前言: UT Austin Villa是近几年Robocup仿真3D项目中稳稳当当的世界冠军,他们每年拿了冠军之后都会发1到2篇论文来阐述他们的进步,其论文内容已经形成了固定模板。首先是Introduction,说一下他们近几年拿了多少个冠军等等,不用细看;然后是Domain Description,缀述一下RoboCup仿真3D的运行环境等等,不用细看;然后是Changes for 20xx,这个是介绍他们当年的进步的实现方法,重点看 ;再后面是Main Competition Results and Analysis、Technical Challenges,是各种秀战绩,不用细看。

一句话,只看Changes for 20xx就够了。

下面我把我们复现过程中可能会用到的一些部分进行了理解性的翻译。
论文中有些不明白的部分,我发邮件给了论文的作者,德克萨斯大学奥斯汀分校的教授Peter Stone,他帮我抄送给了他们团队的负责人,Patrick MacAlpine 博士后,从他那得到了非常耐心而详细的解答,非常感谢他们,并体会到了我们与世界冠军之间的巨大差距(例如原文标题为“品读”,现在改成了“拜读”)。

注: 本文只是一些注释和理解,原文还是要自己看几遍的。


一、2019年论文链接

论文主要内容及对一些部分的理解如下:

One significant change for the 2019 RoboCup 3D Simulation League competition was penalizing self-collisions. While the simulator’s physics model can detect and simulate self-collisions—when a robot’s body part such as a leg or
arm collides with another part of its own body—having the physics model try
to process and handle the large number of self-collisions occurring during games
often leads to instability in the simulator causing it to crash. To preserve stability of the simulator self-collisions are purposely ignored by the physics model.
However, not modeling self-collisions can result in robots performing physically
impossible motions such as one leg passing through the other when kicking the
ball. In order to discourage teams from having robots with self-colliding behaviors, a new feature was added to the simulator this year to detect and penalize
self-collisions when they happen. This feature signals a self-collision as having
occurred if two body parts of a robot overlap by more than 0.04 meters, and
then all joints in any arm or leg of the robot involved in the self-collision are
frozen and not allowed to move for one second. Freezing the joints in an arm or
leg that has started to collide with another body part is an approximation of the
physics model preventing body parts from moving through each other, and also
detracts from the performance of the robot due to its limb being “numb” and immobile. After the second passes, the joints are unfrozen, and the robot is allowed to move its self-colliding body parts for two seconds without any self-collisions
being reported. This two second period, during which previously collided body
parts are no longer penalized and frozen for self-collisions, allows a robot time
to reposition its body to no longer have a self-collision.

加入了一个传球模式:

A player may initiate the pass play mode as long as the following conditions are all met:
– The current play mode is PlayOn.
– The agent is within 0.5 meters of the ball.
– No opponents are within a meter of the ball.
– The ball is stationary as measured by having a speed no greater than 0.05
meters per second.
– At least three seconds have passed since the last time a player’s team has
been in pass mode.
Once pass mode for a team has started the following happens:
– Players from the opponent team are prevented from getting within a meter
of the ball.
– The pass play mode ends as soon as a player touches the ball or four seconds
have passed.
– After pass mode has ended the team who initiated the pass mode is unable
to score for ten seconds—this prevents teams from trying to take a shot on
goal out of pass mode.

  • 减少自我碰撞的方法
    首先要确定哪些动作产生了自我碰撞

通过跟其他不同队伍进行几千场比赛,将产生碰撞时的动作和球员编号记录下来。

下面是采用的策略:
1.手臂调整:大约一半有自我碰撞的踢球动作涉及到手臂,而在踢球动作中起主要作用的是腿,因此可以通过调整手臂的关节角度来避免自我碰撞,而不改变原先的踢球动作。

When a self-collision occurs, the simulator reports which body parts
of a robot collided with each other. For kicking skills the body parts that
matter the most are those in the legs, so if a robot’s arm is involved in a self-collision the arm’s movement can probably be adjusted without affecting
the kicking motion. Roughly half the kicking skills that had self-collisions
involved the robots’ arms in the self-collisions, so we were able to manually
adjust the arms’ joint angle positions to no longer self-collide while still
exhibiting the same kicking motion through the ball.

2.重新优化当前产生碰撞的动作:在很多情况下很难通过手动调节来避免动作中的自我碰撞,那么就以当前动作为起点,用cma-es算法重新进行优化,如果发生自我碰撞,就给球员的适应值上加上一个大的惩罚值。

In many cases it is not easy
to hand adjust the motions of a skill to avoid a self-collision as doing so fundamentally changes the performance of the skill (e.g. adjusting the position
of the legs of a robot for a kicking skill when the robot’s legs self-collide).
Instead of trying to fix things by hand, the current skill can be relearned
with CMA-ES using the current self-colliding behavior as a starting point
for learning, while also adding a large penalty value to the fitness of an agent
if it has any self-collisions while performing the optimization task it is trying
to learn.

3.如果当前的动作里含有很多自我碰撞,可能优化的时候就找不到不含有自我碰撞的动作,这时候就从跟当前动作相似的一个动作为起点开始优化

: If the previous strategy does
not work—possibly because the current behavior has too many self-collisions
such that it is hard to find a behavior that does not have self-collisions when
using the current self-colliding behavior as a starting point—one can instead
attempt to learn using a similar related skill (e.g. similar distance kick) that
has fewer collisions as a starting point for learning.

  1. 当某个动作只有很少的自碰撞,在学习试验中不经常出现,但是在比赛中仍然会产生不少次,那么就减小自碰撞阈值进行优化,例如假设比赛规定当胳膊与躯干交叠了10层时视为发生了自我碰撞,现在将其减小为5层进行优化,这样在优化时就能检测到该动作的自碰撞。

Some skills have
infrequent enough self-collisions that they do not always occur during a learning trial, but still experience a significant number of self-collisions during
games. It can be especially hard to reduce the number of self-collisions for
skills when self-collisions are not always detected during learning. As a way
to decrease the chance of the robot assuming body positions that are right on
the border of having self-collisions, one can decrease the allowed amount of
overlap between body parts in the simulator before a self-collision is considered to have occurred. By decreasing the amount of allowed overlap between
body parts during learning it is less likely that a learned behavior will have
self-collisions exceeding the actual allowed amount of overlap.

  • 传球模式的策略
    为了最好地利用传球模式的优点,球员必须小心地决定什么时候启用该模式。如果简单地在每一个满足传球模式的条件下都启用,会使我们必须要在传球之后10s才能射门;如果从不使用,又会使我们失去了在没有敌人的情况下踢球的机会。
    下面是UT使用pass mode的策略:
    1.只在敌人离球1.25米以内时启用传球模式。因为如果敌人离得很远,不会对我们踢球造成威胁,这时候开启传球模式是没有必要的,而且越晚开启pass mode,留下的在pass mode最终结束之前的踢球时间就越长(我觉得这个的作用是,比如说我方球员现在离球0.4m,敌方球员离球1.2m,这时候开启pass mode的时间越晚,我方球员就可以走的离球更近,或者已经做出了踢球前的准备动作,这样可以节省在开启pass mode之后的踢球时间)。

Only activate pass mode when an opponent is within 1.25 meters of the
ball. Activating pass mode before the opponent is close is unnecessary as
the opponent is not yet a threat to interfere with a kick, and the later pass
mode is activated the later it will time out leaving more time to kick the ball
before pass mode eventually ends.

2.不要在球员离敌方球门足够近,可以直接射门得分时开启pass mode,否则必须要等10s才能射门。

Do not use pass mode when a player is close enough to take a shot on goal
and score. Goals cannot be scored for ten seconds after pass mode ends, so
it is better to attempt a shot and try to score than to pass the ball and then
have to wait ten seconds to score.

3.当球员不在球后面时,即使离敌方球门很近,可以直接射门,也要使用pass mode,因为球员从球前走到球后面的踢球点需要一定的时间,如果不开启pass mode敌人就会对我们踢球造成潜在威胁。

Do use pass mode if a player is not behind the ball even if the player is close
enough to the opponent’s goal to take a shot and score. The player will have
to take some time to walk around the ball to get in position to take a shot,
and at that point it is likely the opponent will have gotten close enough to
the ball to interfere with a potential shot.

二、2018年论文链接

2018年的主要进展:

1.摔倒
2.走过了,碰到了球或者没走够,错过了球
3.踢球时间太长 (超过12s没有接触球)造成超时

即产生Penalty时的fitness与球没有动时的结果一样。因为cma-es算法只使用训练中fitness值的顺序排序,因此不同踢球动作间fitness的相对误差不会造成影响。

优化代数:300代;
每代个体数:300个;
优化结果:fitness > -1,即球最终到达位置离目标点的平均距离小于一米。
优化顺序:先用已经有的长距离踢球参数作为种子,优化出一组好的长距离动作参数;然后依次减小优化的踢球距离,并将上一次的参数作为本次的种子。例如:用19m的参数作为种子优化18m的,再用18m的参数作为种子优化17m的。。。

  • 深度学习传球策略
    在2018年UT使用了基于深度学习的方法来训练传球策略。

数据集的获取方法: 设SSS是一个大小为m的数据集:{(xi,yi)}i=1m\{(x^i,y^i)\}^m_{i=1}{(xi,yi)}i=1m​,其中单输入xix^ixi是一个49维的特征向量,用来表示比赛状态,即比赛模式、22个球员的坐标,球的坐标和潜在的传球坐标(我的理解是:比赛模式为1维,22个球员的x坐标和y坐标一共是44维,球的x坐标和y坐标一共2维,再加上潜在的传球x坐标和y坐标2维,一共49维);输出yiy^iyi是一个[0,1]之间的单标量值,用来表示潜在传球位置的值(译为“得分”更为恰当)。在数据采集过程中,先根据xix^ixi将赛场恢复到一个确定的比赛状态,通过10次重复来确定yiy^iyi的值,在每一次采集时,如果在20s内进球了,就给一个+1的奖励,否则奖励为0,yiy^iyi就是这10次奖励的平均值。显然,对于每一种球员和球的站位状态,都有很多有效的传球位置;因此对于一种站位状态有很多的训练例子。(在这里,一个有效的传球位置是在距离球的初始坐标20m以内,而且球场范围内)
此外,下面的方法优化了数据集:

1.网络的输入是规范化的,具体地说,输入网络的球员坐标是按照球场的x坐标轴从左到右顺序排列的;
2.通过对数据预处理来确保对称。具体地说,如果球的y坐标是负值,就反转所有的y坐标以保证输入到神经网络的球的y坐标都是正值。这样相当于只用考虑球在球场上边一半时的情况,因而减少了一半的可能情况,提高了收敛速度。

优化传球的细节: 首先要确定合适的神经网络的大小,影响我们选择的有两个因素,一个是它是否会过拟合,另一个是它是否能在0.02s内完成计算。
下表是不同的神经网络大小所对应的平均花费时间、最大花费时间、和最大丢包数量,单位是毫秒。(最大丢包数量是sever和agent通信时丢失的消息量)
UT选择的方案是上表第三种。
下面是方案3的训练细节:
一旦训练完成,这个网络就可以每时每刻根据当前的比赛状态计算出潜在传球位置的得分,机器人将传向得分最高的一个潜在传球位置。

以上是论文里的内容,看完之后我有一个疑问:

当我们收集数据集时,首先,将xix ^ ixi设置为输入; 其次,我们需要根据xix ^ ixi在RoboCup3D仿真平台上构建一个环境,并设计一种策略来测试它是否在20秒内达到目标。 最后,根据测试结果得到yiy ^ iyi。
我不明白的是,在上面的第二步中,如何设计策略?

Patrick MacAlpine 博士给出的详细回答(翻译了会变味,直接贴上原文):

三、2017年论文链接

2017年的主要进展:

四、2016年论文链接

2016年的主要进展:

五、2015年论文链接

2015年的主要进展:


附: UT所有文献的网址
值得一提:
入门RoboCup仿真3D的必读材料:用户手册,里面的通信部分可以参考:RoboCup仿真3D底层通信模块介绍(一)、RoboCup仿真3D底层通信模块介绍(二)

上手Robocup仿真3D的必读材料:UT开源底层的详细介绍

拜读近五年UT Austin Villa发表的RoboCup仿真3D论文相关推荐

  1. 研究生数学建模竞赛——近五年赛题分析以及数据分析类赛题优秀论文分享

    赛题分析 1 2017-2022年题目名称及类型 1.1 2017 1.2 2018 1.3 2019 1.4 2020 1.5 2021 1.6 2022 2 选题建议 3 优秀论文分享 纯属菜鸡自 ...

  2. 2023最新车道线综述!近五年文章全面盘点(几何建模/机器学习/深度学习)

    点击下方卡片,关注"自动驾驶之心"公众号 ADAS巨卷干货,即可获取 点击进入→自动驾驶之心[车道线检测]技术交流群 后台回复[车道线综述]获取基于检测.分割.分类.曲线拟合等近几 ...

  3. 五个计算机软件,近五个交易日计算机软件概念股市复盘(4月19日)

    计算机软件概念近五个交易日股市资讯复盘: 2021-04-16复盘:普遍报涨 榕基软件(6.650,0.480,7.780%)领涨,启明信息(11.390,0.810,7.656%) .多伦科技(7. ...

  4. oh,我的老伙计,你看看这近五十个dapr视频

    oh,我的老伙计,你看看这近五十个 dapr 视频.这不就是你想要的视频资料吗?快来捡走吧! 开始了,但是没完全开始 Dapr 是一个可移植的.事件驱动的运行时,它使任何开发人员能够轻松构建出弹性的. ...

  5. 语法转换_近五年高考语法填空词性转换汇总(含答案)

    一.近五年高考语法填空词性转换总结 ▲ 2019 年 2019 全国 I 卷 It is difficult to figure out a global population of polar be ...

  6. 北京科技大学计算机专业毕设,北京科技大学毕业生近五年就业情况分析(2017)

    北京科技大学毕业生近五年就业情况分析(2017) 北京科技大学是中华人民共和国教育部直属的一所以工科为主,工学.理学.管理学.文学.经济学.法学等多学科协调发展的全国重点大学,是国家"211 ...

  7. 【NER综述】近五年中文电子病历命名实体识别研究进展

    来自:python遇见NLP 阅读综述性论文是一种能够快速了解某一领域的方法,接下来通过今年的一篇综述性论文来了解一下近五年来中文电子病历的命名实体识别研究进展. 基本的,我们应该先来了解一下两个概念 ...

  8. 近五年中文电子病历命名实体识别研究进展

    原文链接: 近五年中文电子病历命名实体识别研究进展 阅读综述性论文是一种能够快速了解某一领域的方法,接下来通过今年的一篇综述性论文来了解一下近五年来中文电子病历的命名实体识别研究进展. 基本的,我们应 ...

  9. 一图看懂鸿蒙股票,近五个交易日鸿蒙2.0概念股市复盘数据,一分钟教你看懂(5月1日)...

    鸿蒙2.0概念近五个交易日股市走势复盘数据,以下股票值得关注: 2021-04-30复盘:普遍报跌 蓝盾股份(3.290,-0.320,-8.864%)领跌,先进数通(14.790,-0.620,-4 ...

最新文章

  1. Python之初识函数
  2. listview移动时 item背景颜色错位问题
  3. Effective C++ ------- virtual
  4. Java中利用MessageFormat对象实现类似C# string.Format方法格式化
  5. rocketmq 顺序消费_RocketMQ核心概念扫盲
  6. 安利一个React同构渲染脚手架 —— razzle
  7. windows10(专业版和家庭版)---禁止自动更新系统
  8. 带外设引脚选择(PPS)的I/O端口
  9. 网络安全 实验五 :破解密码
  10. FIT2CLOUD飞致云成为Kubernetes认证服务提供商(KCSP)
  11. Java基础—封装继承多态(详细)
  12. 恶略天气下的目标检测
  13. django.db.migrations.exceptions.BadMigrationError: Migration urls in app book has no Migration class
  14. docker更换国内镜像(网易docker镜像)解决docker下载镜像慢问题
  15. Hexagon GDB Debugger介绍(53)
  16. ieg技术总监_干货!从程序员到技术总监,大牛内部分享的资料
  17. linux路径跟踪命令,Linux基础命令---tracepath追踪路由信息
  18. YTU-OJ-多重继承
  19. elasticsearch(es)在用户画像业务上的应用【elasticsearch(es)性能调优】
  20. 2020届校招-老虎证券-Windows开发工程师笔试题

热门文章

  1. 一看就懂→专票电子化的三条实现路径
  2. python二维码识别读取_python+opencv检测图片中二维码
  3. 1.整理华子面经--1
  4. 在2012年12月03号这一天,V2.3版,想个街溜子夹着华子!
  5. 【基础】《操作系统》学习笔记(B站王道考研)(1)
  6. Mac 运行sh文件,也就是传说中的shell脚本
  7. ppt复现CVPR顶会流程图
  8. python打包成.exe文件时出现“系统找不到指定路径”
  9. IOT物联网的九大通信协议
  10. java基础之自定义异常_繁星漫天_新浪博客