学习自动驾驶技术学习之路

Do you remember learning to ride a bicycle as a child? Excited and mildly anxious, you probably sat on a bicycle for the first time and pedalled while an adult hovered over you, prepared to catch you if you lost balance. After some wobbly attempts, you perhaps managed to balance for a few metres. Several hours in, you probably were zipping around the park on gravel and grass alike.

您还记得小时候学过骑自行车吗？兴奋而轻度的焦虑，您可能是第一次坐在自行车上，踩着踏板，而一个成年人却在您上方盘旋，准备在失去平衡时抓住您。经过一些摇摆不定的尝试后，您也许可以平衡几米。几个小时后，您可能正在沙砾和草地上在公园中穿梭。

The adult would have only given you brief tips along the way. You did not need a dense 3D map of the park nor a high fidelity laser on your head. You did not need a long list of rules to follow to be able to balance on the bicycle. The adult simply gave you a safe environment for you to learn how to map what you see to what you should do, to successfully ride a bicycle.

这个成年人只会给你一些简单的提示。您不需要公园的密集3D地图，也不需要头上的高保真激光。您不需要遵循很长的规则就能在自行车上保持平衡。成年人只是为您提供了一个安全的环境，让您学习如何将看到的东西映射到应该做的事情上，从而成功地骑自行车。

Today’s self-driving cars have been packed with a large array of sensors, and are told how to drive with a long list of carefully hand-engineered rules through slow development cycles. In this blogpost, we go back to basics, and let a car learn to follow a lane from scratch, with clever trial and error, much like how you learnt to ride a bicycle. Have a look at what we did:

如今的自动驾驶汽车已装有大量传感器，并被告知如何通过缓慢的开发周期使用一整套精心设计的手工规则驾驶。在这篇博文中，我们回到了基础知识，让汽车学会了从头开始沿车道行驶，并经过了巧妙的反复试验，就像您学会了如何骑自行车一样。看看我们做了什么：

In just 15–20 minutes, we were able to teach a car to follow a lane from scratch, only by using when the safety driver took over as training feedback.

在短短的15-20分钟内，我们只有在安全驾驶员接手训练反馈时才能够教导汽车从零开始行驶。

没有密集的3D地图。
没有手写规则。 (No dense 3D map.
No hand-written rules.)

This is the first example where an autonomous car has learnt online, getting better with every trial. So, how did we do it?

这是无人驾驶汽车在线学习的第一个例子，每次试验都会变得更好。那么，我们是怎么做到的呢？

We adapted a popular model-free deep reinforcement learning algorithm (deep deterministic policy gradients, DDPG) to solve the lane following task. Our model input was a single monocular camera image. Our system iterated through 3 processes: exploration, optimisation and evaluation.

我们采用了流行的无模型深度强化学习算法(深度确定性策略梯度，DDPG)来解决车道跟踪任务。我们的模型输入是单个单眼相机图像。我们的系统通过3个过程进行迭代：探索，优化和评估。

Our network architecture was a deep network with 4 convolutional layers and 3 fully connected layers with a total of just under 10k parameters. For comparison, state of the art image classification architectures have 10s of millions of parameters.

我们的网络架构是一个深层网络，具有4个卷积层和3个完全连接的层，总共有不到1万个参数。为了进行比较，现有技术的图像分类架构具有数以千万计的参数。

All processing was performed on one graphics processing unit (GPU) on-board the car.

所有处理均在车上的一个图形处理单元(GPU)上进行。

Working on a real robot in a dangerous real environment poses many new problems. In order to better understand the task at hand and find suitable model architectures and hyperparameters, we did a lot of testing in simulation.

在危险的真实环境中使用真实的机器人工作会带来许多新问题。为了更好地理解手头的任务并找到合适的模型架构和超参数，我们在仿真中进行了大量测试。

Above is an example of our lane following simulated environment shown from different angles. The algorithm only sees the driver perspective i.e. the image with the teal border. At every episode, we randomly generate a curved lane to follow, as well as the road texture and lane markings. The agent explores until it leaves the lane, when the episode terminates. Then the policy optimises based on collected data and we repeat.

上面是从不同角度显示模拟环境的车道示例。该算法只能看到驾驶员的视角，即带有蓝绿色边框的图像。在每个情节中，我们都会随机生成一条要遵循的弯曲车道以及道路纹理和车道标记。当情节终止时，特工进行探索直到离开小路。然后，该策略会根据收集到的数据进行优化，然后重复进行。

**Distance travelled by the car before a safety driver takeover against number of exploration episodes.安全驾驶员接管之前要经过的行驶距离，对应于探索事件的次数。**

We used simulated tests to try out different neural network architectures and hyperparameters until we found settings which consistently solved the task of lane following in very few training episodes i.e. with little data. For example, one of our findings was that training the convolutional layers using an auto-encoder reconstruction loss significantly improved stability and data-efficiency of training. See our full technical report for more details.

我们使用模拟测试来尝试不同的神经网络架构和超参数，直到我们发现设置可以在极少的训练情节(即数据很少)中始终解决巷道跟踪的任务。例如，我们的发现之一是使用自动编码器重建损失训练卷积层可显着提高训练的稳定性和数据效率。有关更多详细信息，请参见我们的完整技术报告。

我们方法的潜在影响是巨大的。 (The potential implications of our approach are huge.)

Imagine deploying a fleet of autonomous cars, with a driving algorithm which initially is 95% the quality of a human driver. Such a system would not be wobbly like the randomly initialised model in our demonstration video, but rather would be almost capable of dealing with traffic lights, roundabouts, intersections, etc. After a full day of driving and on-line improvement from human-safety driver take over, perhaps the system would improve to 96%. After a week, 98%. After a month, 99%. After a few months, the system may be super-human, having benefited from the feedback of many different safety drivers.

想象一下，部署一支无人驾驶汽车，其驾驶算法最初的质量是人类驾驶员的95％。这样的系统不会像我们的演示视频中的随机初始化模型那样摇摆不定，而是几乎能够处理交通信号灯，环形交叉路口，十字路口等。经过一整天的驾驶和人的安全在线改进后司机接管，也许该系统将提高到96％。一周后，达到98％。一个月后，达到了99％。几个月后，该系统可能是超人的，得益于许多不同安全驾驶员的反馈。

Today’s self-driving cars are stuck at good but not good enough performance levels. Here, we have provided evidence for the first viable framework to quickly improving driving algorithms from being mediocre to being roadworthy. The ability to quickly learn to solve tasks through clever trial and error is what has made humans incredibly versatile machines capable of evolution and survival. We learn through a mixture of imitation, and lots of trial and error for everything from riding a bicycle, to learning how to cook.

当今的自动驾驶汽车一直处于良好状态，但性能还不够好。在这里，我们为第一个可行的框架提供了证据，该框架可以快速地将驾驶算法从平庸改成适合公路行驶。快速学习通过巧妙的反复试验解决任务的能力使人类拥有了无与伦比的具有进化和生存能力的多功能机器。我们从模仿到混合学习，从骑自行车到学习烹饪的所有过程中反复尝试和错误学习。

DeepMind have shown us that deep reinforcement learning methods can lead to super-human performance in many games including Go, Chess and computer games, almost always outperforming any rule based system. We here show that a similar philosophy is also possible in the real world, and in particular, in autonomous vehicles. A crucial point to note is that DeepMind’s Atari playing algorithms required millions of trials to solve a task. It is remarkable that we consistently learnt to lane-follow in under 20 trials.

DeepMind向我们展示了深度强化学习方法可以在包括Go，Chess和计算机游戏在内的许多游戏中带来超人的表现，几乎总是胜过任何基于规则的系统。我们在这里表明，在现实世界中，尤其是在自动驾驶汽车中，类似的哲学思想也是可能的。需要注意的关键一点是，DeepMind的Atari播放算法需要数百万次尝试才能解决任务。值得注意的是，我们始终在20项试验中学会了跟踪。

我们学会了在20分钟内从头开始行驶。想像一下我们一天可以学到什么？ (We learnt to follow lanes from scratch in 20 minutes. Imagine what we could learn to do in a day…?)

Wayve has a philosophy that to build robotic intelligence we do not need massive models, fancy sensors and endless data. What we need is a clever training process that learns rapidly and efficiently, like in our video above. Hand-engineered approaches to the self-driving problem have reached an unsatisfactory glass ceiling in performance. Wayve is attempting to unlock autonomous driving capabilities with smarter machine learning.

Wayve的理念是建立机器人智能，我们不需要庞大的模型，精美的传感器和无尽的数据。我们需要的是一个聪明的培训过程，可以快速，高效地学习，就像上面的视频一样。手工设计的自动驾驶方法在性能上达到了令人不满意的玻璃天花板。 Wayve试图通过更智能的机器学习来释放自动驾驶功能。

Read our full scientific paper here, published at the International Conference on Robotics and Automation 2019.

We’re hiring! wayve.ai/careers/

我们正在招聘！ wayve.ai/职业/
Full research paper: arXiv paper link, published at the International Conference on Robotics and Automation 2019.

完整的研究论文： arXiv论文链接，在2019年机器人与自动化国际会议上发表。
Follow us: twitter / linkedin

关注我们： twitter / linkedin

Special thanks: We would like to thank StreetDrone for building us an awesome robotic vehicle, Admiral for insuring our vehicle trials and the Cambridge Polo Club for granting us access to their private land for our lane-following research.

特别鸣谢：我们要感谢StreetDrone为我们制造了一款出色的机器人车辆，感谢海军上将为我们的车辆测试提供了保证，并感谢Cambridge Polo Club允许我们进入其私人土地进行车道跟踪研究。

This story was originally published at https://wayve.ai/blog/learning-to-drive-in-a-day-with-reinforcement-learning on 28th June 2018.

该故事最初于2018年6月28日发布在https://wayve.ai/blog/learning-to-drive-in-a-day-with-reinforcement-learning中。

翻译自: https://medium.com/wayve/learning-to-drive-in-a-day-30f0b616dd27

学习自动驾驶技术学习之路

查看全文

http://www.taodudu.cc/news/show-1874156.html

python 姿势估计_Python中的实时头姿势估计
node-red 可视化_可视化和注意-第4部分
人工智能ai算法_AI算法比您想象的要脆弱得多
自然语言理解gpt_GPT-3：自然语言处理的创造潜力
ai中如何建立阴影_在投资管理中采用AI：公司如何成功建立
ibm watson_IBM Watson Assistant与Web聊天的集成
ai替代数据可视化_在药物发现中可视化AI初创公司
软件测试前景会被ai取代吗_软件测试人员可能很快会被AI程序取代
ansys电力变压器模型_最佳变压器模型的超参数优化
一年成为ai算法工程师_我作为一名数据科学研究员所学到的东西在一年内成为了AI领导者...
openai-gpt_为什么GPT-3感觉像是编程
医疗中的ai_医疗保健中自主AI的障碍
uber大数据_Uber创建了深度神经网络以为其他深度神经网络生成训练数据
http 响应消息解码_响应生成所需的解码策略
永久删除谷歌浏览器缩略图_“暮光之城”如何永久破坏了Google图片搜索
从头实现linux操作系统_从头开始实现您的第一个人工神经元
语音通话视频通话前端_无需互联网即可进行数十亿视频通话
优先体验重播matlab_如何为深度Q网络实施优先体验重播
人工智能ai以算法为基础_为公司采用人工智能做准备
ieee浮点数与常规浮点数_浮点数如何工作
模型压缩_模型压缩：
pytorch ocr_使用PyTorch解决CAPTCHA（不使用OCR）
pd4ml_您应该在本周（7月4日）阅读有趣的AI / ML文章
aws搭建深度学习gpu_选择合适的GPU进行AWS深度学习
证明神经网络的通用逼近定理_在您理解通用逼近定理之前，您不会理解神经网络。...
ai智能时代教育内容的改变_人工智能正在改变我们的评论方式
通用大数据架构-_通用做法-第4部分
香草 jboss 工具_使用Tensorflow创建香草神经网络
机器学习深度学习 ai_人工智能，机器学习和深度学习。真正的区别是什么？...
锁公平非公平_推荐引擎也需要公平！

学习自动驾驶技术学习之路_一天学习驾驶相关推荐

【尚观】Android游戏与应用开发最佳学习之路_转载来学习Android
Android游戏与应用开发最佳学习路线图为了帮助大家更好的学习Android,并快速入门特此我们为大家制定了以下学习路线图,希望能够帮助大家. 一. 路线图概括: 开博不到一周,不予上传 ...
学习云计算学哪种编程语言_您应该学习哪种编程语言？
学习云计算学哪种编程语言如果您想入门或在编程生涯中取得成功,学习一种新语言是一个明智的主意. 但是大量使用中的语言引发了一个问题:哪种编程语言是最好的一种? 为了回答这个问题,让我们从一个简化的问题 ...
自动驾驶技术如何升级？这份技能图谱为你指路
来源:AI前线自动驾驶技术在过去的一年里发展十分迅速,无论控制系统的升级还是高清地图的泛用,无不在推动自动驾驶的落地.再加上开源框架的进一步成熟,V2X 的落地实践,传感器融合更强等多重因素的影响下 ...
自动驾驶技术：前景、优势与挑战
一.自动驾驶技术概述自动驾驶技术是指利用计算机.传感器.GPS.通讯等技术实现车辆自动化驾驶的技术.自动驾驶技术涵盖的范围很广,从基础的辅助驾驶到完全自动驾驶,从单车辆到车队,从城市道路到高速公路, ...
自动驾驶技术为什么需要这些复杂数据？
有报道称,奥迪已于2019年底,正式取消了L3级自动驾驶研发项目,转向L2和L4级自动驾驶技术研发.就在一周前,奇瑞汽车发布了一张带有"鸿蒙"字样的新车海报又快速删除,外界猜测&q ...
C/C++学习之路_九:文件操作
C/C++学习之路_九:文件操作目录概述文件的顺序读写文件的随机读写 windows和linux文本获取文件状态删除文件.重命名文件文件缓冲区 1. 概述 1. 磁盘文件和设备文件磁盘 ...
C/C++学习之路_七: 内存管理
C/C++学习之路_七: 内存管理目录作用域内存布局内存分区代码分析 1. 作用域 C语言变量的作用域分为: 代码块作用域(代码块是{}之间的一段代码) 函数作用域文件作用域 1. 局部变量 ...
深度学习-84:自动驾驶技术(L0-L5级别)
深度学习-84:自动驾驶技术(L0-L5级别) 深度学习原理与实践(开源图书)-总目录, 建议收藏,告别碎片阅读! 人工智能在驾驶领域的应用最为深入.通过依靠人工智能.视觉计算.雷达.监控装置和全球定 ...
点云数据的类型主要分为_点云学习在自动驾驶中的研究概述
作者:蒋天园 Date:2020-04-17 来源:点云学习在自动驾驶中的研究概述自动驾驶公司的发展有关自动驾驶的研究最早可以追述到2004年的DARPA Grand Challenge和2007 ...
C/C++学习之路_八: 复合类型
C/C++学习之路_八: 复合类型目录结构体共用体(联合体) 枚举 typedef 1. 结构体 1. 概述有时我们需要将不同类型的数据组合成一个有机的整体,如:一个学生有学号/姓名/性别/年 ...

学习自动驾驶技术学习之路_一天学习驾驶

没有密集的3D地图。
没有手写规则。 (No dense 3D map.
No hand-written rules.)

我们方法的潜在影响是巨大的。 (The potential implications of our approach are huge.)

我们学会了在20分钟内从头开始行驶。想像一下我们一天可以学到什么？ (We learnt to follow lanes from scratch in 20 minutes. Imagine what we could learn to do in a day…?)

相关文章：

学习自动驾驶技术学习之路_一天学习驾驶相关推荐

最新文章

热门文章

学习自动驾驶技术 学习之路_一天学习驾驶

没有密集的3D地图。 没有手写规则。 (No dense 3D map. No hand-written rules.)

我们方法的潜在影响是巨大的。 (The potential implications of our approach are huge.)

我们学会了在20分钟内从头开始行驶。 想像一下我们一天可以学到什么？ (We learnt to follow lanes from scratch in 20 minutes. Imagine what we could learn to do in a day…?)

相关文章：

学习自动驾驶技术 学习之路_一天学习驾驶相关推荐

最新文章

热门文章

学习自动驾驶技术学习之路_一天学习驾驶

没有密集的3D地图。
没有手写规则。 (No dense 3D map.
No hand-written rules.)

我们学会了在20分钟内从头开始行驶。想像一下我们一天可以学到什么？ (We learnt to follow lanes from scratch in 20 minutes. Imagine what we could learn to do in a day…?)

学习自动驾驶技术学习之路_一天学习驾驶相关推荐