背景假设:

A fundamental aspect of future state prediction is that it is inherently stochastic, as agents cannot know each other’s motivations
【CC】这里假设我们是无法正真知道其他agnent的行为,跟RL计算纳什均衡思路不一样,后者假设每个agent都是理智的,都想达到自己的最优解

We seek a model of the future that can provide both (1) a weighted, parsimonious set of discrete trajectories that covers the space of likely outcomes and (2) a closed-form evaluation of the likelihood of any trajectory
【CC】期望模型输出能够满足 1)对预测的轨迹质量能够量化,进而选出更少的可能的未来(不然可能性会指数爆炸) 2)能够计算轨迹质量的解析解的似然值

主要思路:

MultiPath leverages a fifixed set of future state-sequence anchors that correspond to modes of the trajectory distribution.
【CC】以一组固定的(预定义)状态序列(预先定义好的轨迹)再对其数据分布进行预测;整体思想跟Detection中先固定一组template anchor box再对其进行回归差不都

Our method is inflfluenced heavily by the concept of predefifined anchors, which have a rich history in machine learning applications to handle multi-modal problems
【CC】如何去“预定义好”的anchors(轨迹),影响比较大

MultiPath predicts a discrete distribution over the anchors and, for each anchor, regresses offsets from anchor waypoints along with uncertainties, yielding a Gaussian mixture at each time step.
【CC】对每个agent(可以看成每辆车)基于固定的anchor(预先定义好的轨迹)去回归关于此轨迹的偏置, 先验的认为其服从GMM; 感觉像是先撒点,再对各个点进行回归

MultiPath model addresses these issues with a key insight: it employs a fifixed set of trajectory anchors as the basis of our modeling
【CC】为了规避未来时空可行性的指数爆炸问题,引入一组固定的anchors(预先定义好的轨迹)

assume control uncertainty is normally distributed at each future time step,parameterized such that the mean corresponds to a context-specifific offset from the anchor state,with the associated covariance capturing the unimodal aleatoric uncertainty
【CC】后面对control uncertainty有解释,就是对anchors offset的数据分布的描述,先验的认为其满足正态分布。其期望是上跟当前场景/上下文相关的量,其方差是一个单峰值的随机量

Our trajectory anchors are modes found in our training data in state-sequence space via unsupervised learning.
【CC】anchorss(预先定义好的轨迹)是从数据集中通过非监督的方式学出来的(K-meas)

Our complete model predicts a Gaussian mixture model (GMM) at each time step, with the mixture weights (intent distribution) fixed over time
【CC】GMM模型的参数是不随时间变化的;逻辑上合理,这么思考在遇到同样的场景下agent做出的预期动作都是一样的

Given such a parametric distribution model, we can directly evaluate the likelihood of any future trajectory and also have a simple way to obtain a compact, diverse weighted set of trajectory samples: the MAP sample from each anchor-intent
【CC】如果模型给定,就可以直接计算出轨迹的似然, 就可以直接拿MAP(最大后验估计)作为其量化指标;这里就回答了最开始如何量化评价输出轨迹的质量问题

形式化描述:

Given observations x in the form of past trajectories of all agents in a scene and possibly additional contextual information. MultiPath seeks to provide (1) a parametric distribution over future trajectories s: p(s|x), and (2) a compact weighted set of explicit trajectories which summarizes this distribution well.
【CC】给定观测量X,包含所有agent历史轨迹和当前环境(后面有input如何结构化的描述). 本模型的目标 1)能够给出未来轨迹分布的参数p(s|x)(相当于拿模型拟合数据分布) 2)能够量化的评估预测轨迹的质量(前面已近说过, 用MAP)

Let t denote a discrete time step, and let st denote the state of an agent at time t, the future trajectory s = [s1, . . . , sT ] is a sequence of states from t = 1 to a fixed time horizon T
【CC】st表征t时刻agent的状态

We factorize the notion of uncertainty into independent quantities. Intent uncertainty models uncertainty about the agents’ latent coarse-scale intent or desired goal . control uncertainty,which describes the uncertainty over the sequence of states the agent will follow to satisfy its intent. Both intent and control uncertainty depend on the past observations of static and dynamic world context x
【CC】将不确定性分为两类(即认为预测有两个随机变量,我们的目标就是去拟合这两个随机变量的分布):Intent uncertainty/control uncertainty,前者是粗粒度的不确定(比如,左转/跟随/右转),后者是服从于前者的状态分布的偏差,其也是一个随机变量;认为两个随机变量都依赖于输入x. 个人理解后者应该是基于前者的一个分布,而不可能认为两者是独立的分布,本文应该是简单处理的

We model a discrete set of intents as a set of K anchor trajectories A = {ak}K k=1, where each anchor trajectory is a sequence of states: ak = [ak1, . . . , akT ]. We model uncertainty over this discrete set of intents with a softmax distribution:

where fk(x) : Rd(x) → R is the output of a deep neural network.
【CC】集合A表征所有预定的轨迹,ak表示第k条轨迹;基于当前输入x观察到第k条轨迹ak的“概率”,这里使用了softmax作为其“概率的度量”;进一步,这里的π(ak|x)就是上面Intent uncertainty的度量,即选中某条预定义轨迹的概率的度量;注意,这里fk(x)是NN学习的一个函数, 即,认为Intent uncertainty只依赖于输入x, 另我们并不知道fk的表达形式,通过NN网络学习获得;再观察一下,这里的f有K类,即认为这种关系有K类,即每种预定义轨迹针对输入x有不同的映射关系

We make the simplifying assumption that uncertainty is unimodal given intent, and model control uncertainty as a Gaussian distribution dependent on each waypoint state of an anchor trajectory

The Gaussian parameters µk tand Σkt are directly predicted by our model as a function of x for each time-step of each anchor trajectory akt . Note in the Gaussian distribution mean, akt + µkt, the µkt represents a scene-specific offset from the anchor state akt; it can be thought of as modeling a scene-specific residual or error term on top of the prior anchor distribution
【CC】假设control uncertainty直接服从高斯分布. 这么理解,一方面是数学处理上简单,另一方面已经确定了粗粒度的形式轨迹,现在是基于轨迹进一步的精细化,一般来说不会偏离预期轨迹(即其期望 akt + µkt)太远. 这里的µk/Σkt认为只与x相关而与ak无关,回顾前文,即认为control uncertainty是与Intent uncertainty无关的量,只与x相关;这里就有上面的疑问了,直觉上两者应该是有关联的! 如果把µk记为µk(x, ak)认为µk需要有两个入参来确定,而实际µ本身有k类,是做了分类的,所以感觉也没啥必要;这里的φ(skt |ak, x)就是control uncertainty的概率,够直接了

The time-step distributions are assumed to be conditionally independent given an anchor, i.e., we write φ(st|·) instead of φ(st|·, s1:t−1).
This modeling assumption allows us to predict for all time steps jointly with a single inference pass, making our model simple to train and efficient to evaluate. If desired, it is straightforward to add a conditional next-time-step dependency to our model, using a
recurrent structure (RNN)
【CC】假设在时域上是独立分布的。说人话是,t时刻的预测值跟t-1时刻没啥关系,其数据分布是独立的,直觉上就不太对;这样做有好处,就是够快,一步计算完成. 如果感觉不对劲, 可以直接改为RNN的方式,通过[1, t-1]的数据去预测t时刻

To obtain a distribution over the entire state space, we marginalize over agent intent:

Note that this yields a Gaussian Mixture Model distribution, with mixture weights fixed over all time steps.
【CC】将Intent uncertainty/control uncertainty合起来使用用联合概率公式, 就是一个标准GMM的形式

主流程:


Figure 1: MultiPath estimates the distribution over future trajectories per agent in a scene, as follows: 1) Based on a top-down scene representation, the Scene CNN extracts mid-level features that encode the state of individual agents and their interactions. 2) For each agent in the scene, we crop an agent-centric view of the mid-level feature representation and predict the probabilities over the fixed set of K predefined anchor trajectories. 3) For each anchor, the model regresses offsets from the anchor states and uncertainty distributions for each future time step.

  • Input representation

We follow other recent approaches [2, 11] and represent a history of dynamic and static scene context as a 3-dimensional array of data rendered from a top-down ortho-graphic perspective. The first two dimensions represent spatial locations in the top-down image. The channels in the depth dimension hold static and time-varying (dynamic) content of a fixed number of previous time steps.
【CC】整个的输入表达参考了 IntentNet/ ChauffeurNet的方式;三维数组,头两维为空间信息,第三维存固定帧数的储静态/动态的信息;另,是不是可以通过vectorNet自己学习表达方式?

  • Obtaining anchor trajectories

As noted by [6, 5], directly learning a mixture suffers from issues of mode collapse
【CC】对Ak本身放到NN里面自己学习的话容易造成模型坍塌,所以采用预处理的方式
we used the k-means algorithm as a simple approximation to obtain A with the following squared distance between trajectories:

where Mu, Mv are affifine transformation matrices which put trajectories into a canonical rotation- and translation-invariant agent-centric coordinate frame.
【CC】这里直接采用K-meas的方式,距离的定义就是一个标准的二范数

  • Learning

We train our model via imitation learning by fifitting our parameters to maximize the log-likelihood of recorded driving trajectories.
Let our data be of the form {(xm,ˆsm)} M m =1. We learn to predict distribution parameters π(ak|x), µ(x)kt and Σ(x)kt as outputs of a deep neural network parameterized by weights θ with the following negative log-likelihood loss built upon Equation 2:

The notation II(·) is the indicator function, and kˆm is the index of the anchor most closely matching the groundtruth trajectory ˆsm, measured as ` 2-norm distance in state-sequence space. This hard-assignment of groundtruth anchors sidesteps the intractability of direct GMM likelihood fitting, avoids resorting to an expectation-maximization procedure
【CC】老套路,所谓的imitation learning只是将真值与预测值做最小loss; 这里的loss function也很直接,就是p(s|x)的NLL;只不过这里多了一个维度m,即所有的样本; 这里关于预测值与那条ak最近的指派函数也很暴力,就是一个二范式距离,本文说的是为了避免EM过程,后面的实现者可以自己去调整,那是不是可以去学习一个距离呢?比如把 II(·)通过softmax软化,里面的f让NN自己学会更好?

Inferring a diverse weighted set of test-time trajectories
we take the MAP trajectory estimates from each of our K anchor modes, and consider the distribution over anchors π(ak|x) the sample weights
【CC】前面已经讲过了

  • Neural network details

We opt to use ResNet based architectures [24] for this scene-level feature extractor. The second phase extracts patches of size 11×11 centered on agents locations in this feature map.The extracted features are also rotated to an agent-centric coordinate system via a differentiable bilinear warping. The second agent-centric network then operates on a per-agent basis. It contains 4 convolutional layers with kernel size 3 and 8 or 16 depth channels
【CC】第一层使用各种ResNet作为表达学习;抠出一个以agent为中心1111的特征图,以anget的朝向为坐标起点进行变换,对每个agent都做预测,塞到一个4层的conv; 这里注意两个点: 其一,1111这个值比较tricky,相当于抠了agent周围的一块区域,但从人的直觉来看,会更关注前方类似梯形的区域; 其二,这里的backbone不能降采样(或者要返回去),不然坐标系对不齐

It produces K×T×5 parameters describing bivariate Gaussian’s per time step per anchor (parameterized by µx, µy, log σx, log σy and ρ; the last 3 parameters defifine the 2 × 2 covariance matrix Σxy in the agent-centric x, y-coordinate space), as well as K softmax logits to represent π(a|x)
【CC】吐出K×T×5的张量,K表示预定义的轨迹,T表示要预测多长时间; µx, µy为期望,Σxy即为方差;感觉这里的π(a|x)写错了,应该是φ(skt |ak, x)中的待学习量

补充知识:
Conditional variational autoencoders (CVAEs)

MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction相关推荐

  1. 轨迹预测经典论文之五:MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

    前言 文章地址:MultiPath 2019年的一篇论文,也比较久远了,来自waymo,也是这个领域的一篇经典论文.这篇论文当时看的时候自己其实没有完全理解,似懂非懂,现在再回来回顾一下.其做法现在来 ...

  2. 文献阅读报告:MutiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

    文献阅读报告:MutiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction 文章目录 文献 ...

  3. 【论文笔记】MultiPath: Multiple Probabilistic Anchor TrajectoryHypotheses for Behavior Prediction

    摘要:人类行为预测是运动规划中一个困难而又关键的任务.这在很大程度上具有挑战性,因为在自动驾驶等现实世界的领域中,可能的结果具有高度的不确定性和多模态集合.除了单一MAP轨迹预测[1,2],获得未来的 ...

  4. 【论文笔记】Learning from Multiple Cities: A Meta-Learning Approach for Spatial-Temporal Prediction

    论文学习心得 前言 应用场景 基础概念 什么是元学习 元学习的分类 MAML 基本概念理解 MAML中的Task MAML算法详解 摘要 本文贡献 利用来自多个城市的信息来提高迁移的稳定性 元学习时空 ...

  5. WPF 如何在静态资源定义字体大小

    默认的 WPF 的字体大小的单位是像素,如果想要将字体大小使用 pt 点表示,写在 xaml 里面是直接添加 pt 后缀.但是此时如果在静态资源尝试定义的时候写上了 pt 将会在运行的时候提示无法转换 ...

  6. Aurora的自动驾驶技术介绍

    Aurora最近上市,市值超过一百亿美金.其自动驾驶工程师团队超过1400人:曾收购一个激光雷达公司Blackmore,采用的FMCW而不是TOF技术,还收购一个激光雷达芯片公司Ours,也是基于FM ...

  7. 深度解析,教你如何打造自动驾驶的数据闭环

    最近自动驾驶和数据闭环结合在一起,原因是自动驾驶工程已经被认可是一个解决数据分布"长尾问题"的任务,时而出现的corner case(极端情况)是对数据驱动的算法模型进行升级的来源 ...

  8. 中国科学院大学计算机博士,毛文吉-中国科学院大学-UCAS

    领域顶级会议论文: [1] Z. Zeng, S. Wang, N. Xu and W. Mao. PAN: Prototype-based Adaptive Network for Robust C ...

  9. TNT: Target-driveN Trajectory Prediction

    概要: Our key insight is that for prediction within a moderate time horizon, the future modes can be e ...

最新文章

  1. 如何删除Cookie?
  2. iOS获取最上层控制器
  3. 有符号数据的符号位扩展
  4. 共享打印机,解决驱动检测失败无法连接共享打印机问题
  5. 32.Docker安装MongoDb
  6. unity game和scene效果不一样_KTV装修设计:如何让消费者体验到不一样的KTV娱乐效果...
  7. python控制台输入字符串作为参数_Python-如何将字符串传递到subprocess.Popen(使用stdin参数)?...
  8. plotly包安装_Plotly(一)安装指南
  9. CVE-2010-2883Adobe Reader和Acrobat CoolType.dll栈缓冲区溢出漏洞分析
  10. Python 的 MySQL 模块
  11. FPGA使用ISERDES2过采样
  12. n6 tenda 固件_腾达 Tenda N6 刷 TTDW 说明
  13. MATLAB中四阶单位矩阵,matlab-线性代数 创建 N阶数量矩阵 N阶单位矩阵 对角矩阵 范德蒙矩阵 等差数列......
  14. 【渝粤教育】广东开放大学 法律职业伦理 形成性考核 (40)
  15. Qt开发环境下载和安装
  16. html5 figure和figcaption
  17. 寄云一站式平台支持起医疗大数据的构建与运营
  18. 代码从svn到工作空间,Myeclipse中java项目转成Web项目
  19. html 图片放大缩小轮播,jQuery左右滚动支持图片放大缩略图图片轮播代码分享
  20. 金融机构业务连续性管理

热门文章

  1. 分享101个PHP源码,总有一款适合您
  2. 4g运行内存手机还能用多久_手机4G和6G运行内存有多大区别?看完秒懂
  3. vue练手小项目--眼镜在线试戴
  4. JavaScript中查找关键词
  5. web安全—万能密码登录(跳过密码验证)
  6. 中文转换为日文的几点注意事项
  7. K8s学习之yum安装
  8. 全链通“铭镜”农产品溯源平台介绍分析
  9. 【转】SD Card - UHS-I UHS Speed Class 1
  10. 关于64位windows7系统下安装IE11失败(9c59)的解决过程记录