Lidar-RCNN：基于稀疏点云的3D目标检测网络（CVPR2021）

作者丨柒柒@知乎

来源丨https://zhuanlan.zhihu.com/p/390322842

编辑丨3D视觉工坊

论文标题：LiDAR R-CNN: An Efficient and Universal 3D Object Detector
作者单位：TuSimple
代码：https://github.com/tusimple/LiDAR_RCNN
论文：https://arxiv.org/pdf/2103.15297.pdf

一句话读论文：

解决点云稀疏性导致的proposal尺寸歧义问题。

作者的观点：

1. 尺寸歧义问题（Size Ambiguity Problem）。尺寸歧义通常的表现形式是：various proposals → same features。也就是说，不同的proposal囊括了相同的points，导致网络依据point feature无法回归到准确的位置。作者认为，产生这个问题的主要原因是：RCNN网络并不知道proposal的大小，也就是RCNN对proposal的大小没有感知能力。我的理解是：从优化的角度讲，这意味着points对应的proposal优解有无数个。

The problem is related to the property of point cloud. Unlike 2D images, in which each position is filled with RGB values, the point cloud is sparse and irregular. Nevertheless, if we directly use a point-based method, in which we only consider existing points in network, we would ignore the scale information indicated by the spacing.

2. 所以，作者认为讨论的核心点在于：我们应该怎么告知RCNN网络proposal的大小。

Different from 2D RCNN, we should equip our LiDAR-RCNN with the ability to perceive the spacing and the size of proposals.

具体地，作者尝试了五种已有的方法，如上图：

a）归一化（Normalization），我的理解是归一化的核心在于告知网络“proposal 的边界在哪里”。通过这个边界信息，RCNN网络就知道预测结果是否是准确的。举个例子，如果网络预测的proposal偏大，那么其边界上必然不会存在points，所有points坐标会等比减小。

The simplest solution to the ambiguity problem is to normalize the point coordinates by the proposal size. If the proposal is enlarged, the point coordinates will be smaller and the size target will be higher. Consequently, the model could be aware of the size of the proposal.

但是这种方法存在的问题是：抹去了object的尺寸信息。

When the R-CNN model is applied on multiple categories, it totally ignores the scale difference off different categories. The size normalization makes it more difficult for the model to distinguish different categories.

b）锚点（Anchor），这个做法显然是不行的。anchor无法为object提供类别信息，也就是 anchor → same features but different categories → confusion。

Since the network is still not aware of the boundary of the proposal, this method does not solve the classification ambiguity. Furthermore, when there are few point in the boxes, objects from different categories will also have similar features. In this case, there will be a regression ambiguity because different categories have different anchors, which corresponds to different regression targets.

c）体素（Voxelization），我的理解，体素和anchor其实没什么去区别，最多算是一个“小”anchor？从这个角度讲的话，体素也有和anchor一样的缺点，那就是无法提供明确的边界信息，一方面confuse classification，另一方面hard to regress。

The points in it are still not aware of the voxel boundary. The model only have coarse information about the proposal size at voxel level, but not the point level. As a result, this solution alleviates, but not fully solves the ambiguity problem.

从以上失败的方法中，作者总结“解决尺寸问题的关键是要让RCNN网络知道proposal的尺寸信息，同时要保证物体形状”。

Revisiting the previous solutions, we can conclude that the key is to provide the size information to network, while preserving the shape of the object.

因此，作者提出了另外两种方法：

d）边界偏移（Boundary offset），告知网络边界在哪就很容易判断proposal尺寸了，作者是在“每个点后面串联上它们到上下左右前后6条边的归一化距离”。

To provide the proposal boundary information, a simple way is to append the boundary offset to the point features. From the offset, the network will be able to know how far the points is from the proposal's boundary, which should solve the ambiguity problem.

e）虚点（Virtual points）。2D RGB和3D points一个显著区别是：2D 是dense pixels，也就意味着bounding box内部除了前景点还有背景点，那么不准确的box必然包含更多的背景点，网络很容易根据特征进行区分。而3D points是稀疏的，除了前景点外，是zero points也就是不包含任何的信息，所以无法通过背景点来识别框的好坏。因此，作者尝试对point cloud进行填充，让这些虚点起到图像中背景的作用。

Since the R-CNN model ignores the spacing, another natural idea is to augment the spacing with "virtual points" to indicate the existence of them. Here we generate the grid points which are evenly distributed in the proposal. Through the virtual points, the RCNN module will have the ability to perceive the size of the proposal since the spacing is no longer under represented.

实验结果：

Waymo Open Dataset

照惯例，还是没啥好说的。

读完后的想法：

第一，作者在文中反复提到，单类检测没有问题，多类才有问题，这个问题产生的原因可能是单类物体尺寸趋于一致，而多类物体尺寸会存在较大差异。

第二，其实proposal尺寸歧义的核心是边界问题。

第三，大佬讲得非常棒王峰：LiDAR R-CNN：一种快速、通用的二阶段3D检测器（https://zhuanlan.zhihu.com/p/359800738）可以学习一下。

本文仅做学术分享，如有侵权，请联系删文。

下载1

在「3D视觉工坊」公众号后台回复：3D视觉，即可下载 3D视觉相关资料干货，涉及相机标定、三维重建、立体视觉、SLAM、深度学习、点云后处理、多视图几何等方向。

下载2

在「3D视觉工坊」公众号后台回复：3D视觉github资源汇总，即可下载包括结构光、标定源码、缺陷检测源码、深度估计与深度补全源码、点云处理相关源码、立体匹配源码、单目、双目3D检测、基于点云的3D检测、6D姿态估计源码汇总等。

下载3

在「3D视觉工坊」公众号后台回复：相机标定，即可下载独家相机标定学习课件与视频网址；后台回复：立体匹配，即可下载独家立体匹配学习课件与视频网址。

重磅！3DCVer-学术论文写作投稿 交流群已成立

扫码添加小助手微信，可申请加入3D视觉工坊-学术论文写作与投稿微信交流群，旨在交流顶会、顶刊、SCI、EI等写作与投稿事宜。

同时也可申请加入我们的细分方向交流群，目前主要有3D视觉、CV&深度学习、SLAM、三维重建、点云后处理、自动驾驶、多传感器融合、CV入门、三维测量、VR/AR、3D人脸识别、医疗影像、缺陷检测、行人重识别、目标跟踪、视觉产品落地、视觉竞赛、车牌识别、硬件选型、学术交流、求职交流、ORB-SLAM系列源码交流、深度估计等微信群。

一定要备注：研究方向+学校/公司+昵称，例如：”3D视觉 + 上海交大 + 静静“。请按照格式备注，可快速被通过且邀请进群。原创投稿也请联系。

▲长按加微信群或投稿

▲长按关注公众号

3D视觉从入门到精通知识星球：针对3D视觉领域的视频课程（三维重建系列、三维点云系列、结构光系列、手眼标定、相机标定、orb-slam3等视频课程）、知识点汇总、入门进阶学习路线、最新paper分享、疑问解答五个方面进行深耕，更有各类大厂的算法工程人员进行技术指导。与此同时，星球将联合知名企业发布3D视觉相关算法开发岗位以及项目对接信息，打造成集技术与就业为一体的铁杆粉丝聚集区，近2000星球成员为创造更好的AI世界共同进步，知识星球入口：

学习3D视觉核心技术，扫描查看介绍，3天内无条件退款

圈里有高质量教程资料、可答疑解惑、助你高效解决问题

觉得有用，麻烦给个赞和在看~