毫米波目标检测论文阅读笔记 | Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar

毫米波目标检测论文 | Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar

Jie Bai, Lianqing Zheng , Sen Li , Bin Tan, Sihan Chen and Libo Huang
同济大学
Sensors

原始论文地址: https://www.mdpi.com/1424-8220/21/11/3854

本文为毫米波目标检测论文 Radar Transformer: An Object Classification Network Based on 4D MMW Imaging Radar的阅读笔记, 原载于R.X. NLOS的博客
笔记难免存在问题，欢迎联系 981591477@qq.com 指正。

内容在CSDN、知乎和微信公众号同步更新

CSDN博客
知乎
微信公众号

Abstract

millimeter-wave (MMW) 4D radar的重要性
- essential in automomous vehicles
  - 因为its robustness in all weather conditions
- 但传统automotive radar分辨率低
  - 难以完成object Classification任务
- 因此，有了4D imaging radar
  - azimuth and elevation 分辨率高 + 包括了Doppler信息
  - 能够产生高质量3D点云 + speed

本文工作
- 提出了Radar Transformer
  - 用于雷达点云目标检测
- 核心：注意力机制
  - 包括vector attention和scalar attention
  - to make full use of 空间信息、Doppler信息、点云强度信息，实现deep fusion
- 实验结果：
  - 采集了数据集，完成了标注
  - 识别准确率94.9%
  - The prosposed method适合radar点云识别任务

Introduction

p1: 为什么要研究基于MMW点云的object detection: 自动驾驶很重要 + MMW 4D sensor具有优势 + 现有研究较少

近年来，自动驾驶developed rapidly
Auto. vehicles包括多个模块：
- 感知 perception (is siginificant)
- (预测)、规划 path planing
- 控制 decision control
4D MMW radar在perception中很重要：
- cameras and LiDAR对不同天气、光照的鲁棒性差
- 传统MMW radar分辨率低、缺少高度信息, 仅起到最终警告作用
- 4D MMW radar 能够形成点云、包含Doppler信息、对天气鲁棒，但相关算法仍in the initial stage

介绍4D毫米波雷达的相关文章[5-7]：
[5]. Brisken, S.; Ruf, F.; Höhne, F. Recent evolution of automotive imaging radar and its information content. IET Radar Sonar Navig. 2018.
[6]. Li, G.; Sit, Y.L.; Manchala, S.; Kettner, T.; Ossowska, A.; Krupinski, K.; Sturm, C.; Lubbert, U. Novel 4D 79 GHz Radar Concept for Object Detection and Active Safety Applications. In Proceedings of the 2019 12th German Microwave Conference (GeMiC), Stuttgart, Germany, 2019.
[7]. Stolz, M.;Wolf, M.; Meinl, F.; Kunert, M.; Menzel,W. A New Antenna Array and Signal Processing Concept for an Automotive 4D Radar. In Proceedings of the 2018 15th European Radar Conference (EuRAD), Spain, 2018.

p2: Related Works on 4D imaging radar (Hardware + 相应Detection Alg.)

imaging radar Hardware:
- [6]: 4D radar operates at 79HZ, 使用FMCW, bandwidth 1.6GHz
  - 利用MIMO和BPSK transmitting signals to obtain elevation info.
  - 能够用来完成简单的检测任务(如road edge height estimation)
- [7]: used a new antenna array – can measure angles in azimuth and elevation
  - 通过combining them to estimate the direction of arrival
- [8]: exploited high-resolution MMW radar to obtain radar point-cloud represen.
  - then using GMM for point-cloud segmentation (交通场景)

imaging radar Harware --> Algorithm
- [9] used planar phased FMCW radar产生3D点云，用于detecting human motions
  - 通过calculating the direction of arrival获得3D点云
  - CNN for classification: acc 80%
- [10] made a dataset including radar, LiDAR and Camera
  - radar: Astyx 6455 HiRes[5], 高分辨率imaging radar
  - most of the objects are cars --> 难以应用

p3: Related Works on Point-Cloud Object Detection

Deep learning have made impressive achievements
- including in data structures like point clouds
- 点云特性：permutation and orientation invariance
  - 传统CNN不适合such irregularly strucured data
- MVCNN: from different views
- 3DMV: integrates RGB + geometric features
- Voxel-based methods: Vox Net; 3DCNN;
- GCN: DGCNN, EdgeConv;
- Point-wise network: PointNet series
  - often hierarchically extract/combined features

p4: Related Works on Transformers and its applications in Point-Cloud Object Detection

Transformers has dominated in NLP:
- BERT, Transformer-XL, BioBERT, etc.
- Transformers has also benn extented to CV
The core of Transformer is the self-attention module
- this mechanism is well suited to dealing with data like point clouds
- PCT [31] applies transformer to point clouds 并取得了good results

p5: Introduces the proposed Network in this paper

Transfomer架构
- 使用注意力机制在multi-level上fuse局部和全局features
完成MMW radar下的object Classification任务
achieved the highest accuracy

p6: Point-by-point Summary of Contributions

Generated an MMW imaging radar classification dataset
- collected dynamic and static road participants
  - persons, cyclists, motorcyclists, cars and buses
- manually annotated them
- A total of 10,000 frames of data
- each data containing spatial (XYZ) and Doppler velocity (V) information
Proposed a radar point-cloud classification network beased on Transformer
- 输入为5维点云信息 -> 经过embedding, hierarchical feature extraction, multilevel deep fusion和scalar+vector attention后得到deep features

Experiments show that the proposed network exhibits SOTA performance

p7: Organization of the remainder

Section 2: Describe the Network
Section 3: Experiments
Section 4: Discussion
Section 5: Conclusion

Methodology

The Network architecture

本次没有去读详细的网络结构
网络结构大体上如右图所示
- input: Radar Point Cloud
- output: 分类结果
  - Note that 不是检测结果

Results

Dataset

Existed public datasets containing radar information:
- 要么只包括2D radar数据
  - Nuscence
  - CRUW
  - Oxford Radar Robotcar Dataset
- 要么质量差 (frame太少+ unbalanced classes)
  - Astyx Dataset
  - only 500 frames + Most of which are cars
因此this paper collected and created own dataset:
- contain 10000 frames
- five classes
  - persons, cyclists, motorcyclists, cars, buses

采集设置
- Radar: TI imaging radar TIDEP01012
  - composed of four AWR2243 cascaded radar boards
  - 跨多个cascaded AWR2243制作的MIMO antenna能够maximize the number of active antennas
  - enabling substantially improved angular resolution
照片和Radar参数如下

Radar Signal Processing

Processing流程图如下

The imaging radar development board was designed in a cascade of four devices.

Radar Signal Processing具体处理过程包括：

Step 0: 预处理 (本步骤未在上页图中表明)
- 1. Antenna calibration
  - 防止由于芯片和天线耦合等因素的差异导致主设备与其余三个从设备之间的频率、相位和幅度不匹配
  - 校准方法：
    - 使用 TI 官方校准矩阵通过一次性视轴校准
- 1. chirp configuration parameters
  - set to those in MIMO mode
Step 1: Read and parsed the ADC data
Step 2: Perform frequency and phase calibrations
Step 3: 将校准后的数据经过 range FFT 和多普勒FFT
Step 4: 进行非相干集成 (non-coherent integration)
- Since there were multiple channels

Step 5: 执行恒定虚警率 constant false-alarm rate (CFAR) 算法
- To filter out noise and interference
Step 6: Performing maximum velocity extension and phase compensation
- 最大速度扩展和相位补偿
Step 7: Azimuth and elevation angle estimation
- 方位角和仰角估计
最终获得点云

Data acquisition and Production

所采集的数据: 包括Static scenes和Dynamic scenes

Static Scenes:
- collected data at a distance interval of $1 m$ and an angle interval of $45°45 \degree$ $45 °$
  - To fully represent the distribution of the object point cloud
- collected different types of samples for each class of objects
  - To make the object classes more representative
Dynamic Data:
- collected them on campus roads and experimental sites
- different objects moved at different speeds and angles

每一帧的format 和坐标变换

The formate of the reflected points:
- $pi={ri,θi,φi,vi,si}p_{i}=\left\{r_{i}, \theta_{i}, \varphi_{i}, v_{i}, s_{i}\right\}$
- $r_{i}$ : range; $θi\theta_{i}$ : azimuth angle; $φi\varphi_{i}$ : elevation angle; $v_{i}$ : radial velocity; $s_{i}$ : signal-to-noise ratio
- 坐标变换以进行后续analysis、visualization和labeling:
  - spherical coordinate to Cartesian Coordinate system
  - $[xiyizi]=ri[cos⁡(θi)cos⁡(φi)sin⁡(θi)cos⁡(φi)sin⁡(φi)]\left[\begin{array}{c}x_{i} \\ y_{i} \\ z_{i}\end{array}\right]=r_{i}\left[\begin{array}{c}\cos \left(\theta_{i}\right) \cos \left(\varphi_{i}\right) \\ \sin \left(\theta_{i}\right) \cos \left(\varphi_{i}\right) \\ \sin \left(\varphi_{i}\right)\end{array}\right]$

数据标注方法

1: clustered the obtained point cloud for each frame
- to get the approximate 3D bounding box
2: labeled it with the information recorded by camera
3: final dataset contained 10,000 frames (static + dynamix)

Visualization of some experimental data

Experimental Details

Details related to Dataset Processing (划分 + Normalization)

Totally 10,000 frames of data
- including 5 classes
- classes are proportionally balanced
  - each class had 2000 frames of data
Train : Test = 7:3
The information in each point:
- XYZ spatial information + Doppler velocity V + intensity information S
Normalization:
- For each point $pi={xi,yi,zi,vi,si}p_{i}=\left\{x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right\}$ in one frame
- Norm:

$(xi,yi,zi,vi,si)=(xi,yi,zi,vi,si)max⁡(xi2+yi2+zi2+vi2+si2),i=1,2,…N\left(x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right)=\frac{\left(x_{i}, y_{i}, z_{i}, v_{i}, s_{i}\right)}{\max \left(\sqrt{x_{i}^{2}+y_{i}^{2}+z_{i}^{2}+v_{i}^{2}+s_{i}^{2}}\right)}, i=1,2, \ldots N$

)(xi,yi,zi,vi,si),i=1,2,…N

Details related to network Training

128 input points
- 缺失则补0，过多则采样
Using Pytorch, SGD optimizer, momentum + weight decay
Learning rate: 0.001
- decayed by 30% every 20 epoches
loss function:
- softmax cross-encryption
training
- With data augmentation
- No data augmentation in testing
200 epoches, batchsize = 24; 1080 Ti

Experimental Results

OA: overall accuracy