论文地址：https://www.mdpi.com/1424-8220/18/10/3337
代码地址：https://github.com/traveller59/second.pytorch
OpenPCDdet：https://github.com/open-mmlab/OpenPCDet

简介

大多数现有的 3D 对象检测方法将点云数据转换为 2D 表示，例如 BEV 和前视图表示，因此丢失了原始点云中包含的大部分空间信息。在本文中介绍了一种新的角度损失回归方法，成功地将稀疏卷积应用于基于激光雷达的网络中，并提出了一种充分利用点云优势的数据增强新方法。在 KITTI 数据集上的实验表明，所提出的网络优于其他最先进的方法。

主要贡献如下：

在基于 LiDAR 的目标检测中应用稀疏卷积，从而大大提高了训练和推理的速度。
提出了一种改进的稀疏卷积方法，使其运行得更快。
提出了一种新颖的角度损失回归方法，该方法展示了比其他方法更好的方向回归性能。
为仅限 LiDAR 的学习问题引入了一种新的数据增强方法，大大提高了收敛速度和性能。

网络结构

SECOND 检测器的结构。检测器将原始点云作为输入，将其转换为体素特征和坐标，并应用两个 VFE（体素特征编码）层和一个线性层。然后应用稀疏 CNN。最后，RPN 生成检测。

SECOND与VoxelNet网络结构异同

注：VoxelNet中的点云特征提取VFE模块在作者最新的实现中已经被替换；因为原来的VFE操作速度太慢，并且对显存不友好。具体可以查看这个issue：

点云分组

将点云变成Voxel的方法和VoxelNet中一样。首先根据指定的体素数量限制预先分配缓冲区；然后，遍历点云并将这些点分配给它们相关的体素，并保存体素坐标和每个体素的点数，在迭代过程中根据哈希表检查体素的存在，如果与某个点相关的体素还不存在，在哈希表中设置相应的值；否则，将体素的数量加一，一旦体素的数量达到指定的限制，迭代过程将停止。

经过对点云数据进行Grouping操作后得到三份数据：

得到所有的voxel shape为(N, 5 , 4) ; 5为每个voxel最大的点数，4为每个point的数据（x，y，z，reflect intensity）
得到每个voxel的位置坐标 shape（N， 3）
得到每个voxel中有多少个非空点 shape （N）

对于汽车和其他物体， zyxz y xzyx轴点云的范围为[0, -40, -3, 70.4, 40, 1]

对于行人和骑车人检测，zyxz y xzyx轴点云的范围为[0, -20, -3,70.4, 20, 1]

对于我们较小的模型，zyxz y xzyx轴点云的范围为[0, -32, -3,52.8, 32, 1]

裁剪后的区域需要根据体素大小进行微调，以确保生成的特征图的大小可以在后续网络中正确下采样。体素大小vD=0.4×vH=0.2×vW=0.2v_D = 0.4 × v_H = 0.2 × v_W= 0.2vD=0.4×vH=0.2×vW=0.2 。汽车检测的每个voxel的最大点数设置为T = 35，行人和骑自行车的人设置为 T = 45，因为行人和骑自行车的人相对较小，体素特征提取需要更多的点。

    def transform_points_to_voxels(self, data_dict=None, config=None):"""将点云转换为voxel,调用spconv的VoxelGeneratorV2"""if data_dict is None:grid_size = (self.point_cloud_range[3:6] - self.point_cloud_range[0:3]) / np.array(config.VOXEL_SIZE)self.grid_size = np.round(grid_size).astype(np.int64)self.voxel_size = config.VOXEL_SIZE# just bind the config, we will create the VoxelGeneratorWrapper later,# to avoid pickling issues in multiprocess spawnreturn partial(self.transform_points_to_voxels, config=config)if self.voxel_generator is None:self.voxel_generator = VoxelGeneratorWrapper(# 给定每个voxel的长宽高  [0.05, 0.05, 0.1]vsize_xyz=config.VOXEL_SIZE,  # [0.16, 0.16, 4]# 给定点云的范围 [  0.  -40.   -3.   70.4  40.    1. ]coors_range_xyz=self.point_cloud_range,# 给定每个点云的特征维度，这里是x，y，z，r 其中r是激光雷达反射强度num_point_features=self.num_point_features,# 给定每个pillar中有采样多少个点，不够则补0max_num_points_per_voxel=config.MAX_POINTS_PER_VOXEL,  # 32# 最多选取多少个voxel，训练16000，推理40000max_num_voxels=config.MAX_NUMBER_OF_VOXELS[self.mode],  # 16000)# 使用spconv生成voxel输出points = data_dict['points']voxel_output = self.voxel_generator.generate(points)# 假设一份点云数据是N*4，那么经过pillar生成后会得到三份数据# voxels代表了每个生成的voxel数据，维度是[M, 5, 4]# coordinates代表了每个生成的voxel所在的zyx轴坐标，维度是[M,3]# num_points代表了每个生成的voxel中有多少个有效的点维度是[m,]，因为不满5会被0填充voxels, coordinates, num_points = voxel_output# Falseif not data_dict['use_lead_xyz']:voxels = voxels[..., 3:]  # remove xyz in voxels(N, 3)data_dict['voxels'] = voxelsdata_dict['voxel_coords'] = coordinatesdata_dict['voxel_num_points'] = num_pointsreturn data_dict

Voxelwise Feature Extractor

论文中使用Voxelwise Feature Extractor（VFE）层来提取体素特征，VFE模块是和VoxelNet中一样的。

将将同一体素中的所有点作为输入，通过全连接网络 FCN （每个全连接层由linear layer、batch normalization（BN）和linear unit（ReLU）组成。）映射到一个高维特征空间得到fi∈Rmf_i \in {\mathbb R}_mfi∈Rm ，在获得逐点point-wise特征后，然后对 fif_ifi 使用element-wise MaxPooling来获得voxel的局部聚合特征 f~∈Rm\tilde f \in {\mathbb R}_mf~∈Rm，最后，将局部聚合特征f~\tilde ff~和point-wise特征 fif_ifi 拼接得到输出特征集 Vout={fiout=[fiT,f~T]T∈R2m}i...tV_{out} =\{ f_i^{out }= [f_i^T , \tilde f^T ]^T \in {\mathbb R}^{2m}\} _{i...t}Vout={fiout=[fiT,f~T]T∈R2m}i...t 。体素特征提取器由2个 VFE 层和一个 FCN 层组成。

原来的VFE操作速度太慢，并且对显存不友好，在新的实现中，去掉了原来Stacked Voxel Feature Encoding，直接计算每个voxel内点的平均值，当成这个voxel的特征；大幅提高了计算的速度，并且也取得了不错的检测效果。得到voxel特征的维度变换为(Batch*16000, 5, 4) --> (Batch*16000, 4)

OpenPcdet里实现代码：pcdet/models/backbones_3d/vfe/mean_vfe.py

class MeanVFE(VFETemplate):def __init__(self, model_cfg, num_point_features, **kwargs):super().__init__(model_cfg=model_cfg)# 每个点多少个特征(x，y，z，r)self.num_point_features = num_point_featuresdef get_output_feature_dim(self):return self.num_point_featuresdef forward(self, batch_dict, **kwargs):"""Args:batch_dict:voxels: (num_voxels, max_points_per_voxel, C)voxel_num_points: optional (num_voxels) how many points in a voxel**kwargs:Returns:vfe_features: (num_voxels, C)"""# here use the mean_vfe module to substitute for the original pointnet extractor architecturevoxel_features, voxel_num_points = batch_dict['voxels'], batch_dict['voxel_num_points']# 求每个voxel内 所有点的和# eg:SECOND  shape (Batch*16000, 5, 4) -> (Batch*16000, 4)points_mean = voxel_features[:, :, :].sum(dim=1, keepdim=False)# 正则化项， 保证每个voxel中最少有一个点，防止除0normalizer = torch.clamp_min(voxel_num_points.view(-1, 1), min=1.0).type_as(voxel_features)# 求每个voxel内点坐标的平均值points_mean = points_mean / normalizer# 将处理好的voxel_feature信息重新加入batch_dict中batch_dict['voxel_features'] = points_mean.contiguous()return batch_dict

稀疏卷积中间提取器

稀疏卷积网络回顾

参考。 [25] 是第一篇介绍空间稀疏卷积的论文。在这种方法中，如果没有相关的输入点，则不计算输出点。这种方法在基于 LiDAR 的检测中提供了计算优势，因为 KITTI 中点云的分组步骤将生成 5k-8k 体素，稀疏度接近 0.005。作为正常稀疏卷积的替代方案，submanifold convolution当且仅当相应的输入位置处于活动状态时才限制输出位置处于活动状态。这避免了生成过多的活动位置，这可能导致由于活动点数量过多而导致后续卷积层的速度下降。

稀疏卷积算法

参考：https://zhuanlan.zhihu.com/p/97367489
让我们首先考虑二维密集卷积算法。我们使用 Wu,v,l,mW_{u,v,l,m}Wu,v,l,m 表示过滤后的元素，使用 Du,v,lD_{u,v,l}Du,v,l表示图像元素，其中 uuu 和 vvv 是空间位置索引，lll 表示输入通道，mmm表示输出通道。给定输出位置，函数 P(x,y)P ( x, y )P(x,y)生成需要计算的输入位置。因此，Yx,y,mY_{x,y,m}Yx,y,m 的卷积输出由以下公式给出：

Yx,y,m=∑u,v∈P(x,y)∑Wu−u0,v−v0,l,mDu,v,l(1)Y_{x,y,m}=\sum_{u,v \in P(x,y)} \sum W_{u-u_0,v-v_0,l,m} D_{u,v,l} \tag{1}Yx,y,m=u,v∈P(x,y)∑∑Wu−u0,v−v0,l,mDu,v,l(1)

其中：x,yx,yx,y是是输出空间索引，u−u0u - u_0u−u0 和 v−v0v - v_0v−v0 表示 uuu 和 vvv 坐标kernel-offset。基于通用矩阵乘法 (GEMM) 的算法（也称为基于 im2col 的算法可用于获取构建矩阵 D~P(x,y),l\tilde D_{P(x,y),l}D~P(x,y),l所需的所有数据，然后执行 GEMM 本身：

Yx,y,m=∑l∑W∗,l,mD~P(x,y),l(2)Y_{x,y,m}=\sum_{l} \sum W_{*,l,m} \tilde D_{P(x,y),l} \tag{2}Yx,y,m=l∑∑W∗,l,mD~P(x,y),l(2)

其中 $W_{∗,l,m} 对应于 Wu−u0,v−v0,l,mW_{u − u_0 ,v − v_0 ,l,m}Wu−u0,v−v0,l,m 但采用 GEMM 形式。对于稀疏数据 Di,l′D_{i,l}^{'}Di,l′和关联的输出 Yj,m′Y_{j,m}^{'}Yj,m′ ，直接计算算法可以写成：

Yj,m′=∑i∈P′(j)∑Wk,l,mDi,l′(3)Y_{j,m}^{'}=\sum_{i \in P^{'}(j)} \sum W_{k,l,m} D_{i,l}^{'} \tag{3}Yj,m′=i∈P′(j)∑∑Wk,l,mDi,l′(3)

其中：P′(j)P^{'}(j)P′(j)是获取输入索引iii和滤波器偏移量的函数，下标 k 是1D kernel offset，对应方程（1）中的 u−u0u - u_0u−u0 和 v−v0v - v_0v−v0，下标iii 对应方程（1）中的 uuu 和 vvv 。等式 (3) 的基于 GEMM 的版本由以下公式给出：

Yj,m′=∑l∑W∗,l,mD~P′(j),l′(4)Y_{j,m}^{'}=\sum_{l} \sum W_{*,l,m} \tilde D_{P^{'}(j),l}^{'} \tag{4}Yj,m′=l∑∑W∗,l,mD~P′(j),l′(4)

D~P′(j),l′\tilde D_{P^{'}(j),l}^{'}D~P′(j),l′稀疏数据的聚集矩阵仍然包含许多不需要零计算。为了解决这个问题，我们不直接将式（3）转化为式（4）如下：

Yj,m′=∑k∑lWk,l,mD~Rk,j,k,l′(5)Y_{j,m}^{'}=\sum_{k} \sum _{l}W_{k,l,m} \tilde D_{R_{k,j},k,l}^{'} \tag{5}Yj,m′=k∑l∑Wk,l,mD~Rk,j,k,l′(5)

其中 Rk,jR_{k,j}Rk,j ，也称为 Rule，是一个矩阵，它指定给定kernel offset k 和输出索引 j 的输入索引 i。等式（5）中的inner sum无法通过 GEMM 计算，因此我们需要获取必要的输入来构造矩阵，执行 GEMM，然后将数据分散回去。在实践中，我们可以通过使用预先构建的输入-输出索引规则矩阵直接从原始稀疏数据中获取数据。这增加了速度。具体来说，我们构造了一个规则矩阵表 Rk,i,t=R[k,i,t]R_{k,i,t}= R[ k, i, t ]Rk,i,t=R[k,i,t]，维度为 K×Nin×2K × N_{in} × 2K×Nin×2，其中 K 是内核大小（表示为体积），NinN_{in}Nin是输入特征的数量，ttt是输入/输出索引。元素 R[:,:,0]R [ :, :, 0 ]R[:,:,0]存储用于收集的输入索引，元素 R[:,:,1]R [ :, :, 1 ]R[:,:,1] 存储用于散射的输出索引。图 2 的顶部显示了我们提出的算法。

图2 稀疏卷积算法如上图，GPU规则生成算法如下图。NinN_{in}Nin表示输入特征的数量，NoutN_{out}Nout表示输出特征的数量。 N 是收集的特征的数量。 Rule是规则矩阵，其中Rule[i,:,:]是卷积核中第i个核矩阵对应的第i个规则。除白色外的颜色框表示数据稀疏的点，白色框表示空点。

Rule Generation Algorithm

当前实现面临的主要性能挑战与规则生成算法有关。通常使用使用哈希表的基于 CPU 的规则生成算法，但这种算法速度较慢，并且需要在 CPU 和 GPU 之间传输数据。更直接的规则生成方法是遍历输入点以找到与每个输入点相关的输出，并将相应的索引存储到规则中。在迭代过程中，需要一个表来检查每个输出位置的存在，以决定是否使用全局输出索引计数器来累积数据。这是阻碍算法使用并行计算的最大挑战。

作者设计了一种基于 GPU 的规则生成算法（算法 1），它在 GPU 上运行得更快。图 1 的底部显示了我们提出的算法。

首先，收集输入索引和相关的空间索引而不是输出索引（算法 1 中的第一个循环）。在此阶段获得重复的输出位置。
然后，对空间索引数据执行独特的并行算法，以获得输出索引及其关联的空间索引。从先前的结果生成与稀疏数据具有相同空间维度的缓冲区，用于下一步中的查表（算法 1 中的第二个循环）。
最后，迭代规则并使用存储的空间索引来获得每个输入索引的输出索引（算法 1 中的第三个循环）。

表 1 显示了我们的实现与现有方法之间的性能比较。

SparseConvNet 是子流形卷积的官方实现

代码在：pcdet/models/backbones_3d/spconv_backbone.py

class VoxelBackBone8x(nn.Module):def __init__(self, model_cfg, input_channels, grid_size, **kwargs):super().__init__()self.model_cfg = model_cfgnorm_fn = partial(nn.BatchNorm1d, eps=1e-3, momentum=0.01)self.sparse_shape = grid_size[::-1] + [1, 0, 0]self.conv_input = spconv.SparseSequential(spconv.SubMConv3d(input_channels, 16, 3, padding=1, bias=False, indice_key='subm1'),norm_fn(16),nn.ReLU(),)block = post_act_blockself.conv1 = spconv.SparseSequential(block(16, 16, 3, norm_fn=norm_fn, padding=1, indice_key='subm1'),)self.conv2 = spconv.SparseSequential(# [1600, 1408, 41] <- [800, 704, 21]block(16, 32, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),block(32, 32, 3, norm_fn=norm_fn, padding=1, indice_key='subm2'),)self.conv3 = spconv.SparseSequential(# [800, 704, 21] <- [400, 352, 11]block(32, 64, 3, norm_fn=norm_fn, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm3'),)self.conv4 = spconv.SparseSequential(# [400, 352, 11] <- [200, 176, 5]block(64, 64, 3, norm_fn=norm_fn, stride=2, padding=(0, 1, 1), indice_key='spconv4', conv_type='spconv'),block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),block(64, 64, 3, norm_fn=norm_fn, padding=1, indice_key='subm4'),)last_pad = 0last_pad = self.model_cfg.get('last_pad', last_pad)self.conv_out = spconv.SparseSequential(# [200, 150, 5] -> [200, 150, 2]spconv.SparseConv3d(64, 128, (3, 1, 1), stride=(2, 1, 1), padding=last_pad,bias=False, indice_key='spconv_down2'),norm_fn(128),nn.ReLU(),)self.num_point_features = 128self.backbone_channels = {'x_conv1': 16,'x_conv2': 32,'x_conv3': 64,'x_conv4': 64}def forward(self, batch_dict):"""Args:batch_dict:batch_size: intvfe_features: (num_voxels, C)voxel_coords: (num_voxels, 4), [batch_idx, z_idx, y_idx, x_idx]Returns:batch_dict:encoded_spconv_tensor: sparse tensor"""# voxel_features, voxel_coords  shape (Batch * 16000, 4)voxel_features, voxel_coords = batch_dict['voxel_features'], batch_dict['voxel_coords']batch_size = batch_dict['batch_size']# 根据voxel坐标，并将每个voxel放置voxel_coor对应的位置，建立成稀疏tensorinput_sp_tensor = spconv.SparseConvTensor(# (Batch * 16000, 4)features=voxel_features,# (Batch * 16000, 4) 其中4为 batch_idx, x, y, zindices=voxel_coords.int(),# [41,1600,1408] ZYX 每个voxel的长宽高为0.05，0.05，0.1 点云的范围为[0, -40, -3, 70.4, 40, 1]spatial_shape=self.sparse_shape,# 4batch_size=batch_size)"""稀疏卷积的计算中，feature，channel，shape，index这几个内容都是分开存放的，在后面用out.dense才把这三个内容组合到一起了，变为密集型的张量spconv卷积的输入也是一样，输入和输出更像是一个  字典或者说元组注意卷积中pad与no_pad的区别"""# # 进行submanifold convolution# [batch_size, 4, [41, 1600, 1408]] --> [batch_size, 16, [41, 1600, 1408]]x = self.conv_input(input_sp_tensor)# [batch_size, 16, [41, 1600, 1408]] --> [batch_size, 16, [41, 1600, 1408]]x_conv1 = self.conv1(x)# [batch_size, 16, [41, 1600, 1408]] --> [batch_size, 32, [21, 800, 704]]x_conv2 = self.conv2(x_conv1)# [batch_size, 32, [21, 800, 704]] --> [batch_size, 64, [11, 400, 352]]x_conv3 = self.conv3(x_conv2)# [batch_size, 64, [11, 400, 352]] --> [batch_size, 64, [5, 200, 176]]x_conv4 = self.conv4(x_conv3)# for detection head# [200, 176, 5] -> [200, 176, 2]# [batch_size, 64, [5, 200, 176]] --> [batch_size, 128, [2, 200, 176]]out = self.conv_out(x_conv4)batch_dict.update({'encoded_spconv_tensor': out,'encoded_spconv_tensor_stride': 8})batch_dict.update({'multi_scale_3d_features': {'x_conv1': x_conv1,'x_conv2': x_conv2,'x_conv3': x_conv3,'x_conv4': x_conv4,}})batch_dict.update({'multi_scale_3d_strides': {'x_conv1': 1,'x_conv2': 2,'x_conv3': 4,'x_conv4': 8,}})return batch_dict

其中block为稀疏卷积构建：

def post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,conv_type='subm', norm_fn=None):# 后处理执行块，根据conv_type选择对应的卷积操作并和norm与激活函数封装为块if conv_type == 'subm':conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)elif conv_type == 'spconv':conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,bias=False, indice_key=indice_key)elif conv_type == 'inverseconv':conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)else:raise NotImplementedErrorm = spconv.SparseSequential(conv,norm_fn(out_channels),nn.ReLU(),)return m

稀疏卷积中间提取器

中间提取器用于学习有关 z 轴的信息并将稀疏的 3D 数据转换为 2D BEV 图像。图 3 显示了中间提取器的结构。它由稀疏卷积的两个阶段组成。每个阶段包含几个子流形卷积层和一个正常稀疏卷积，以在 z 轴上执行下采样。在 z 维被下采样到一维或二维后，稀疏数据被转换为密集特征图。然后，将数据简单地重新整形为类似图像的 2D 数据。

稀疏中间特征提取器的结构。黄色框代表稀疏卷积，白色框代表子流形卷积，红色框代表稀疏到密集层。图的上半部分显示了稀疏数据的空间维度。

由于前面VoxelBackBone8x得到的tensor是稀疏的，数据为：[batch_size, 128, [2, 200, 176]]

这里需要将原来的稀疏数据转换为密集数据；同时将得到的密集数据在Z轴方向上进行堆叠，因为在KITTI数据集中，没有物体会在Z轴上重合；同时这样做的好处有：

简化了网络检测头的设计难度
增加了高度方向上的感受野
加快了网络的训练、推理速度

最终得到的BEV特征图为：(batch_size, 128*2, 200, 176)

代码在pcdet/models/backbones_2d/map_to_bev/height_compression.py

# 在高度方向上进行压缩
class HeightCompression(nn.Module):def __init__(self, model_cfg, **kwargs):super().__init__()self.model_cfg = model_cfg# 高度的特征数self.num_bev_features = self.model_cfg.NUM_BEV_FEATURESdef forward(self, batch_dict):"""Args:batch_dict:encoded_spconv_tensor: sparse tensorReturns:batch_dict:spatial_features:"""# 得到VoxelBackBone8x的输出特征encoded_spconv_tensor = batch_dict['encoded_spconv_tensor']# 将稀疏的tensor转化为密集tensor,[bacth_size, 128, 2, 200, 176]# 结合batch，spatial_shape、indice和feature将特征还原到密集tensor中对应位置spatial_features = encoded_spconv_tensor.dense()# batch_size，128，2，200，176N, C, D, H, W = spatial_features.shape"""将密集的3D tensor reshape为2D鸟瞰图特征    将两个深度方向内的voxel特征拼接成一个 shape : (batch_size, 256, 200, 176)z轴方向上没有物体会堆叠在一起，这样做可以增大Z轴的感受野，同时加快网络的速度，减小后期检测头的设计难度"""spatial_features = spatial_features.view(N, C * D, H, W)# 将特征和采样尺度加入batch_dictbatch_dict['spatial_features'] = spatial_features# 特征图的下采样倍数 8倍batch_dict['spatial_features_stride'] = batch_dict['encoded_spconv_tensor_stride']return batch_dict

RPN

使用类似于 (SSD) 架构来构建 RPN 架构。 RPN 的输入包括来自稀疏卷积中间提取器的特征图。 RPN 架构由三个阶段组成。每个阶段都从一个下采样的卷积层开始，然后是几个卷积层。在每个卷积层之后，应用 BatchNorm 和 ReLU 层。然后，我们将每个阶段的输出上采样为相同大小的特征图，并将这些特征图连接成一个特征图。最后，应用三个 1×1 卷积来预测类别、回归偏移和方向。

SECOND中存在两个下采样分支结构，则对应存在两个反卷积结构：经过HeightCompression得到的BEV特征图是：(batch_size, 128*2, 200, 176)

下采样分支一：(batch_size, 128*2, 200, 176) --> (batch,128, 200, 176)，对应反卷积分支一：(batch, 128, 200, 176) --> (batch, 256, 200, 176)
下采样分支二：(batch_size, 128*2, 200, 176) --> (batch,128, 200, 176)，对应反卷积分支二：(batch, 256, 100, 88) --> (batch, 256, 200, 176)

最终将结构在通道维度上进行拼接的特征图维度：(batch, 256 * 2, 200, 176)

代码在：pcdet/models/backbones_2d/base_bev_backbone.py

    def forward(self, data_dict):"""Args:data_dict:spatial_features : (4, 64, 496, 432)Returns:"""spatial_features = data_dict['spatial_features']ups = []ret_dict = {}x = spatial_features# 对不同的分支部分分别进行conv和deconv的操作for i in range(len(self.blocks)):"""SECOND中一共存在两个下采样分支，分支一: (batch,128,200,176)分支二: (batch,256,100,88)"""x = self.blocks[i](x)stride = int(spatial_features.shape[2] / x.shape[2])ret_dict['spatial_features_%dx' % stride] = x# 如果存在deconv，则对经过conv的结果进行反卷积操作"""SECOND中存在两个下采样，则分别对两个下采样分支进行反卷积操作分支一: (batch,128,200,176)-->(batch,256,200,176)分支二: (batch,256,100,88)-->(batch,256,200,176)"""if len(self.deblocks) > 0:ups.append(self.deblocks[i](x))else:ups.append(x)# 将上采样结果在通道维度拼接if len(ups) > 1:"""最终经过所有上采样层得到的2个尺度的的信息每个尺度的 shape 都是 (batch,256,200,176)在第一个维度上进行拼接得到x  维度是 (batch,512,200,176)"""x = torch.cat(ups, dim=1)elif len(ups) == 1:x = ups[0]# Fasleif len(self.deblocks) > len(self.blocks):x = self.deblocks[-1](x)# 将结果存储在spatial_features_2d中并返回data_dict['spatial_features_2d'] = xreturn data_dict

Anchors and Targets

每个anchor都被分配了一个分类目标的one-hot向量、一个7-vector的框回归目标和一个方向分类目标的one-hot向量。不同的类别有不同的匹配和不匹配阈值。对于汽车，锚点使用 0.6 的交并联合 (IoU) 阈值分配给ground-truth对象，如果它们的 IoU 小于 0.45，则将锚点分配给背景（负值）。 IoU 介于 0.45 和 0.6 之间的锚点在训练期间会被忽略。

anchor生成

使用固定大小的anchor，anchor基于 KITTI 训练集中所有ground truths的大小和中心位置的均值确定，每个类别的anchor都有两个方向角为0度和90度。。

对于汽车，anchor尺寸为 [ 1.6，3.9，1.5]的锚点，以 z = − 1.0 m 为中心
对于行人，anchor尺寸为 [0.6，0.8，1.73] 的锚点，以 z = − 0.6 m 为中心
对于骑自行车的人，anchor尺寸为 [ 0.6，1.76，1.73]，以 z = − 0.6 m 为中心

pcdet/models/dense_heads/target_assigner/anchor_generator.py

class AnchorGenerator(object):def __init__(self, anchor_range, anchor_generator_config):super().__init__()self.anchor_generator_cfg = anchor_generator_configself.anchor_range = anchor_range # [0, -39.68, -3, 69.12, 39.68, 1]# car:[[3.9, 1.6, 1.56]] ，Pedestrian:[[0.8, 0.6, 1.73]]，Cyclist:[[1.76, 0.6, 1.73]]self.anchor_sizes = [config['anchor_sizes'] for config in anchor_generator_config]# [0, 1.57],[0, 1.57],[0, 1.57]self.anchor_rotations = [config['anchor_rotations'] for config in anchor_generator_config]# [-0.6],[-0.6],[-0.6]self.anchor_heights = [config['anchor_bottom_heights'] for config in anchor_generator_config]# False,False,Falseself.align_center = [config.get('align_center', False) for config in anchor_generator_config]assert len(self.anchor_sizes) == len(self.anchor_rotations) == len(self.anchor_heights)self.num_of_anchor_sets = len(self.anchor_sizes) # 3def generate_anchors(self, grid_sizes):assert len(grid_sizes) == self.num_of_anchor_sets# 1.初始化all_anchors = []num_anchors_per_location = []# 2.三个类别的anchors逐个生成for grid_size, anchor_size, anchor_rotation, anchor_height, align_center in zip(grid_sizes, self.anchor_sizes, self.anchor_rotations, self.anchor_heights, self.align_center):# 2 = 2x1x1 --> 每个位置产生2个anchornum_anchors_per_location.append(len(anchor_rotation) * len(anchor_size) * len(anchor_height))if align_center:x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]x_offset, y_offset = x_stride / 2, y_stride / 2 # 中心对齐，平移半个网格else:# 2.1计算每个网格的实际大小x_stride = (self.anchor_range[3] - self.anchor_range[0]) / (grid_size[0] - 1) # (69.12 - 0) / (216 - 1) = 0.321488y_stride = (self.anchor_range[4] - self.anchor_range[1]) / (grid_size[1] - 1) # (39.68 - (-39.68)) / (248 - ) = 0.321295x_offset, y_offset = 0, 0 # 由于没有进行中心对齐，这里采用的是左上角# 2.2 生成单个维度x_shifts，y_shifts和z_shifts# 以x_stride为step，在self.anchor_range[0] + x_offset和self.anchor_range[3] + 1e-5，产生x坐标 --> 216个点 [0, 69.12]x_shifts = torch.arange(self.anchor_range[0] + x_offset, self.anchor_range[3] + 1e-5, step=x_stride, dtype=torch.float32,).cuda()# 248个点 [-39.68, 39.68]y_shifts = torch.arange(self.anchor_range[1] + y_offset, self.anchor_range[4] + 1e-5, step=y_stride, dtype=torch.float32,).cuda()z_shifts = x_shifts.new_tensor(anchor_height) # [-1.78]num_anchor_size, num_anchor_rotation = anchor_size.__len__(), anchor_rotation.__len__() # 1,2anchor_rotation = x_shifts.new_tensor(anchor_rotation) # tensor([0.0000, 1.5700])anchor_size = x_shifts.new_tensor(anchor_size) # tensor([[3.9000, 1.6000, 1.5600]])# # 2.3 调用meshgrid生成网格坐标# torch.meshgrid（）的功能是生成网格，可以用于生成坐标# 维度：[第一个输入张量的元素个数，第二个输入张量的元素个数,第三个输入张量的元素个数]# 第一个输出张量填充第一个输入张量中的元素，第二个输出张量填充第二个输入张量中的元素,第三个输出张量填充第三个输入张量中的元素x_shifts, y_shifts, z_shifts = torch.meshgrid([x_shifts, y_shifts, z_shifts])  # [x_grid, y_grid, z_grid] torch.Size([216, 248, 1]) torch.Size([216, 248, 1]) torch.Size([216, 248, 1])# meshgrid可以理解为在原来的维度上进行扩展,例如:# x原来为（216，）-->（216，1, 1）--> (216,248,1)# y原来为（248，）--> (1，248，1）--> (216,248,1) # z原来为 (1,) --> (1,1,1) --> (216,248,1)# 2.4.anchor各个维度堆叠组合，生成最终anchor(1,248,216,1,2,7）# 2.4.1.堆叠anchor的位置# [x, y, z, 3]  # torch.Size([216, 248, 1, 3])"""anchors tensor([[[[  0.0000, -39.6800,  -1.7800]],[[  0.0000, -39.3587,  -1.7800]],[[  0.0000, -39.0374,  -1.7800]],...,"""anchors = torch.stack((x_shifts, y_shifts, z_shifts), dim=-1)   # [x, y, z, 3]  # torch.Size([216, 248, 1, 3])# 2.4.2.将anchor的位置和大小进行组合，编程套路为将anchor扩展并复制为相同维度（除了最后一维），然后进行组合anchors = anchors[:, :, :, None, :].repeat(1, 1, 1, anchor_size.shape[0], 1) # (216,248,1,1,3)-->(216,248,1,1,3)anchor_size = anchor_size.view(1, 1, 1, -1, 3).repeat([*anchors.shape[0:3], 1, 1]) # (1,1,1,1,3)-->(216,248,1,1,3)anchors = torch.cat((anchors, anchor_size), dim=-1) # anchors的位置+大小 --> (216,248,1,1,6)# 2.4.3.将anchor的位置和大小和旋转角进行组合anchors = anchors[:, :, :, :, None, :].repeat(1, 1, 1, 1, num_anchor_rotation, 1) # (216, 248, 1, 1，1, 6)-->(216, 248, 1, 1, 2, 6)anchor_rotation = anchor_rotation.view(1, 1, 1, 1, -1, 1).repeat([*anchors.shape[0:3], num_anchor_size, 1, 1]) #(1,1,1,1,2,1)--> (216, 248, 1, 1, 2, 1)anchors = torch.cat((anchors, anchor_rotation), dim=-1)  # anchors的位置+大小+旋转方向 --> [x, y, z, num_size, num_rot, 7] --> (216,248,1,1,2,7)# 2.5 调整anchor的维度anchors = anchors.permute(2, 1, 0, 3, 4, 5).contiguous() #（1，248，216，1，2，7）--> [x, y, z, dx, dy, dz, rot]# anchors = anchors.view(-1, anchors.shape[-1])anchors[..., 2] += anchors[..., 5] / 2  # z轴方向-->shift to box centersall_anchors.append(anchors)return all_anchors, num_anchors_per_location # list:3 [(1，248，216，1，2，7），(1，248，216，1，2，7），(1，248，216，1，2，7）], [2,2,2]

Target assignment

处理一批数据中所有点云的anchors和gt_boxes，分类别计算每个anchor属于前景还是背景，为每个前景的anchor分配类别和计算box的回归残差和回归权重，不同于在ssd中整体计算iou并取最大值

车使用 0.45 作为非匹配阈值，0.6 作为匹配阈值
行人和骑自行车的人，使用 0.35 作为非匹配阈值，0.5 作为匹配阈值

对于回归目标，我们使用以下框编码函数：

pcdet/models/dense_heads/target_assigner/axis_aligned_target_assigner.py

import numpy as np
import torchfrom ....ops.iou3d_nms import iou3d_nms_utils
from ....utils import box_utilsclass AxisAlignedTargetAssigner(object):def __init__(self, model_cfg, class_names, box_coder, match_height=False):super().__init__()anchor_generator_cfg = model_cfg.ANCHOR_GENERATOR_CONFIG # anchor生成配置参数anchor_target_cfg = model_cfg.TARGET_ASSIGNER_CONFIG # 为预测box找对应anchor的参数# 编码box的7个残差参数(x, y, z, w, l, h, θ) --> pcdet.utils.box_coder_utils.ResidualCoderself.box_coder = box_coder # pcdet.utils.box_coder_utils.ResidualCoder# 在PointPillars中指定正负样本的时候由BEV视角计算GT和先验框的iou，不需要进行z轴上的高度的匹配，# 想法是：1、点云中的物体都在同一个平面上，没有物体在Z轴发生重叠的情况#        2、每个类别的高度相差不是很大，直接使用SmoothL1损失就可以达到很好的高度回归效果self.match_height = match_height # Falseself.class_names = np.array(class_names) # ['Car', 'Pedestrian', 'Cyclist']self.anchor_class_names = [config['class_name'] for config in anchor_generator_cfg]# ['Car', 'Pedestrian', 'Cyclist']# anchor_target_cfg.POS_FRACTION = -1.0 < 0 --> None# 前景、背景采样系数 PointPillars不考虑self.pos_fraction = anchor_target_cfg.POS_FRACTION if anchor_target_cfg.POS_FRACTION >= 0 else Noneself.sample_size = anchor_target_cfg.SAMPLE_SIZE # SAMPLE_SIZE: 512# False 前景权重由 1/前景anchor数量 PointPillars不考虑self.norm_by_num_examples = anchor_target_cfg.NORM_BY_NUM_EXAMPLES # False# 类别iou匹配为正样本阈值{'Car':0.6, 'Pedestrian':0.5, 'Cyclist':0.5}self.matched_thresholds = {}# 类别iou匹配为负样本阈值{'Car':0.45, 'Pedestrian':0.35, 'Cyclist':0.35}self.unmatched_thresholds = {}for config in anchor_generator_cfg:# {'Car':0.6, 'Pedestrian':0.5, 'Cyclist':0.5}self.matched_thresholds[config['class_name']] = config['matched_threshold']# {'Car':0.45, 'Pedestrian':0.35, 'Cyclist':0.35}self.unmatched_thresholds[config['class_name']] = config['unmatched_threshold']self.use_multihead = model_cfg.get('USE_MULTIHEAD', False) # False# self.separate_multihead = model_cfg.get('SEPARATE_MULTIHEAD', False)# if self.seperate_multihead:#     rpn_head_cfgs = model_cfg.RPN_HEAD_CFGS#     self.gt_remapping = {}#     for rpn_head_cfg in rpn_head_cfgs:#         for idx, name in enumerate(rpn_head_cfg['HEAD_CLS_NAME']):#             self.gt_remapping[name] = idx + 1# 完成对一帧点云数据中所有的类别和anchor的正负样本分配def assign_targets(self, all_anchors, gt_boxes_with_classes):"""处理一批数据中所有点云的anchors和gt_boxes,计算每个anchor属于前景还是背景,为每个前景的anchor分配类别和计算box的回归残差和回归权重Args:all_anchors: [(N, 7), ...] [(1,248,216,1,2,7),(1,248,216,1,2,7),(1,248,216,1,2,7)]gt_boxes: (B, M, 8) # 最后维度数据为 (x, y, z, w, l, h, θ,class)Returns:all_targets_dict = {# 每个anchor的类别'box_cls_labels': cls_labels, # (batch_size,num_of_anchors)# 每个anchor的回归残差 -->(∆x, ∆y, ∆z, ∆l, ∆w, ∆h, ∆θ）'box_reg_targets': bbox_targets, # (batch_size,num_of_anchors,7)# 每个box的回归权重'reg_weights': reg_weights # (batch_size,num_of_anchors)}"""# 1.初始化结果list并提取对应的gt_box和类别bbox_targets = []cls_labels = []reg_weights = []# 得到批大小batch_size = gt_boxes_with_classes.shape[0]# 得到所有GT的类别gt_classes = gt_boxes_with_classes[:, :, -1] # （4，num_of_gt）# 2.对batch中的所有数据逐帧匹配anchor的前景和背景gt_boxes = gt_boxes_with_classes[:, :, :-1] # （4，num_of_gt，7）for k in range(batch_size):cur_gt = gt_boxes[k] # 取出当前帧中的 gt_boxes (num_of_gt，7）"""由于在OpenPCDet的数据预处理时,以一批数据中拥有GT数量最多的帧为基准,其他帧中GT数量不足,则会进行补0操作,使其成为一个矩阵，例:[[1,1,2,2,3,2],[2,2,3,1,0,0],[3,1,2,0,0,0]]因此这里从每一行的倒数第二个类别开始判断，截取最后一个非零元素的索引,来取出当前帧中真实的GT数据"""cnt = cur_gt.__len__() - 1 # 得到一批数据中最多有多少个GT# 这里的循环是找到最后一个非零的box，因为预处理的时候会按照batch最大box的数量处理，不足的进行补0while cnt > 0 and cur_gt[cnt].sum() == 0:cnt -= 1# 2.1提取当前帧非零的box和类别cur_gt = cur_gt[:cnt + 1]# cur_gt_classes 例: tensor([1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3], device='cuda:0', dtype=torch.int32)cur_gt_classes = gt_classes[k][:cnt + 1].int()target_list = []target_list = []# 2.2 对每帧中的anchor和GT分类别，单独计算前背景# 计算时候 每个类别的anchor是独立计算的 不同于在ssd中整体计算iou并取最大值for anchor_class_name, anchors in zip(self.anchor_class_names, all_anchors):# anchor_class_name : 车 | 行人 | 自行车# anchors : (1, 248, 216, 1, 2, 7)  7 --> (x, y, z, l, w, h, θ)if cur_gt_classes.shape[0] > 1:# self.class_names : ["car", "person", "cyclist"]# 这里减1是因为列表索引从0开始，目的是得到属于列表中gt中哪些类别是与当前处理的类别相同，得到类别maskmask = torch.from_numpy(self.class_names[cur_gt_classes.cpu() - 1] == anchor_class_name)else:mask = torch.tensor([self.class_names[c - 1] == anchor_class_namefor c in cur_gt_classes], dtype=torch.bool)# 在检测头中是否使用多头，默认为Falseif self.use_multihead:anchors = anchors.permute(3, 4, 0, 1, 2, 5).contiguous().view(-1, anchors.shape[-1])# if self.seperate_multihead:#     selected_classes = cur_gt_classes[mask].clone()#     if len(selected_classes) > 0:#         new_cls_id = self.gt_remapping[anchor_class_name]#         selected_classes[:] = new_cls_id# else:#     selected_classes = cur_gt_classes[mask]selected_classes = cur_gt_classes[mask]else:# 2.2.1 计算所需的变量 得到特征图的大小feature_map_size = anchors.shape[:3]  # （1, 248, 216）# 将所有的anchors展平  shape : (216, 248, 1, 1, 2, 7) -->  (107136, 7)anchors = anchors.view(-1, anchors.shape[-1])# List: 根据mask索引得到该帧中当前需要处理的类别  --> 车 | 行人 | 自行车selected_classes = cur_gt_classes[mask]# 2.2.2 使用assign_targets_single来单独为某一类别的anchors分配gt_boxes，# 并为前景、背景的box设置编码和回归权重# assign_targets_single：完成对一帧中每个类别的GT和anchor的正负样本分配single_target = self.assign_targets_single(anchors, # 该类的所有anchorcur_gt[mask], # GT_box  shape : （num_of_GT_box, 7）gt_classes=selected_classes, # 当前选中的类别matched_threshold=self.matched_thresholds[anchor_class_name], # 当前类别anchor与GT匹配为正样本的阈值unmatched_threshold=self.unmatched_thresholds[anchor_class_name] # 当前类别anchor与GT匹配为负样本的阈值)# 到目前为止，处理完该帧单个类别和该类别anchor的前景和背景分配target_list.append(single_target)if self.use_multihead:target_dict = {'box_cls_labels': [t['box_cls_labels'].view(-1) for t in target_list],'box_reg_targets': [t['box_reg_targets'].view(-1, self.box_coder.code_size) for t in target_list],'reg_weights': [t['reg_weights'].view(-1) for t in target_list]}target_dict['box_reg_targets'] = torch.cat(target_dict['box_reg_targets'], dim=0)target_dict['box_cls_labels'] = torch.cat(target_dict['box_cls_labels'], dim=0).view(-1)target_dict['reg_weights'] = torch.cat(target_dict['reg_weights'], dim=0).view(-1)else:# 对该帧各类anchor的assign结果进行view并拼接target_dict = {'box_cls_labels': [t['box_cls_labels'].view(*feature_map_size, -1) for t in target_list], # feature_map_size:(1，248，216）--> (1，248，216, 2)'box_reg_targets': [t['box_reg_targets'].view(*feature_map_size, -1, self.box_coder.code_size) # (1，248，216, 2, 7)for t in target_list],'reg_weights': [t['reg_weights'].view(*feature_map_size, -1) for t in target_list] # (1，248，216, 2)}# list:3 (1，248，216, 2, 7) --> (1，248，216, 6, 7) -> (321408, 7)target_dict['box_reg_targets'] = torch.cat(target_dict['box_reg_targets'], dim=-2).view(-1, self.box_coder.code_size)# list:3 (1，248，216, 2) --> (1，248，216, 6) -> (321408,)target_dict['box_cls_labels'] = torch.cat(target_dict['box_cls_labels'], dim=-1).view(-1)# list:3 (1，248，216, 2) --> (1，248，216, 6) -> (321408,)target_dict['reg_weights'] = torch.cat(target_dict['reg_weights'], dim=-1).view(-1)# 将结果填入对应的容器bbox_targets.append(target_dict['box_reg_targets'])cls_labels.append(target_dict['box_cls_labels'])reg_weights.append(target_dict['reg_weights'])# 到这里该batch的点云全部处理完# 3.将结果stack并返回bbox_targets = torch.stack(bbox_targets, dim=0) # (4，321408，7）cls_labels = torch.stack(cls_labels, dim=0) # (4，321408）reg_weights = torch.stack(reg_weights, dim=0)all_targets_dict = {'box_cls_labels': cls_labels, # (4，321408）'box_reg_targets': bbox_targets, # (4，321408，7）'reg_weights': reg_weights # (4，321408）}return all_targets_dictdef assign_targets_single(self, anchors, gt_boxes, gt_classes, matched_threshold=0.6, unmatched_threshold=0.45):"""针对某一类别的anchors和gt_boxes，计算前景和背景anchor的类别，box编码和回归权重Args:anchors: (107136,7)gt_boxes: （15，7）gt_classes: (15,1)matched_threshold:0.6unmatched_threshold:0.45Returns:前景anchorret_dict = {'box_cls_labels': labels, # (107136,)'box_reg_targets': bbox_targets,  # (107136,7)'reg_weights': reg_weights, # (107136,)}"""#----------------------------1.初始化-------------------------------#        num_anchors = anchors.shape[0] # 107136 该帧中该类别的GT数量num_gt = gt_boxes.shape[0] # 15 # 初始化anchor对应的label和gt_id ，并置为 -1，-1表示loss计算时候不会被考虑，背景的类别被设置为0labels = torch.ones((num_anchors,), dtype=torch.int32, device=anchors.device) * -1gt_ids = torch.ones((num_anchors,), dtype=torch.int32, device=anchors.device) * -1# ---------------------2.计算该类别中anchor的前景和背景------------------------#if len(gt_boxes) > 0 and anchors.shape[0] > 0:# 1.计算该帧中某一个类别gt和对应anchors之间的iou（jaccard index）# anchor_by_gt_overlap    shape : (107136, num_gt)# anchor_by_gt_overlap代表当前类别的所有anchor和当前类别中所有GT的iouanchor_by_gt_overlap = iou3d_nms_utils.boxes_iou3d_gpu(anchors[:, 0:7], gt_boxes[:, 0:7]) \if self.match_height else box_utils.boxes3d_nearest_bev_iou(anchors[:, 0:7], gt_boxes[:, 0:7])# NOTE: The speed of these two versions depends the environment and the number of anchors# anchor_to_gt_argmax = torch.from_numpy(anchor_by_gt_overlap.cpu().numpy().argmax(axis=1)).cuda()# 2.得到每一个anchor与哪个的GT的的iou最大# anchor_to_gt_argmax表示数据维度是anchor的长度，索引是gt# (107136，）找到每个anchor最匹配的gt的索引anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(dim=1)# anchor_to_gt_max得到每一个anchor最匹配的gt的iou数值# （107136，）找到每个anchor最匹配的gt的iouanchor_to_gt_max = anchor_by_gt_overlap[torch.arange(num_anchors, device=anchors.device), anchor_to_gt_argmax]# gt_to_anchor_argmax = torch.from_numpy(anchor_by_gt_overlap.cpu().numpy().argmax(axis=0)).cuda()# 3.找到每个gt最匹配anchor的索引和iougt_to_anchor_argmax = anchor_by_gt_overlap.argmax(dim=0)gt_to_anchor_max = anchor_by_gt_overlap[gt_to_anchor_argmax, torch.arange(num_gt, device=anchors.device)]# 4.标记没有匹配的gt并将iou置为-1empty_gt_mask = gt_to_anchor_max == 0 # 没有匹配anchor的gt的maskgt_to_anchor_max[empty_gt_mask] = -1 # 让没有匹配anchor的gt的iou值为-1# 5.找到anchor中和gt存在最大iou的anchor索引，即前景anchor# 以gt为基础，逐个anchor对应，比如第一个gt的最大iou为0.9，则在所有anchor中找iou为0.9的anchoranchors_with_max_overlap = (anchor_by_gt_overlap == gt_to_anchor_max).nonzero()[:, 0] # (35,)# 找到anchor中和gt存在最大iou的gt索引# 其实和(anchor_by_gt_overlap == gt_to_anchor_max).nonzero()[:, 1]的结果一样gt_inds_force = anchor_to_gt_argmax[anchors_with_max_overlap] # （35，）labels[anchors_with_max_overlap] = gt_classes[gt_inds_force] # 将gt的类别赋值到对应的anchor的label中gt_ids[anchors_with_max_overlap] = gt_inds_force.int() # 将gt的索引赋值到对应的anchor的gt_id中# 6.根据matched_threshold和unmatched_threshold以及anchor_to_gt_max计算前景和背景索引，并更新labels和gt_ids# 这里应该对labels和gt_ids的操作应该包含了上面的anchors_with_max_overlappos_inds = anchor_to_gt_max >= matched_threshold # 找到最匹配的anchor中iou大于给定阈值的mask #(107136,)gt_inds_over_thresh = anchor_to_gt_argmax[pos_inds] # 找到最匹配的anchor中iou大于给定阈值的gt的索引 #(105,)labels[pos_inds] = gt_classes[gt_inds_over_thresh] # 将pos anchor对应gt的类别赋值到对应的anchor的label中gt_ids[pos_inds] = gt_inds_over_thresh.int() # 将pos anchor对应gt的索引赋值到对应的anchor的gt_id中bg_inds = (anchor_to_gt_max < unmatched_threshold).nonzero()[:, 0] # 找到背景anchor索引 (106879，)else:bg_inds = torch.arange(num_anchors, device=anchors.device)fg_inds = (labels > 0).nonzero()[:, 0] # 找到前景anchor的索引--> (119,)  106879 + 119 = 106998 < 107136 有一些anchor既不是背景也不是前景# 到目前为止得到了anchor的前景和背景#------------------3.对anchor的前景和背景进行筛选和赋值--------------------## 如果存在前景采样比例，则分别采样前景和背景anchor if self.pos_fraction is not None: # anchor_target_cfg.POS_FRACTION = -1 < 0 --> Nonenum_fg = int(self.pos_fraction * self.sample_size) # self.sample_size=512# 如果前景anchor大于采样前景数if len(fg_inds) > num_fg:# 计算要丢弃的前景anchor数目num_disabled = len(fg_inds) - num_fg# 在前景数目中随机产生索引值，并取前num_disabled个关闭索引# 比如：torch.randperm(4)# 输出：tensor([ 2,  1,  0,  3])disable_inds = torch.randperm(len(fg_inds))[:num_disabled]# 将被丢弃的anchor的iou设置为-1labels[disable_inds] = -1# 更新前景索引fg_inds = (labels > 0).nonzero()[:, 0]# 计算所需背景数num_bg = self.sample_size - (labels > 0).sum()# 如果当前背景数大于所需背景数if len(bg_inds) > num_bg:# torch.randint在0到len(bg_inds)之间，随机产生size为(num_bg,)的数组enable_inds = bg_inds[torch.randint(0, len(bg_inds), size=(num_bg,))]# 将enable_inds的标签设置为0labels[enable_inds] = 0# bg_inds = torch.nonzero(labels == 0)[:, 0]else:if len(gt_boxes) == 0 or anchors.shape[0] == 0:labels[:] = 0else:# 将背景赋0labels[bg_inds] = 0# 将前景赋对应类别labels[anchors_with_max_overlap] = gt_classes[gt_inds_force]#------------------4.计算bbox_targets和reg_weights--------------------## 初始化bbox_targetsbbox_targets = anchors.new_zeros((num_anchors, self.box_coder.code_size)) # (107136,7)if len(gt_boxes) > 0 and anchors.shape[0] > 0:fg_gt_boxes = gt_boxes[anchor_to_gt_argmax[fg_inds], :]fg_anchors = anchors[fg_inds, :]bbox_targets[fg_inds, :] = self.box_coder.encode_torch(fg_gt_boxes, fg_anchors) # 编码gt和前景anchor，并赋值到bbox_targets的对应位置# 初始化回归权重reg_weights = anchors.new_zeros((num_anchors,)) # (107136,)if self.norm_by_num_examples: # Falsenum_examples = (labels >= 0).sum()num_examples = num_examples if num_examples > 1.0 else 1.0reg_weights[labels > 0] = 1.0 / num_examples # 将前景anchor的权重赋1else:reg_weights[labels > 0] = 1.0ret_dict = {'box_cls_labels': labels, # (107136,)'box_reg_targets': bbox_targets, # (107136,7) 编码后的结果'reg_weights': reg_weights, # (107136,)}return ret_dict

pcdet/utils/box_coder_utils.py

class ResidualCoder(object):def __init__(self, code_size=7, encode_angle_by_sincos=False, **kwargs):"""loss中anchor和gt的编码与解码7个参数的编码的方式为∆x = (x^gt − xa^da)/d^a , ∆y = (y^gt − ya^da)/d^a , ∆z = (z^gt − za^ha)/h^a∆w = log (w^gt / w^a) ∆l = log (l^gt / l^a) , ∆h = log (h^gt / h^a)∆θ = sin(θ^gt - θ^a)"""super().__init__()self.code_size = code_sizeself.encode_angle_by_sincos = encode_angle_by_sincosif self.encode_angle_by_sincos: # Fasleself.code_size += 1def encode_torch(self, boxes, anchors):"""Args:boxes: (N, 7 + C) [x, y, z, dx, dy, dz, heading, ...]anchors: (N, 7 + C) [x, y, z, dx, dy, dz, heading or *[cos, sin], ...]Returns:"""# 截断anchors的[dx,dy,dz]，每个anchor_box的l, w, h数值如果小于1e-5则为1e-5anchors[:, 3:6] = torch.clamp_min(anchors[:, 3:6], min=1e-5)# 截断boxes的[dx,dy,dz] 每个GT_box的l, w, h数值如果小于1e-5则为1e-5boxes[:, 3:6] = torch.clamp_min(boxes[:, 3:6], min=1e-5)# torch.split(tensor, split_size, dim=)  split_size是切分后每块的大小，不是切分为多少块！，多余的参数使用*cags接收xa, ya, za, dxa, dya, dza, ra, *cas = torch.split(anchors, 1, dim=-1)xg, yg, zg, dxg, dyg, dzg, rg, *cgs = torch.split(boxes, 1, dim=-1)# 计算anchor对角线长度diagonal = torch.sqrt(dxa ** 2 + dya ** 2)# 计算loss的公式，Δx,Δy,Δz,Δw,Δl,Δh,Δθxt = (xg - xa) / diagonal # Δxyt = (yg - ya) / diagonal # Δyzt = (zg - za) / dza # Δzdxt = torch.log(dxg / dxa) # Δwdyt = torch.log(dyg / dya) # Δldzt = torch.log(dzg / dza) # Δh# Falseif self.encode_angle_by_sincos:rt_cos = torch.cos(rg) - torch.cos(ra)rt_sin = torch.sin(rg) - torch.sin(ra)rts = [rt_cos, rt_sin]else:rts = [rg - ra] # Δθcts = [g - a for g, a in zip(cgs, cas)]return torch.cat([xt, yt, zt, dxt, dyt, dzt, *rts, *cts], dim=-1)

Loss

总损失函数最终形式如下：

Ltotal=β1Lcls+β2(Lreg−θ+Lreg−other)+β3LdirL_{total} = β_1 L_{cls} + β_2 ( L_{reg − θ} + L_{reg − other}) + β_3 L_{dir}Ltotal=β1Lcls+β2(Lreg−θ+Lreg−other)+β3Ldir
其中 LclsL_{cls}Lcls 是分类损失，Lreg−otherL_{reg − other}Lreg−other 是位置和维度的回归损失，Lreg−θL_{reg − θ}Lreg−θ 是新的角度损失，LdirL_{dir}Ldir 是方向分类损失。 β1β_1β1 = 1.0、β2β_2β2 = 2.0 和 β3β_3β3 = 0.2 是损失公式的常数系数，使用相对较小的 β3β_3β3 值来避免网络难以识别物体方向的情况。

    def get_loss(self):# 计算classfiction layer的loss，tb_dict内容和cls_loss相同，形式不同，一个是torch.tensor一个是字典值cls_loss, tb_dict = self.get_cls_layer_loss()# 计算regression layer的lossbox_loss, tb_dict_box = self.get_box_reg_layer_loss()# 在tb_dict中添加tb_dict_box，在python的字典中添加值，如果添加的也是字典，用updae方法，如果是键值对则采用赋值的方式tb_dict.update(tb_dict_box)rpn_loss = cls_loss + box_loss# 在tb_dict中添加rpn_loss，此时tb_dict中包含cls_loss,reg_loss和rpn_losstb_dict['rpn_loss'] = rpn_loss.item()return rpn_loss, tb_dict

分类loss

Focal Loss for Classification

为了解决样本中前景和背景类的不平衡，作者引入focal loss，分类损失具有以下形式：

FL(pt)=−αt(1−pt)γlog(pt)FL(p_t)=− α_t ( 1 − p_t ) ^γ log ( p_t )FL(pt)=−αt(1−pt)γlog(pt)

其中 ptp_tpt 是模型的估计概率，α 和 γ 是focal loss的参数。在训练过程中使用 ααα = 0.25 和 γγγ = 2。

pcdet/models/dense_heads/anchor_head_template.py

    def get_cls_layer_loss(self):# (batch_size, 248, 216, 18) 网络类别预测cls_preds = self.forward_ret_dict['cls_preds']# (batch_size, 321408) 前景anchor类别box_cls_labels = self.forward_ret_dict['box_cls_labels']batch_size = int(cls_preds.shape[0])# [batch_szie, num_anchors]--> (batch_size, 321408)# 关心的anchor  选取出前景背景anchor, 在0.45到0.6之间的设置为仍然是-1，不参与loss计算cared = box_cls_labels >= 0  # [N, num_anchors]# (batch_size, 321408) 前景anchorpositives = box_cls_labels > 0# (batch_size, 321408) 背景anchornegatives = box_cls_labels == 0# 将背景anchor赋予权重negative_cls_weights = negatives * 1.0# 将每个anchor分类的损失权重设为1cls_weights = (negative_cls_weights + 1.0 * positives).float()# 每个anchor正样本的回归损失权重reg_weights = positives.float()# 如果只有1类if self.num_class == 1:# class agnosticbox_cls_labels[positives] = 1# 正则化并计算权重 求出每个数据中有多少个正例pos_normalizer = positives.sum(1, keepdim=True).float()# 正则化回归损失reg_weights /= torch.clamp(pos_normalizer, min=1.0)# 正则化分类损失cls_weights /= torch.clamp(pos_normalizer, min=1.0)# care包含了背景和前景的anchor，但是这里只需要得到前景部分的类别即可不关注-1和0# cared.type_as(box_cls_labels) 将cared中为False的那部分不需要计算loss的anchor变成了0# 对应位置相乘后，所有背景和iou介于match_threshold和unmatch_threshold之间的anchor都设置为0cls_targets = box_cls_labels * cared.type_as(box_cls_labels)cls_targets = cls_targets.unsqueeze(dim=-1)cls_targets = cls_targets.squeeze(dim=-1)# (batch_size,321408,4),类别数加1世纪考虑背景one_hot_targets = torch.zeros(*list(cls_targets.shape), self.num_class + 1, dtype=cls_preds.dtype, device=cls_targets.device)# target.scatter(dim, index, src)# scatter_函数的一个典型应用就是在分类问题中，# 将目标标签转换为one-hot编码形式 https://blog.csdn.net/guofei_fly/article/details/104308528# 这里表示在最后一个维度，将cls_targets.unsqueeze(dim=-1)所索引的位置设置为1"""dim=1: 表示按照列进行填充index=batch_data.label:表示把batch_data.label里面的元素值作为下标，去下标对应位置(这里的"对应位置"解释为列，如果dim=0，那就解释为行)进行填充src=1:表示填充的元素值为1"""# (batch_size,321408,4)one_hot_targets.scatter_(-1, cls_targets.unsqueeze(dim=-1).long(), 1.0)# (batch_size,321408,18)-->(batch_size,321408,3)cls_preds = cls_preds.view(batch_size, -1, self.num_class)# (batch_size,321408,3) 不计算背景分类损失one_hot_targets = one_hot_targets[..., 1:]# 计算分类损失cls_loss_src = self.cls_loss_func(cls_preds, one_hot_targets, weights=cls_weights)  # [N, M]# 求和并处以batch_sizecls_loss = cls_loss_src.sum() / batch_size# loss乘以分类权重 cls_weight= 1.0cls_loss = cls_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['cls_weight']tb_dict = {'rpn_loss_cls': cls_loss.item()}return cls_loss, tb_dict

focal loss 类SigmoidFocalClassificationLoss代码在：pcdet/utils/loss_utils.py

class SigmoidFocalClassificationLoss(nn.Module):"""多分类Sigmoid focal cross entropy loss."""def __init__(self, gamma: float = 2.0, alpha: float = 0.25):"""Args:gamma: Weighting parameter to balance loss for hard and easy examples.alpha: Weighting parameter to balance loss for positive and negative examples."""super(SigmoidFocalClassificationLoss, self).__init__()self.alpha = alpha  # 0.25self.gamma = gamma  # 2.0@staticmethoddef sigmoid_cross_entropy_with_logits(input: torch.Tensor, target: torch.Tensor):""" PyTorch Implementation for tf.nn.sigmoid_cross_entropy_with_logits:max(x, 0) - x * z + log(1 + exp(-abs(x))) inhttps://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logitsArgs:input: (B, #anchors, #classes) float tensor.Predicted logits for each classtarget: (B, #anchors, #classes) float tensor.One-hot encoded classification targetsReturns:loss: (B, #anchors, #classes) float tensor.Sigmoid cross entropy loss without reduction"""loss = torch.clamp(input, min=0) - input * target + \torch.log1p(torch.exp(-torch.abs(input)))return lossdef forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor):"""Args:input: (B, #anchors, #classes) float tensor. eg:(4, 321408, 3)Predicted logits for each class :一个anchor会预测三种类别target: (B, #anchors, #classes) float tensor. eg:(4, 321408, 3)One-hot encoded classification targets，:真值weights: (B, #anchors) float tensor. eg:(4, 321408)Anchor-wise weights.Returns:weighted_loss: (B, #anchors, #classes) float tensor after weighting."""pred_sigmoid = torch.sigmoid(input)  # (batch_size, 321408, 3) f(x) = 1 / (1 + e^(-x))# 这里的加权主要是解决正负样本不均衡的问题:正样本的权重为0.25，负样本的权重为0.75# 交叉熵来自KL散度，衡量两个分布之间的相似性，针对二分类问题：# 合并形式: L = -(y * log(y^) + (1 - y) * log(1 - y^)) <--># 分段形式:y = 1, L = -y * log(y^); y = 0, L = -(1 - y) * log(1 - y^)# 这两种形式等价，只要是0和1的分类问题均可以写成两种等价形式，针对focal loss做类似处理# 相对熵 = 信息熵 + 交叉熵， 且交叉熵是凸函数，求导时能够得到全局最优值-->(sigma(s)- y)x# https://zhuanlan.zhihu.com/p/35709485alpha_weight = target * self.alpha + (1 - target) * (1 - self.alpha)  # (4, 321408, 3)pt = target * (1.0 - pred_sigmoid) + (1.0 - target) * pred_sigmoidfocal_weight = alpha_weight * torch.pow(pt, self.gamma)# (batch_size, 321408, 3) 交叉熵损失的一种变形,具体推到参考上面的链接bce_loss = self.sigmoid_cross_entropy_with_logits(input, target)loss = focal_weight * bce_loss  # (batch_size, 321408, 3)if weights.shape.__len__() == 2 or \(weights.shape.__len__() == 1 and target.shape.__len__() == 2):weights = weights.unsqueeze(-1)assert weights.shape.__len__() == loss.shape.__len__()# weights参数使用正anchor数目进行平均，使得每个样本的损失与样本中目标的数量无关return loss * weights

回归loss

以前的角度回归方法，包括角编码、直接编码和矢量编码，通常表现不佳。角点预测方法无法确定对象的方向，也不能用于行人检测，因为 BEV 框几乎是正方形的。矢量编码方法保留了冗余信息，导致难以基于 LiDAR 检测远处的物体。 VoxelNet 直接预测弧度偏移，但在 0 和 π 弧度的情况下会遇到对抗性示例问题，因为这两个角度对应于同一个框。作者引入新的角度损失回归解决该问题

在VoxelNet中，一个3D BBox被建模为一个7维向量表示，分别为(xc,yc,zc,l,w,h,θ)(x_c,y_c,z_c,l,w,h,θ)(xc,yc,zc,l,w,h,θ)，训练过程中，对这7个变量采用Smooth L1损失进行回归训练。当同一个3D检测框的预测方向恰好与真实方向相反的时候，上述的7个变量的前6个变量的回归损失较小，而最后一个方向的回归损失会很大，这其实并不利于模型训练。为了解决这个问题，作者引入角度回归的正弦误差损失，定义如下：

Lθ=SmoothL1(sin(θp−θt))L_θ = SmoothL1 ( sin ( θ_p − θ_t ))Lθ=SmoothL1(sin(θp−θt))

θpθ_pθp为预测的方向角，θtθ_tθt为真实的方向角。那么当θpθ_pθp 与θtθ_tθt相差π\piπ的时候，该损失趋向于0，这样更利于模型训练。

那这样的话，模型预测方向很可能与真实方向相差180度，为了解决这种损失将具有相反方向的框视为相同的问题，在 RPN 的输出中添加了一个简单的方向分类器，该方向分类器使用 softmax 损失函数。围绕z轴的偏航旋转高于0，则结果为正；否则为负数。

pcdet/models/dense_heads/anchor_head_template.py

    def get_box_reg_layer_loss(self):# (batch_size, 248, 216, 42） anchor_box的7个回归参数box_preds = self.forward_ret_dict['box_preds']# (batch_size, 248, 216, 12） anchor_box的方向预测box_dir_cls_preds = self.forward_ret_dict.get('dir_cls_preds', None)# (batch_size, 321408, 7) 每个anchor和GT编码的结果box_reg_targets = self.forward_ret_dict['box_reg_targets']# (batch_size, 321408) 得到每个box的类别box_cls_labels = self.forward_ret_dict['box_cls_labels']batch_size = int(box_preds.shape[0])# 获取所有anchor中属于前景anchor的mask  shape : (batch_size, 321408)positives = box_cls_labels > 0# 设置回归参数为1.    [True, False] * 1. = [1., 0.]reg_weights = positives.float()  # (4, 211200) 只保留标签>0的值# 同cls处理pos_normalizer = positives.sum(1,keepdim=True).float()  # (batch_size, 1) 所有正例的和 eg:[[162.],[166.],[155.],[108.]]reg_weights /= torch.clamp(pos_normalizer, min=1.0)  # (batch_size, 321408)if isinstance(self.anchors, list):if self.use_multihead:anchors = torch.cat([anchor.permute(3, 4, 0, 1, 2, 5).contiguous().view(-1, anchor.shape[-1]) for anchor inself.anchors], dim=0)else:anchors = torch.cat(self.anchors, dim=-3)  # (1, 248, 216, 3, 2, 7)else:anchors = self.anchors# (1, 248*216, 7） --> (batch_size, 248*216, 7）anchors = anchors.view(1, -1, anchors.shape[-1]).repeat(batch_size, 1, 1)# (batch_size, 248*216, 7）box_preds = box_preds.view(batch_size, -1,box_preds.shape[-1] // self.num_anchors_per_location if not self.use_multihead elsebox_preds.shape[-1])# sin(a - b) = sinacosb-cosasinb# (batch_size, 321408, 7)    分别得到sinacosb和cosasinbbox_preds_sin, reg_targets_sin = self.add_sin_difference(box_preds, box_reg_targets)loc_loss_src = self.reg_loss_func(box_preds_sin, reg_targets_sin, weights=reg_weights)loc_loss = loc_loss_src.sum() / batch_sizeloc_loss = loc_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['loc_weight']  # loc_weight = 2.0 损失乘以回归权重box_loss = loc_losstb_dict = {# pytorch中的item()方法，返回张量中的元素值，与python中针对dict的item方法不同'rpn_loss_loc': loc_loss.item()}# 如果存在方向预测，则添加方向损失if box_dir_cls_preds is not None:# (batch_size, 321408, 2) 此处生成每个anchor的方向分类dir_targets = self.get_direction_target(anchors, box_reg_targets,dir_offset=self.model_cfg.DIR_OFFSET,  # 方向偏移量 0.78539 = π/4num_bins=self.model_cfg.NUM_DIR_BINS  # BINS的方向数 = 2)# 方向预测值 (batch_size, 321408, 2)dir_logits = box_dir_cls_preds.view(batch_size, -1, self.model_cfg.NUM_DIR_BINS)# 只要正样本的方向预测值 (batch_size, 321408)weights = positives.type_as(dir_logits)# (4, 211200) 除正例数量，使得每个样本的损失与样本中目标的数量无关weights /= torch.clamp(weights.sum(-1, keepdim=True), min=1.0)# 方向损失计算dir_loss = self.dir_loss_func(dir_logits, dir_targets, weights=weights)dir_loss = dir_loss.sum() / batch_size# 损失权重，dir_weight: 0.2dir_loss = dir_loss * self.model_cfg.LOSS_CONFIG.LOSS_WEIGHTS['dir_weight']# 将方向损失加入box损失box_loss += dir_losstb_dict['rpn_loss_dir'] = dir_loss.item()return box_loss, tb_dict

add_sin_difference函数：

    def add_sin_difference(boxes1, boxes2, dim=6):# 针对角度添加sin损失，有效防止-pi和pi方向相反时损失过大assert dim != -1  # 角度=180°×弧度÷π   弧度=角度×π÷180°# (batch_size, 321408, 1)  torch.sin() - torch.cos() 的 input (Tensor) 都是弧度制数据，不是角度制数据。rad_pred_encoding = torch.sin(boxes1[..., dim:dim + 1]) * torch.cos(boxes2[..., dim:dim + 1])# (batch_size, 321408, 1)rad_tg_encoding = torch.cos(boxes1[..., dim:dim + 1]) * torch.sin(boxes2[..., dim:dim + 1])# (batch_size, 321408, 7) 将编码后的结果放回boxes1 = torch.cat([boxes1[..., :dim], rad_pred_encoding, boxes1[..., dim + 1:]], dim=-1)# (batch_size, 321408, 7) 将编码后的结果放回boxes2 = torch.cat([boxes2[..., :dim], rad_tg_encoding, boxes2[..., dim + 1:]], dim=-1)return boxes1, boxes2

WeightsSmoothL1损失函数，代码在：pcdet/utils/loss_utils.py

class WeightedSmoothL1Loss(nn.Module):"""Code-wise Weighted Smooth L1 Loss modified based on fvcore.nn.smooth_l1_losshttps://github.com/facebookresearch/fvcore/blob/master/fvcore/nn/smooth_l1_loss.py| 0.5 * x ** 2 / beta   if abs(x) < betasmoothl1(x) = || abs(x) - 0.5 * beta   otherwise,where x = input - target."""def __init__(self, beta: float = 1.0 / 9.0, code_weights: list = None):"""Args:beta: Scalar float.L1 to L2 change point.For beta values < 1e-5, L1 loss is computed.code_weights: (#codes) float list if not None.Code-wise weights."""super(WeightedSmoothL1Loss, self).__init__()self.beta = beta  # 默认值1/9=0.111if code_weights is not None:self.code_weights = np.array(code_weights, dtype=np.float32)  # [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]self.code_weights = torch.from_numpy(self.code_weights).cuda()  # 将权重放到GPU上@staticmethoddef smooth_l1_loss(diff, beta):# 如果beta非常小，则直接用abs计算，否则按照正常的Smooth L1 Loss计算if beta < 1e-5:loss = torch.abs(diff)else:n = torch.abs(diff)  # (batch_size, 321408, 7)# smoothL1公式，如上面所示 --> (batch_size, 321408, 7)loss = torch.where(n < beta, 0.5 * n ** 2 / beta, n - 0.5 * beta)return lossdef forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor = None):"""Args:input: (B, #anchors, #codes) float tensor.Ecoded predicted locations of objects.target: (B, #anchors, #codes) float tensor.Regression targets.weights: (B, #anchors) float tensor if not None.Returns:loss: (B, #anchors) float tensor.Weighted smooth l1 loss without reduction."""# 如果target为nan,则等于input,否则等于targettarget = torch.where(torch.isnan(target), input, target)  # ignore nan targets# (batch_size, 321408, 7)diff = input - target  # (batch_size, 321408, 7)# code-wise weightingif self.code_weights is not None:diff = diff * self.code_weights.view(1, 1, -1)  #(batch_size, 321408, 7) 乘以box每一项的权重loss = self.smooth_l1_loss(diff, self.beta)# anchor-wise weightingif weights is not None:assert weights.shape[0] == loss.shape[0] and weights.shape[1] == loss.shape[1]# weights参数使用正anchor数目进行平均，使得每个样本的损失与样本中目标的数量无关loss = loss * weights.unsqueeze(-1)return loss

数据增强

Sample Ground Truths from the Database
在训练过程中遇到的主要问题是存在的ground truths太少，限制了网络的收敛速度和最终性能。

SECOND提出了对GT进行采样截取，生成GT的Database，该方法在后续的很多网络中都得到了使用。

首先，从训练数据集中生成了一个数据库，其包含所有ground truths的标签及其相关的点云数据
然后，在训练过程中，从这个数据库中随机选择了几个ground truths，引入到当前的训练点云中使用这种方法，可以大大增加每个点云的ground truths数量并模拟不同环境中存在的对象。
最后，对所有的GT进行碰撞检测，防止放入的GT会相互碰撞，产生不可能在物理世界中出现的结果，并将碰撞的采样GT删除

该方法的加快了网络的收敛速度和提升了网络精度，对比图如下：

Object Noise

为了考虑噪声，我们采用了与 VoxelNet 中使用的相同方法，其中每个GT及其点云都是独立且随机变换的，而不是使用相同参数变换所有点云。Second使用从均匀分布 ∆θ ∈ [− π/2, π/2 ] 中采样的随机旋转和从均值为 0 ，标准差为 1.0 的高斯分布中采样的随机线性变换。

全局旋转和缩放

Second将全局缩放和旋转应用于整个点云和所有真实框。缩放噪声取自均匀分布 [0.95, 1.05]，[− π/4, π/4] 用于全局旋转噪声。

class DataAugmentor(object):def __init__(self, root_path, augmentor_configs, class_names, logger=None):self.root_path = root_pathself.class_names = class_namesself.logger = loggerself.data_augmentor_queue = []# cfg.DATA_CONFIG.DATA_AUGMENTOR.AUG_CONFIG_LIST: # [{'NAME': 'gt_sampling', 'USE_ROAD_PLANE': False, 'DB_INFO_PATH': ['kitti_dbinfos_train.pkl'],#  'PREPARE': {'filter_by_min_points': ['Car:5', 'Pedestrian:5', 'Cyclist:5'], 'filter_by_difficulty': [-1]},#  'SAMPLE_GROUPS': ['Car:20', 'Pedestrian:15', 'Cyclist:15'], 'NUM_POINT_FEATURES': 4, 'DATABASE_WITH_FAKELIDAR': False, # 'REMOVE_EXTRA_WIDTH': [0.0, 0.0, 0.0], 'LIMIT_WHOLE_SCENE': True}, {'NAME': 'random_world_flip', 'ALONG_AXIS_LIST': ['x']},#  {'NAME': 'random_world_rotation', 'WORLD_ROT_ANGLE': [-0.78539816, 0.78539816]}, # {'NAME': 'random_world_scaling', 'WORLD_SCALE_RANGE': [0.95, 1.05]}]aug_config_list = augmentor_configs if isinstance(augmentor_configs, list) \else augmentor_configs.AUG_CONFIG_LISTfor cur_cfg in aug_config_list:if not isinstance(augmentor_configs, list):# gt_sampling,random_world_flip,random_world_rotation,random_world_scalingif cur_cfg.NAME in augmentor_configs.DISABLE_AUG_LIST:continuecur_augmentor = getattr(self, cur_cfg.NAME)(config=cur_cfg)self.data_augmentor_queue.append(cur_augmentor)def gt_sampling(self, config=None):db_sampler = database_sampler.DataBaseSampler(root_path=self.root_path,  # ../data/kittisampler_cfg=config,class_names=self.class_names,logger=self.logger)return db_samplerdef random_world_flip(self, data_dict=None, config=None):if data_dict is None:return partial(self.random_world_flip, config=config)gt_boxes, points = data_dict['gt_boxes'], data_dict['points']for cur_axis in config['ALONG_AXIS_LIST']:assert cur_axis in ['x', 'y']gt_boxes, points = getattr(augmentor_utils, 'random_flip_along_%s' % cur_axis)(gt_boxes, points,)data_dict['gt_boxes'] = gt_boxesdata_dict['points'] = pointsreturn data_dictdef random_world_rotation(self, data_dict=None, config=None):if data_dict is None:return partial(self.random_world_rotation, config=config)rot_range = config['WORLD_ROT_ANGLE']if not isinstance(rot_range, list):rot_range = [-rot_range, rot_range]gt_boxes, points = augmentor_utils.global_rotation(data_dict['gt_boxes'], data_dict['points'], rot_range=rot_range)data_dict['gt_boxes'] = gt_boxesdata_dict['points'] = pointsreturn data_dictdef random_world_scaling(self, data_dict=None, config=None):if data_dict is None:return partial(self.random_world_scaling, config=config)gt_boxes, points = augmentor_utils.global_scaling(data_dict['gt_boxes'], data_dict['points'], config['WORLD_SCALE_RANGE'])data_dict['gt_boxes'] = gt_boxesdata_dict['points'] = pointsreturn data_dict

消融研究

改进的稀疏卷积方法的性能
将其与 SparseConvNet中的原始实现进行了比较。可以看出，Second的稀疏卷积方法比原始实现更快，因为它的规则生成速度更快。
对角度编码的改进

vector是AVOD中的角度编码实现。

采样Ground Truths以加快收敛

SECOND——论文与代码解析相关推荐

ECCV2020｜图像重建(超分辨率，图像恢复，去雨，去雾等)相关论文汇总（附论文链接/代码/解析）
转载自https://zhuanlan.zhihu.com/p/180551773 原帖地址: ECCV2020|图像重建/底层视觉(超分辨率,图像恢复,去雨,去雾,去模糊,去噪等)相关论文汇总(附论 ...
Adaptive Personalized Federated Learning 论文解读+代码解析
论文地址点这里一. 介绍联邦学习强调确保本地隐私情况下,对多个客户端进行训练,客户端之间不交换数据而交换参数来进行通信.目的是聚合成一个全局的模型,使得这个模型再各个客户端上读能取得较好的成果.联 ...
Contextual Transformer Networks for Visual Recognition论文以及代码解析
Contextual Transformer Networks for Visual Recognition 1. Abstract 2. Introduction 3. Approach 3.1. ...
EfficientNet-V2 论文以及代码解析
参看视频:论文解析论文地址:论文地址源码地址:tensorflow官方源码 pytorch代码地址:pytorch代码这里的代码解析参考的是博主噼里啪啦的源码,下面对EfficientNetV2 ...
Decoupled Knowledge Distillation论文阅读+代码解析
本文来自2022年CVPR的文章,论文地址点这里一. 介绍知识蒸馏(KD)的通过最小化师生预测对数之间的KL-Divergence来传递知识(下图a).目前大部分的研究注意力都被吸引到从中间层的深 ...
Gradient Episodic Memory for Continual Learning 论文阅读+代码解析
一. 介绍在开始进行监督学习的时候我们需要收集一个训练集 D t r = { ( x i , y i ) } i = 1 n D_{tr}=\{(x_i,y_i)\}^n_{i=1} Dtr={( ...
ONLINE CORESET SELECTION FOR REHEARSAL-BASED CONTINUAL LEARNING 论文阅读+代码解析
本篇依旧是针对持续学习的工作,也是FedWEIT的团队进行的研究,论文地址点这里一. 介绍(简单描述) 在持续学习中为了应对灾难性遗忘,常见的方法有基于正则化.基于记忆重塑以及基于动态架构.其中基于 ...
Data-Free Knowledge Distillation for Heterogeneous Federated Learning论文阅读+代码解析
论文地址点这里一. 介绍联邦学习具有广阔的应用前景,但面临着来自数据异构的挑战,因为在现实世界中用户数据均为Non-IID分布的.在这样的情况下,传统的联邦学习算法可能会导致无法收敛到各个客户端的 ...
3D目标检测 | PointPillars论文和代码解析
作者丨周威@知乎来源丨https://zhuanlan.zhihu.com/p/357626425 编辑丨3D视觉工坊 1 前言本文要解析的模型叫做PointPillars,是2019年出自工业界 ...
Personalized Federated Learning with Moreau Envelopes论文阅读+代码解析
(好久没更新文章啦,现在开学继续肝) 论文地址点这里一. 介绍尽管FL具有数据隐私和减少通信的优势,但它面临着影响其性能和收敛速度的主要挑战:统计多样性,这意味着客户之间的数据分布是不同的(即非i ...

SECOND——论文与代码解析

简介