从零实现一个3D目标检测算法（3）：PointPillars主干网实现（持续更新中）

在上一篇文章《从零实现一个3D目标检测算法（2）：点云数据预处理》我们完成了对点云数据的预处理。

从本篇文章，我们开始正式实现PointPillars网络，我们将按照本系列第一篇文章介绍的网络具体结构来实现。

文章目录

1. Pytorch基本模块
- 1.1 Empty模块
- 1.2 Sequential网络模块
2. Pillar Feature Net 实现
- 2.1 VFE模块
- 2.2 Pillar Scatter模块
3.

1. Pytorch基本模块

在工程上，为了方便对网络的搭建和修改，通常会基于Pytorch实现两个基本模块，空网络层模块（Empty）和序列网络模块（Sequential），文件为pytorch_utils.py。

1.1 Empty模块

顾名思义，就是构造一个什么也不做的网络层，当然在这里的具体作用只是让网络更加完整（具体使用我们后面会见到），但在这里未参与计算，如果需要参与计算，也可对其进行修改，其构造也比较简单，就是创建一个名为Empty的类，它会继承nn.Module，代码为：

import torch
import torch.nn as nn
import sys
from collections import OrderedDictclass Empty(torch.nn.Module):def __init__(self, *args, **kwargs):super(Empty, self).__init__()def forward(self, *args, **kwargs):if len(args) == 1:return args[0]elif len(args) == 0:return Nonereturn args

在代码中，*args用来将参数打包成tuple给函数体调用，**kwargs 打包关键字参数成dict给函数体调用。而在编写函数中，参数arg、*args、**kwargs三个参数的位置是一定的。必须是(arg,*args,**kwargs)这个顺序，否则程序会报错，大家可以运行下面的代码来看看输出结果：

def function(arg,*args,**kwargs):print(arg,args,kwargs)function(6,7,8,9,a=1, b=2, c=3)

1.2 Sequential网络模块

Pytorch本身就带有此模块，这里之所以要单独介绍，是因为在配置文件中，网络的各种超参数是通过字典所给出的，这里重新编写，方便我们后面加载网路超参数，这样我们修改网络模型时，只需要修改字典中的参数即可，我们创建一个名为Sequential的类，代码为：

class Sequential(torch.nn.Module):"""A sequential container.Modules will be added to it in the order they are passed in the constructor.Alternatively, an ordered dict of modules can also be passed in."""def __init__(self, *args, **kwargs):super(Sequential, self).__init__()if len(args) == 1 and isinstance(args[0], OrderedDict):for key, module in args[0].items():self.add_module(key, module)else:for idx, module in enumerate(args):self.add_module(str(idx), module)for name, module in kwargs.items():if sys.version_info < (3, 6):raise ValueError("kwargs only supported in py36+")if name in self._modules:raise ValueError("name exists.")self.add_module(name, module)def __getitem__(self, idx):if not (-len(self) <= idx and idx < len(self)):raise IndexError('index {} is out of range'.format(idx))if idx < 0:idx += len(self)it = iter(self._modules.values())for i in range(idx):next(it)return next(it)def __len__(self):return len(self._modules)def add(self, module, name=None):if name is None:name = str(len(self._modules))if name in self._modules:raise KeyError("name exists")self.add_module(name, module)def forward(self, input):for module in self._modules.values():input = module(input)return input

下面提供 Sequential类使用的三个例子，其效果是等价的，大家可以运行看看输出结果：

 model = Sequential(nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU())model = Sequential(OrderedDict([('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU()),('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU())]))model = Sequential(conv1=nn.Conv2d(1,20,5), relu1=nn.ReLU(),conv2=nn.Conv2d(20,64,5),relu2=nn.ReLU())

2. Pillar Feature Net 实现

现在，我们开始实现PointPillars网络的第一部分Feature Net，这一部分主要是生成伪图像，包括两个模块VFE模块和Pillar Scatter模块，文件为vfe_utils.py。

2.1 VFE模块

VFE模块的作用是将散乱无序的点云划分为一个个Pillar，然后对其进行特征学习，如下图所示。
首先我们导入需要的包，包括Pytorch以及上一节我们写的Empty类。

import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
sys.path.append('../')
from ..model_utils.pytorch_utils import Empty

首先我们定义一个VoxelFeatureExtractor类，不过这里本身并不会进行任何操作：

class VoxelFeatureExtractor(nn.Module):def __init__(self, **kwargs):super().__init__()def get_output_feature_dim(self):raise NotImplementedErrordef forward(self, **kwargs):raise NotImplementedError

然后我们定义一个paddings_indicator函数。

def get_paddings_indicator(actual_num, max_num, axis=0):"""Create boolean mask by actually number of a padded tensor.Args:actual_num ([type]): [description]max_num ([type]): [description]Returns:[type]: [description]"""actual_num = torch.unsqueeze(actual_num, axis+1)   print('actual_num shape is: ', actual_num.shape)   # tiled_actual_num: [N, M, 1]max_num_shape = [1] * len(actual_num.shape)max_num_shape[axis+1] = -1max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)# tiled_actual_num: [[3,3,3,3,3], [4,4,4,4,4], [2,2,2,2,2]]# tiled_max_num: [[0,1,2,3,4], [0,1,2,3,4], [0,1,2,3,4]]paddings_indicator = actual_num.int() > max_num# paddings_indicator shape: [batch_size, max_num]return paddings_indicator

然后，我们定义一个PFNLayer类，这是一个简化的PointNet层，输入特征为10，输出特征为64，网络是论文中提出的线性网络，只有一层，代码如下：

class PFNLayer(nn.Module):def __init__(self, in_channels, out_channels, use_norm=True, last_layer=False):"""Pillar Feature Net Layer.The Pillar Feature Net could be composed of a series of these layers, but the PointPillars paper resultsonly used a single PFNLayer.:param in_channels: <int>. Number of input channels.      :param out_channels: <int>. Number of output channels.    :param use_norm: <bool>. Whether to include BatchNorm.    :param last_layer: <bool>. If last_layer, there is no concatenation of features."""super().__init__()self.name = 'PFNLayer'self.last_vfe = last_layer            if not self.last_vfe:out_channels = out_channels // 2self.units = out_channels             if use_norm:                          self.linear = nn.Linear(in_channels, self.units, bias=False)self.norm = nn.BatchNorm1d(self.units, eps=1e-3, momentum=0.01)else:self.linear = nn.Linear(in_channels, self.units, bias=True)self.norm = Empty(self.units)def forward(self, inputs):x = self.linear(inputs)total_points, voxel_points, channels = x.shapex = self.norm(x.view(-1, channels)).view(total_points, voxel_points, channels)x = F.relu(x)x_max = torch.max(x, dim=1, keepdim=True)[0]      if self.last_vfe:return x_max                                  else:x_repeat = x_max.repeat(1, inputs_shape[1], 1)x_concatenated = torch.cat([x, x_repeat], dim=2)return x_concatenated

下面我们将实现PillarFeatureNetOld2类，这里的作用是生成一个个Pillar，并将点云原来的4维特征(x,y,z,r)(x,y,z,r)(x,y,z,r)扩充为10维特征(x,y,z,r,xc,yc,zc,xp,yp,zp)(x,y,z,r, x_c,y_c,z_c,x_p,y_p,z_p)(x,y,z,r,xc,yc,zc,xp,yp,zp)，代码如下：

class PillarFeatureNetOld2(VoxelFeatureExtractor):def __init__(self, num_input_features=4, use_norm=True, num_filters=(64, ), with_distance=False,voxel_size=(0.2, 0.2, 4), pc_range=(0, -40, -3, 70.4, 40, 1)):"""Pillar Feature Net.The network prepares the pillar features and performs forward pass through PFNLayers.:param num_input_features: <int>. Number of input features, either x, y, z or x, y, z, r.           :param use_norm: <bool>. Whether to include BatchNorm.:param num_filters: (<int>: N). Number of features in each of the N PFNLayers.:param with_distance: <bool>. Whether to include Euclidean distance to points.:param voxel_size: (<float>: 3). Size of voxels, only utilize x and y size.                         :param pc_range: (<float>: 6). Point cloud range, only utilize x and y min.                        """super().__init__()self.name = 'PillarFeatureNetOld2'assert len(num_filters) > 0num_input_features +=6         if with_distance:              num_input_features += 1    self.with_distance = with_distanceself.num_filters = num_filters# Create PillarFeatureNetOld layersnum_filters = [num_input_features] + list(num_filters)    pfn_layers = []for i in range(len(num_filters) - 1):     in_filters = num_filters[i]           out_filters = num_filters[i+1]         if i < len(num_filters) - 2:last_layer = Falseelse:last_layer = True                  pfn_layers.append(PFNLayer(in_filters, out_filters, use_norm, last_layer=last_layer))self.pfn_layers = nn.ModuleList(pfn_layers)# Need pillar (voxel) size and x/y offset in order to calculate pillar offsetself.vx = voxel_size[0]self.vy = voxel_size[1]self.vz = voxel_size[2]self.x_offset = self.vx / 2 + pc_range[0]self.y_offset = self.vy / 2 + pc_range[1]self.z_offset = self.vz / 2 + pc_range[2]def get_output_feature_dim(self):return self.num_filters[-1]         # 64def forward(self, features, num_voxels, coords):""":param features: (N, max_points_of_each_voxel, 3 + C):param num_voxels: (N):param coors: (z ,y, x):return:"""dtype = features.dtype# Find distance of x, y, and z from cluster center (x, y, z mean)points_mean = features[:, :, :3].sum(dim=1, keepdim=True) / num_voxels.type_as(features).view(-1, 1, 1)print('points_mean shape is: ', points_mean.shape)      f_cluster = features[:, :, :3] - points_mean# Find distance of x, y, and z from pillar centerf_center = torch.zeros_like(features[:, :, :3])f_center[:, :, 0] = features[:, :, 0] - (coords[:, 3].to(dtype).unsqueeze(1) * self.vx + self.x_offset)f_center[:, :, 1] = features[:, :, 1] - (coords[:, 2].to(dtype).unsqueeze(1) * self.vy + self.y_offset)f_center[:, :, 2] = features[:, :, 2] - (coords[:, 1].to(dtype).unsqueeze(1) * self.vz + self.z_offset)print('f_center shape is: ', f_center.shape)          # Combine together feature decorationsfeatures_ls = [features, f_cluster, f_center]if self.with_distance:           # Falsepoints_dist = torch.norm(features[:, :, :3], 2, 2, keepdim=True)features_ls.append(points_dist)features = torch.cat(features_ls, dim=-1)# The feature decorations were calculated without regard to whether pillar was empty. # Need to ensure that empty pillars remain set to zeros.voxel_count = features.shape[1]mask = get_paddings_indicator(num_voxels, voxel_count, axis = 0)mask = torch.unsqueeze(mask, -1).type_as(features)features *= maskprint('161 features shape is: ', features.shape)  # Forward pass through PFNLayersfor pfn in self.pfn_layers:features = pfn(features)return features.squeeze()

2.2 Pillar Scatter模块

此模块生成伪造图像，图像维度为(1,64,496,432)(1, 64, 496, 432)(1,64,496,432)，文件为pillar_scatter.py：

import torch
import torch.nn as nnclass PointPillarsScatter(nn.Module):def __init__(self, input_channels=64, **kwargs):"""Point Pillar's Scatter.Converts learned features from dense tensor to sparse pseudo image.:param output_shape: ([int]: 4). Required output shape of features.:param num_input_features: <int>. Number of input features."""super().__init__()self.nchannels = input_channelsdef forward(self, voxel_features, coords, batch_size, **kwargs):output_shape = kwargs['output_shape']nz, ny, nx = output_shape# batch_canvas will be the final output.batch_canvas = []for batch_itt in range(batch_size):# Create the canvas for this samplecanvas = torch.zeros(self.nchannels, nz*nx*ny, dtype=voxel_features.dtype, \device=voxel_features.device)# Only include non-empty pillarsbatch_mask = coords[:, 0] == batch_itt this_coords = coords[batch_mask, :]indices = this_coords[:, 1].type(torch.long) * nz + this_coords[:, 2].type(torch.long) * nx + \this_coords[:, 3].type(torch.long)indices = indices.type(torch.long)voxels = voxel_features[batch_mask, :]voxels = voxels.t()# Now scatter the blob back to the canvas.canvas[:, indices] = voxels # Append to a list for later stacking.batch_canvas.append(canvas)# Stack to 3-dim tensor (batch-size, nchannels, nrows*ncols)batch_canvas = torch.stack(batch_canvas, 0)# Undo the column stacking to final 4-dim tensorbatch_canvas = batch_canvas.view(batch_size, self.nchannels * nz, ny, nx)return batch_canvas

至此，我们已经实现了特征网络部分，生成了伪图像。

3. 从零实现一个3D目标检测算法（3）：PointPillars主干网实现（持续更新中）相关推荐

从零实现一个3D目标检测算法（2）：点云数据预处理
在上一篇文章<从零实现一个3D目标检测算法(1):3D目标检测概述>对3D目标检测研究现状和PointPillars模型进行了介绍,在本文中我们开始写代码一步步实现PointPillars ...
激光点云3D目标检测算法之PointPillars
前言 <PointPillars: Fast Encoders for Object Detection from Point Clouds>是一篇发表在CVPR 2019上关于激光点云3 ...
论文篇 | 2020-Facebook-DETR ：利用Transformers端到端的目标检测=＞翻译及理解（持续更新中）
论文题目:End-to-End Object Detection with Transformers 2020 论文复现可参考:项目复现 | DETR:利用transformers端到端的目标检测_夏 ...
基于激光雷达点云的3D目标检测算法论文总结
作者丨eyesighting@知乎来源丨https://zhuanlan.zhihu.com/p/508859024 编辑丨3D视觉工坊前言过去很多年激光雷达的车规标准和高昂价格是阻碍其量产落 ...
双目立体视觉建立深度图_从单幅图像到双目立体视觉的3D目标检测算法
原创声明:本文为 SIGAI 原创文章,仅供个人学习使用,未经允许,不能用于商业目的. 其它机器学习.深度学习算法的全面系统讲解可以阅读<机器学习-原理.算法与应用>,清华大学出版社,雷明 ...
史上最全综述：3D目标检测算法汇总！
来源:自动驾驶之心本文约16000字,建议阅读10+分钟本文将演示如何通过阈值调优来提高模型的性能.本文的结构安排如下:首先,第2节中介绍了3D目标检测问题的定义.数据集和评价指标.然后,我们回顾 ...
3D单目(mono 3D)目标检测算法综述
layout: post title: 3D单目(mono 3D)目标检测算法综述 date: 2021-01-22 22:08:39.000000000 +09:00 categories: [算法 ...
一文尽览 | 基于点云、多模态的3D目标检测算法综述！（Point/Voxel/Point-Voxel）
点击下方卡片,关注"自动驾驶之心"公众号 ADAS巨卷干货,即可获取点击进入→自动驾驶之心技术交流群后台回复[ECCV2022]获取ECCV2022所有自动驾驶方向论文! 目前 ...
万字长文概述单目3D目标检测算法
一,理论基础-相机与图像相机将三维世界中的坐标点(单位为米)映射到二维图像平面(单位为像素)的过程能够用一个几何模型进行描述,这个模型有很多种,其中最简单的称为针孔相机模型.相机的成像过程是也一个射 ...