PointNet网络详解

一、点云数据的特点

无序性
近密远疏
非结构化数据
局部结构语义

二、PointNet

2.1 PointNet思想

传统的卷积神经网络是对图像像素进行卷积，如果在不同方向将点云数据进行投影，再利用卷积神经网络也能实现分割，但是算法复杂且效果不好。PointNet考虑的是直接输入点云数据，实现一个端到端的网络。

但是点云数据不同于图像数据，首先是点具有置换不变性，即交换任意点之间的位置，不会对整体造成影响(不考虑回波和辐射强度时)。

PointNet需要满足这种不变性，即:
f(x1,x2,...,xn)≡f(xπ1,xπ2,...,xπn)f(x_1,x_2,...,x_n)\equiv f(x_{\pi1},x_{\pi2},...,x_{\pi n}) f(x1,x2,...,xn)≡f(xπ1,xπ2,...,xπn)
其实有很多种方式能满足这种不变性，PointNet采用了最大池化(max函数)来实现。但是该方法太过一刀切了，会丢失很多的信息。为了确保信息量，PointNet采用升维的方式，构建更多的隐含信息。

最简单的就是通过全连接层或者卷积进行特征提取。

其网络架构如下：

大致流程为：

输入一个包含n个点云的集合，表示为n*3的tensor，三个维度分别对应xyz坐标。
输入的数据一般需要跟一个T-Net学习到的转移矩阵相乘来对其，这样保证了模型的对特定空间转换的不变性。
最终利用maxpooling在各个维度上操作得到全局特征。

对于分类工作，是将输入数据先做一个数据增强，然后利用多层感知机进行升维，在获取到1024维的信息之后，采用最大池化获取全局信息。接着在对这1024维进行降维到k维，进行分类。

而针对分割任务，则是将全局高维信息与64维的低维信息融合(等于64维后面直接黏上1024维)后做降维。

2.2 PointNet实现

本文代码参考自：https://link.zhihu.com/?target=https%3A//github.com/yanx27/Pointnet_Pointnet2_pytorch.

2.2.1 T-Net

T-Net是用来模拟模型对特定空间转换的不变性，在原文中给出了如下的解释：The semantic labeling of a point cloud has to be invariant if the point cloud undergoes certain geometric transformations, such as rigid transformation. We therefore expect that the learnt representation by our point set is invariant to these transformations.

本质上是做了刚体变换(Rigid Transformation)，即变换前后两点间距离仍保持不变。具体原理可以参考文章。

class STN3d(nn.Module):# T-Net在三维情况下def __init__(self,channel):super(STN3d, self).__init__()self.conv1=nn.Conv1d(channel,64,1)self.conv2=nn.Conv1d(64,128,1)self.conv3=nn.Conv1d(128,1024,1)self.fc1=nn.Linear(1024,512)self.fc2=nn.Linear(512,256)# 这里的9是3*3变换来的self.fc3=nn.Linear(256,9)self.relu=nn.ReLU()self.bn1=nn.BatchNorm1d(64)self.bn2=nn.BatchNorm1d(128)self.bn3=nn.BatchNorm1d(1024)self.bn4=nn.BatchNorm1d(512)self.bn5=nn.BatchNorm1d(256)def forward(self,x):batchsize=x.size(0)# 开始获取高维数据# shape: [ batch , num , 3 ]x=F.relu(self.bn1(self.conv1(x)))# shape: [ batch , num , 64 ]x=F.relu(self.bn2(self.conv2(x)))# shape: [ batch , num , 128 ]x=F.relu(self.bn3(self.conv3(x)))# shape: [ batch , num , 1024 ]# 最大池化获取全局信息x=torch.max(x,2,keepdim=True)[0]# 展平做线性层x=x.view(-1,1024)# shape: [ b, 1024 ]x=F.relu(self.bn4(self.fc1(x)))# shape: [ b, 512 ]x=F.relu(self.bn5(self.fc2(x)))# shape: [ b, 256 ]x=self.fc3(x)# shape: [ b, 9 ]# 原本的三维xyz变换到了9维(3*3)# 关于iden，这东西就是一个eyes矩阵，本质上相当于给变换的结果加上input本身iden=Variable(torch.from_numpy(np.array([1,0,0,0,1,0,0,0,1]).astype(np.float32)))\.view(1,9).repeat(batchsize,1)# shape: [ batch , 9 ]if x.is_cuda:iden=iden.cuda()x+=iden# 转换成[ batch , 3 , 3 ]的矩阵进行输出# 该矩阵用于对原始向量做刚体变换x=x.view(-1,3,3)return x

T-Net就相当于一个微型网络，能够获得一个用于变换的数据，且该数据是能自适应的。

输入数据如果是一个3*1000的点云，得到的网络结构如下：

----------------------------------------------------------------Layer (type)               Output Shape         Param #
================================================================Conv1d-1             [-1, 64, 1000]             256BatchNorm1d-2             [-1, 64, 1000]             128Conv1d-3            [-1, 128, 1000]           8,320BatchNorm1d-4            [-1, 128, 1000]             256Conv1d-5           [-1, 1024, 1000]         132,096BatchNorm1d-6           [-1, 1024, 1000]           2,048Linear-7                  [-1, 512]         524,800BatchNorm1d-8                  [-1, 512]           1,024Linear-9                  [-1, 256]         131,328BatchNorm1d-10                  [-1, 256]             512Linear-11                    [-1, 9]           2,313
================================================================
Total params: 803,081
Trainable params: 803,081
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 18.57
Params size (MB): 3.06
Estimated Total Size (MB): 21.64
----------------------------------------------------------------

扩展到k维，则是表示如下：

class STNkd(nn.Module):def __init__(self, k=64):super(STNkd, self).__init__()self.conv1 = torch.nn.Conv1d(k, 64, 1)self.conv2 = torch.nn.Conv1d(64, 128, 1)self.conv3 = torch.nn.Conv1d(128, 1024, 1)self.fc1 = nn.Linear(1024, 512)self.fc2 = nn.Linear(512, 256)self.fc3 = nn.Linear(256, k * k)self.relu = nn.ReLU()self.bn1 = nn.BatchNorm1d(64)self.bn2 = nn.BatchNorm1d(128)self.bn3 = nn.BatchNorm1d(1024)self.bn4 = nn.BatchNorm1d(512)self.bn5 = nn.BatchNorm1d(256)self.k = kdef forward(self, x):batchsize = x.size()[0]x = F.relu(self.bn1(self.conv1(x)))x = F.relu(self.bn2(self.conv2(x)))x = F.relu(self.bn3(self.conv3(x)))x = torch.max(x, 2, keepdim=True)[0]x = x.view(-1, 1024)x = F.relu(self.bn4(self.fc1(x)))x = F.relu(self.bn5(self.fc2(x)))x = self.fc3(x)iden = Variable(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1, self.k * self.k).repeat(batchsize, 1)if x.is_cuda:iden = iden.cuda()x = x + idenx = x.view(-1, self.k, self.k)return x

2.2.2 PointNet

实现起来也相对简单，与T-Net的差别其实不大。

class PointNet(nn.Module):def __init__(self,global_feat=True,feature_transform=False,channel=3,n=9):''':param global_feat: 是否返回全局特征,该值为False的时候会返回拼接的信息:param feature_transform: 要素转换阶段，是否要进行要素转换:param channel: 输入数据的维度，默认是只含有xyz坐标:param n: 需要转换的类型数量'''super(PointNet, self).__init__()self.stn=STN3d(channel)self.conv1 = torch.nn.Conv1d(channel, 64, 1)self.conv2 = torch.nn.Conv1d(64, 128, 1)self.conv3 = torch.nn.Conv1d(128, 1024, 1)self.bn1 = nn.BatchNorm1d(64)self.bn2 = nn.BatchNorm1d(128)self.bn3 = nn.BatchNorm1d(1024)self.global_feat=global_featself.feature_transform=feature_transformif self.feature_transform:self.fstn=STNkd(k=64)self.mlp1=nn.Linear(1024,128)self.mlp2=nn.Linear(128,64)self.mlp3=nn.Linear(64,n)def forward(self,x):x=x.transpose(2,1) # *B, D, N=x.size() # batch , deep , numif D>3:# 此时需要分割要素# 我们做刚体变换的只有位置数据# x,feature=x.split(3,dim=2)x,feature=x.split(3,dim=1)trans=self.stn(x) # return : shape: [ b , 3 , 3 ]x=x.transpose(2,1) # [ b , num , deep ]# 此时做矩阵乘法，bmm这个方法一定要三维才能进行# 相当于[x',y',z']+[x,y,z]# [x',y',z']来自于trans矩阵的变换feature=feature.transpose(2,1)x=torch.bmm(x,trans) # [ num , deep ] * [ deep , deep ]-> [ num , deep ]if D>3:# x=torch.cat([x,feature],dim=2)x=torch.cat([x,feature],dim=1)x=x.transpose(2,1) # [ b , d , n ]x=F.relu(self.bn1(self.conv1(x)))# 64个特征时是否需要做feature_transformif self.feature_transform:trans_feat=self.fstn(x)x=x.transpose(2,1)x=torch.bmm(x,trans_feat)x=x.transpose(2,1)else:trans_feat=None# 此时若是处理分割任务，则将该部分(64维特征)作为拼接项pointfeat=x# 接着进行卷积x=F.relu(self.bn2(self.conv2(x)))x=self.bn3(self.conv3(x))x=torch.max(x,2,keepdim=True)[0]x=x.view(-1,1024)x=x.to(torch.float32)if self.global_feat:# 分类任务# trans是input_transform的3*3矩阵# trans是feature_transform的64*64矩阵x=F.relu(self.bn2(self.mlp1(x)))x=F.relu(self.bn1(self.mlp2(x)))x=self.mlp3(x)return x,trans,trans_featelse:# 分割任务，需要将全局信息黏贴到中间层信息中x=x.view(-1,1024,1).repeat(1,1,N)return torch.cat([x,pointfeat],1),trans,trans_feat

论文中提到，64*64维的矩阵很难优化，但作者发现如果该矩阵约等于正交矩阵，优化就会容易很多。根据正交矩阵的性质：正交矩阵乘以转置等于单位矩阵，作者额外增加了损失函数。
Lreg=∣∣I−AAT∣∣F2L_{reg}=||I-AA^T||^2_F Lreg=∣∣I−AAT∣∣F2
AAA是通过T-Net得到的64*64对齐矩阵，在本部分中，作者给出的损失函数代码为：

def feature_transform_reguliarzer(trans):# 定义损失规则d = trans.size()[1] # deepI = torch.eye(d)[None, :, :] # [ 1 , deep , deep ]if trans.is_cuda:I = I.cuda()loss = torch.mean(torch.norm(torch.bmm(trans, trans.transpose(2, 1) - I), dim=(1, 2)))# 虽然但是...按照公式写应该是#  loss = torch.mean(torch.norm(torch.bmm(trans, trans.transpose(2, 1))- I, dim=(1, 2)))return loss

话说F范数就是对向量的所有元素平方求和再开方，本质上是向量模的度量。如果向量内的数据都是无量纲的，那么开不开方影响就不大了。

我们可以查看下网络的结构。

----------------------------------------------------------------Layer (type)               Output Shape         Param #
================================================================Conv1d-1             [-1, 64, 1000]             256BatchNorm1d-2             [-1, 64, 1000]             128Conv1d-3            [-1, 128, 1000]           8,320BatchNorm1d-4            [-1, 128, 1000]             256Conv1d-5           [-1, 1024, 1000]         132,096BatchNorm1d-6           [-1, 1024, 1000]           2,048Linear-7                  [-1, 512]         524,800BatchNorm1d-8                  [-1, 512]           1,024Linear-9                  [-1, 256]         131,328BatchNorm1d-10                  [-1, 256]             512Linear-11                    [-1, 9]           2,313STN3d-12                 [-1, 3, 3]               0Conv1d-13             [-1, 64, 1000]             256BatchNorm1d-14             [-1, 64, 1000]             128Conv1d-15            [-1, 128, 1000]           8,320BatchNorm1d-16            [-1, 128, 1000]             256Conv1d-17           [-1, 1024, 1000]         132,096BatchNorm1d-18           [-1, 1024, 1000]           2,048
================================================================
Total params: 946,185
Trainable params: 946,185
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 37.12
Params size (MB): 3.61
Estimated Total Size (MB): 40.74
----------------------------------------------------------------

2.2.3 数据加载

这里我们随便测试下数据，加载器就随便写写了

注意Model是刚刚定义的模型文件夹，pointnet是写PointNet的.py文件

import numpy as np
import torch
import torch.nn as nn
import torch.utils.data as Data
import os
from Model import pointnet
from torch.nn import functional as F
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

我们就单独取一个类试试

path=r".modelnet40_normal_resampled\car\\"
file_list=os.listdir(path)

接着定义数据读取方法

# 定义一个数据读取类
class PCDataset(Data.Dataset):def __init__(self,file_list):self.file_list=file_listdef __getitem__(self, idx):with open(path + file_list[idx]) as f:data = f.readlines()data = [i.split("\n")[0].split(",") for i in data]data=np.array(data,dtype=float)return data,1def __len__(self):return len(self.file_list)# 读入数据
Dataset=PCDataset(file_list)
train_loader=Data.DataLoader(dataset=Dataset,shuffle=True,batch_size=9
)

网络搭建

# 构建PointNet
pn=pointnet.PointNet(3)
criterion=nn.NLLLoss()
optimizer=torch.optim.Adam(pn.parameters(),lr=0.0003)# 简单测试下
loss_list=[]
acc_list=[]device=torch.device("cuda" if torch.cuda.is_available() else "cpu")
pn.to(device)
criterion.to(device)for step,(bx,by) in enumerate(train_loader):bx,by=bx.to(device),by.to(device)out = pn(bx.to(torch.float32))[0]print("Result's Size", out.shape)out = F.softmax(out, dim=1)pre_lab = torch.argmax(out, dim=1)print("class", pre_lab)print(by)loss = criterion(out, by)optimizer.zero_grad()loss.backward()optimizer.step()loss_list.append(loss.item())acc_list.append(accuracy_score(pre_lab,by))

结果可视化

plt.figure(figsize=(12,8))
plt.plot(range(len(train_loader)),loss_list,"ro-",label="loss")
plt.plot(range(len(train_loader)),acc_list,"bs-",label="acc")
plt.legend()
plt.show()

损失函数出现负值是因为用的函数是NLLoss，然后小batch的训练其实没多大意义，这里只是测试下。

2.3 PointNet缺点

PointNet与当下主流网络不符，只是做了全局信息的融合，并没有考虑到局部的语义
点对之间的特征关系并没有考虑

关于Point的改进PointNet++请见本篇博客。

【点云处理】PointNet网络相关推荐

CVPR2020：4D点云语义分割网络（SpSequenceNet）
CVPR2020:4D点云语义分割网络(SpSequenceNet) SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds 论 ...
PU-Net：一种基于数据的3D点云上采样网络
点击上方"视学算法",选择"星标" 干货第一时间送达论文下载: https://openaccess.thecvf.com/content_cvpr_2018 ...
一种投影法的点云目标检测网络
点击上方"3D视觉工坊",选择"星标" 干货第一时间送达文章导读本文来源于早期的一篇基于投影法的三维目标检测文章<An Euler-Region-Pr ...
14.7倍推理加速、18.9倍存储节省！北航、商汤、UCSD提出首个点云二值网络 | ICLR 2021...
允中编辑整理量子位报道 | 公众号 QbitAI 编者按: 无论是在自动驾驶场景中,还是在手持移动设备上,基于点云的深度学习模型应用越来越广泛. 但这些离线边缘场景自身的限制,给模型的推理.存储 ...
法向量点云pca_CVPR 2019 | 旷视研究院Oral论文提出GeoNet：基于测地距离的点云分析深度网络...
全球计算机视觉三大顶会之一 CVPR 2019 (IEEE Conference on Computer Vision and Pattern Recognition)将于 6 月 16-20 在美国 ...
CVPR 2019 | 旷视研究院Oral论文提出GeoNet：基于测地距离的点云分析深度网络
全球计算机视觉三大顶会之一 CVPR 2019 (IEEE Conference on Computer Vision and Pattern Recognition)将于 6 月 16-20 在美国 ...
PF-Net基于深度学习的点云补全网络
目录 1. 论文和代码 2. 论文阅读笔记 2.1 目的和框架 2.2 IFPS 下采样 3. 源码解读 3.1 载入数据 3.1.1 归一化操作 3.2 数据前处理 3.3 网络输入输出 3.3.1 ...
某云数据中心网络解决方案（分享二十一）
某云数据中心网络解决方案(分享二十一) 参考文章: (1)某云数据中心网络解决方案(分享二十一) (2)https://www.cnblogs.com/zywu-king/p/8284189.html ...
在生产环境中，阿里云如何构建高性能云原生容器网络？（含 PPT 下载）
作者 | 溪恒阿里云技术专家直播完整视频回顾:https://www.bilibili.com/video/BV1nC4y1x7mt/ 关注"阿里巴巴云原生"公众号,后台回复 ...

【点云处理】PointNet网络