1. 前人的工作方向、面临的挑战、本文的贡献

1.1 前人的工作方向

To capture the 3D geometries, prior works mainly rely on exploring sophisticated local geometric extractors using convolution, graph, or attention mechanisms.

1.2 面临的挑战

These methods, however, incur(招致、引起) unfavorable latency during inference, and the performance saturates over the past few years.

1.3 本文的贡献

In this paper, we present a novel perspective on this task. We notice that detailed local geometrical information probably is not the key to point cloud analysis – we introduce a pure residual MLP network, called PointMLP, which integrates no “sophisticated” local geometrical extractors but still performs very competitively. Equipped with a proposed lightweight geometric affine module, PointMLP delivers the new state-of-the-art on multiple datasets.

1.4 PointMLP的性能

On the real-world ScanObjectNN dataset, our method even surpasses the prior best method by 3.3% accuracy. We emphasize that PointMLP achieves this strong performance without any sophisticated operations, hence leading to a superior inference speed. Compared to most recent CurveNet, PointMLP trains 2× faster, tests 7× faster, and is more accurate on ModelNet40 benchmark.

1.5 代码网址

https://github.com/ma-xu/pointMLP-pytorch

2. 论文的起源:

3. 论文采取的方法:DEEP RESIDUAL MLP FOR POINT CLOUD

We begin with representing local points with simple residual MLPs as they are permutation-invariant and straightforward. Then we introduce a lightweight geometric affine module to boost the performance. To improve efficiency further, we also introduce a lightweight counterpart, dubbed as PointMLP-elite.

3.1 REVISITING POINT-BASED METHODS

The design of point-based methods for point cloud analysis dates back to the PointNet and PointNet++ papers (Qi et al., 2017a;b), if not earlier. The motivation behind this direction is to directly
consume point clouds from the beginning and avoid unnecessary rendering processes.

简单介绍了pointnet++、RSCNN和Point Transformer。

接着提出这些方法没有解决的问题:

While these methods can easily take the advantage of detailed local geometric information and usually exhibit promising results, two issues limit their development.

  1. First, with the introduction of delicate extractors, the computational complexity is largely increased, leading to prohibitive inference latency . For example, the FLOPs of Equation 3 in Point Transformer would be 14Kd^2,ignoring the summation and subtraction operations. Compared with the conventional FC layer that enjoys 2Kd2 FLOPs, it increases the computations by times. Notice that the memory access cost is not considered yet.
  2. Second, with the development of local feature extractors, the performance gain has started to saturate on popular benchmarks. Moreover, empirical analysis in Liu et al. (2020) reveals that most sophisticated local extractors make surprisingly similar contributions to the network performance under the same network input. Both limitations encourage us to develop a new method that circumvents the employment of sophisticated local extractors, and provides gratifying results

3.2 FRAMEWORK OF POINTMLP

We propose to learn the point cloud representation by a simple feed-forward residual MLP network
(named PointMLP), which hierarchically aggregates the local features extracted by MLPs, and abandons the use of delicate local geometric extractors.

POINTMLP的结构


In order to get rid of the restrictions mentioned above, we present a simple yet effective MLP-based
network for point cloud analysis that no sophisticated or heavy operations are introduced. The key
operation of our PointMLP can be formulated as:
A (·) :max-pooling operation. Equation 4 describes one stage of of PointMLP. For a hierarchical and deep network, we recursively repeat the operation by s stages.

PointMLP的优点

Albeit(尽管) the framework of PointMLP is succinct(简明的), it exhibits some prominent merits.

  1. Since PointMLP only leverages MLPs, it is naturally invariant to permutation, which perfectly fits the characteristic of point cloud.
  2. By incorporating residual connections, PointMLP can be easily extended to dozens layers, resulting deep feature
    representations.
  3. In addition, since there is no sophisticated extractors included and the main operation is only highly optimized feed-forward MLPs, even we introduce more layers, our PointMLP still performs efficiently.

3.3GEOMETRIC AFFINE MODULE

To further improve the robustness and improve the performance, we also introduce a lightweight geometric affine module to transform the local points to a normal distribution.

简单 stacking more blocks来增加深度带来的问题

While it may be easy to simply increase the depth by considering more stages or stacking more blocks in Φpre and Φpos, we notice that a simple deep MLP structure will decrease the accuracy and stability, making the model less robust. This is perhaps caused by the sparse and irregular geometric structures in local regions. Diverse geometric structures among different local regions may require different extractors but shared residual MLPs struggle at achieving this.

解决的方法

We flesh out (充实、具体化)this intuition and develop a lightweight geometric affine module to tackle this problem.

即增加了一个几何仿射模块来解决

4. 论文达到的性能



5.关键代码段

class LocalGrouper(nn.Module):def __init__(self, channel, groups, kneighbors, use_xyz=True, normalize="center", **kwargs):"""Give xyz[b,p,3] and fea[b,p,d], return new_xyz[b,g,3] and new_fea[b,g,k,d]:param groups: groups number:param kneighbors: k-nerighbors:param kwargs: others"""super(LocalGrouper, self).__init__()self.groups = groupsself.kneighbors = kneighborsself.use_xyz = use_xyzif normalize is not None:self.normalize = normalize.lower()else:self.normalize = Noneif self.normalize not in ["center", "anchor"]:print(f"Unrecognized normalize parameter (self.normalize), set to None. Should be one of [center, anchor].")self.normalize = Noneif self.normalize is not None:add_channel=3 if self.use_xyz else 0self.affine_alpha = nn.Parameter(torch.ones([1,1,1,channel + add_channel]))self.affine_beta = nn.Parameter(torch.zeros([1, 1, 1, channel + add_channel]))def forward(self, xyz, points):B, N, C = xyz.shapeS = self.groupsxyz = xyz.contiguous()  # xyz [btach, points, xyz]# fps_idx = torch.multinomial(torch.linspace(0, N - 1, steps=N).repeat(B, 1).to(xyz.device), num_samples=self.groups, replacement=False).long()# fps_idx = farthest_point_sample(xyz, self.groups).long()fps_idx = pointnet2_utils.furthest_point_sample(xyz, self.groups).long()  # [B, npoint]new_xyz = index_points(xyz, fps_idx)  # [B, npoint, 3]new_points = index_points(points, fps_idx)  # [B, npoint, d]idx = knn_point(self.kneighbors, xyz, new_xyz)# idx = query_ball_point(radius, nsample, xyz, new_xyz)grouped_xyz = index_points(xyz, idx)  # [B, npoint, k, 3]grouped_points = index_points(points, idx)  # [B, npoint, k, d]if self.use_xyz:grouped_points = torch.cat([grouped_points, grouped_xyz],dim=-1)  # [B, npoint, k, d+3]if self.normalize is not None:if self.normalize =="center":mean = torch.mean(grouped_points, dim=2, keepdim=True)if self.normalize =="anchor":mean = torch.cat([new_points, new_xyz],dim=-1) if self.use_xyz else new_pointsmean = mean.unsqueeze(dim=-2)  # [B, npoint, 1, d+3]std = torch.std((grouped_points-mean).reshape(B,-1),dim=-1,keepdim=True).unsqueeze(dim=-1).unsqueeze(dim=-1)grouped_points = (grouped_points-mean)/(std + 1e-5)grouped_points = self.affine_alpha*grouped_points + self.affine_betanew_points = torch.cat([grouped_points, new_points.view(B, S, 1, -1).repeat(1, 1, self.kneighbors, 1)], dim=-1)return new_xyz, new_pointsclass ConvBNReLU1D(nn.Module):def __init__(self, in_channels, out_channels, kernel_size=1, bias=True, activation='relu'):super(ConvBNReLU1D, self).__init__()self.act = get_activation(activation)self.net = nn.Sequential(nn.Conv1d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, bias=bias),nn.BatchNorm1d(out_channels),self.act)def forward(self, x):return self.net(x)class ConvBNReLURes1D(nn.Module):def __init__(self, channel, kernel_size=1, groups=1, res_expansion=1.0, bias=True, activation='relu'):super(ConvBNReLURes1D, self).__init__()self.act = get_activation(activation)self.net1 = nn.Sequential(nn.Conv1d(in_channels=channel, out_channels=int(channel * res_expansion),kernel_size=kernel_size, groups=groups, bias=bias),nn.BatchNorm1d(int(channel * res_expansion)),self.act)if groups > 1:self.net2 = nn.Sequential(nn.Conv1d(in_channels=int(channel * res_expansion), out_channels=channel,kernel_size=kernel_size, groups=groups, bias=bias),nn.BatchNorm1d(channel),self.act,nn.Conv1d(in_channels=channel, out_channels=channel,kernel_size=kernel_size, bias=bias),nn.BatchNorm1d(channel),)else:self.net2 = nn.Sequential(nn.Conv1d(in_channels=int(channel * res_expansion), out_channels=channel,kernel_size=kernel_size, bias=bias),nn.BatchNorm1d(channel))def forward(self, x):return self.act(self.net2(self.net1(x)) + x)class PreExtraction(nn.Module):def __init__(self, channels, out_channels,  blocks=1, groups=1, res_expansion=1, bias=True,activation='relu', use_xyz=True):"""input: [b,g,k,d]: output:[b,d,g]:param channels::param blocks:"""super(PreExtraction, self).__init__()in_channels = 3+2*channels if use_xyz else 2*channelsself.transfer = ConvBNReLU1D(in_channels, out_channels, bias=bias, activation=activation)operation = []for _ in range(blocks):operation.append(ConvBNReLURes1D(out_channels, groups=groups, res_expansion=res_expansion,bias=bias, activation=activation))self.operation = nn.Sequential(*operation)def forward(self, x):b, n, s, d = x.size()  # torch.Size([32, 512, 32, 6])x = x.permute(0, 1, 3, 2)x = x.reshape(-1, d, s)x = self.transfer(x)batch_size, _, _ = x.size()x = self.operation(x)  # [b, d, k]x = F.adaptive_max_pool1d(x, 1).view(batch_size, -1)x = x.reshape(b, n, -1).permute(0, 2, 1)return xclass PosExtraction(nn.Module):def __init__(self, channels, blocks=1, groups=1, res_expansion=1, bias=True, activation='relu'):"""input[b,d,g]; output[b,d,g]:param channels::param blocks:"""super(PosExtraction, self).__init__()operation = []for _ in range(blocks):operation.append(ConvBNReLURes1D(channels, groups=groups, res_expansion=res_expansion, bias=bias, activation=activation))self.operation = nn.Sequential(*operation)def forward(self, x):  # [b, d, g]return self.operation(x)class Model(nn.Module):def __init__(self, points=1024, class_num=40, embed_dim=64, groups=1, res_expansion=1.0,activation="relu", bias=True, use_xyz=True, normalize="center",dim_expansion=[2, 2, 2, 2], pre_blocks=[2, 2, 2, 2], pos_blocks=[2, 2, 2, 2],k_neighbors=[32, 32, 32, 32], reducers=[2, 2, 2, 2], **kwargs):super(Model, self).__init__()self.stages = len(pre_blocks)self.class_num = class_numself.points = pointsself.embedding = ConvBNReLU1D(3, embed_dim, bias=bias, activation=activation)assert len(pre_blocks) == len(k_neighbors) == len(reducers) == len(pos_blocks) == len(dim_expansion), \"Please check stage number consistent for pre_blocks, pos_blocks k_neighbors, reducers."self.local_grouper_list = nn.ModuleList()self.pre_blocks_list = nn.ModuleList()self.pos_blocks_list = nn.ModuleList()last_channel = embed_dimanchor_points = self.pointsfor i in range(len(pre_blocks)):out_channel = last_channel * dim_expansion[i]pre_block_num = pre_blocks[i]pos_block_num = pos_blocks[i]kneighbor = k_neighbors[i]reduce = reducers[i]anchor_points = anchor_points // reduce# append local_grouper_listlocal_grouper = LocalGrouper(last_channel, anchor_points, kneighbor, use_xyz, normalize)  # [b,g,k,d]self.local_grouper_list.append(local_grouper)# append pre_block_listpre_block_module = PreExtraction(last_channel, out_channel, pre_block_num, groups=groups,res_expansion=res_expansion,bias=bias, activation=activation, use_xyz=use_xyz)self.pre_blocks_list.append(pre_block_module)# append pos_block_listpos_block_module = PosExtraction(out_channel, pos_block_num, groups=groups,res_expansion=res_expansion, bias=bias, activation=activation)self.pos_blocks_list.append(pos_block_module)last_channel = out_channelself.act = get_activation(activation)self.classifier = nn.Sequential(nn.Linear(last_channel, 512),nn.BatchNorm1d(512),self.act,nn.Dropout(0.5),nn.Linear(512, 256),nn.BatchNorm1d(256),self.act,nn.Dropout(0.5),nn.Linear(256, self.class_num))def forward(self, x):xyz = x.permute(0, 2, 1)batch_size, _, _ = x.size()x = self.embedding(x)  # B,D,Nfor i in range(self.stages):# Give xyz[b, p, 3] and fea[b, p, d], return new_xyz[b, g, 3] and new_fea[b, g, k, d]xyz, x = self.local_grouper_list[i](xyz, x.permute(0, 2, 1))  # [b,g,3]  [b,g,k,d]x = self.pre_blocks_list[i](x)  # [b,d,g]x = self.pos_blocks_list[i](x)  # [b,d,g]x = F.adaptive_max_pool1d(x, 1).squeeze(dim=-1)x = self.classifier(x)return xdef pointMLP(num_classes=40, **kwargs) -> Model:return Model(points=1024, class_num=num_classes, embed_dim=64, groups=1, res_expansion=1.0,activation="relu", bias=False, use_xyz=False, normalize="anchor",dim_expansion=[2, 2, 2, 2], pre_blocks=[2, 2, 2, 2], pos_blocks=[2, 2, 2, 2],k_neighbors=[24, 24, 24, 24], reducers=[2, 2, 2, 2], **kwargs)def pointMLPElite(num_classes=40, **kwargs) -> Model:return Model(points=1024, class_num=num_classes, embed_dim=32, groups=1, res_expansion=0.25,activation="relu", bias=False, use_xyz=False, normalize="anchor",dim_expansion=[2, 2, 2, 1], pre_blocks=[1, 1, 2, 1], pos_blocks=[1, 1, 2, 1],k_neighbors=[24,24,24,24], reducers=[2, 2, 2, 2], **kwargs)if __name__ == '__main__':data = torch.rand(2, 3, 1024)print("===> testing pointMLP ...")model = pointMLP()out = model(data)print(out.shape)

6.本文中学到的、可能用到的词、搭配、表述方法

Albeit(尽管)
opt to 选择
succinct(简明的)
flesh out (充实、具体化)this intuition (直觉)
tackle this problem 解决这个问题
incur(招致、引起) unfavorable latency 招致令人不快的延迟
prior works

Empirically 根据观察、经验上
performs very competitively

we empirically found that…… 在实验中,我们发现……
we emphasize that 我们强调……

largely hamper(妨碍) the performance

【论文笔记】RETHINKING NETWORK DESIGN AND LOCAL GEOMETRY IN POINT CLOUD: A SIMPLE RESIDUAL MLP FRAMEWORK相关推荐

  1. 1.3读论文笔记:M. Raissi a等人的Physics-informed neural networks:A deep learning framework for solving forw..

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems ...

  2. 《Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network》论文笔记

    <Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network>论文笔记 论文地址 ...

  3. 《From Big to Small:Multi-Scale Local Planar Guidance for Monocular Depth Estimation》论文笔记

    参考代码:bts 1. 概述 导读:从2D图像中估计出深度信息是多解的,对此文章提出了在解码器的多个stage上加上隐式约束,从而引导解码器中适应深度估计特征的生成,从而产生更佳的深度估计结果.其中的 ...

  4. 论文笔记 《Maxout Networks》 《Network In Network》

    原文出处:http://zhangliliang.com/2014/09/22/paper-note-maxout-and-nin/ 论文笔记 <Maxout Networks> & ...

  5. 论文笔记:HIE-SQL:History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing

    论文笔记:HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsin ...

  6. 论文笔记(一):Temporal Network Embedding with High-Order Nonlinear Information

    论文笔记(一):Temporal Network Embedding with High-Order Nonlinear Information 论文标题:Temporal Network Embed ...

  7. 论文笔记(三):PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

    PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes 文章概括 摘要 1. ...

  8. Hierarchical Graph Network for Multi-hop Question Answering 论文笔记

    Hierarchical Graph Network for Multi-hop Question Answering 论文笔记 2020 EMNLP,Microsoft 365, 这篇文章所提出的层 ...

  9. 论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu

    论文笔记:Identifying Lung Cancer Risk Factors in the Elderly Using Deep Neural Network - Chen, Wu 原文链接 I ...

  10. 论文笔记: Local climate zone mapping as remote sensing scene classifcation using deep learning: A case s

    论文笔记1: Local climate zone mapping as remote sensing scene classifcation using deep learning: A case ...

最新文章

  1. YOLOv3学习笔记
  2. apache日志分析简介
  3. 那些有用但不为大家所熟知的 Java 特性
  4. springmuvc如何设置jsp的input跳转_如何扩大私域流量?「上线了」跳转小程序来帮你...
  5. 蓝桥杯java第七届决赛第四题--路径之谜
  6. Entity Framework 5.0 Code First全面学习
  7. seo需要处理页面html,为什么单页面的seo不友好?如何解决这一问题?
  8. RS(1)--10分钟了解什么是推荐系统
  9. linux c语言 udp 接收和发送数据用同一个端口_【Python学习笔记】80、UDP编程
  10. 在Flex中获取一个屏幕截图(Screenshot)并将其传递给ASP.NET
  11. 南京大学计算机视觉博士生导师,孙正兴——著名计算机专家孙正兴——南京大学教授...
  12. 实验二十三——RPL协议仿真实验
  13. 用Excel求线性回归方程
  14. JS简单实现京东网页轮播图
  15. javascript原型图了解
  16. Android修改ro.debuggable 的第五种方法
  17. Remember Me 功能实现
  18. [附源码]Java计算机毕业设计SSM动物保护资讯推荐网站
  19. 计算机网络期末复习(已完结)
  20. idea如何全屏_intelliJ IDEA 全屏键盘手

热门文章

  1. 脸谱网下载_脸谱网对AR眼镜大胆愿景的最大障碍是信任
  2. 算法的特征及设计要求
  3. 算法基础篇-05-排序-LowB三人组(冒泡/选择/插入排序)
  4. 排序算法lowB三人组
  5. python在煤矿的用途-息烽高校邦数据科学通识课【Python爬虫】答案
  6. 数据结构 第2版 第二版 陈越_高中数学选学---人教A版选修2-1第二章第二节椭圆...
  7. 【FPGA基础】DDR的基本原理介绍,DDR快速上手使用
  8. HTML如何判断是否星期六,判断今天是星期几的5种方法(原生js)
  9. GB码和BIG5码的互换技术-foxpro版-摘自csdn-faq
  10. 踩坑记录:关于低版本firefox43.0.1在控件中定义onclick=remove(),点击按钮,按钮会消失。