CVPR-2021



文章目录

  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Building RepVGG via Structural Re-param
    • 4.1 Simple is Fast, Memory-economical, Flexible
    • 4.2 Training-time Multi-branch Architecture
    • 4.3 Reparam for Plain Inferencetime Model
    • 4.4 Architectural Specification
  • 5 Experiments
    • 5.1 Datasets
    • 5.2 RepVGG for ImageNet Classification
    • 5.3 Structural Reparameterization is the Key
    • 5.4 Semantic Segmentation
  • 6 Conclusion(own) / Future work

1 Background and Motivation

MVGA:Making VGG-style ConvNets Great Again

CNN 多 branch 结构(ResNet,inception)的缺点

  • slow down the inference and reduce the memory utilization
  • increase the memory access cost and lack supports
    of various devices

单 branch(plain 结构 eg VGG)没有上述缺陷但性能又不如多 branch

本文,作者利用 re-parameterization(identity 和 1x1 都等价替换成 3x3) 技术提出了 VGG-style 网络(plain)—— RepVGG

训练的时候多 branch,推理的时候单 branch——VGG-style

decouple the training-time multi-branch and inference-time plain architecture via structural re-parameterization

精度还行,提升了推理速度

2 Related Work

  • From Single-path to Multi-branch
  • Effective Training of Single-path Models
  • Model Re-parameterization
  • Winograd Convolution
    Tera FLoating-point Operations Per Second, TFLOPS)


Winograd [20] is a classic algorithm for accelerating 3x3 conv (only if the stride is 1), which has been well supported (and enabled by default) by libraries like cuDNN and MKL.

3 Advantages / Contributions

提出 RepVGG 网络

  • without any branches
  • uses only 3x3 conv and ReLU
  • nor heavy designs

run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy

show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet

4 Building RepVGG via Structural Re-param

an identity branch can be regarded as a degraded 1x1 conv, and the latter can be further regarded as a degraded 3x3 conv

4.1 Simple is Fast, Memory-economical, Flexible

先看看简单结构(plain or single branch)的好处

(1)Fast

VGG-16 has 8.4x FLOPs as EfficientNet-B3 but runs 1.8x faster on 1080Ti, which means the computational density of the former is 15x as the latter.

有如下两个重要因素是 FLOPs 评价指标所忽略的

  • the memory access cost (MAC)
    (MAC constitutes a large portion of time usage in groupwise convolution)
  • degree of parallelism
    a model with high degree of parallelism could be much faster than another one with low degree of parallelism, under the same FLOPs.

(2)Memory-economical



skip connection 结构结合的时候,要等两个都到位才能结合

(3)Flexible

  • last conv layers of every residual block have to produce tensors of the same shape

  • multi-branch topology limits the application of channel pruning

4.2 Training-time Multi-branch Architecture

a multibranch architecture makes the model an implicit ensemble of numerous shallower models

例如 with n n n blocks, the model can be interpreted as an ensemble of 2 n 2^n 2n models

下面看看作者在训练阶段设计的 multi-branch 结构


we use ResNet-like identity (only if the dimensions match) and 1x1 branches so that the training-time information flow of a building block is y = x + g ( x ) + f ( x ) y = x + g(x) + f(x) y=x+g(x)+f(x).

  • g ( x ) g(x) g(x) 是1x1 conv
  • f ( x ) f(x) f(x) 是 3x3 conv
  • x x x 是 identity

4.3 Reparam for Plain Inferencetime Model

先看图

1x1、3x3 和 identity 结构后面都接了 BN

再看看公式

  • M ( 1 ) ∈ R N × C 1 × H 1 × W 1 M^{(1)} \in \mathbb{R}^{N \times C_1 \times H_1 \times W_1} M(1)∈RN×C1​×H1​×W1​ 输入特征图

  • M ( 2 ) ∈ R N × C 2 × H 2 × W 2 M^{(2)} \in \mathbb{R}^{N \times C_2 \times H_2 \times W_2} M(2)∈RN×C2​×H2​×W2​ 输出特征图

  • μ \mu μ, σ \sigma σ, γ \gamma γ, β \beta β 是 BN 的 accumulated mean, standard deviation and learned scaling factor and bias,角标 (0) 表示的是 identity 的 BN 参数,角标 (1) 表示的是 1x1 conv 的,角标 (2) 表示的是 3x3 conv 的

BN 可以和 conv 合并,具体如下

合并前 (w) 后 (w’) 卷积的 weight 和 bias


an identity can be viewed as a 1x1 conv with an identity matrix as the kernel.

这样的话,作者的 multi-branch 结构就由 one 3x3 kernel, two 1x1 kernels, and three bias vectors 组成

然后 1x1 的 conv 可以 8 临域 padding 0 成 3x3 conv

所以作者最终的结构由 3 个 3x3 conv 和 3 个 bias 构成,最终 add 在一起

由于矩阵乘法的分配律和结合律 w1x + b1 + w2x + b2 + w3x + b3 = (w1+w2+w3)x+b1 + b2 + b3

所以三个 conv 和 3 个 bias 可以合并成 1 个 conv 和 1 个 bias,这样 multi-branch 就变成了 plain 结构

4.4 Architectural Specification


it does not use max pooling like VGG,仅 3x3 conv 堆堆起来

multiplier a a a to scale the first four stages and b b b for the last stage, and usually set b > a b > a b>a because we desire the last layer to have richer features for the classification or other down-stream tasks

为了进一步减少参数量,we may optionally interleave groupwise 3x3 conv layers with dense ones to trade accuracy for efficiency

3 5 7 … 21 for RepVGG-A
3 5 7 … 27 for RepVGG-B

set groups g g g as 1,2, or 4

5 Experiments

5.1 Datasets

  • ImageNet
  • COCO

5.2 RepVGG for ImageNet Classification


Wino MULs is a better proxy on GPU(相比于 FLOPs)

VGG-16 的 FLOPS 比 ResNet-152 大,但 Wino MULs 却比较小(乘法比加法慢很多),所以最终的推理速度 VGG-16 快(16层的 FLOPs 比 152层的都高也是没谁啦,哈哈哈)

当然,金标准还得是 actual speed


g4 是 group = 4 的意思

Compared to RegNetX-12GF, RepVGG-B3 runs 31% faster(但效果没有人家好哈,which is the first time for plain models to catch up with the state-of-the-arts)

5.3 Structural Reparameterization is the Key


  • Identity w/o BN
  • Post-addition BN:the position of BN is changed from pre-addition to post-addition
  • +ReLU in branches:inserts ReLU into each branch (after BN and before addition).

5.4 Semantic Segmentation

6 Conclusion(own) / Future work

  • multiplications are much more time-consuming than additions

  • RepVGGstyle structural re-parameterization is not a generic overparameterization technique, but a methodology critical for training powerful plain ConvNets.

  • RepVGG models are more parameter-efficient than ResNets but may be less
    favored than the mobile-regime models like MobileNets and ShuffleNets for low-power devices

摘抄一些比较不错的点评

【RepVGG】《RepVGG:Making VGG-style ConvNets Great Again》相关推荐

  1. 【FeatherNets】《FeatherNets:Convolutional Neural Networks as Light as Feather for Face Anti-spoofing》

    CVPR-2019 workshop code:https://github.com/SoftwareGift/FeatherNets_Face-Anti-spoofing-Attack-Detect ...

  2. 【BiSeNet】《BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation》

    ECCV-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...

  3. 【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》

    CVPR-2019 workshop code:https://github.com/AlexanderParkin/ChaLearn_liveness_challenge 文章目录 1 Backgr ...

  4. 【Mixup】《Mixup:Beyond Empirical Risk Minimization》

    ICLR-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...

  5. 【D2Det】《 D2Det:Towards High Quality Object Detection and Instance Segmentation》

    CVPR-2020 Pytorch Code: https://github.com/JialeCao001/D2Det. 文章目录 1 Background and Motivation 2 Rel ...

  6. 【PyTorch】迁移学习:基于 VGG 实现的光明哨兵与破败军团分类器

    文章目录 简述. 环境配置. PyTorch代码. 导入第三方库. 使用 GPU. 加载数据. 定义可视化函数. 加载预训练模型. 冻结特征层. 修改输出层. 定义优化器. 定义训练函数. 训练过程. ...

  7. 【WebFace260M】《WebFace260M:A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition》

    CVPR-2021 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Datasets an ...

  8. 【RichieZhu】《狙击手:幽灵战士》首部DLC

    <狙击手:幽灵战士>的第1弹补丁追加下载内容官方已经开始提供,本次的内容主要新增了5张对战地图,地图都非常广阔,玩家们能更痛快的享受百发百中的乐趣.City Interactive公布&l ...

  9. 【AutoAugment】《AutoAugment:Learning Augmentation Policies from Data》

    arXiv-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 E ...

  10. 【FastDepth】《FastDepth:Fast Monocular Depth Estimation on Embedded Systems》

    ICRA-2019 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...

最新文章

  1. 从windows server的文件服务到分布式文件服务(一)
  2. 【数据分析R语言系列】R和RStudio的下载和安装, R在 Ubuntu 和CentOS 系统下的安装
  3. sitecore系列教程之目标功能有什么新意?
  4. 使用SpringMVC 的MultipartFile文件上传时参数获取的一个坑
  5. 用.NET Core实现一个类似于饿了吗的简易拆红包功能
  6. 【转载保存】lucene3.0可以对docId、docField、queryParser设置Boost值来影响排序结果
  7. linux下挂载samba服务器,Linux使用mount挂载samba共享
  8. 当代职场成功学:越懒惰,越躺赢
  9. java break与continue_java中的break与continue
  10. 求教一个WEBSERVER与C的通信问题
  11. 计算机辅助设计学什么,计算机辅助设计课程教学大纲
  12. 修改Linux文件格式为unix
  13. 2019 AI顶会时间表
  14. android大作业报告总结,android大作业总结报告.doc
  15. 1960-征战的Loy
  16. 游戏策划一类的,非常好
  17. Redis 设计与实现——读书笔记
  18. html 播放本地视频(获取磁盘文件url)
  19. matlab cond含糊,入坑MATLAB必会的吐血总结
  20. memcached与redis技术的对比试验

热门文章

  1. tomcat责任链设计模式 FilterChain原理解析
  2. java的进程和线程_java进程和线程区别与不同
  3. 怎样将word文件转化为Latex文件:word-to-latex-2.56具体解释
  4. 如果 —— 拉迪亚德·吉卜林,神作品诗歌,看完后整个人像被了打鸡血一样,超热血、感动、坚强...
  5. 常用Linux命令的基本使用(五)
  6. C语言详解双向链表的基本操作
  7. 接口测试基础-1-什么是接口测试
  8. 使用mitmweb抓包教程
  9. vscode报错“also define the standard property ‘appearance‘for compatibility”
  10. 盐城北大青鸟每日小报