CVPR-2020

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Design Space Design
- 4.1 Tools for Design Space Design
- 4.2 The AnyNet Design Space
- 4.3 The RegNet Design Space
- 4.4 Design Space Generalization
5 Experiments
- 5.1 Datasets
- 5.2 Analyzing the RegNetX Design Space
- 5.3 Comparison to Existing Networks
- 5.4 Appendix
6 Conclusion（own） / Future work

1 Background and Motivation

本文 present a new network design paradigm

类似于手动设计一些网络设计准则，区别于 AutoML 针对特定数据集和预定的算子学出来一个网络（finding the
best network instances within a fixed, manually designed search space——which we call a design space），作者设计网络准则，基于该准则可以设计出很多性能不错的网络——design network design spaces(not just particular network instantiations, but also design principles）

Our goal is to help advance the understanding of network design and discover design principles that generalize across settings.

2 Related Work

Manual network design
早些年的网络都属于这个范畴，比如 VGG / ResNet / DenseNet / MobileNet
In fact, our methodology is analogous to manual design but performed at the design space level（虽然也是手动，但设计网络准则，而不是单个网络）
Automated network design
NAS 等
Network scaling
eg：EfficientNet
Comparing networks
a methodology for comparing and analyzing populations of networks sampled from a design space（distribution-level view）
Parameterization

3 Advantages / Contributions

提出了一些设计准则，基于这些准则设计出了系列网络（不同大小）——RegNet

outperform the popular EfficientNet（SOTA） models while being up to 5× faster on GPUs

4 Design Space Design

The overall process is analogous to manual design, elevated to the population level（系列网络，而不是单个网络） and guided via distribution estimates of network design spaces

RegNet，regular Network
AnyNet，widths and depths vary freely across stages

compute regimes, schedule lengths, and network block types

4.1 Tools for Design Space Design

space by sampling a set of models from that design space and characterizing the resulting model error distribution

comparing distributions is more robust and informative than using search

error empirical distribution function (EDF)

EDF for n = 500 sampled models from the AnyNetX design space

图例里面是 [x | y] 分别表示 min error 和 mean error

分布图的含义， all（深蓝色点），good（浅蓝色区域），best（黑色线）

MF = million flops = 10610^6106

GF = 10910^9109

设计 design space 的流程如下

EDF 的计算细节如下

用的 bootstrap 有放回的抽样

95% CI（Confidence Interval）

95%CI在统计学中时什么意思

计算方法就是上图的横坐标，涉及到平均值和方差，还有关键的 1.96

如何理解95%置信区间

换句话说，95%置信区间是评价总体平均值的一个范围。我们进行100组实验，只有5组实验数据的平均值是落在这个范围之外的

4.2 The AnyNet Design Space

AnyNet 的网络结构

stem（下采样一次）
body（下采样四次，4 个 stage）
head

We keep the stem and head fixed and as simple as possible, and instead focus on the structure of the network body

作者 design 的 design space 集中在 body 的

stage iii，共 4 种
the number of blocks di⩽16d_i \leqslant 16di⩽16，16种
block widths wi⩽1024w_i \leqslant 1024wi⩽1024 and divisible by 8，128 种
bottleneck ratios bi∈{1,2,4}b_i \in \{1,2,4\}bi∈{1,2,4}，3种
group widths gi∈{1,2,4,8,16,32}g_i \in \{1,2,4,8,16,32\}gi∈{1,2,4,8,16,32}，6 种

作者的目标

to simplify the structure of the design space,
to improve the interpretability of the design space,
to improve or maintain the design space quality,
to maintain model diversity in the design space.

不同的网络空间

1）AnyNetXA

搜索空间 (16⋅128⋅3⋅6)4≈1.85⋅1018(16 \cdot 128 \cdot 3 \cdot 6 )^4 \approx 1.85 \cdot 10^{18}(16⋅128⋅3⋅6)4≈1.85⋅1018

2）AnyNetXB

shared bottleneck ratio bi=bb_i = bbi=b for all stages iii

AnyNetXA 和 AnyNetXB 的比较如图 5 left 所示

b⩽2b \leqslant 2b⩽2 is best in this regime (right，还是 all-good-best)

最好的是 1，说明既不是 bottleneck 也不是 inverted bottleneck

3）AnyNetXC
use a shared group width gi=gg_i = ggi=g for all stages to obtain AnyNetXC

AnyNetXA 和 AnyNetXB 的比较如图 5 middle 所示

find g > 1 is best（图中没有展示出来，也即 DW 不是最优的）

作者进一步 examine typical network structures of both good and bad networks from AnyNetXC

发现效果比较好的网络遵循 wi+1≥wiw_{i+1} \geq w_iwi+1≥wi，于是加了这个限制条件设计出了 AnyNetXD

4）AnyNetXD

wi+1≥wiw_{i+1} \geq w_iwi+1≥wi

图 7 left 所示

5）AnyNetXE
di+1≥did_{i+1} \geq d_idi+1≥di

图 7 right 所示

搜索空间变成了 3867⋅11716640⋅3⋅6≈8.17⋅10113867\cdot 11716640\cdot 3 \cdot 6 \approx 8.17 \cdot 10^{11}3867⋅11716640⋅3⋅6≈8.17⋅1011

3867 是由 16 满足 ≤ 条件推算出来的
11716640 同理，是 128 推算出来的（和作者给出的 table 1 是有点出入的）

复杂度如作者描述的那样，cumulative reduction of O(107)O(10^7)O(107) from AnyNetXA（101810^{18}1018->101110^{11}1011）

4.3 The RegNet Design Space

iii and jjj to index over stages and blocks（也即深度，深度的最小单位是 block 而非 conv）

wjw_jwj = 48⋅(j+1)48·(j+1)48⋅(j+1) for 0⩽j⩽200 \leqslant j \leqslant 200⩽j⩽20 (solid black curve, please note that the y-axis is logarithmic，y 轴这点很关键，不然你肯定会有疑问，图8左上那张图，黑色明明是条曲线，怎么还是个线性方程).

linear parameterization for block widths

uj=w0+wa⋅jfor0⩽j<du_j = w_0 + w_a \cdot j \space \space \space \space for \space \space 0 \leqslant j < duj=w0+wa⋅j for 0⩽j<d

jjj 表示网络的 depth（最小深度单元是 block，而非单个 conv）
w0w_0w0 ，initial width（通道数）
waw_awa ，slope

来自《Designing Network Design Spaces》的整体解读（一篇更比六篇强）

不同的深度，不同的初始化通道数，和不同的通道数增长率（block 内的 conv 间），generates a different block width uju_juj

block widths

uj=w0⋅wmsju_j = w_0 \cdot w_m^{s_j}uj=w0⋅wmsj

由 w0w_0w0、wmw_mwm 和 sjs_jsj 重新表示下 uju_juj，已知 w0w_0w0、waw_awa 可以求出 uju_juj 进而结合 wmw_mwm 可以求出 sjs_jsj

quantized per-block widths wjw_jwj

wj=w0⋅wm⌊sj⌉w_j = w_0 \cdot w_m^{\left \lfloor s_j \right \rceil}wj=w0⋅wm⌊sj⌉

其中 ⌊⌉\left \lfloor \right \rceil⌊⌉ 表示 round 操作，sjs_jsj round 后就可以推导出实际网络的 block widths

这也就是这个图为啥同时给出了 waw_awa 、w0w_0w0、wmw_mwm 的含义

感觉这个 ⌊⌉\left \lfloor \right \rceil⌊⌉ 也是 quantized linear 限制条件的核心操作了

横坐标 block index jjj（也即深度），纵坐标 wjw_jwj（通道数）

RegNet = AnyNetXE + www 和 ddd 之间的约束（ddd < 64, w0w_0w0,waw_awa < 256, 1.5 < wmw_mwm < 3 and bbb and ggg as before）

w0=waw_0 = w_aw0=wa 比较猛

4.4 Design Space Generalization

R block: same as the X block except without groups,
V block: a basic block with only a single 3×3 conv,
VR block: same as V block plus residual connections.

5 Experiments

5.1 Datasets

ImageNet
ImageNetv2

5.2 Analyzing the RegNetX Design Space

1）RegNet trends

optimal depth of ~20 blocks (60 layers)
use a bottleneck ratio b of 1.0（既不是 bottleneck，也不是 inverted bottleneck）
wmw_mwm of good models is ~2.5（not identical to the popular recipe of doubling widths across stages）

g,wa,w0g, w_a, w_0g,wa,w0 increase with complexity

2）Complexity analysis

activations, which we define as the size of the output tensors of all conv layers

左上角图中，www 是通道数，r2r^2r2 是分辨率，ggg 注意是组宽而不是组数

上面一排靠后的两个图中的 rrr 应该是斜率

activations increase with the square-root of flops, parameters increase linearly, and runtime is best modeled using both a linear and a square-root term due to its dependence on both flops and activations.

横纵坐标比例不一样，所以看起来线性关系变弯了

3）RegNetX constrained

RegNetX Unconstrained 加上下面两个限制条件

set bbb = 1, d≤40d \leq 40d≤40, and wm≥2w_m \geq 2wm≥2
we limit parameters and activations，following Figure 12 (bottom).（感觉范围是卡在了 good，不可能是 best 吧，哈哈哈）

变成了 RegNetX Constrained

4）Alternate design choices

左图可以看出，inverted bottleneck degrades the EDF slightly and depthwise conv performs even worse relative

中间图可以看出，a fixed resolution of 224×224 is best,

5）SE

X+SE as Y

图 14 右图可以看出，SE 还是蛮有效果的

5.3 Comparison to Existing Networks

先看看 RegNetX 和 RegNetY 的一些特点

higher flop models have a large number of blocks in the third stage and a small number of blocks in the last stage.（加上 stem 结构，也即 1/16 和 1/32，【FD-MobileNet】《FD-MobileNet：Improved MobileNet with a Fast Downsampling Strategy》的优化符合此规律）

the group width ggg increases with complexity, but depth ddd saturates for large models.

1）State-of-the-Art Comparison: Mobile Regime

2）Standard Baselines Comparison: ResNe(X)t

RegNet 整体来说要比 ResNet 和 ResNext 要好一些

3）State-of-the-Art Comparison: Full Regime

flops 较小的时候， EfficientNet 比作者的方法猛，flops 上来后，RegNet 猛

RegNet 的一大优势是很快，训练推理

最后一栏 grayed 的结果是考虑了一些训练策略，如下

EfficientNet 还是猛呀

5.4 Appendix

1）Test Set Evaluation

在 ImageNet v2 数据集上验证下效果

趋势和 ImageNet 上一致

2）Additional Ablations

Fixed depth
差不多
Fewer stages
4 stage 比 3 stage 好
Inverted Bottleneck
b < 1 degrades results further
Swish vs. ReLU

图 20 中间的图可以看出，flops 比较小的时候，非 dw conv，Swish 比 ReLU 要好一些

图 20 右边的图可以卡出，if ggg is restricted to be 1 (depthwise conv), Swish performs much better than ReLU

3）Optimization Settings

6 Conclusion（own） / Future work

widths and depths of good networks can be explained by a quantized linear function.
the depth of the best models is stable across compute regimes (~20 blocks) and that the best models do not use either a bottleneck or inverted bottleneck.
We highlight the improvements for fixed activations, which is of high practical interest as the number of activations can strongly influence the runtime on accelerators such as GPUs.
regime = flops

摘抄一些优秀的解读

《Designing Network Design Spaces》的整体解读（一篇更比六篇强）

批注：image resolution 的224->448 改变在文中是没有什么作用的

这个看 EfficientNet 的大哥把我逗笑了，哈哈哈，确实，efficientnet 在较小的 regime 上表现比 RegNet 的效果好（毕竟 AotoML 出来的，效果上还是杠杠的）

Backbone系列 - RegNet

【RegNet】《Designing Network Design Spaces》相关推荐

Paper之RegNet：《Designing Network Design Spaces》的翻译与解读—2020年3月30日来自Facebook AI研究院何恺明团队最新算法RegNet
Paper之RegNet:<Designing Network Design Spaces>的翻译与解读-2020年3月30日来自Facebook AI研究院何恺明团队最新算法RegNet ...
Paper：2020年3月30日何恺明团队最新算法RegNet—来自Facebook AI研究院《Designing Network Design Spaces》的翻译与解读
Paper:2020年3月30日何恺明团队最新算法RegNet-来自Facebook AI研究院<Designing Network Design Spaces>的翻译与解读导读: 卧槽 ...
何恺明组《Designing Network Design Spaces》的整体解读（一篇更比六篇强）
本文原载自知乎,已获原作者授权转载,请勿二次转载. https://zhuanlan.zhihu.com/p/122557226 statistics 大法好,DL不是statistics,因为DL不 ...
RegNet: Designing Network Design Spaces
Designing network design spaces是何凯明团队最近推出的一篇论文,目的是在探索网络搜索空间的设计.本论文和EfficientNet一样,也是假定网络的基本模块是不变的,不会 ...
Designing Network Design Spaces，译读
Designing Network Design Spaces,CVPR 2020 ,RegNet,, Ilija Radosavovic, Raj Prateek Kosaraju, Ross Gi ...
oracle 12cdba pdf,【C26】《深入理解Oracle12c数据库管理》PDF 下载
[C26]<深入理解Oracle12c数据库管理>PDF 下载第1章安装Oracle1 1.1了解OFA1 1.1.1Oracle清单目录2 1.1.2Oracle基础目录3 1.1.3 ...
【CVPR2020】Designing Network Design Spaces
论文地址:https://arxiv.org/pdf/2003.13678.pdf Abstract 在这项工作中,我们提出了一个新的网络设计范例.我们的目标是帮助提高对网络设计的理解,并发现跨领域( ...
阅读Designing Network Design Spaces(CVPR2020)
[CVPR2020]网络设计空间,论文地址:https://arxiv.org/pdf/2003.13678.pdf 目录 Abstract 1.Introduction 2. Related Wor ...
【通知】《深度学习之模型设计》第三次重印，欢迎读者支持！
大家好,有三的<深度学习之模型设计:核心算法与案例实践>第三次重印了,这是一本系统性讲述深度学习模型设计核心算法的书籍,同时配套有大量实战案例. 书籍各章节目录如下: 第1章神经网络和计 ...

【RegNet】《Designing Network Design Spaces》