Multi-Scale Context Aggregation by Dilated Convolution

Yu, Fisher and Vladlen Koltun. “Multi-Scale Context Aggregation by Dilated Convolutions.” CoRR abs/1511.07122 (2016): n. pag.

本文主要是对空洞卷积（扩张卷积）以及感受野的理解

dilated convolution是针对图像语义分割问题中下采样会降低图像分辨率、丢失信息而提出的一种卷积思路。？

在基于FCN思想的语义分割问题中，输出图像的size要和输入图像的size一致。但是FCN中有若干stride>1的池化层，所以越高较高的网络层中，单位像素中包含的原始图像的信息就越多，也就是感受野越大，但这是以通过池化降低分辨率、损失原始图像中的信息作为代价而得来的；由于pooling层的存在，后面层的feature map的size会越来越小。但由于需要计算loss等原因，最后输出图像呃size要和输入图像的size保持一致，所以在FCN中的后端网络层中，必须对feature map进行上采样操作，将缩小的feautre map再还原到原始尺寸，在这个过程中，不可能将池化过程中丢失的信息完全还原回来，这样就造成了信息的丢失，语义分割的精度降低。

若如果不加pooling层，在较小的卷积核尺寸的前提下，感受野会很小；但如果为了增大感受野，在中段的网络层中使用size较大的卷积核，计算量又会暴增，内存扛不住！因为中段的channel一般会非常大，比如1024、2018，跟最开始rgb图像的3个channel比起来，增大了几百倍。

Dilated Convolution的提出

 加pooling层，损失信息，降低精度；不加pooling层，感受野变小，模型学习不到全局信息...怎么办？====> 我们先做一个决定：不要pooling层。没有pooling层，后续网络层中，较小size的卷积核的感受野会很小，没有全局视野；====> 既然小size的卷积核感受野小，那就增大卷积核的size；====> 但卷积核size越大，计算量就会随之加大；====> 那可不可以让卷积核增大，使感受野增大，但计算量不变呢？====> 有啊，把原本紧密贴着的卷积核变得"蓬松"一些，但卷积核需要计算的点不变，也就是蓬松出来的位置全填进去0，还按照卷积核原本的计算方式计算就)可以了。====> 这样，由于卷积核的"蓬松"，感受野变大了，又由于卷积核中的有效计算点不变，所以计算量不变。除此之外，每层的feature map的size都不变，所以图像信息都保存了下来。一举三得，OK!妥了！

F(dilated) = [pow(2, (dilated / 2) +2) - 1] × [pow(2, (dilated / 2) +2) - 1]

其中，dilated表示卷积核中每个计算点自己的"半径"， F(dilated)就是卷积核的感受野尺寸。比如上图中（a），dilated=1，F(dilated) = 3×3；图（b）中，dilated=2，F(dilated)=7×7；图（c）中，dilated=4， F(dilated)=15×15。

卷积核大小kxk，dilation factor:n-推出感受野大小为：（k+1）x n - 1

import warnings
import os
os.environ['KERAS_BACKEND'] = 'theano'
import keras.backend as K
K.set_image_dim_ordering('th')
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D, Input
from keras.layers import Dropout, UpSampling2D, ZeroPadding2D
from keras.layers import Permute, Reshape, Activation
from keras.utils.data_utils import get_file
from datasets import CONFIG# CITYSCAPES MODEL
def get_dilation_model_cityscapes(input_shape, classes):# 构建模型model_in = Input(shape=input_shape)    h = Conv2D(64, (3, 3), activation='relu', name='conv1_1')(model_in)  # (,64,1394,1394)h = Conv2D(64, (3, 3), activation='relu', name='conv1_2')(h)         # (,64,1392,1392)h = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool1')(h)  # (,64,696,696)h = Conv2D(128, (3, 3), activation='relu', name='conv2_1')(h)        # (,128,694,694)h = Conv2D(128, (3, 3), activation='relu', name='conv2_2')(h)        # (,128,692,692)h = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool2')(h)  # (,128,346,346)h = Conv2D(256, (3, 3), activation='relu', name='conv3_1')(h)        # (,256,344,344)h = Conv2D(256, (3, 3), activation='relu', name='conv3_2')(h)        # (,256,342,342)h = Conv2D(256, (3, 3), activation='relu', name='conv3_3')(h)        # (,256,340,340)h = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool3')(h)  # (,256,170,170)h = Conv2D(512, (3, 3), activation='relu', name='conv4_1')(h)        # (,512,168,168)h = Conv2D(512, (3, 3), activation='relu', name='conv4_2')(h)        # (,512,166,166)h = Conv2D(512, (3, 3), activation='relu', name='conv4_3')(h)        # (,512,164,164)h = Conv2D(512, (3, 3), dilation_rate=(2, 2), activation='relu', name='conv5_1')(h)  # (,512,160,160) 扩张卷积比普通卷积对于尺度的缩小作用更大h = Conv2D(512, (3, 3), dilation_rate=(2, 2), activation='relu', name='conv5_2')(h)  # (,512,156,156) 扩张卷积h = Conv2D(512, (3, 3), dilation_rate=(2, 2), activation='relu', name='conv5_3')(h)  # (,512,152,152) 扩展卷积h = Conv2D(4096,(7, 7), dilation_rate=(4, 4), activation='relu', name='fc6')(h)     # (,4096,128,128)  扩张后的卷积核大小为 25*25h = Dropout(0.5, name='drop6')(h)h = Conv2D(4096, (1, 1), activation='relu', name='fc7')(h)   # (,4096,128,128)h = Dropout(0.5, name='drop7')(h)h = Conv2D(classes, (1, 1), name='final')(h)                   # (,19,128,128)  classes=19 最后要在通道上取softmax，因此通道数等于类别数h = Conv2D(classes, (3, 3), padding='same', activation='relu', name='ctx_conv1_1')(h)  # (,19,128,128)h = Conv2D(classes, (3, 3), padding='same', activation='relu', name='ctx_conv1_2')(h)  # (,19,128,128)h = ZeroPadding2D(padding=(2, 2))(h)h = Conv2D(classes, (3, 3), dilation_rate=(2, 2), activation='relu', name='ctx_conv2_1')(h)   # (,19,128,128)h = ZeroPadding2D(padding=(4, 4))(h)h = Conv2D(classes, (3, 3), dilation_rate=(4, 4), activation='relu', name='ctx_conv3_1')(h)   # (,19,128,128)h = ZeroPadding2D(padding=(8, 8))(h)h = Conv2D(classes, (3, 3), dilation_rate=(8, 8), activation='relu', name='ctx_conv4_1')(h)   # (,19,128,128)h = ZeroPadding2D(padding=(16, 16))(h)h = Conv2D(classes, (3, 3), dilation_rate=(16, 16), activation='relu', name='ctx_conv5_1')(h)  # (,19,128,128)h = ZeroPadding2D(padding=(32, 32))(h)h = Conv2D(classes, (3, 3), dilation_rate=(32, 32), activation='relu', name='ctx_conv6_1')(h)  # (,19,128,128)h = ZeroPadding2D(padding=(64, 64))(h)h = Conv2D(classes, (3, 3), dilation_rate=(64, 64), activation='relu', name='ctx_conv7_1')(h)  # (,19,128,128)h = ZeroPadding2D(padding=(1, 1))(h)h = Conv2D(classes, (3, 3), activation='relu', name='ctx_fc1')(h)     # (,19,128,128)h = Conv2D(classes, (1, 1), name='ctx_final')(h)                      # (,19,128,128)# the following two layers pretend to be a Deconvolution with grouping layer.# never managed to implement it in Keras# since it's just a gaussian upsampling trainable=False is recommended# tensorflow实现中用 tf.image.resize_bilinear 执行双线性插值上采样h = UpSampling2D(size=(8, 8))(h)        # 上采样8倍 (,19,1024,1024)  # 使用卷积操作构造一个高斯上采样，这里trainable=Falselogits = Conv2D(classes, (16, 16), padding='same', use_bias=False, trainable=False, name='ctx_upsample')(h)  # (,19,1024,1024)# 在通道层面施加softmax_, c, h, w = logits._keras_shape            # (None,19,1024,1024)x = Permute(dims=(2, 3, 1))(logits)         # (None,1024,2024,19)x = Reshape(target_shape=(h * w, c))(x)     # (None,1048576,19)x = Activation('softmax')(x)                # (None,1048576,19)x = Reshape(target_shape=(h, w, c))(x)      # (None,1024,1024,19)model_out = Permute(dims=(3, 1, 2))(x)      # (None,19,1024,1024)# 构建模型model = Model(input=model_in, output=model_out, name='dilation_cityscapes')return model# model function
def DilationNet(dataset, pretrained=True):""" 根据有关参数，初始化模型，导入权重"""classes = CONFIG[dataset]['classes']          # 类别数目input_shape = CONFIG[dataset]['input_shape']  # 采用th顺序，维度在前# get the modelif dataset == 'cityscapes':model = get_dilation_model_cityscapes(input_shape=input_shape, classes=classes)# 导入权重if pretrained:assert K.image_dim_ordering() == 'th'weights_path = get_file("cityscapes.h5", origin=None, cache_subdir='models') model.load_weights(weights_path)return modelif __name__ == '__main__':ds = 'cityscapes'  # choose between cityscapes, kitti, camvid, voc12# get the modelmodel = DilationNet(dataset=ds)model.compile(optimizer='sgd', loss='categorical_crossentropy')model.summary()

【语义分割专题】语义分割相关工作--Multi-Scale Context Aggregation by Dilated Convolution相关推荐

语义分割和实例分割_2019 语义分割指南
语义分割是指将图像中的每个像素归于类标签的过程,这些类标签可以包括一个人.汽车.鲜花.一件家具等. 我们可以将语义分割认为是像素级别的图像分类.例如,在有许多汽车的图像中,分割会将所有对象标记为汽车对 ...
【语义分割专题】语义分割相关工作--PSPNet
Pyramid Scene Parsing Network 论文在结构上提供了一个pyramid pooling module,在不同层次上融合feature,达到语义和细节的融合. PSPNet通过 ...
详解计算机视觉五大技术：图像分类、对象检测、目标跟踪、语义分割和实例分割
https://www.tinymind.cn/articles/120 [ 导读]目前,计算机视觉是深度学习领域最热门的研究领域之一.计算机视觉实际上是一个跨领域的交叉学科,包括计算机科学(图形.算 ...
Facebook人工智能实验室提出「全景分割」，实现实例分割和语义分割的统一
原文来源:arxiv 作者:Alexander Kirillov.Kaiming He1.Ross Girshick.Carsten Rother.Piotr Dollar 「雷克世界」编译:嗯~阿童 ...
语义分割和实例分割_一文读懂语义分割与实例分割
以人工智能为导向的现代计算机视觉技术,在过去的十年中发生了巨大的变化.今天,它被广泛用于图像分类.人脸识别.物体检测.视频分析以及机器人及自动驾驶汽车中的图像处理等领域.图像分割技术是目前预测图像领域 ...
一文详解计算机视觉五大技术：图像分类、对象检测、目标跟踪、语义分割和实例分割
[ 导读]目前,计算机视觉是深度学习领域最热门的研究领域之一.计算机视觉实际上是一个跨领域的交叉学科,包括计算机科学(图形.算法.理论.系统.体系结构),数学(信息检索.机器学习),工程学(机器人.语 ...
gcn语义分割_语义分割该如何走下去？
来自 | 知乎编辑 | 深度学习这件小事链接 | https://www.zhihu.com/question/390783647本文仅供交流,如有侵权,请联系删除问题语义分割该如何走下 ...
BiSeNet:用于实时语义分割的双边分割网络-7min精简论文阅读系列-Leon
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation BiSeNet:用于实时语义分割的双边分割网络 ...
一文详解语义SLAM相关工作
作者丨方川@知乎来源丨https://zhuanlan.zhihu.com/p/379243930 编辑丨计算机视觉工坊动态SLAM和语义SLAM 场景中的动态物体不一定是object或不能得到o ...

【语义分割专题】语义分割相关工作--Multi-Scale Context Aggregation by Dilated Convolution

Multi-Scale Context Aggregation by Dilated Convolution

【语义分割专题】语义分割相关工作--Multi-Scale Context Aggregation by Dilated Convolution相关推荐

最新文章

热门文章