U-Net最初是一个用于二维图像分割的卷积神经网络,分别赢得了ISBI 2015细胞追踪挑战赛和龋齿检测挑战赛的冠军. U-net是基于全卷积网络拓展和修改而来,网络由两部分组成:一个收缩路径(contracting path)来获取上下文信息以及一个对称的扩张路径(expanding path)用以精确定位。下面就来精读一下这篇论文吧~


1. Abstract

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window
convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Moreover, the network is fast. Segmentation
of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net

  • 为了更有效的利用标注数据,我们使用了数据增强的方法(data augmentation)。
  • 我们的网络由两部分组成:一个收缩路径(contracting path)来获取上下文信息以及一个对称的扩张路径(expanding path)用以精确定位。

这种网络可以从很少的图像中进行端到端的训练。
这个网络非常的快

2. Introduction

In the last two years, deep convolutional networks have outperformed the state of the art in many visual recognition tasks.While convolutional networks have already existed for a long time, their success was limited due to the size of the available training sets and the size of the considered networks. The breakthrough by Krizhevsky et al. was due to supervised training of a large network with 8 layers and millions of parameters on the ImageNet dataset with 1 million training images. Since then, even larger and deeper networks have been trained.

The typical use of convolutional networks is on classification tasks, where the output to an image is a single class label. However, in many visual tasks, especially in biomedical image processing, the desired output should include localization, i.e., a class label is supposed to be assigned to each pixel. Moreover, thousands of training images are usually beyond reach in biomedical tasks.
Hence, Ciresan et al. trained a network in a sliding-window setup to predict the class label of each pixel by providing a local region (patch) around that pixel as input. First, this network can localize. Secondly, the training data in terms
of patches is much larger than the number of training images. The resulting network won the EM segmentation challenge at ISBI 2012 by a large margin.

Obviously, the strategy in Ciresan et al. has two drawbacks. First, it is quite slow because the network must be run separately for each patch(对每个点都要截取一块图进行训练), and there is a lot of redundancy due to overlapping patches. Secondly, there is a trade-off between localization accuracy and the use of context. Larger patches require more max-pooling layers that reduce the localization accuracy(如果截取的那块图过大,会损失掉局部信息), while small patches allow the network to see only little context.

FCN的思想是:

In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network”. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations(见图1). The main idea in fully convolutional network is to supplement a usual contracting network by successive layers("U"型的右侧), where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled output(skip-connection). A successive convolution layer can then learn to assemble a more precise output based on this information.

作者的改进:

One important modification in our architecture is that in the upsampling part we have also a large number of feature channels, which allow the network to propagate context information to higher resolution layers. As a consequence,
the expansive path is more or less symmetric to the contracting path, and yields a u-shaped architecture. The network does not have any fully connected layers and only uses the valid part of each convolution, i.e., the segmentation map only contains the pixels, for which the full context is available in the input image. This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy (see Figure 2). To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory.

与FCN不同的是:

  • 我们的网络在上采样部分依然有大量的特征通道,这使得网络可以将空间上下文信息向更高的分辨率层传播。结果是,上采样路径基本对称于下采样路径,并呈现出一个U型。
  • 网络不存在任何全连接层,并且,只使用每个卷积的valid部分,例如,分割图只包含这样一些像素点,这些像素点的完整上下文都出现在输入图像中。这种策略允许使用Overlap-tile策略无缝地分割任意大小的图像(参见下图)。
  • 为了预测图像边界区域的像素点,我们采用镜像图像的方式补全缺失的环境像素。这个tiling方法在使用网络分割大图像时是非常有用的,因为如果不这么做,GPU显存会限制图像分辨率。

  • Overlap-tile策略可以无缝分割任意大小的图像(这里分割的神经元结构在EM堆叠)。黄色区域是预测的分割,需要蓝色区域内的图像数据作为输入。通过镜像的方式外推缺少的输入数据。

3. Network Architecture

The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels.
Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped
feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers. To allow a seamless tiling of the output segmentation map (see Figure 2), it is important to select the input tile size such that all 2x2 max-pooling operations are applied to a layer with an even x- and y-size.

contracting path是典型的卷积网络架构:

架构中含有着一种重复结构,每次重复中都有2个 3*3 卷积层(无padding)、非线性ReLU层和一个 2*2 max pooling层(stride为2)。每一次下采样后我们都把特征通道的数量加倍。

expansive path也使用了一种相同的排列模式:

每一步都首先使用反卷积(up-convolution),每次使用反卷积都将特征通道数量减半,特征图大小加倍。反卷积过后,将反卷积的结果与contracting path中对应步骤的特征图拼接起来。
contracting path中的特征图尺寸稍大,将其修剪过后进行拼接。对拼接后的map再进行2次3*3的卷积。

最后一层的卷积核大小为1*1,将64通道的特征图转化为特定类别数量(分类数量,二分类为2)的结果.

U-Net论文阅读(逐句翻译+精读)相关推荐

  1. SCI论文阅读神器-翻译-资源

    搜狗 有道 deepl 谷歌 百度翻译 小绿鲸sci阅读器 Xtranslator 专业词汇翻译 cnki

  2. Self-Supervised Pre-Training for Transformer-BasedPerson Re-Identification 论文阅读与翻译

    摘要 基于transformer的监督前训练在reid方面获得了很好的表现.但是,由于ImageNet与ReID数据集之间的域差异,通常需要更大的训练前的数据集(例如ImageNet-21K),因为t ...

  3. YOLOv4论文阅读(附原文翻译)

    YOLOv4论文阅读(附原文翻译) 论文阅读 论文翻译 Abstract摘要 1.Introduction 引言 2.Related work相关工作 2.1.Object detection mod ...

  4. Feature Selective Anchor-Free Module for Single-Shot Object Detection论文阅读翻译 - 2019CVPR

    Feature Selective Anchor-Free Module for Single-Shot Object Detection论文阅读翻译 文章目录 Feature Selective A ...

  5. 论文查找: arXiv,论文阅读:知云文献翻译, 完美组合 !

    点击上方"码农的后花园",选择"星标" 公众号精选文章,第一时间送达 我相信大家在查阅学习某些相关技术.或者在写毕业论文.等各种论文的时候大部分都是在知网上查找 ...

  6. 【论文阅读+】SCRDet 论文翻译学习

    SCRDet:Towards More Robust Detection for Small, Cluttered and Rotated Objects_babywang0的博客-CSDN博客_sc ...

  7. 【论文阅读翻译】A STRUCTURED SELF - ATTENTIVE SENTENCE EMBEDDING

    [论文阅读翻译]A STRUCTURED SELF - ATTENTIVE SENTENCE EMBEDDING Abstruct 1. Introducion 2. Approach 2.1 Mod ...

  8. Context Prior for Scene Segmentation论文阅读/翻译

    Context Prior for Scene Segmentation论文阅读/翻译 论文地址 Abstract Introduction Context Prior Affinity Loss C ...

  9. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals - 论文阅读翻译

    Sparse R-CNN: End-to-End Object Detection with Learnable Proposals - 论文阅读翻译 文章目录 Sparse R-CNN: End-t ...

最新文章

  1. [JSOI2008]星球大战 并查集
  2. php限定名称写法,php命名空间:非限定名称、限定名称、完全限定名称实例详解...
  3. tensorflow中学习率、过拟合、滑动平均的学习
  4. 数据结构实验之栈三:后缀式求值
  5. python函数的作用域_python学习第五篇 函数 变量作用域
  6. Linux程序分析工具介绍—ldd,nm
  7. python迭代-可迭代对象与迭代器对象
  8. 【云和恩墨大讲堂】从执行计划洞察ORACLE优化器的“小聪明”
  9. AC自动机 HDOJ 2222 Keywords Search
  10. 分享用Adobe Air向iOS移植游戏的经验
  11. Centos 6.5 Tengine 安装流程
  12. 淘宝/天猫获得淘宝商品类目 API 返回值说明
  13. ubuntu永久修改mac地址
  14. 【Unity3D开发小游戏】《青蛙过河》Unity开发教程
  15. 关于求余和取模的区别以及负数取摸
  16. springcloud4-服务熔断hystrix及sentinel
  17. 【Arduino】wokwi在线编程仿真学习
  18. 论文推介:CaTT-KWS—基于级联Transducer-Transformer的多阶段自定义关键词识别框架
  19. 教程篇(7.0) 07. FortiGate基础架构 高可用性(HA) ❀ Fortinet 网络安全专家 NSE 4
  20. 正面管教php_正面管教php_我校开展正面管教家长工作坊分享会

热门文章

  1. Heat模板及简单应用
  2. 骨传导蓝牙耳机推荐,2022年最好的骨传导耳机
  3. 计算机英语第四版可可英语翻译,专四英语作文高分范文背诵(MP3+中英字幕)第28篇:计算机和人翻译...
  4. qt、adb、小米屏幕滑动demo
  5. html数据的格式是什么格式的文件,.html是什么格式的文件,html文件怎么打开
  6. ios 高德获取定位_更新日志-iOS 定位SDK | 高德地图API
  7. [攻防世界]crypto新手练习区Caesar
  8. linux文件管理命令ppt,linux命令以及文件管理.ppt
  9. 二倍均值随机算法之抢拼手气红包场景应用
  10. 中金支付APP跳转支付对接