






DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略



       Training Deep Neural Networks is complicated by the fact  that the distribution of each layer’s inputs changes during  training, as the parameters of the previous layers change.  This slows down the training by requiring lower learning  rates and careful parameter initialization, and makes it no  -  toriously hard to train models with saturating nonlinearities.  We refer to this phenomenon as internal covariate  shift, and address the problem by normalizing layer inputs.  Our method draws its strength from making normalization  a part of the model architecture and performing the  normalization for each training mini-batch. Batch Normalization  allows us to use much higher learning rates and  be less careful about initialization. It also acts as a regularizer,  in some cases eliminating the need for Dropout.  Applied to a state-of-the-art image classification model,  Batch Normalization achieves the same accuracy with 14  times fewer training steps, and beats the original model  by a significant margin. Using an ensemble of batchnormalized  networks, we improve upon the best published  result on ImageNet classification: reaching 4.9% top-5  validation error (and 4.8% test error), exceeding the accuracy  of human raters.
       We have presented a novel mechanism for dramatically  accelerating the training of deep networks. It is based on  the premise that covariate shift, which is known to complicate  the training of machine learning systems, also applies to sub-networks and layers, and removing it from internal activations of the network may aid in training. Our proposed method draws its power from normalizing activations, and from incorporating this normalization in the network architecture itself. This ensures that the normalization is appropriately handled by any optimization method that is being used to train the network. To enable stochastic optimization methods commonly used in deep network training, we perform the normalization for each mini-batch, and backpropagate the gradients through the normalization parameters. Batch Normalization adds only two extra parameters per activation, and in doing so preserves the representation ability of the network. We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and often do not require Dropout for regularization.
       我们提出了一种新的机制,可以显著加快深度网络的训练。它的前提是协变量移位(covariate shift)也适用于子网络和层,从网络的内部激活中去除协变量移位可能有助于训练。协变量移位已知会使机器学习系统的训练复杂化。我们提出的方法从规范化激活和将这种规范化合并到网络体系结构本身中获得强大的功能。这可以确保任何用于训练网络的优化方法都能恰当地处理规范化。为了实现深度网络训练中常用的随机优化方法,我们对每个小批进行归一化,并通过归一化参数对梯度进行反向传播。批处理规范化在每次激活时只添加两个额外的参数,这样做保留了网络的表示能力。提出了一种利用批处理规范化网络构造、训练和执行推理的算法。得到的网络可以用饱和非线性进行训练,对增加的训练率更有容忍度,而且通常不需要退出正则化。
       Merely adding Batch Normalization to a state-of-theart  image classification model yields a substantial speedup  in training. By further increasing the learning rates, removing  Dropout, and applying other modifications afforded  by Batch Normalization, we reach the previous  state of the art with only a small fraction of training steps  – and then beat the state of the art in single-network image  classification. Furthermore, by combining multiple models  trained with Batch Normalization, we perform better  than the best known system on ImageNet, by a significant  margin.
       Interestingly, our method bears similarity to the standardization  layer of (G¨ulc¸ehre & Bengio, 2013), though  the two methods stem from very different goals, and perform  different tasks. The goal of Batch Normalization  is to achieve a stable distribution of activation values  throughout training, and in our experiments we apply it  before the nonlinearity since that is where matching the  first and second moments is more likely to result in a  stable distribution. On the contrary, (G¨ulc¸ehre & Bengio,  2013) apply the standardization layer to the output of the  nonlinearity, which results in sparser activations. In our  large-scale image classification experiments, we have not  observed the nonlinearity inputs to be sparse, neither with  nor without Batch Normalization. Other notable differentiating characteristics of Batch Normalization include  the learned scale and shift that allow the BN transform  to represent identity (the standardization layer did not require  this since it was followed by the learned linear transform  that, conceptually, absorbs the necessary scale and  shift), handling of convolutional layers, deterministic inference  that does not depend on the mini-batch, and batchnormalizing  each convolutional layer in the network.
       有趣的是,我们的方法与(G¨ulc ehre & Bengio, 2013)的标准化层有相似之处,尽管这两种方法的目标非常不同,执行的任务也不同。批量归一化的目标是在整个训练过程中实现激活值的稳定分布,在我们的实验中,我们将其应用于非线性之前,因为在非线性之前,匹配第一和第二矩更有可能得到稳定的分布。相反,(G¨ulc ehre & Bengio, 2013)将标准化层应用于非线性的输出,导致更稀疏的激活。在我们的大规模图像分类实验中,我们没有观察到非线性输入是稀疏的,既没有批次归一化也没有没有。批正常化的其他显著的差异化特征包括规模和学习转变,使BN变换代表身份(标准化层不需要这个,因为随之而来的线性变换,从概念上讲,吸收必要的规模和转移),卷积处理层,确定性推理,并不取决于mini-batch,和每个卷积batchnormalizing层网络中。
       In this work, we have not explored the full range of  possibilities that Batch Normalization potentially enables.  Our future work includes applications of our method to  Recurrent Neural Networks (Pascanu et al., 2013), where  the internal covariate shift and the vanishing or exploding  gradients may be especially severe, and which would allow  us to more thoroughly test the hypothesis that normalization  improves gradient propagation (Sec. 3.3). We plan  to investigate whether Batch Normalization can help with  domain adaptation, in its traditional sense – i.e. whether  the normalization performed by the network would allow  it to more easily generalize to new data distributions,  perhaps with just a recomputation of the population  means and variances (Alg. 2). Finally, we believe that further  theoretical analysis of the algorithm would allow still  more improvements and applications.
       在这项工作中,我们还没有探索批处理规范化可能实现的所有可能性。我们未来的工作包括将我们的方法应用于递归神经网络(Pascanu et al., 2013),其中内部协变量移位和消失或爆炸梯度可能特别严重,这将使我们能够更彻底地检验正常化改善梯度传播的假设(第3.3节)。我们计划调查是否批标准化有助于域适应,在传统意义上,即标准化执行的网络是否会使它更容易推广到新的数据分布,也许只需重新计算总体均值和方差(alg.2)。最后,我们相信的进一步理论分析算法将允许更多的改进和应用。

Sergey Ioffe, Christian Szegedy.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,




Batch Normalization有两个功能,一个是可以加快训练和收敛速度,另外一个是可以防止过拟合。



  • Batch Normalization(批归一化)。意义,目前BN已经成为几乎所有卷积神经网络的标配技巧。
  • 5x5卷积核→2个3x3卷积核。相同的感受野


在提供的包含50000个图像的验证集上,与以前的最新技术进行批量标准化初始比较。*根据测试服务器的报告,在ImageNet测试集的100000张图像上,BN初始集成已达到4.82% top-5。
         其中BN-Inception Ensemble,则采用多个网络模型集成学习后得到的结果。


TF之DD:利用Inception模型+GD算法生成原始的Deep Dream图片
TF之DD:利用Inception模型+GD算法生成更大尺寸的Deep Dream精美图片
TF之DD:利用Inception模型+GD算法生成更高质量的Deep Dream高质量图片
TF之DD:利用Inception模型+GD算法生成带背景的大尺寸、高质量的Deep Dream图片


  1. DL之Xception:Xception算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之Xception:Xception算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 Xception算法的简介(论文介绍) 1.论文使用的数据集 Xception算法的架构详 ...

  2. DL之InceptionV4/ResNet:InceptionV4/Inception-ResNet算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之InceptionV4/ResNet:InceptionV4/Inception-ResNet算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 InceptionV4/Inc ...

  3. DL之GoogleNet:GoogleNet(InceptionV1)算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之GoogleNet:GoogleNet(InceptionV1)算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 GoogleNet算法的简介 GoogleNet算法的架构详 ...

  4. DL之InceptionV2/V3:InceptionV2 InceptionV3算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之InceptionV2/V3:InceptionV2 & InceptionV3算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 InceptionV2 & ...

  5. DL之DilatedConvolutions:Dilated Convolutions(膨胀卷积/扩张卷积)算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之DilatedConvolutions:Dilated Convolutions(膨胀卷积/扩张卷积)算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 Dilated Con ...

  6. DL之ShuffleNetV2:ShuffleNetV2算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之ShuffleNetV2:ShuffleNetV2算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 ShuffleNetV2算法的简介(论文介绍) 1.论文特点 2.基于硬件 ...

  7. DL之ShuffleNet:ShuffleNet算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之ShuffleNet:ShuffleNet算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 相关文章 DL之ShuffleNet:ShuffleNet算法的简介(论文介绍).架构详 ...

  8. DL之MobileNetV2:MobileNetV2算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之MobileNet V2:MobileNet V2算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 MobileNetV2算法的简介(论文介绍) MobileNet V2算法 ...

  9. DL之MobileNet:MobileNet算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

    DL之MobileNet:MobileNet算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略 目录 MobileNet算法的简介(论文介绍) 1.研究背景 2.传统的模型轻量化常用的方法 ...


  1. 为云服务立规矩——首批可信云服务认证名单公布
  2. 为什么阿里巴巴禁止使用Apache Beanutils进行属性的copy?
  3. oracle修改c root,从新发现Oracle太美之root.sh
  4. 新装WINDOWS XP系统 必须安装的十大高危漏洞补丁
  5. 1.5 编程基础之循环控制 11 整数的个数(2022.01.09)--python
  6. Flutter Container 容器以及对齐方式 Alignment
  7. mysql-bin position_MySQL基于binlog-position的复制
  8. java对接PayPal支付 (添加物流跟踪信息)
  9. 淼淼Kruskal算法
  10. ubuntu安装中文拼音输入法,装系统的第一步
  11. 在java中 int类型对应的包装类是_Java语言对简单数据类型进行了类包装,int对应的包装类是______。...
  12. Android实现白天黑夜动画,android 实现【夜晚模式】的另外一种思路
  13. 重磅!被称“新材料之王”“黑黄金”,中石化大丝束碳纤维投产,全球仅4家能产 | 美通社头条...
  14. 网易云音乐api,硅谷云音乐调用登录API出现,网络太拥挤,登录失败(最简单的解决方案,有效哦)
  15. 微信分享带缩略图,标题,简介的JS代码(亲测有效)
  16. 新睿云科普:什么叫云技术?云技术是如何发展到如今的?
  17. 创建Windows10 密码重设盘
  18. 大数据的误区——何为真正的大数据
  19. 华为matebook护眼模式失效的解决办法
  20. 【20保研】清华-伯克利深圳学院2019年暑期夏令营招募通知


  1. git 提交命令_工作总结:Git的学习和使用,最详细的Git教程,从入门到精通
  2. android状态栏半透明灰色,Android7.0沉浸式状态栏蒙灰问题完美解决
  3. 引号快捷键_高效率的Excel-Ctrl类快捷键二
  4. 国内大数据开发中比较受欢迎的几款工具
  5. 探索 YOLO v3 实现细节 - 第6篇 预测 (完结)
  6. 区块链面试过程中的40个问题
  7. 浅说--未将对象引用设置到对象的实例(System.NullReferenceException)
  8. docker与虚拟机的对比
  9. 简单易懂的自动驾驶科普知识
  10. 深入浅出下一代互联网基础IPFS