[CV论文填坑]

  • 论文标题:Backpropagation applied to Handwritten zip code recognition(Yan LeCun), 1989
  • 文章大意:将BP算法应用到手写邮政编码识别
  • 文章链接:http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf
  • 备用链接:https://download.csdn.net/download/HaoZiHuang/12272643

注:每一段笔者用绿色标注开头;为方便阅读,每一句话,被分解为一段

0.abstract

The ability of learning networks to generalize can be greatly enhanced by providing constraints from the task domain.

This paper demonstrates how such constrains can be integrated into a backpropagation network through the archiecture of the network.

The approach has been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

A single network learns the entire recogition operation, going from the normalized image of the character to the final classification.

1.introduction

Previous work performed on recognizing simple digit image(LeCun 1989) showed that good generalization on complex tasks can be obtained by designing a network architechture that contains a certain amount of a priori knowledge[priori knowledge:先验知识] about the task.

The basic design principle is to reduce the number of free parameters in the network as much as possible without overly reducing its computational power.
Application of this principle increases the probability of correct generalization because it results in a specialized network architechture that has a reduced entropy[reduced entropy:降熵](Denker et al 1987;Patarnello and Carnevali 1987, Tishby et al.1989; LeCun 1989), and a reduced Vapnik-Chervonenkis dimensionality[reduced dimensionality:降维](Baum and Haussler 1989)
In this paper, we apply the backpropagetion algorithm(Rumelhart et al 1986) to a real-world problem in recognizing handwritten digits taken from the U.S. Mail.
Unlike previous results reported by our group on the problem (Denker et al 1989), the learning network is directly fed with images, rather than feature vectors, thus demonstrating[demonstrate:证明展示] the ability of backpropagation networks to deal with large amounts of low-level information.

2.zip codes
2.1 data base The data base used to train and test the network consist of 9298 segmented numerals digitized from handwritten zip codes that appeared on U.S. mail passing through the Buffalo, NY post office.
Examples of such image are shown in Figure 1.The digits were written by many different people, using a great variety of sizes, writing styles, and instruments, with widely varying amounts of care;7291 examples are used for training the network and 2007 are used for testing the generalization performance[generalization performance:泛化能力].

One important feature of this data base is that both the training set and the testing set contain numerous examples that are ambiguous[ambiguous:含糊不清的], unclassifiable, or even misclassified.

2.2 Preprocessing

Locating the zip code on the envelope and separating each digit from its neighbours, a very hard task in itself, was performed by Postal Service contractors(Wang and Srihari 1988).

At this point, the size of a digit image varies but is typically around 40 by 60 pixels.
A linear transformation is then applied to make the image fit in a 16 by 16 pixel image.
This transformation preserves the aspact ratio of the character, and is performed after extraneous marks in the image have been removed.
Because of the linear transformation, the resulting image is not binary but multiple gray levels[multiple gray levels:多个灰度级], since a variable number of pixels in the origin image can fall into a given pixel in the target image. [此处大意:为了将可变大小的原图映射到16*16的目标图,使用该线性变换,导致了原来的二值图变为有多个灰度级的图]
The gray levels of each image are scaled and translated to fall within the range -1 to 1.
3.Network Deigne

3.1 Input and Output

The remainder of the recognition is entirely performed by a multilayer network.[multilayer network:多层神经网络]
All of the connections in the network are adaptive, although heavily constrained, and are trained using backpropagation.
This is in contrast with [in contrast with:与...想反] earlier work (Denker et al.1989) where the first few layers of connections were hand-chosen constants implemented on a neural-network chip.
The input of the network is a 16 by 16 normalized image.
The output is compoesd of 10 units(one per class) and uses place coding [place coding:位置编码].
3.2 Feature Maps and Weight Sharing Classical work in visual pattern recognition has demonstrated the advantage of extracting local features and combining them to form higher order features.
Such knowledge can be easily built into the network by forcing the hidden units to combine only local sources of information.
Distinctive features of an object can appear at various locations on the input image.
Therefore it seems judicious to have a set of feature detectors that can detect a particular instance of a feature anywhere on the input plane.
Since the precise location of a feature is not relevant to the classification, we can afford to lose some position information in the process.
Nevertheless, approximate postion information must be preserved, to allow the next levels to detect higher order, more complex features(Fukushima 1980;Mozer 1987) [此处大意:特征在图片中的精确位置与分类是不相关的,所以我们可以丢弃一定的信息。不过,也要保留部分位置信息,以使下一级可以检测更高阶,更复杂的特征]
The detection of a particular feature at any location on the input can be easily done using the "weight sharing" technique.
Weight Sharing was described in Rumelhart et al (1989) for the so-called T-C problem and consists in having several connections(links) controlled by a single parameter(weight).
It can be interpreted as imposing equality constraints among the connection strengths.
This technique can be implemented with very little computational overhead. [computational overhead:计算开销]
Weight sharing not only greatly reduces the number of free parameters in the network but also can express information about the geometry and topology of the task.
In our case, the first hidden layer is composed of several planes that we call feature maps. [feature maps:特征图,可以理解为卷积后的结果]
All units in a plane share the same set of weights, thereby detecting the same feature at different locations.
Since the exact position of the feature is not important, the feature maps need not have as many units as the input.

3.3 Network Architecture

The network is represented in Figure 2.

Its architecture of the one proposed in LeCun(1989).
The network has three hidden layers named H1, H2 and H3, respectively. [respectively:分别]
Connections entering H1 and H2 are local and are heavily constrained.

H1 is composed of 12 groups of 64 units arranged as 12 independent 8 by 8 feature maps.

These 12 feature maps will be designated by H1_1, H1_2,...,H1_12.
Each unit in a feature map takes input on a 5 by 5 neighbourhood on the input plane.
For units in layer H1 that are one unit apart, their receptive field [receptive field:感受野] (in the input layer) are two pixels apart.
Thus, the input image is undersampled and some position information is eliminated.
A simplar two-to-one undersampling occurs going from layer H1 to H2.
The motivation is that high resolution may be needed to detect the presence of a feature, while its exact position need not be determined with equally high precision. [此处大意:我们这样做的目的是,在检测特征的存在时,需要高分辨率,而无需以相同的高精度确定特征的位置]

It is also known that the kinds of features that are important at one place in the image are likely to be important in other places.

Therefore, corresponding connections on each unit in a given feature map are constrained to have the same weights.

Each unit performs the same operation on corresponding parts of the image.

The function performed by a feature map can thus be interpreted as a nonlinear subsampled convolution with a 5 by 5 kernel.

Of course, units in another (say H1.4) share another set of 25 weights.

Units do not share their biases(thresholds).

Each unit thus has 25 input lines plus a bias.

Connections extending past the boundaries of the input plane take their input from a virtual background plane whose state is equal to a constant, predetermined background level, in our case-1

Thus, layer H1 comprises 768 units (8 by 8 times 12), 19968 connections(768 times 26), but only 1068 free parameters(768 bias plus 25 times 12 feature kernels) since many connections share the same weight.

Layer H2 is also composed of 12 features maps.

Each feature map contains 16 units arranged in a 4 by 4 plane.

As before, these feature maps will be designated as H2.1, H2,2, …, H2.12.

The connection scheme between H1 and H2 is quite similar to the one between the input and H1,but slightly more complicated because H1 has multiple two-dimensional maps.

Each unit in H2 combines local information coming from 8 of the 12 different feature maps in H1.

Its receptive field is composed of eight 5 by 5 neighborhoods centered around units that are at identical positions within each of the eight maps.

Thus, a unit in H2 has 200 inputs, 200 weights, and a bias.

Once again, all units in a given map in H1 on which a map in H2 takes its inputs are chosen according a scheme that will not be described here.

Connections falling off the boundaries are treated like as in H1.

To summarize, layer H2 contains 192 units(12 times 4 by 4) and there is a total of 38592 connections between layers H1 and H2(192 units times 201 input lines). All these connections are controlled by only 2592 free parameters(12 feature maps times 200 weights plus 192 biases).

Layer H3 has 30 units, and is fully connected to H2.

The number of connections between H2 and H3 is thus 5790 (30 times 192 plus 30biases).

The output layer has 10 units and is also fully connected to H3, adding another 310 weights.

In summary, the network has 1256 units, 64660 connections, and 9760 independent parameters.

[CV论文填坑]:Backpropagation applied to Handwritten zip code recognition(Yan LeCun)相关推荐

  1. VGGNet论文翻译-Very Deep Convolutional Networks for Large-Scale Image Recognition

    Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan[‡] & Andrew Zi ...

  2. 20150726 填坑日记

    三中内填坑: 1. 组合数递推什么的 C(m,n)=C(m,n-1)+C(m-1,n-1).填了个大坑,以前没认真听课QAQ 2. 裸题过河卒 3. 缺角正方形摆放车统计,分上下部分,枚举上部分放几个 ...

  3. 卷积神经网络「失陷」,CoordConv来填坑

    卷积神经网络「失陷」,CoordConv来填坑 作者:Rosanne Liu等 卷积神经网络拥有权重共享.局部连接和平移等变性等非常优秀的属性,使其在多种视觉任务上取得了极大成功.但在涉及坐标建模的任 ...

  4. 【每周CV论文推荐】 初学深度学习单张图像三维人脸重建需要读的文章

    欢迎来到<每周CV论文推荐>.在这个专栏里,还是本着有三AI一贯的原则,专注于让大家能够系统性完成学习,所以我们推荐的文章也必定是同一主题的. 基于图像的人脸三维重建在人脸分析与娱乐领域里 ...

  5. WGAN-div:默默无闻的WGAN填坑者 | 附开源代码

    作者丨苏剑林 单位丨广州火焰信息科技有限公司 研究方向丨NLP,神经网络 个人主页丨kexue.fm 今天我们来谈一下 Wasserstein 散度,简称"W 散度".注意,这跟 ...

  6. Mac Xcode opencv C++环境配置 保姆级教程 填坑记录 19年最新版本

    网上找了很多教程,照着做都失败了,整整弄了两天两夜,终于好了.网上根本没有人遇到我的坑么?都搜不到,国外也没搜到,呜呜- 版本说明: 1.(必备)MacBook系统:macOS Catalina(版本 ...

  7. 一个机械研究生在计算机与机械之间的徘徊与思考-(下)之填坑

    现已研三(上 ),自问自答一下当初问机械研究生(智能制造方向)到底该学什么,主力该放在学术研究上还是系统开发上?先说现状,已签工作(华为IE工程师),也拿了国奖.研一上发一篇小论文,研一下一篇,研二上 ...

  8. matlab画条纹填充(Hatched Fill)图 填坑 applyhatch hardcopy

    matlab画条纹填充(Hatched Fill)图 填坑 matlab功能庞大,有时也是一个很好的画图工具,今天画图过程遇到了些问题. 义愤地写下此博客!! 因为突然想结合条形图来展示实验结果会更加 ...

  9. ncnn填坑记录八:将自己训练的模型打包为APK并部署到安卓端运行

    上一篇:ncnn填坑记录七:examples/squeezenet.cpp代码阅读 做一个分类任务,模型选取的mobilenetv3,训练好模型,并按前文依次转换为onnx.ncnn后,参考官方htt ...

最新文章

  1. python中异常的姓名
  2. iOS 性能优化-启动优化、main函数之前优化-二进制重排
  3. 将c语言转换成汇编语言的软件,如何把汇编语言转换成C语言
  4. android:由URL载入中ImageView
  5. ElasticSearch6.3脚本更新
  6. leetcode46. 全排列(回溯)
  7. 牛客 牛牛的独特子序列(双指针/二分查找)
  8. SQL SERVER删除及恢复不安全的储存过程
  9. mysql 编码种类_MySQL 编码
  10. 王军生老师---银行领域高端讲师(王军生)
  11. python需要下载哪些软件-80%的人都不知道,全球Python库下载前10名
  12. HTTP协议详细介绍~超详细
  13. vsftpd 配置-掉坑记
  14. 机器学习基础算法33-HMM实践
  15. 三种简单的方法去除视频中的水印
  16. NTDETECT.COM 丢失(NTDETECT failed)解决方法
  17. mysql官方自带数据库(例如world.sql)下载
  18. java课设设计的目的是什么_网页设计实习目的及意义
  19. 【渝粤题库】陕西师范大学164208 网络营销理论与实务 作业(专升本)
  20. 了解前端工程化之组件化——Vue组件

热门文章

  1. CSS -- CSS3中3D转换相关属性讲解(translate3d,rotate3d,perspective,transform-style)
  2. 周记:ajax获取后台数据
  3. Android 简单的动画制作
  4. 如何在Word中使用MathType为公式编号并引用
  5. ssm+java计算机毕业设计小型农资公司进销存管理系统02uk6(程序+lw+源码+远程部署)
  6. 浅谈微服务体系中的分层设计和领域划分
  7. log4j 的additivity属性
  8. Cookie/Session机制详解
  9. 学籍管理系统制作教程第二天之 用户登陆界面(三层)
  10. 自动驾驶技术的新进展:智能感知与决策的优化与实现