手写单字体的识别,在看过卷积神经网络的mnist例子之后,很容易实现,那么如何实现多字体的同时识别呢? 如下图

LeCun大神所用的是SDNN space displacement neural network,这是什么鬼?

经过一番查询之后,原来它就是滑动窗口+图像金子塔+NMS,2015年yahoo的一篇论文 Multi-view Face Detection using deep convolutional Neural Networks 用的也是这种方法

参考页面:https://www.quora.com/What-is-a-space-displacement-neural-network-SDNN

下面是两位知情人士的回答:

Alessandro Ferrari, I have had a lot of fun playing with convnets.

A neural network that is slided as a detector across all the possible locations in the image. You have a network with an input layer of size NxN pixels, Then, you have an image with size MxM pixels, with M>N. The objects that you want to detect are somewhere in the image but you do not know where. Thus, you sweep your neural network all over the image. At the first position, in the top-left corner, you have certain classification scores for the objects that you want to detect, and you update your score map at that position. Then, you apply your NN on a position shifted of 1 or few pixels horizontally, and you update the score map for that position as well. This process continue until all the image is processed and all the score map completed.

The score map represents a detection map of your objects. A mechanism of non-maxima suppression have to be implemented in order to avoid multiple matches of the same object.

It avoids you to use segmentation. However, also in this case there is not free lunch. For making it scale invariant, you need to create a scale space of your input image. This requires to perform a number of classification on the order of ten thousands for few scale in 1MP image. Even if you can reuse a great part of the computation for convolutional layers for nearby classifications, you have to recompute the fully connected layers all the time, making the process painfully slow.

That is why people started to research in object proposal techniques. Maybe one day enough computational power may let us not think about these problems.

翻译如下:

一个神经网络,像探测器一样,在图像的所有可能的位置进行滑动。假设你有一个输入大小为N×N象素的神经网络,然后,你有大小MXM像素的图像,其中M>N,你要检测图像中的某处,但你不知道在哪里。因此,你用神经网络扫描来遍布图像。在第一个位置,在左上角,你有一定的分类分数的对象,你想检测,你更新你的得分地图在那个位置。然后,你把你的NN水平转移到1或几个像素的位置,你更新该位置的得分地图以及。这个过程继续,直到所有的图像处理和所有的得分图完成。
分数图表示对象的检测图。以避免多个相同的对象的匹配,非最大值抑制的机制。
它避免了你使用分割。然而,在这种情况下也没有免费的午餐。为了使其缩放不变,您需要创建一个输入图像的缩放空间。这就需要对图像1mp几个规模十成千上万的顺序执行一系列分类。即使你可以利用附近的分类卷积层计算的很大一部分,你必须重新计算的全连接层的所有时间,使过程缓慢。
这就是为什么人们开始研究对象的建议技术(术语为region proposal,“区域建议”)。也许有一天足够的计算能力可能让我们不考虑这些问题。

Barath Lakshmanan, works at TVS Motor Company

CNNs extract features from the input and classify them. However, the input has to be size-normalized. In case of a single composite objects, each individual object within them have variable size and it is difficult to segment them. One way to recognize such objects is using a sliding window in the input layer as mentioned by Alessandro Ferrari.

It is to be noted that when convolution is performed, on the inputs which are overlapping regions in an image, same set of features gets extracted repeatedly. In order to avoid this redundant action, convolution is performed on the entire input image till the last conv layer. Finally the classifier is used as sliding window on the obtained feature map to produce the heat map.

Performance of such network should improve drastically as the redundancy is removed. This design is called as Space Displacement Neural Network (SDNN).

翻译如下:

CNN的特征提取和分类的输入。然而,输入必须是尺寸归一化。在一个单一的复合对象的情况下,每个单独的对象内有可变的大小,它是很难分割。认识到这些对象的一个方法是使用在输入层由Alessandro法拉利提到一个滑动窗口。
需要注意的是,当进行卷积,在图像中的重叠区域的输入,相同的一组功能被提取重复。为了避免这种重复的动作,卷积进行对整个输入图像到最后转换层。最后,分类器被用作所得到的特征映射的滑动窗口产生的热映射。
这样的网络的性能应大幅改善冗余被删除。这种设计被称为空间位移的神经网络(SDNN)。

转载于:https://www.cnblogs.com/laiqun/p/6441538.html

使用SDNN (space displacement neural network)进行多字体手写识别相关推荐

  1. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling论文阅读

    <Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling& ...

  2. 笔记:PoseCNN:A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

    PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes (Robotics: ...

  3. 2014 CVPR-DeepReID Deep Filter Pairing Neural Network for Person Re-Identification

    论文地址 第一篇用深度学习来做Re-ID的工作,介绍了很多基础性的概念 model部分对CNN的设计思路讲的很详细,有些细节还没有完全搞懂,回头会继续理解总结~ Motivation 传统的re-ID ...

  4. Recurrent Neural Network系列2--利用Python,Theano实现RNN

    作者:zhbzz2007 出处:http://www.cnblogs.com/zhbzz2007 欢迎转载,也请保留这段声明.谢谢! 本文翻译自 RECURRENT NEURAL NETWORKS T ...

  5. DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(二)

    DL:深度学习算法(神经网络模型集合)概览之<THE NEURAL NETWORK ZOO>的中文解释和感悟(二) 目录 AE VAE DAE SAE 相关文章 DL:深度学习算法(神经网 ...

  6. Paper翻译:《A Novel Convolutional Neural Network Based Model for Recognition and Classification of App》

    论文名称:<A Novel Convolutional Neural Network Based Model for Recognition and Classification of Appl ...

  7. 论文阅读-2022.1.2-A Neural Network Approach for_2016_一种用于知识驱动响应生成的神经网络方法

    摘要 We present a novel response generation system.我们提出了一种新颖的响应生成系统. The system assumes the hypothesis ...

  8. 【论文研读】Similarity of Neural Network Representations Revisited (ICML2019)

    Title: Similarity of Neural Network Representations Revisited (ICML2019) Author:Simon Kornblith ...( ...

  9. Graph Convolutional Neural Network - Spatial Convolution 图卷积神经网络 — 空域卷积详解

    文章目录 往期文章链接目录 Note Convolutional graph neural networks (ConvGNNs) GCN Framework GCN v.s. RecGNN What ...

最新文章

  1. 腾讯竟然是这样招人的,哈哈哈哈哈
  2. json与javabean、list、map之间的转化
  3. vim for python
  4. 上海市经济信息化委关于支持新建互联网数据中心项目用能指标的通知
  5. pta函数统计素数并求和_关于求和的4种函数公式,此文讲透了,尤其是第4种,绝对的高效...
  6. ReaderMe 1.0.0.32版发布
  7. 飞鸽传书确保服务数据的安全可靠
  8. Android--使用剪切板在Activity中传值
  9. 自学python顺序-Django 学习顺序及入门要求?
  10. C++---之Arraylist
  11. 在ASP.NET中清除页面状态
  12. spring @Bean注解的使用
  13. SpringBoot+JWT+SpringSecurity对api进行授权保护
  14. 一文纵览无监督学习研究现状:从自编码器到生成对抗网络
  15. 金盘系统无法连接服务器,西数金盘Gold系列主要面向企业级服务器及存储系统...
  16. 真人玩计算机图片大全集,微信真人表情图片大全 用自己的照片做微信真人表情包(好玩),各类搞笑素材任你选择...
  17. C++ 基本编程工具 DevCpp5.4.0 + 经典 VC6.0 | 软件分享 |
  18. switch 注册哪个服务器,switch注册教程
  19. 20230208 对偶四元数的乘法
  20. flash,php上传文件

热门文章

  1. 最美的时光在飞逝,为什么还在努力的路上蹒跚?
  2. 3.提取线稿(PS)
  3. 视频剪辑,就上这5个网站找素材,免费可商用。
  4. 【C++入门】静态成员详解(定义、实现原理、使用注意事项)
  5. 香港银行开户请尽早!附最全开户攻略
  6. Codeforces Round #708 (Div. 2)B. M-arrays
  7. linux – signal 信号列表
  8. 华为OD机试真题 Java 实现【服务中心选址】【2023 Q1 | 200分】
  9. 求职面试找工作时,你遇到的奇葩问题?
  10. 手把手教你搭建一个【文件共享平台】系列教程第一话——你想知道的,这里都有