官网链接:http://mxnet.readthedocs.io/en/latest/tutorials/imagenet_full.html

Training Deep Net on 14 Million Images by Using A Single Machine

This note describes how to train a neural network on Full ImageNet Dataset [1] with 14,197,087 images in 21,841 classes.We achieved a state-of-art model by using 4 GeForce GTX 980 cards on a single machine in 8.5 days.

There are several technical challenges in this problem.

  1. How to pack and store the massive data.
  2. How to minimize the memory consumption of the network, so we can use net with more capacity than those used for ImageNet 1K
  3. How to train the model fast.

We also released our pre-trained model for this full ImageNet dataset.

此方法描述了如何是使用一台机器 -4张GTX980互联在一台机器上,训练8.5天训练完整个ImageNet 数据集,此数据集有14,197,087张图片。

1.存储海量数据的方法。

2.最小化内存的计算代价。

3.如何快速训练模型。

Data Preprocessing

The raw full ImageNet dataset is more than 1TB. Before training the network, we need to shuffle these images then load batch of images to feed the neural network. Before we describe how we solve it, let’s do some calculation first:

原始的ImageNet数据超过1TB,在训练之前,必须分批的形式载入神经网络。在描述之前,先进行一些计算。

假设的硬件速度:

4K 随机读取                                      顺序存取

西数 黑盘:    0.43MB /s (110 IOPS)                            170MB/s

三星 850PRO ssd       40MB/s                                             550MB/s

Assume we have two good storage device [2]:

| Device                    | 4K Random Seek        | Sequential Seek |
| ------------------------- | --------------------- | --------------- |
| WD Black (HDD)            | 0.43 MB /s (110 IOPS) | 170 MB/s        |
| Samsung 850 PRO (SSD)     | 40 MB/s (10,000 IOPS) | 550 MB/s        |

A very naive approach is loading from a list by random seeking. If use this approach, we will spend 677 hours with HDD or 6.7 hours with SSD respectively. This is only about read. Although SSD looks not bad, but 1TB SSD is not affordable for everyone.

But we notice sequential seek is much faster than random seek. Also, loading batch by batch is a sequential action. Can we make a change? The answer is we can’t do sequential seek directly. We need random shuffle the training data first, then pack them into a sequential binary package.

This is the normal solution used by most deep learning packages. However, unlike ImageNet 1K dataset, where wecannot  store the images in raw pixels format. Because otherwise we will need more than 1TB space. Instead, we need to pack the images in compressed format.

一个普通的方法是 从一个列中随机存取,若使用这种途径,我们使用HDD 将使用667小时,或者SSD6.7小时。SSD缺点就是有点贵。

序列化存取比随机存取快,但有时随机存取是必须的。

另外,我们需要一种压缩的格式进行存取数据。

The key ingredients are

  • Store the images in jpeg format, and pack them into binary record.
  • Split the list, and pack several record files, instead of one file.
    • This allows us to pack the images in distributed fashion, because we will be eventually bounded by the IO cost during packing.
    • We need to make the package being able to read from several record files, which is not too hard.This will allow us to store the entire imagenet dataset in around 250G space.

After packing, together with threaded buffer iterator, we can simply achieve an IO speed of around 3,000 images/sec on a normal HDD.

主要过程是:

使用jpeg格式存储数据,并进行二次压缩;

将列表拆分;减轻 I/O 代价;

打包存储可以 存储整个 ImageNet的数据集到250GB的空间里。

打包之后,我们可以多线程存取达到 3000张/秒 图像。

Training the model.训练模型

Now we have data. We need to consider which network structure to use. We use Inception-BN [3] style model, compared to other models such as VGG, it has fewer parameters, less parameters simplified sync problem. Considering our problem is much more challenging than 1k classes problem, we add suitable capacity into original Inception-BN structure, by increasing the size of filter by factor of 1.5 in bottom layers of original Inception-BN network.This however, creates a challenge for GPU memory. As GTX980 only have 4G of GPU RAM. We really need to minimize the memory consumption to fit larger batch-size into the training. To solve this problem we use the techniques such as node memory reuse, and inplace optimization, which reduces the memory consumption by half, more details can be found in memory optimization note

收集数据完毕,我们需要确定使用哪种网络结构。 在此,我们使用  Inception-BN [3] 形式的模型。相对于其他模型,它有更少的参数,因此有更少的参数同步问题。.............考虑到GTX980只有4G显存,需要分批训练。 我们使用  node memory reuse节点内存救援 的形式,减少一半的内存耗费,详情请见memory optimization note。

Finally, we cannot train the model using a single GPU because this is a really large net, and a lot of data. We use data parallelism on four GPUs to train this model, which involves smart synchronization of parameters between different GPUs, and overlap the communication and computation. A runtime denpdency engine is used to simplify this task, allowing us to run the training at around 170 images/sec.

我们使用多个GPU进行训练,缓解参数过多和数据过多的问题,使用数据并行的方式训练模型,设计到少量的 参数同步。 一个  runtime denpdency engine 用于使这种工作简单化, 可以达到 170张 图片/每秒 的训练速度。

Here is a learning curve of the training process:

训练-学习曲线

Evaluate the Performance

Train Top-1 Accuracy over 21,841 classes: 37.19%

训练表现:

在21841个类上达到 37.19% 的Top1 分辨率

There is no official validation set over 21,841 classes, so we are using ILVRC2012 validation set to check the performance. Here is the result:

| Accuracy | Over 1,000 classes | Over 21,841 classes |
| -------- | ------------------ | ------------------- |
| Top-1    | 68.3%              | 41.9%               |
| Top-5    | 89.0%              | 69.6%               |
| Top=20   | 96.0%              | 83.6%               |

As we can see we get quite reasonable result after 9 iterations. Notably much less number of iterations is needed to achieve a stable performance, mainly due to we are facing a larger dataset.

We should note that this result is by no means optimal, as we did not carefully pick the parameters and the experiment cycle is longer than the 1k dataset. We think there is definite space for improvement, and you are welcomed to try it out by yourself!

The Code and Model.代码和模型

The code and step guide is publically available at https://github.com/dmlc/mxnet/tree/master/example/image-classification

We also release a pretrained model under https://github.com/dmlc/mxnet-model-gallery/tree/master/imagenet-21k-inception

How to Use The Model.使用模型

We should point out it 21k classes is much more challenging than 1k. Directly use the raw prediction is not a reasonable way.

Look at this picture which I took in Mount Rainier this summer:

We can figure out there is a mountain, valley, tree and bridge. And the prediction probability is :

对于 山峦、峡谷、树木、桥梁的预测准确度:

We notice there are several peaks. Let’s print out the label text in among 21k classes and ImageNet 1k classes:

| Rank  | Over 1,000 classes          | Over 21,841 classes        |
| ----- | --------------------------- | -------------------------- |
| Top-1 | n09468604 valley            | n11620673 Fir              |
| Top-2 | n09332890 lakeside          | n11624531 Spruce           |
| Top-3 | n04366367 suspension bridge | n11621281 Amabilis fir     |
| Top-4 | n09193705 alp               | n11628456 Douglas fir      |
| Top-5 | n09428293 seashore          | n11627908 Mountain hemlock |

There is no doubt that directly use probability over 21k classes loss diversity of prediction. If you carefully choose a subset by using WordNet hierarchy relation, I am sure you will find more interesting results.

Note

[1] Deng, Jia, et al. “Imagenet: A large-scale hierarchical image database.” Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

[2] HDD/SSD data is from public website may not be accurate.

[3] Ioffe, Sergey, and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.”arXiv preprint arXiv:1502.03167 (2015).

MxNet教程:使用一台机器训练1400万张图片相关推荐

  1. 可恶的爬虫直接把生产6台机器爬挂了!

    引言 正在午睡,突然收到线上疯狂报警的邮件,查看这个邮件发现这个报警的应用最近半个月都没有发布,应该不至于会有报警,但是还是打开邮件通过监控发现是由于某个接口某个接口流量暴增,CPU暴涨.为了先解决问 ...

  2. mysql三台机群集_MySQL Cluster 3台机器搭建集群环境

    最后还是通过三台机器实现了linux下的mysql集群,这边首先要感谢众多网络上提供帮助教程,虽然有些没有经过实践无法真正达到效果,但是这次给MGM项目的这个开头还是不错. 书归正传,这边用的是vmw ...

  3. 可恶的爬虫直接把生产 6 台机器爬挂了!

    作者 | 码农二胖  责编 | 张文 头图 | CSDN 下载自视觉中国 前言 正在午睡,突然收到线上疯狂报警的邮件,查看这个邮件发现这个报警的应用最近半个月都没有发布,应该不至于会有报警,但是还是打 ...

  4. 一台机器(群晖、CentOS)挂两台网心云docker

    问题提出:一台机器只有一个18888端口,多个网心云必然端口冲突,因此将第二台映射新的端口即可 群晖版 群晖docker容器魔方官方教程部分(装两台教程在下面) 1.在包管理中心"套件中心/ ...

  5. Netflix如何在上万台机器中管理微服务?(史上最全)

    疯狂创客圈为小伙伴奉上以下珍贵的学习资源: 疯狂创客圈 经典图书 : 极致经典 < Java 高并发 三部曲 > 面试必备 + 大厂必备 + 涨薪必备 疯狂创客圈 经典图书 : <N ...

  6. 完爆阿尔法狗元,DeepMind用5000台TPU训练出新算法,1天内称霸3种棋类

    大数据文摘作品 作者:姜范波.Aileen.Yawei Xia.龙牧雪.魏子敏 距离阿尔法狗元版本刷屏一个多月时间,阿尔法狗又进化了,这次不光可以玩围棋,不再是"狗"了.我一点也不 ...

  7. 50台机器无盘服务器,以50台机器小吧为例看深度无盘快速布署的那些事.doc

    以50台机器小吧为例看深度无盘快速布署的那些事 以50台机器小网吧为例看深度无盘快速布署的那些事 干网管这行好几年了,接手的网吧也有不少,这些年头也一直在研究无盘,自认还是有一些了解.测试和安装了这么 ...

  8. 一台机器就能帮助夫妻怀孕? AI比医生更善于预测胚胎良好度

    亮点: 人工智能从一系列良好的胚胎图像中得到 "训练" 系统识别出人类肉眼看不见的24个特征 到目前为止,它只是用牛胚胎的图像进行了测试 但是研究人员相信这项技术有可能改变病人的护 ...

  9. oracle复制数据库文件,ORACLE数据库文件转移到另一台机器的方法

    ORACLE数据库教程文件转移到另一台机器的方法 因为恢复视频数据导致空间不足而终止,需要转移部分数据文件到别的硬盘里.在参照部分 网络资料,结合自己的实践,总结出Oracle数据文件转移方法. 1) ...

最新文章

  1. 2019年最新10份开源Java精选资料
  2. PHP自动加载上——spl_autoload_register
  3. 如何理解java回电话
  4. Python字典类型内部做判断赋值
  5. 知道路程时间求加速度_凸轮分割器的出力轴加速度是怎么算的
  6. python镜像_Python二叉树的镜像转换实现方法示例
  7. 流水灯verilog实验原理_流水灯实验
  8. 判断IE关闭还是刷新
  9. java web项目_一个完整JavaWeb项目开发总结
  10. 树莓派使用usb摄像头
  11. 【自然语言处理】【聚类】ECIC:通过迭代分类增强短文本聚类
  12. python数据分析与可视化答案学堂云_智慧树知到_Python数据分析与数据可视化_最新答案...
  13. 【翻译】Chromium 网络栈 disk cache 设计原理
  14. ffmpeg批量处理视频和音频合成
  15. STM32C8T6之按键检测
  16. QT(6) of beginer from qter.org
  17. 纠结了五年,华为要动智能电视了? 1
  18. 【数据可视化】免费开源BI工具 DataEase 之血缘关系分析
  19. Spring - 装配bean
  20. c语言退回N帧滑动窗口协议,滑动窗口协议实验的报告.docx

热门文章

  1. 第三方seo关键词优化工具推荐
  2. Tkinter Helloword !
  3. 一个民工的数字化生活
  4. 2008年1月19日 微软(北京).NET俱乐部 线下休闲活动-滑雪
  5. Oracle 9i安装后,配置和启动企业管理器的详细过程
  6. 拼团功能,开团并发问题,使用数据库行锁方案
  7. 简单的企业微信开发 前后端
  8. Js组件layer的使用
  9. 杭电acm 2177 取(2堆)石子游戏(威佐夫博弈)
  10. 谜题 (Puzzle,ACM/ICPC World Finals 1993,UVa227)