Caffe   Theano  Torch   TensorFlow

本文转自:https://github.com/zer0n/deepframeworks

Abstract. In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: Caffe,CNTK, TensorFlow, Theano, and Torch. This is a dynamic document and the evaluation, to the best of my knowledge, is based on the current state of their code.

I also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].

If you find something wrong or inadequate, please help improve by filing an issue.

Table of contents

  1. Modeling Capability
  2. Interfaces
  3. Model Deployment
  4. Performance
  5. Architecture
  6. Ecosystem
  7. Cross-platform

Modeling Capability

In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks without writing too much code. Some of these networks are:

  • ConvNets: AlexNet, OxfordNet, GoogleNet
  • RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
  • Sequential modeling with attention.

In addition, we also evaluate the flexibility to create a new type of model.

Caffe 

Caffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013, due to its excellent convnet implementation (at the time). It is still the most popular toolkit within the computer vision community, with many extensions being actively added.

However, its support for recurrent networks and language modeling in general is poor, due to its legacy architecture, which's limitations are detailed in the architecture section.

CNTK 

CNTK is a deep learning system started by the speech people who started the deep learning craze and grown into a more general platform-independent deep learning system. It is better known in the speech community than in the general deep learning community.

In CNTK (as in TensorFlow and Theano), a network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).

As of today, CNTK is not usable for a variety of tasks such as sequence-2-sequence.

TensorFlow 

State-of-the-art models

New models Since TF uses symbolic graph of vector operations approach, specifying a new network is fairly easy. Although it doesn't support symbolic loop yet (at least not well tested/documented, as of 05/2016), RNNs can be made easy and efficient using the bucketing trick.

However, TF has a major weakness in terms of modeling flexibility. Every computational flow has be constructed as a static graph. That makes some computations difficult, such as beam search (which is used frequently in sequence prediction tasks).

Theano 

State-of-the-art models. Theano has implementation for most state-of-the-art networks, either in the form of a higher-level framework (e.g. Blocks, Keras, etc.) or in pure Theano.

New models. Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler.

Torch 

State-of-the-art models

New models. In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is sometimes considered less flexible because for new layer types, users have to implement the full forward, backward, and gradient input update.

However, unlike Caffe, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via Protobuf.

Torch is more flexible than TensorFlow and Theano in that it is imperative while TF/Theano are declarative (i.e. one has to declare a computational graph). That makes some operations, e.g. beam search, much easier to do in Torch.


 
Left: graph model of CNTK/Theano/TensorFlow; Right: graph model of Caffe/Torch

Interfaces

Caffe 

Caffe has pycaffe interface but that's a mere secondary alternative to the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

CNTK 

The way to use CNTK, similar to Caffe, is to specify a config file and run command line. CNTK is slightly worse than Caffe because there's no Python or any other high-level language interface.

TensorFlow 

TF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency.

It would be perfect if TF supports F# or TypeScript. The lack of static type in Python is just ... painful :).

Theano 

Python

Torch 

Torch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++/C#/Java). Hence developers don't have to think about symbolic programming, which can be limited. They can just write all kinds of computations without worrying about performance penalty.

However, let's face it, Lua is not yet a mainstream language.

Model Deployment

How easy to deploy a new model?

Caffe 

Caffe is C++ based, which can be compiled on a variety of devices. It is cross-platform (windows port is available and maintained here). Which makes Caffe the best choice with respect deployment.

CNTK 

Like Caffe, CNTK is also C++ based and is cross-platform. Hence, deployment should be easy in most cases. However, to my understanding, it doesn't work on ARM architecture, which limits its its capability on mobile devices.

TensorFlow 

TF supports C++ interface and the library can be compiled/optimized on ARM architectures because it uses Eigen (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [3].

TF doesn't work on Windows yet so TF models can't be deployed on Windows devices though.

Theano 

The lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there.

The cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points.

Torch 

Torch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of Caffe/CNTK/TF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a larger production pipeline.

Performance

Single-GPU

All of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly.

Soumith@FB has done some benchmarking for ConvNets. Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark.

TensorFlow and Torch 

TensorFlow used to be slow when it first came out but as of 05/2016, it has reached the ballpark of other frameworks in terms of ConvNet speed. This is not surprising because every framework nowadays calls CuDNN for the actual computations.

Here's my latest micro benchmark of TensorFlow 0.8 vs before. The measurement is latency, in milliseconds, for one full minibatch forward-backward pass on a single Titan X GPU.

Network TF 0.6 [ref] TF 0.8 [my run] Torch FP32 [my run]
AlexNet 292 97 81
Inception v1 1237 518 470

Theano 

On big networks, Theano’s performance is on par with Torch7, according to this benchmark. The main issue of Theano is startup time, which is terrible, because Theano has to compile C/CUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls.

Even import theano takes time because this import apparently does a lot of stuffs. Also, after import Theano, you are stuck with a pre-configured device (e.g. GPU0).

Multi-GPU

TBD

Architecture

Developer Zone

Caffe 

Caffe's architecture was considered excellent when it was born but in the modern standard, it is considered average. The main pain points of Caffe are its layer-wise design in C++ and the protobuf interface for model definition.

Layer-wise design. The building block of a network in Caffe is layer.

Protobuf. Caffe has pycaffe interface but that's a mere replacement of the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

[Copied from my own answer on Quora]

CNTK

To be updated ...

TensorFlow 

TF has a clean, modular architecture with multiple frontends and execution platforms. Details are in the white paper.

Theano 

The architecture is fairly hacky: the whole code base is Python where C/CUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers.

Torch 

Torch7 and nn libraries are also well-designed with clean, modular interfaces.

Ecosystem

  • Caffe and CNTK: C++
  • TensorFlow: Python and C++
  • Theano: Python
  • Torch: Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python.

Cross-platform

Caffe, CNTK, and Theano work on all OSes. TensorFlow and Torch do not work on Windows and there's no known plan to port from either camp.


Footnotes

[1] Note that I don’t aggregate ratings because different users/developers have different priorities.

[2] Disclaimer: I haven’t analyzed this extension carefully.

[3] See my blog post for why this is desirable.

Deep Learning Toolkits 的比较(转)相关推荐

  1. Evaluation of Deep Learning Toolkits

    找不到其他的markdown编辑器了,写毕业论文用,先放着吧 原文地址:https://github.com/zer0n/deepframeworks Abstract In this study, ...

  2. (转) Deep Learning Resources

    转自:http://www.jeremydjacksonphd.com/category/deep-learning/ Deep Learning Resources Posted on May 13 ...

  3. 几何深度学习(Geometric Deep Learning)技术

    几何深度学习(Geometric Deep Learning)技术 几何深度学习综述 从论文Geometric Deep Learning: Grids, Groups, Graphs, Geodes ...

  4. 深度学习编译器综述The Deep Learning Compiler

    深度学习编译器综述The Deep Learning Compiler The Deep Learning Compiler: A Comprehensive Survey 参考文献: https:/ ...

  5. 全文翻译(全文合集):TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

    全文翻译(全文合集):TVM: An Automated End-to-End Optimizing Compiler for Deep Learning 摘要 人们越来越需要将机器学习应用到各种各样 ...

  6. 全文翻译(二): TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

    全文翻译(二): TVM: An Automated End-to-End Optimizing Compiler for Deep Learning 3.优化计算图 计算图是在DL框架中表示程序的常 ...

  7. 全文翻译(一):TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

    全文翻译(一):TVM: An Automated End-to-End Optimizing Compiler for Deep Learning 摘要 人们越来越需要将机器学习应用到各种各样的硬件 ...

  8. TVM优化Deep Learning GPU算子

    TVM优化Deep Learning GPU算子 高效的深度学习算子是深度学习系统的核心.通常,这些算子很难优化,需要HPC专家付出巨大的努力. 端到端张量IR / DSL堆栈TVM使这一过程变得更加 ...

  9. 深度学习编译与优化Deep Learning Compiler and Optimizer

    深度学习编译与优化Deep Learning Compiler and Optimizer

最新文章

  1. cas 注销不关闭浏览器异常_一次浏览器请求的生命周期
  2. bzoj 4832 抵制克苏恩
  3. LDAPit's usage
  4. 【转】shell之for、while、until循环
  5. 带你读Paper丨分析ViT尚存问题和相对应的解决方案
  6. 安卓案例:列表控件上拉加载更多
  7. 5.3 恶意代码功能演示示例(上兴远程控制2014版)
  8. Pycharm中工程项目的多个python环境的管理使用方法的一点点理解
  9. Java实现OpenOffice将word转换为pdf
  10. 智能门锁电路图_智能门锁原理图,智能锁工作原理科普
  11. c语言calloc和malloc,使用malloc()、calloc()、free()和realloc()在C中进行动态内存分配
  12. 数据预处理部分的思维导图
  13. 10.cocos2d坐标系
  14. 自动化测试之Appium
  15. 江苏大学的计算机,毛启容-江苏大学计算机科学与通信工程学院
  16. 程序员久坐伤身,站起来,走出去,别回来
  17. 交互设计实用指南系列(11)—减少记忆负担
  18. 手机加速度传感器在Android横竖屏切换中的应用
  19. systemd和initd添加开机自启服务
  20. 欧姆龙cp1h-xa40dt-d与台达变频器modbus rtu通讯程序

热门文章

  1. mysql指数函数_SQL语言参考大全的目录
  2. [蓝桥杯][2017年第八届真题]小数第n位(数学)
  3. Python机器学习--KNN归一化、距离的惩罚
  4. 打印图形(2)(直角三角形)(C+Java)
  5. 把100减锐城1用计算机怎么算,北师大四年级下册数学
  6. 系统底部返回遮挡_一加7 Pro这个新功能让大家久等了:屏幕两侧一滑就可返回...
  7. 计算机编程语言的分类,解释型语言、编译型语言、脚本语言的区别
  8. Ubuntu14.04系统下安装配置OpenCV 4.0.0开发环境全过程
  9. C语言中open与fopen的的解释和区别
  10. ecos(redboot)移植剖析