Deep Learning Toolkits 的比较（转）

Caffe Theano Torch TensorFlow

本文转自：https://github.com/zer0n/deepframeworks

Abstract. In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: Caffe,CNTK, TensorFlow, Theano, and Torch. This is a dynamic document and the evaluation, to the best of my knowledge, is based on the current state of their code.

I also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].

If you find something wrong or inadequate, please help improve by filing an issue.

Table of contents

Modeling Capability
Interfaces
Model Deployment
Performance
Architecture
Ecosystem
Cross-platform

Modeling Capability

In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks without writing too much code. Some of these networks are:

ConvNets: AlexNet, OxfordNet, GoogleNet
RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
Sequential modeling with attention.

In addition, we also evaluate the flexibility to create a new type of model.

Interfaces

Caffe

Caffe has pycaffe interface but that's a mere secondary alternative to the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

CNTK

The way to use CNTK, similar to Caffe, is to specify a config file and run command line. CNTK is slightly worse than Caffe because there's no Python or any other high-level language interface.

TensorFlow

TF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency.

It would be perfect if TF supports F# or TypeScript. The lack of static type in Python is just ... painful :).

Theano

Python

Torch

Torch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++/C#/Java). Hence developers don't have to think about symbolic programming, which can be limited. They can just write all kinds of computations without worrying about performance penalty.

However, let's face it, Lua is not yet a mainstream language.

Model Deployment

How easy to deploy a new model?

Caffe

Caffe is C++ based, which can be compiled on a variety of devices. It is cross-platform (windows port is available and maintained here). Which makes Caffe the best choice with respect deployment.

CNTK

Like Caffe, CNTK is also C++ based and is cross-platform. Hence, deployment should be easy in most cases. However, to my understanding, it doesn't work on ARM architecture, which limits its its capability on mobile devices.

TensorFlow

TF supports C++ interface and the library can be compiled/optimized on ARM architectures because it uses Eigen (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [3].

TF doesn't work on Windows yet so TF models can't be deployed on Windows devices though.

Theano

The lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there.

The cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points.

Torch

Torch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of Caffe/CNTK/TF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a larger production pipeline.

Performance

Single-GPU

All of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly.

Soumith@FB has done some benchmarking for ConvNets. Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark.

TensorFlow and Torch

TensorFlow used to be slow when it first came out but as of 05/2016, it has reached the ballpark of other frameworks in terms of ConvNet speed. This is not surprising because every framework nowadays calls CuDNN for the actual computations.

Here's my latest micro benchmark of TensorFlow 0.8 vs before. The measurement is latency, in milliseconds, for one full minibatch forward-backward pass on a single Titan X GPU.

Network	TF 0.6 [ref]	TF 0.8 [my run]	Torch FP32 [my run]
AlexNet	292	97	81
Inception v1	1237	518	470

Theano

On big networks, Theano’s performance is on par with Torch7, according to this benchmark. The main issue of Theano is startup time, which is terrible, because Theano has to compile C/CUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls.

Even import theano takes time because this import apparently does a lot of stuffs. Also, after import Theano, you are stuck with a pre-configured device (e.g. GPU0).

Multi-GPU

TBD

Architecture

Developer Zone

Caffe

Caffe's architecture was considered excellent when it was born but in the modern standard, it is considered average. The main pain points of Caffe are its layer-wise design in C++ and the protobuf interface for model definition.

Layer-wise design. The building block of a network in Caffe is layer.

Protobuf. Caffe has pycaffe interface but that's a mere replacement of the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

[Copied from my own answer on Quora]

CNTK

To be updated ...

TensorFlow

TF has a clean, modular architecture with multiple frontends and execution platforms. Details are in the white paper.

Theano

The architecture is fairly hacky: the whole code base is Python where C/CUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers.

Torch

Torch7 and nn libraries are also well-designed with clean, modular interfaces.

Ecosystem

Caffe and CNTK: C++
TensorFlow: Python and C++
Theano: Python
Torch: Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python.

Cross-platform

Caffe, CNTK, and Theano work on all OSes. TensorFlow and Torch do not work on Windows and there's no known plan to port from either camp.

Footnotes

[1] Note that I don’t aggregate ratings because different users/developers have different priorities.

[2] Disclaimer: I haven’t analyzed this extension carefully.

[3] See my blog post for why this is desirable.