如何用TensorFlow 2.0 + Keras进行机器学习研究?

谷歌深度学习研究员、“Keras之父”François Chollet发表推特,总结了一份TensorFlow 2.0 + Keras做深度学习研究的速成指南。

在这份指南中,Chollet提出了12条必备准则,条条简短易用,全程干货满满,在推特上引发了近3K网友点赞,千人转发。

不多说了,一起看看大神“化繁为简”的编程世界:

必备指南12条

1)你首先需要学习层(Layer),一层Layer里就封装了一种状态和一些计算。

from tensorflow.keras.layers import Layerclass Linear(Layer):"""y = w.x + b"""def __init__(self, units=32, input_dim=32):super(Linear, self).__init__()w_init = tf.random_normal_initializer()self.w = tf.Variable(initial_value=w_init(shape=(input_dim, units), dtype='float32'),trainable=True)b_init = tf.zeros_initializer()self.b = tf.Variable(initial_value=b_init(shape=(units,), dtype='float32'),trainable=True)def call(self, inputs):return tf.matmul(inputs, self.w) + self.b# Instantiate our layer.
linear_layer = Linear(4, 2)# The layer can be treated as a function.
# Here we call it on some data.
y = linear_layer(tf.ones((2, 2)))
assert y.shape == (2, 4)# Weights are automatically tracked under the `weights` property.
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

2)add_weight方法可能是构建权重的捷径。

3)可以实践一下在单独的build中构建权重,用layer捕捉的第一个输入的shape来调用add_weight方法,这种模式不用我们再去指定input_dim了。

class Linear(Layer):"""y = w.x + b"""def __init__(self, units=32):super(Linear, self).__init__()self.units = unitsdef build(self, input_shape):self.w = self.add_weight(shape=(input_shape[-1], self.units),initializer='random_normal',trainable=True)self.b = self.add_weight(shape=(self.units,),initializer='random_normal',trainable=True)def call(self, inputs):return tf.matmul(inputs, self.w) + self.b# Instantiate our lazy layer.
linear_layer = Linear(4)# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))

4)如果想自动检索这一层权重的梯度,可以在GradientTape中调用。利用这些梯度,你可以使用优化器或者手动更新的权重。当然,你也可以在使用前修正梯度。

# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices((x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)# Instantiate our linear layer (defined above) with 10 units.
linear_layer = Linear(10)# Instantiate a logistic loss function that expects integer targets.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)# Iterate over the batches of the dataset.
for step, (x, y) in enumerate(dataset):# Open a GradientTape.with tf.GradientTape() as tape:# Forward pass.logits = linear_layer(x)# Loss value for this batch.loss = loss_fn(y, logits)# Get gradients of weights wrt the loss.gradients = tape.gradient(loss, linear_layer.trainable_weights)# Update the weights of our linear layer.optimizer.apply_gradients(zip(gradients, linear_layer.trainable_weights))# Logging.if step % 100 == 0:print(step, float(loss))

5)层创建的权重可以是可训练的,也可以是不可训练的,是否可训练在trainable_weights和non_trainable_weights中查看。下面这个是一个不可训练的层:

class ComputeSum(Layer):"""Returns the sum of the inputs."""def __init__(self, input_dim):super(ComputeSum, self).__init__()# Create a non-trainable weight.self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),trainable=False)def call(self, inputs):self.total.assign_add(tf.reduce_sum(inputs, axis=0))return self.total  my_sum = ComputeSum(2)
x = tf.ones((2, 2))y = my_sum(x)
print(y.numpy())  # [2. 2.]y = my_sum(x)
print(y.numpy())  # [4. 4.]assert my_sum.weights == [my_sum.total]
assert my_sum.non_trainable_weights == [my_sum.total]
assert my_sum.trainable_weights == []

6)可以将层递归嵌套创建一个更大的计算块。无论是可训练的还是不可训练的,每一层都与它子层(sublayer)的权重有关联。

# Let's reuse the Linear class
# with a `build` method that we defined above.class MLP(Layer):"""Simple stack of Linear layers."""def __init__(self):super(MLP, self).__init__()self.linear_1 = Linear(32)self.linear_2 = Linear(32)self.linear_3 = Linear(10)def call(self, inputs):x = self.linear_1(inputs)x = tf.nn.relu(x)x = self.linear_2(x)x = tf.nn.relu(x)return self.linear_3(x)mlp = MLP()# The first call to the `mlp` object will create the weights.
y = mlp(tf.ones(shape=(3, 64)))# Weights are recursively tracked.
assert len(mlp.weights) == 6

7)层可以在向前传递的过程中带来损失,将损失正则化很管用。

class ActivityRegularization(Layer):"""Layer that creates an activity sparsity regularization loss."""def __init__(self, rate=1e-2):super(ActivityRegularization, self).__init__()self.rate = ratedef call(self, inputs):# We use `add_loss` to create a regularization loss# that depends on the inputs.self.add_loss(self.rate * tf.reduce_sum(inputs))return inputs# Let's use the loss layer in a MLP block.class SparseMLP(Layer):"""Stack of Linear layers with a sparsity regularization loss."""def __init__(self):super(SparseMLP, self).__init__()self.linear_1 = Linear(32)self.regularization = ActivityRegularization(1e-2)self.linear_3 = Linear(10)def call(self, inputs):x = self.linear_1(inputs)x = tf.nn.relu(x)x = self.regularization(x)return self.linear_3(x)mlp = SparseMLP()
y = mlp(tf.ones((10, 10)))print(mlp.losses)  # List containing one float32 scalar

8)这些损失在向前传递时开始由顶层清除,因此不会累积。layer.losses只包含在最后一次向前传递中产生的损失。在写训练循环时,你通常会在计算梯度之前,将这些损失再累加起来。

# Losses correspond to the *last* forward pass.
mlp = SparseMLP()
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1  # No accumulation.# Let's demonstrate how to use these losses in a training loop.# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices((x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)# A new MLP.
mlp = SparseMLP()# Loss and optimizer.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)for step, (x, y) in enumerate(dataset):with tf.GradientTape() as tape:# Forward pass.logits = mlp(x)# External loss value for this batch.loss = loss_fn(y, logits)# Add the losses created during the forward pass.loss += sum(mlp.losses)# Get gradients of weights wrt the loss.gradients = tape.gradient(loss, mlp.trainable_weights)# Update the weights of our linear layer.optimizer.apply_gradients(zip(gradients, mlp.trainable_weights))# Logging.if step % 100 == 0:print(step, float(loss))

9)把计算编译成静态图再运行,可能在debug阶段比直接运行表现更好。静态图是研究人员的好朋友,你可以通过将函数封装在tf.function decorator中来编译它们。

# Prepare our layer, loss, and optimizer.
mlp = MLP()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)# Create a training step function.@tf.function  # Make it fast.
def train_on_batch(x, y):with tf.GradientTape() as tape:logits = mlp(x)loss = loss_fn(y, logits)gradients = tape.gradient(loss, mlp.trainable_weights)optimizer.apply_gradients(zip(gradients, mlp.trainable_weights))return loss# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices((x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)for step, (x, y) in enumerate(dataset):loss = train_on_batch(x, y)if step % 100 == 0:print(step, float(loss))

10)在训练和推理的过程中,尤其是在批标准化层和Dropout层中,执行训练和推理操作的流程是不一样的。这时可以套用一种模板做法,在call中增加training(boolean) 参数。

通过此举,你可以在训练和推理阶段正确使用内部评估循环。

class Dropout(Layer):def __init__(self, rate):super(Dropout, self).__init__()self.rate = rate@tf.functiondef call(self, inputs, training=None):# Note that the tf.function decorator enables use# to use imperative control flow like this `if`,# while defining a static graph!if training:return tf.nn.dropout(inputs, rate=self.rate)return inputsclass MLPWithDropout(Layer):def __init__(self):super(MLPWithDropout, self).__init__()self.linear_1 = Linear(32)self.dropout = Dropout(0.5)self.linear_3 = Linear(10)def call(self, inputs, training=None):x = self.linear_1(inputs)x = tf.nn.relu(x)x = self.dropout(x, training=training)return self.linear_3(x)mlp = MLPWithDropout()
y_train = mlp(tf.ones((2, 2)), training=True)
y_test = mlp(tf.ones((2, 2)), training=False)

问题的关键还是在于Keras+TensorFlow2.0里面我们如何处理在training和testing状态下行为不一致的Layer;以及对于model.fit()和tf.funtion这两种训练方法的区别,最终来看model.fit()里面似乎包含很多诡异的行为。
最终的使用建议如下:

  1. 在使用model.fit()或者model.train_on_batch()这种Keras的API训练模型时,也推荐手动设置tf.keras.backend.set_learning_phase(True),可以加快收敛
  2. 如果使用eager execution这种方法,
  • 1)使用构建Model的subclass,但是针对call()设置training的状态,对于BatchNoramlization,Dropout这样的Layer进行不同处理
  • 2)使用Functional API或者Sequential的方式构建Model,设置tf.keras.backend.set_learning_phase(True),但是注意在testing的时候改变一下状态

11)你可以有很多内置层,从Dense、Conv2D、LSTM到Conv2DTranspose和 ConvLSTM2D都可以拥有,学会重新利用内置功能。

12)如果要构建深度学习模型,你不必总是面向对象编程。到目前为止,你能看到的所有层都可以在功能上进行组合,就像下面这样:

# We use an `Input` object to describe the shape and dtype of the inputs.
# This is the deep learning equivalent of *declaring a type*.
# The shape argument is per-sample; it does not include the batch size.
# The functional API focused on defining per-sample transformations.
# The model we create will automatically batch the per-sample transformations,
# so that it can be called on batches of data.
inputs = tf.keras.Input(shape=(16,))# We call layers on these "type" objects
# and they return updated types (new shapes/dtypes).
x = Linear(32)(inputs) # We are reusing the Linear layer we defined earlier.
x = Dropout(0.5)(x) # We are reusing the Dropout layer we defined earlier.
outputs = Linear(10)(x)# A functional `Model` can be defined by specifying inputs and outputs.
# A model is itself a layer like any other.
model = tf.keras.Model(inputs, outputs)# A functional model already has weights, before being called on any data.
# That's because we defined its input shape in advance (in `Input`).
assert len(model.weights) == 4# Let's call our model on some data, for fun.
y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)# You can pass a `training` argument in `__call__`
# (it will get passed down to the Dropout layer).
y = model(tf.ones((2, 16)), training=True)

这就是函数API,它比子分类更简洁易用,不过它只能用于定义层中的DAG。

掌握了上述指南12条,就能实现大多数深度学习研究了,是不是赞赞的。

传送门

最后,附Chollet推特原文地址:
https://twitter.com/fchollet/status/1105139360226140160

Google Colab Notebook地址:
https://colab.research.google.com/drive/17u-pRZJnKN0gO5XZmq8n5A2bKGrfKEUg#scrollTo=rwREGJ7Wiyl9

TensorFlow 2.0快速上手指南12条:“Keras之父”亲授 | 高赞热贴相关推荐

  1. TensorFlow 2.0 快速入门指南 | iBooker·ApacheCN

    原文:TensorFlow 2.0 Quick Start Guide 协议:CC BY-NC-SA 4.0 自豪地采用谷歌翻译 不要担心自己的形象,只关心如何实现目标.--<原则>,生活 ...

  2. TensorFlow 2.0 快速上手教程与手写数字识别例子讲解

    文章目录 TensorFlow 基础 自动求导机制 参数优化 TensorFlow 模型建立.训练与评估 通用模型的类结构 多层感知机手写数字识别 Keras Pipeline * TensorFlo ...

  3. Atmel Studio 7.0 快速上手指南(基于ASF)

    就在最近,Atmel终于推出了新版本IDE--Atmel Studio 7.0,该版本采用了微软最新的 Visual Studio 2015 平台,在速度.性能和代码视觉风格上都体现的淋淋尽致,用起来 ...

  4. HTAP 快速上手指南

    本指南介绍如何快速上手体验 TiDB 的一站式混合型在线事务与在线分析处理 (Hybrid Transactional and Analytical Processing, HTAP) 功能. 注意 ...

  5. 分布式作业 Elastic-Job 快速上手指南

    转载自 分布式作业 Elastic-Job 快速上手指南 Elastic-Job支持 JAVA API 和 Spring 配置两种方式配置任务,这里我们使用 JAVA API 的形式来创建一个简单的任 ...

  6. TortoiseGit + GitHub 快速上手指南

    TortoiseGit + GitHub 快速上手指南 名词解释: 1. TortoiseGit 是 TortoiseSVN的Git版,它很好的实现了与windows资源管理器的融合,使用界面与Tor ...

  7. raptor累乘流程图_Markdown快速上手指南

    Markdown快速上手指南 1.Markdown介绍 markdown可以实现快速html文档编辑,格式优没,并且不需要使用html元素. markdown采用普通文本的形式,例如读书笔记等易于使用 ...

  8. Wwise 快速上手指南: 程序员篇(v2016.1)

    Wwise 快速上手指南: 程序员篇(v2016.1) https://gameinstitute.qq.com/community/detail/107700 Wwise 快速上手指南: 程序员篇 ...

  9. 中文CLIP快速上手指南

    当前OpenAI提出的CLIP是AI领域内最火热的多模态预训练模型,简单的图文双塔结构让多模态表征学习变得异常简单.此前CLIP只有官方英文版本,如果想在中文领域尤其是业务当中使用这种强大的表征模型, ...

最新文章

  1. thinkphp5.0配置php版本,PHP开发-Mac搭建ThinkPHP5.0
  2. 机械设计基础第一章绪论精选习题(全覆盖,无死角版)2018-01-06
  3. Python 多进程pool.map()方法的使用
  4. java怎么获取传入路径_java如何获取jsp页面上传的文件路径
  5. for+next()实现数组的遍历及while list each 的使用
  6. execl执行linux命令,使用execl运行Linux命令
  7. MaxCompute存储力持续升级,每年节省不止一个亿
  8. python用matplotlib画球_python 如何用matplotlib画一个漂亮的圆-百度经验
  9. 当拼多多开始打假 那么多“拼多多买家秀”竟都是假的?
  10. DBS-Function:f_GetPy
  11. #上位机开发大师之路# 管理员登陆模块开发
  12. 汇编语言 王爽 第四版 第二章 检测点2.1
  13. js判断ie 火狐 还是chrome浏览器
  14. python的模块和包
  15. DM、PQ、PM、diskgen分区工具介绍比较
  16. html5红外遥控,自制红外遥控开关详细步骤(两款自制红外遥控开关方法) - 全文...
  17. 解决SELECT list is not in GROUP BY clause and contains nonaggregated column..
  18. 用matlab对2003年香港SARS数据建模预估新冠病毒在H市的疫情走势
  19. 面试官:首屏加载速度慢怎么解决?
  20. 隐私泄露防不胜防,真的没有办法了吗?

热门文章

  1. 为什么在iOS上访问优酷和土豆一样可以播放呢?
  2. 简洁的MobX与MVP思想—大型项目实践
  3. Lintcode9 Fizz Buzz solution 题解
  4. 实战 SSH 端口转发
  5. 关于字符集--总结,补遗以及问题
  6. POPTEST学员就业面试题目!!!!!
  7. 浅谈MVC设计模式和SSH框架的关系
  8. 一步一步写算法(之洗牌算法)
  9. Bada学习- C++以及Flash应用开发流程之创建应用工程
  10. About “PostMessage” SendMessage