tensorflow中optimizer minimize自动训练简介和选择训练variable的方法
本文主要介绍tensorflow的自动训练的相关细节,并把自动训练和基础公式结合起来。如有不足,还请指教。
写这个的初衷:有些教程说的比较模糊,没体现出用意和特性或应用场景。
面向对象:稍微了解点代码,又因为有限的教程讲解比较模糊而一知半解的初学者。
(更多相关内容,比如相关优化算法的分解和手动实现,EMA、BatchNormalization等用法,底部都有链接。)
正文
tensorflow提供了多种optimizer,典型梯度下降GradientDescent和Adagrad、Momentum、Nestrov、Adam等变种。
典型的学习步骤是梯度下降GradientDescent,optimizer可以自动实现这一过程,通过指定loss来串联所有相关变量形成计算图,然后通过optimizer(learning_rate).minimize(loss)实现自动梯度下降。minimize()也是两步操作的合并,后边会分解。
计算图的概念:一个变量想要被训练到,前提他在计算图中,更直白的说,要在公式或者连锁公式中,如果一个变量和loss没有任何直接以及间接关系,那就不会被训练到。
源码
train的过程其实就是修改计算图中的tf.Variable的过程,可以认为这些所有variable都是权重,为了简化,下面这个例子没引入placeholder和x,没有x和w的区分,但是变量prediction_to_train=3其实等价于:
prediction_to_train(y) = w*x,其中初始值w=3,隐藏的锁死的x=1(也就是一个固定的训练样本)。
这里loss定义的是平方差,label是1,所以训练过程就是x=1,y=1的数据,针对初始化w=3,训练w,把w变成1。
- import tensorflow as tf
- #define variable and error
- label = tf.constant(1,dtype = tf.float32)
- prediction_to_train = tf.Variable(3,dtype=tf.float32)
- #define losses and train
- manual_compute_loss = tf.square(prediction_to_train - label)
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- train_step = optimizer.minimize(manual_compute_loss)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(100):
- print('variable is ', sess.run(prediction_to_train), ' and the loss is ',sess.run(manual_compute_loss))
- sess.run(train_step)
输出
- variable is 3.0 and the loss is 4.0
- variable is 2.96 and the loss is 3.8416002
- variable is 2.9208 and the loss is 3.6894724
- variable is 2.882384 and the loss is 3.5433698
- variable is 2.8447363 and the loss is 3.403052
- variable is 2.8078415 and the loss is 3.268291
- 。。。。。。。
- 。。。
- variable is 2.0062745 and the loss is 1.0125883
- variable is 1.986149 and the loss is 0.9724898
- variable is 1.966426 and the loss is 0.9339792
- 。。。。
- 。。。
- variable is 1.0000029 and the loss is 8.185452e-12
- variable is 1.0000029 and the loss is 8.185452e-12
- variable is 1.0000029 and the loss is 8.185452e-12
- variable is 1.0000029 and the loss is 8.185452e-12
- variable is 1.0000029 and the loss is 8.185452e-12
限定train的Variable的方法:
根据train是修改计算图中tf.Variable(默认是计算图中所有tf.Variable,可以通过var_list指定)的事实,可以使用tf.constant或者python变量的形式来规避常量被训练,这也是迁移学习要用到的技巧。
下边是一个正经的陈(train)一发的例子:
y=w1*x+w2*x+w3*x
因y=1,x=1
1=w1+w2+w3
又w3=4
-3=w1+w2
- #demo2
- #define variable and error
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32)
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.constant(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = tf.square(y_predict - label)
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- train_step = optimizer.minimize(make_up_loss)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(100):
- w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- sess.run(train_step,{x:1})
因为w3是constant,成功避免了被陈(train)一发,只有w1和w2被train。
符合预期-3=w1+w2
- variable is w1: -1.4999986 w2: -1.4999986 w3: 4.0 and the loss is 8.185452e-12
- variable is w1: -1.4999986 w2: -1.4999986 w3: 4.0 and the loss is 8.185452e-12
- variable is w1: -1.4999986 w2: -1.4999986 w3: 4.0 and the loss is 8.185452e-12
- variable is w1: -1.4999986 w2: -1.4999986 w3: 4.0 and the loss is 8.185452e-12
下边是使用var_list限制只有w2被train的例子,只有w2被train,又因为那两个w初始化都是4,x=1,所以w2接近-7是正确答案。
- #define variable and error
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32)
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.constant(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = tf.square(y_predict - label)
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- train_step = optimizer.minimize(make_up_loss,var_list = w2)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(500):
- w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- sess.run(train_step,{x:1})
- variable is w1: 4.0 w2: -6.99948 w3: 4.0 and the loss is 2.7063857e-07
- variable is w1: 4.0 w2: -6.9994903 w3: 4.0 and the loss is 2.5983377e-07
- variable is w1: 4.0 w2: -6.9995003 w3: 4.0 and the loss is 2.4972542e-07
- variable is w1: 4.0 w2: -6.9995103 w3: 4.0 and the loss is 2.398176e-07
- variable is w1: 4.0 w2: -6.9995203 w3: 4.0 and the loss is 2.3011035e-07
- variable is w1: 4.0 w2: -6.99953 w3: 4.0 and the loss is 2.2105178e-07
- variable is w1: 4.0 w2: -6.9995394 w3: 4.0 and the loss is 2.1217511e-07
如果w1、w2、w3都是tf.constant呢?毫无疑问,,还,真友好~
一共两种情况:
var_list自动获取所有可训练变量,会报错告诉你找不到能train的variables:
ValueError: No variables to optimize.
用var_list指定一个constant,没有实现:
NotImplementedError: ('Trying to update a Tensor ', <tf.Tensor 'Const_1:0' shape=() dtype=float32>)
另一种获得var_list的方式——tf.getCollection
各种get_variable更实用一些,因为不一定方便通过python引用得到tensor。
- #demo2.2 another way to collect var_list
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32)
- with tf.name_scope(name='selected_variable_to_trian'):
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.constant(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = (y_predict - label)**3
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- output_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='selected_variable_to_trian')
- train_step = optimizer.minimize(make_up_loss,var_list = output_vars)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(3000):
- w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- sess.run(train_step,{x:1})
- variable is w1: 4.0 w2: -6.988893 w3: 4.0 and the loss is 1.3702081e-06
- variable is w1: 4.0 w2: -6.988897 w3: 4.0 and the loss is 1.3687968e-06
- variable is w1: 4.0 w2: -6.9889007 w3: 4.0 and the loss is 1.3673865e-06
- variable is w1: 4.0 w2: -6.9889045 w3: 4.0 and the loss is 1.3659771e-06
- variable is w1: 4.0 w2: -6.9889083 w3: 4.0 and the loss is 1.3645688e-06
- variable is w1: 4.0 w2: -6.988912 w3: 4.0 and the loss is 1.3631613e-06
- variable is w1: 4.0 w2: -6.988916 w3: 4.0 and the loss is 1.3617548e-06
- variable is w1: 4.0 w2: -6.9889197 w3: 4.0 and the loss is 1.3603493e-06
TRAINABLE_VARIABLE=False
另一种限制variable被限制的方法,与上边的方法原理相似,都和tf.GraphKeys.TRAINABLE_VARIABLE有关,只不过前一个是从里边挑出指定scope,这个从变量定义时就决定了不往里插入这个变量。
不可训练和常量还是不同的,毕竟还能手动修改,比如滑动平均值的应用,不可训练像是专门针对optimizer的约定。
- #demo2.4 another way to avoid variable be train
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.constant(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = (y_predict - label)**3
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- output_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
- train_step = optimizer.minimize(make_up_loss,var_list = output_vars)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(3000):
- w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- sess.run(train_step,{x:1})
获取所有trainable变量来train,也就等于不指定var_list直接train,是默认参数。
- var_list: Optional list or tuple of `Variable` objects to update to
- minimize `loss`. Defaults to the list of variables collected in
- the graph under the key `GraphKeys.TRAINABLE_VARIABLES`.
- #demo2.3 another way to avoid variable be train
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- #w1 = tf.Variable(4,dtype=tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
- with tf.name_scope(name='selected_variable_to_trian'):
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.constant(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = (y_predict - label)**3
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- train_step = optimizer.minimize(make_up_loss)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(3000):
- w1_,w2_,w3_,loss_ = sess.run([w1,w2,w3,make_up_loss],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- sess.run(train_step,{x:1})
实际结果同上,略。
minimize()操作分解
其实minimize()操作也只是一个compute_gradients()和apply_gradients()的组合操作.
compute_gradients()用来计算梯度,opt.apply_gradients()用来更新参数。通过多个optimizer可以指定多个具有不同学习率的学习过程,针对不同的var_list分别进行gradient的计算和参数更新,可以用来迁移学习或者处理一些深层网络梯度更新不匹配的问题,暂不赘述。
- #demo2.4 combine of ompute_gradients() and apply_gradients()
- label = tf.constant(1,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w1 = tf.Variable(4,dtype=tf.float32,trainable=False)
- w2 = tf.Variable(4,dtype=tf.float32)
- w3 = tf.Variable(4,dtype=tf.float32)
- y_predict = w1*x+w2*x+w3*x
- #define losses and train
- make_up_loss = (y_predict - label)**3
- optimizer = tf.train.GradientDescentOptimizer(0.01)
- w2_gradient = optimizer.compute_gradients(loss = make_up_loss, var_list = w2)
- train_step = optimizer.apply_gradients(grads_and_vars = (w2_gradient))
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- for _ in range(300):
- w1_,w2_,w3_,loss_,w2_gradient_ = sess.run([w1,w2,w3,make_up_loss,w2_gradient],feed_dict={x:1})
- print('variable is w1:',w1_,' w2:',w2_,' w3:',w3_, ' and the loss is ',loss_)
- print('gradient:',w2_gradient_)
- sess.run(train_step,{x:1})
具体的learning rate、step、计算公式和手动梯度下降实现:
在预测中,x是关于y的变量,但是在train中,w是L的变量,x是不可能变化的。所以,知道为什么weights叫Variable了吧(强行瞎解释一发)
下面用tensorflow接口手动实现梯度下降:
为了方便写公式,下边的代码改了变量的命名,采用loss、prediction、gradient、weight、y、x等首字母表示,η表示学习率,w0、w1、w2等表示第几次迭代时w的值,不是多个变量。
loss=(y-p)^2=(y-w*x)^2=(y^2-2*y*w*x+w^2*x^2)
dl/dw = 2*w*x^2-2*y*x
代入梯度下降公式w1=w0-η*dL/dw|w=w0
w1 = w0-η*dL/dw|w=w0
w2 = w1 - η*dL/dw|w=w1
w3 = w2 - η*dL/dw|w=w2
初始:y=3,x=1,w=2,l=1,dl/dw=-2,η=1
更新:w=4
更新:w=2
更新:w=4
所以,本例x=1,y=3,dl/dw巧合的等于2w-2y,也就是二倍的prediction和label的差距。learning rate=1会导致w围绕正确的值来回徘徊,完全不收敛,这样写主要是方便演示计算。改小learning rate 并增加循环次数就能收敛了。
- #demo4:manual gradient descent in tensorflow
- #y label
- y = tf.constant(3,dtype = tf.float32)
- x = tf.placeholder(dtype = tf.float32)
- w = tf.Variable(2,dtype=tf.float32)
- #prediction
- p = w*x
- #define losses
- l = tf.square(p - y)
- g = tf.gradients(l, w)
- learning_rate = tf.constant(1,dtype=tf.float32)
- #learning_rate = tf.constant(0.11,dtype=tf.float32)
- init = tf.global_variables_initializer()
- #update
- update = tf.assign(w, w - learning_rate * g[0])
- with tf.Session() as sess:
- sess.run(init)
- print(sess.run([g,p,w], {x: 1}))
- for _ in range(5):
- w_,g_,l_ = sess.run([w,g,l],feed_dict={x:1})
- print('variable is w:',w_, ' g is ',g_,' and the loss is ',l_)
- _ = sess.run(update,feed_dict={x:1})
结果:
learning rate=1
- [[-2.0], 2.0, 2.0]
- variable is w: 2.0 g is [-2.0] and the loss is 1.0
- variable is w: 4.0 g is [2.0] and the loss is 1.0
- variable is w: 2.0 g is [-2.0] and the loss is 1.0
- variable is w: 4.0 g is [2.0] and the loss is 1.0
- variable is w: 2.0 g is [-2.0] and the loss is 1.0
效果类似下图
缩小learning rate
- variable is w: 2.9964619 g is [-0.007575512] and the loss is 1.4347095e-05
- variable is w: 2.996695 g is [-0.0070762634] and the loss is 1.2518376e-05
- variable is w: 2.996913 g is [-0.0066099167] and the loss is 1.0922749e-05
- variable is w: 2.9971166 g is [-0.0061740875] and the loss is 9.529839e-06
- variable is w: 2.9973066 g is [-0.0057668686] and the loss is 8.314193e-06
- variable is w: 2.9974842 g is [-0.0053868294] and the loss is 7.2544826e-06
- variable is w: 2.9976501 g is [-0.0050315857] and the loss is 6.3292136e-06
- variable is w: 2.997805 g is [-0.004699707] and the loss is 5.5218115e-06
- variable is w: 2.9979498 g is [-0.004389763] and the loss is 4.8175043e-06
- variable is w: 2.998085 g is [-0.0041003227] and the loss is 4.2031616e-06
- variable is w: 2.9982114 g is [-0.003829956] and the loss is 3.6671408e-06
- variable is w: 2.9983294 g is [-0.0035772324] and the loss is 3.1991478e-06
扩展:Momentum、Adagrad的自动和手动实现,这里嫌太长,分开了
源码
补充实操经验:
实际工程经常会使用global_step变量,作为动态学习率、EMA和Batch_Normalization操作的依据,在对所有可训练数据训练时,尤其ema选中所有可训练变量时,容易对global_step产生影响(本来是每一步+1,偏偏被加了个惯性,加了衰减系数),所以global_step一定要设定trainable=False。并且EMA等操作谨慎选择训练目标。
关于EMA与trainable=False,其实没有严格关系,但是通常有一定关系,EMA默认可能是获得所有可训练变量,如果给global_step设定trainable=False,就避免了被传入EMA的var_list,这也算是一个“你也不知道为什么,只是走运没出事儿”的常见案例了!!!
同样道理,BatchNormalization的average_mean和average_variance都是要设定trainable=False,都是他们单独维护的。
tensorflow中optimizer minimize自动训练简介和选择训练variable的方法相关推荐
- Tensorflow 中添加正则化项
为防止网络过拟合,在损失函数上增加一个网络参数的正则化项是一个常用方法,下面介绍如何在Tensorflow中添加正则化项. tensorflow中对参数使用正则项分为两步: step1: 创建一个正则 ...
- tensorflow中的正则化函数在_『TensorFlow』正则化添加方法整理
一.基础正则化函数 tf.contrib.layers.l1_regularizer(scale, scope=None) 返回一个用来执行L1正则化的函数,函数的签名是func(weights). ...
- 深度学习框架 TensorFlow:张量、自动求导机制、tf.keras模块(Model、layers、losses、optimizer、metrics)、多层感知机(即多层全连接神经网络 MLP)
日萌社 人工智能AI:Keras PyTorch MXNet TensorFlow PaddlePaddle 深度学习实战(不定时更新) 安装 TensorFlow2.CUDA10.cuDNN7.6. ...
- 【tf.keras】tf.keras使用tensorflow中定义的optimizer
我的 tensorflow+keras 版本: print(tf.VERSION) # '1.10.0' print(tf.keras.__version__) # '2.1.6-tf' tf.ker ...
- 卷积神经网络简介及其在TensorFlow中的实现
介绍: (Introduction:) Convolutional Neural Networks are deep neural networks that were designed typica ...
- 如何在TensorFlow中训练Boosted Trees模型
在使用结构化数据时,诸如梯度提升决策树和随机森林之类的树集合方法是最流行和最有效的机器学习工具之一. 树集合方法训练速度快,无需大量调整即可正常工作,并且不需要大型数据集进行训练. 在TensorFl ...
- 【深度学习】— 各框架分布式训练简介+测评
1.各框架分布式简介 1.Pytorch 从官方文档上我们可以看到,pytorch的分布式训练,主要是torch.distributed包所提供,主要包含以下组件: Distributed Data- ...
- TensorFlow 中文文档 介绍
介绍 本章的目的是让你了解和运行 TensorFlow 在开始之前, 先看一段使用 Python API 撰写的 TensorFlow 示例代码, 对将要学习的内容有初步的印象. 这段很短的 Pyth ...
- 【转】tensorflow中的batch_norm以及tf.control_dependencies和tf.GraphKeys.UPDATE_OPS的探究
笔者近来在tensorflow中使用batch_norm时,由于事先不熟悉其内部的原理,因此将其错误使用,从而出现了结果与预想不一致的结果.事后对其进行了一定的调查与研究,在此进行一些总结. 一.错误 ...
最新文章
- CTO丢给我《技术Leader的30条军规》:照着做,做不好滚回去写代码!
- 【RocketMQ工作原理】消息堆积与消费延迟
- 【书评】RHCSA/RHCE Red Hat Linux 认证学习指南(第6版)EX200 EX300
- 超时机制,断路器模式简介
- Linux系统下Configure命令参数解释说明
- Symbol Mc1000 声音的设置以及播放
- [html] 你是如何理解html与css分离的?
- vue+vant 移动端H5 商城项目_02
- Mask R-CNN
- 视频盒子APP视频播放源代码安卓+IOS双端源码
- 方法2:U盘,WEPE辅助安装系统
- tp5商城购物系统(后台管理+个人中心+购物车)
- python爬虫爬取百度图片
- uva10066-双塔
- 微信授权登录增加浏览器历史记录解决方法
- FX系列DSZR回原点指令结束后M8029不亮的解决方案
- 智慧公交可视化大屏决策管理系统改善城市交通
- 视频监控客户端-GB28181之转码库-转码格式支持H265、H264、MPEG4、MPEG2
- 区块链社区先导者Bitwork正式宣布落户香港
- Sealer - 把 Kubernetes 看成操作系统集群维度的 Docker
热门文章
- python运维自动化开发12期_Python自动化运维之12、面向对象进阶
- fpga数电基础之--------触发器
- asp.net c# 常见面试试题总结汇总(含答案)
- java中a 和 a_Java中a=a++ 和 a=++a(轉)
- python怎么做情感分析_如何用python进行情感分析
- 2020-08-21 Qt+MSVC 强制中文UTF-8编码
- 用户提需求要把自己文集内的全部文章一键全部转换为私密。我该不该听他的?...
- GDAL的python版本安装使用
- JavaWeb——Mybatis逆向工程
- DevExpress统计图TextPattern说明