1 解决方案

【方案一】
载入模型结构放在全局，即tensorflow会话外层。

'''载入模型结构:最关键的一步'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:for i in range(STEPS):'''开始训练'''_,  loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)'''保存模型'''saver.save(sess, save_path="./model/path", i)

【方案二】
在方案一的基础上，将模型结构放在图会话的外部。

'''预测值'''
train_logits= network_model.inference(inputs, keep_prob)
'''损失值'''
train_loss = network_model.losses(train_logits)
'''优化'''
train_op = network_model.train(train_loss, learning_rate)
'''准确率'''
train_acc = network_model.evaluation(train_logits, labels)
'''模型输入'''
feed_dict = {inputs: x_batch, labels: y_batch, keep_prob: 0.5}
'''载入模型结构'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:for i in range(STEPS):'''开始训练'''_,  loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)'''保存模型'''saver.save(sess, save_path="./model/path", i)

2 时间测试

通过不同方法测试训练程序，得到不同的训练时间，每执行一次训练都重新载入图结构，会使每一步的训练时间逐次增加，如果训练步数越大，后面训练速度越来越慢，最终可导致图爆炸，而终止训练。
【时间累加】

2019-05-15 10:55:29.009205: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
step: 0, time cost: 1.8800880908966064
step: 1, time cost: 1.592250108718872
step: 2, time cost: 1.553826093673706
step: 3, time cost: 1.5687050819396973
step: 4, time cost: 1.5777575969696045
step: 5, time cost: 1.5908267498016357
step: 6, time cost: 1.5989274978637695
step: 7, time cost: 1.6078357696533203
step: 8, time cost: 1.6087186336517334
step: 9, time cost: 1.6123006343841553
step: 10, time cost: 1.6320762634277344
step: 11, time cost: 1.6317598819732666
step: 12, time cost: 1.6570467948913574
step: 13, time cost: 1.6584930419921875
step: 14, time cost: 1.6765813827514648
step: 15, time cost: 1.6751370429992676
step: 16, time cost: 1.7304580211639404
step: 17, time cost: 1.7583982944488525

【时间均衡】

2019-05-15 13:03:49.394354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7048 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:00:0d.0, compute capability: 6.1)
step: 0, time cost: 1.9781079292297363
loss1:6.78, loss2:5.47, loss3:5.27, loss4:7.31, loss5:5.44, loss6:6.87, loss7: 6.84
Total loss: 43.98, accuracy: 0.04, steps: 0, time cost: 1.9781079292297363
step: 1, time cost: 0.09688425064086914
step: 2, time cost: 0.09693264961242676
step: 3, time cost: 0.09671926498413086
step: 4, time cost: 0.09688210487365723
step: 5, time cost: 0.09646058082580566
step: 6, time cost: 0.09669041633605957
step: 7, time cost: 0.09666872024536133
step: 8, time cost: 0.09651994705200195
step: 9, time cost: 0.09705543518066406
step: 10, time cost: 0.09690332412719727

3 原因分析

(1) Tensorflow使用图结构构建系统，图结构中有节点(node)和边(operation)，每次进行计算时会向图中添加边和节点进行计算或者读取已存在的图结构；
(2) 使用图结构也是一把双刃之剑，可以加快计算和提高设计效率，但是，程序设计不合理会导向负面，使训练越来约慢；
(3) 训练越来越慢是因为运行一次sess.run，向图中添加一次节点或者重新载入一次图结构，导致图中节点和边越来越多，计算参数也成倍增长；
(4) tf.train.Saver()就是载入图结构的类，因此设计训练程序时，若每执行一次跟新就使用该类载入图结构，自然会增加参数数量，必然导致训练变慢；
(5) 因此，将载入图结构的类放在全局，即只载入一次图结构，其他时间只训练图结构中的参数，可保持原有的训练速度；

4 总结

(1) 设计训练网络，只载入一次图结构即可；
(2) tf.train.Saver()就是载入图结构的类，将该类的实例化放在全局，即会话外部，解决训练越来越慢。

Tensorflow训练模型越来越慢相关推荐

使用tensorflow训练模型时制作自己的mnist集（附代码）
(该方法存在问题,待改正)使用tensorflow训练模型时制作自己的mnist集(附代码) 探索过程代码(python) 想法探索过程 (ps:第一次写,写的不好多多见谅!) mnist集合是一 ...
TensorFlow 训练模型保存四个文件
if mAP > best_mAP:best_mAP = mAPsaver_best.save(sess, args.save_dir + 'best_model_Epoch_{}_step_{ ...
深度学习入门篇--手把手教你用 TensorFlow 训练模型
欢迎大家前往腾讯云技术社区,获取更多腾讯海量技术实践干货哦~ 作者:付越导语 Tensorflow在更新1.0版本之后多了很多新功能,其中放出了很多用tf框架写的深度网络结构(https://git ...
cython 安装升级_软件依赖无烦恼——用TensorMan安装Tensorflow 训练模型[已更新]
tensorman 前言原创文章,转载引用请务必注明链接,水平有限,如有疏漏,欢迎指正. 拯救深陷 TensorFlow GPU 开发环境配置泥潭中的人. 最近参加了 DFRobot 和 Intel ...
tensorflow越跑越慢_tensorflow sess.run()越来越慢的原因分析及其解决方法
最近在训练一个检测器,由于训练数据不足因此需要做数据增强,那么我这边写了代码去做数据增强(这部分将会在下一篇进行介绍),其中使用到了tensorflow会话获取数据,可是问题出现了!gtx 1080t ...
深度学习指南：在iOS平台上使用TensorFlow
在利用深度学习网络进行预测性分析之前,我们首先需要对其加以训练.目前市面上存在着大量能够用于神经网络训练的工具,但TensorFlow无疑是其中极为重要的首选方案之一. 大家可以利用TensorFlo ...
Keras vs tf.keras: 在TensorFlow 2.0中有什么区别?
导读在本文中,您将发现Keras和tf.keras之间的区别,包括TensorFlow 2.0中的新增功能. 万众期待的TensorFlow 2.0于9月30日正式发布. 虽然肯定是值得庆祝的时刻, ...
【神经网络与深度学习-TensorFlow实践】-中国大学MOOC课程（十四）（卷积神经网络））
[神经网络与深度学习-TensorFlow实践]-中国大学MOOC课程(十四)(卷积神经网络)) 14 卷积神经网络 14.1 深度学习基础 14.1.1 深度学习的基本思想 14.1.2 深度学习三 ...
Keras与tf.keras：TensorFlow 2.0有什么区别？
在本教程的第一部分中,我们将讨论Keras和TensorFlow之间相互交织的历史,包括他们共同的受欢迎程度如何相互滋养,相互促进和滋养,使我们走向今天. 然后,我将讨论为什么您应该在以后的所 ...

Tensorflow训练模型越来越慢

1 解决方案

2 时间测试

3 原因分析

4 总结

Tensorflow训练模型越来越慢相关推荐

最新文章

热门文章