数据层次

增加数据量

模型层次

模型简单一些,更改输入数据类型;

Dropout

注意事项:

  1. 构建placeholder 的时候 加入 training 这个 bool 变量 ; 在训练过程中加入 drop out 比例+training的变量
  2. training的过程中,设置为True; test 的过程中,设置为False.
  3. dropout 多用于fc layer 后,不用于cnn网络之后。后者会导致效果变差。

代码实现

注意看以下参考代码的注释
参考资料: https://mofanpy.com/tutorials/machine-learning/tensorflow/dropout/

"""
Know more, visit my Python tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou
Dependencies:
tensorflow: 1.1.0
matplotlib
numpy
"""
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plttf.set_random_seed(1)
np.random.seed(1)# Hyper parameters
N_SAMPLES = 20
N_HIDDEN = 300
LR = 0.01# training data
x = np.linspace(-1, 1, N_SAMPLES)[:, np.newaxis]
y = x + 0.3*np.random.randn(N_SAMPLES)[:, np.newaxis]# test data
test_x = x.copy()
test_y = test_x + 0.3*np.random.randn(N_SAMPLES)[:, np.newaxis]# show data
plt.scatter(x, y, c='magenta', s=50, alpha=0.5, label='train')
plt.scatter(test_x, test_y, c='cyan', s=50, alpha=0.5, label='test')
plt.legend(loc='upper left')
plt.ylim((-2.5, 2.5))
plt.show()# tf placeholders
tf_x = tf.placeholder(tf.float32, [None, 1])
tf_y = tf.placeholder(tf.float32, [None, 1])
tf_is_training = tf.placeholder(tf.bool, None)  # to control dropout when training and testing# overfitting net
o1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu)
o2 = tf.layers.dense(o1, N_HIDDEN, tf.nn.relu)
o_out = tf.layers.dense(o2, 1)
o_loss = tf.losses.mean_squared_error(tf_y, o_out)
o_train = tf.train.AdamOptimizer(LR).minimize(o_loss)# dropout net
d1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu)
d1 = tf.layers.dropout(d1, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d2 = tf.layers.dense(d1, N_HIDDEN, tf.nn.relu)
d2 = tf.layers.dropout(d2, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d_out = tf.layers.dense(d2, 1)
d_loss = tf.losses.mean_squared_error(tf_y, d_out)
d_train = tf.train.AdamOptimizer(LR).minimize(d_loss)sess = tf.Session()
sess.run(tf.global_variables_initializer())plt.ion()   # something about plottingfor t in range(500):sess.run([o_train, d_train], {tf_x: x, tf_y: y, tf_is_training: True})  # train, set is_training=Trueif t % 10 == 0:# plottingplt.cla()o_loss_, d_loss_, o_out_, d_out_ = sess.run([o_loss, d_loss, o_out, d_out], {tf_x: test_x, tf_y: test_y, tf_is_training: False} # test, set is_training=False)plt.scatter(x, y, c='magenta', s=50, alpha=0.3, label='train'); plt.scatter(test_x, test_y, c='cyan', s=50, alpha=0.3, label='test')plt.plot(test_x, o_out_, 'r-', lw=3, label='overfitting'); plt.plot(test_x, d_out_, 'b--', lw=3, label='dropout(50%)')plt.text(0, -1.2, 'overfitting loss=%.4f' % o_loss_, fontdict={'size': 20, 'color':  'red'}); plt.text(0, -1.5, 'dropout loss=%.4f' % d_loss_, fontdict={'size': 20, 'color': 'blue'})plt.legend(loc='upper left'); plt.ylim((-2.5, 2.5)); plt.pause(0.1)plt.ioff()
plt.show()

Batch Normalization

参考资料:

这两个资料供初始理解
https://mofanpy.com/tutorials/machine-learning/tensorflow/intro-batch-normalization/
https://mofanpy.com/tutorials/machine-learning/tensorflow/BN/
高阶理解:重视第一个和第二个回答
https://www.zhihu.com/question/38102762

使用范围

收敛速度慢,受初始解影响较大,梯度爆炸/梯度消失, 增加了泛化能力,训练更快,可以使用更高的学习率。
(1) 正常的处理图片的CNN模型都应该使用Batch Normalization。只要保证batch size较大(不低于32),并且打乱了输入样本的顺序。如果batch太小,则优先用Group Normalization替代。

(2)对于RNN等时序模型,有时候同一个batch内部的训练实例长度不一(不同长度的句子),则不同的时态下需要保存不同的统计量,无法正确使用BN层,只能使用Layer Normalization。

(3) 对于图像生成以及风格迁移类应用,使用Instance Normalization更加合适。

使用

  • 对于输入数据 和 普通的FC layer
tf.layers.batch_normalization(tf_x, training=tf_is_train)
# the momentum plays important rule. the default 0.99 is too high in this case!
x = tf.layers.batch_normalization(x, momentum=0.4, training=tf_is_train)

https://github.com/MorvanZhou/Tensorflow-Tutorial/blob/master/tutorial-contents/502_batch_normalization.py

  • 对于cnn 的 layer
    参考 resnet50 的搭建
def identity_block(input_tensor, kernel_size, filters, stage, block):"""The identity block is the block that has no conv layer at shortcut.# Argumentsinput_tensor: input tensorkernel_size: defualt 3, the kernel size of middle conv layer at main pathfilters: list of integers, the filterss of 3 conv layer at main pathstage: integer, current stage label, used for generating layer namesblock: 'a','b'..., current block label, used for generating layer names# ReturnsOutput tensor for the block."""filters1, filters2, filters3 = filtersif IMAGE_ORDERING == 'channels_last':bn_axis = 3else:bn_axis = 1conv_name_base = 'res' + str(stage) + block + '_branch'bn_name_base = 'bn' + str(stage) + block + '_branch'x = Conv2D(filters1, (1, 1) , data_format=IMAGE_ORDERING , name=conv_name_base + '2a')(input_tensor)x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)x = Activation('relu')(x)x = Conv2D(filters2, kernel_size , data_format=IMAGE_ORDERING ,padding='same', name=conv_name_base + '2b')(x)x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)x = Activation('relu')(x)x = Conv2D(filters3 , (1, 1), data_format=IMAGE_ORDERING , name=conv_name_base + '2c')(x)x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)x = layers.add([x, input_tensor])x = Activation('relu')(x)return x

resnet32


import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import layers, Sequentialclass BasicBlock(layers.Layer):def __init__(self, filter_num, stride=1):super(BasicBlock, self).__init__()self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same')self.bn1 = layers.BatchNormalization()self.relu = layers.Activation('relu')self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same')self.bn2 = layers.BatchNormalization()if stride != 1:self.downsample = Sequential()self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride))else:self.downsample = lambda x:xdef call(self, inputs, training=None):# [b, h, w, c]out = self.conv1(inputs)out = self.bn1(out,training=training)out = self.relu(out)out = self.conv2(out)out = self.bn2(out,training=training)identity = self.downsample(inputs)output = layers.add([out, identity])output = tf.nn.relu(output)return outputclass ResNet(keras.Model):def __init__(self, layer_dims, num_classes=100): # [2, 2, 2, 2]super(ResNet, self).__init__()self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)),layers.BatchNormalization(),layers.Activation('relu'),layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')])self.layer1 = self.build_resblock(64,  layer_dims[0])self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)# output: [b, 512, h, w],self.avgpool = layers.GlobalAveragePooling2D()self.fc = layers.Dense(num_classes)def call(self, inputs, training=None):x = self.stem(inputs,training=training)x = self.layer1(x,training=training)x = self.layer2(x,training=training)x = self.layer3(x,training=training)x = self.layer4(x,training=training)# [b, c]x = self.avgpool(x)# [b, 100]x = self.fc(x)return xdef build_resblock(self, filter_num, blocks, stride=1):res_blocks = Sequential()# may down sampleres_blocks.add(BasicBlock(filter_num, stride))for _ in range(1, blocks):res_blocks.add(BasicBlock(filter_num, stride=1))return res_blocksdef resnet18():return ResNet([2, 2, 2, 2])def resnet34():return ResNet([3, 4, 6, 3])

Weight Decay/ L2正则化

原理

作用: 权重衰减(L2正则化)可以避免模型过拟合问题。
思考: L2正则化项有让w变小的效果,但是为什么w变小可以防止过拟合呢?
原理: (1)从模型的复杂度上解释:更小的权值w,从某种意义上说,表示网络的复杂度更低,对数据的拟合更好(这个法则也叫做奥卡姆剃刀),而在实际应用中,也验证了这一点,L2正则化的效果往往好于未经正则化的效果。(2)从数学方面的解释:过拟合的时候,拟合函数的系数往往非常大,为什么?如下图所示,过拟合,就是拟合函数需要顾忌每一个点,最终形成的拟合函数波动很大。在某些很小的区间里,函数值的变化很剧烈。这就意味着函数在某些小区间里的导数值(绝对值)非常大,由于自变量值可大可小,所以只有系数足够大,才能保证导数值很大。而正则化是通过约束参数的范数使其不要太大,所以可以在一定程度上减少过拟合情况。
公式推导: 见参考链接 1,2

代码及实现方式见

核心思路: 1.创建一个正则化方法 2.将这个正则化方法应用到变量上
见参考链接 2

参考链接:

https://blog.csdn.net/program_developer/article/details/80867468
https://zhuanlan.zhihu.com/p/95883569

我自己的代码实现方式

思路,取出可训练的参数,之后计算其参数的l2 loss 加入到原来的loss中

# 原本的loss
loss = tf.losses.mean_squared_error(y, out)
# l2 loss
loss_regularization = []
for p in net.trainable_variables: # net 为按照 keras 自定义layer定义的模型loss_regularization.append(tf.nn.l2_loss(p)) # 加入l2_loss
loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))
loss = loss + 1e-4 * loss_regularization # 1e-4 为scale

Learning Rate 衰减

原理

在训练模型的时候,通常会遇到这种情况:我们平衡模型的训练速度和损失(loss)后选择了相对合适的学习率(learning rate),但是训练集的损失下降到一定的程度后就不在下降了,比如training loss一直在0.7和0.9之间来回震荡,不能进一步下降。如下图所示:

学习率衰减(learning rate decay) 就是一种可以平衡这两者之间矛盾的解决方案。学习率衰减的基本思想是:学习率随着训练的进行逐渐衰减。
学习率衰减基本有两种实现方法:
线性衰减。例如:每过5个epochs学习率减半。
指数衰减。例如:随着迭代轮数的增加学习率自动发生衰减,每过5个epochs将学习率乘以0.9998。具体算法如下:
decayed_learning_rate=learning_rate*decay_rate^(global_step/decay_steps)
其中decayed_learning_rate为每一轮优化时使用的学习率,learning_rate为事先设定的初始学习率,decay_rate为衰减系数,decay_steps为衰减速度。

代码实现

  • 参考链接:
    https://www.cnblogs.com/baby-lily/p/10962574.html
  • 关键参数
tf.train.exponential_decay(),
learning_rate, 初始的学习率的值global_step, 迭代步数变量decay_steps, 带迭代多少次进行衰减decay_rate, 迭代decay_steps次衰减的值staircase=False, 默认为False,为True则不衰减

tf.train.exponential_decay(initial_learning_rate, global_step=global_step, decay_steps=1000, decay_rate=0.9)表示没经过1000次的迭代,学习率变为原来的0.9。

增大批次处理样本的数量也可以起到退化学习率的作用。

  • 演示代码

思路:

引入 learning_rate = tf.train.exponential_decay(...)

以控制optimizer 的learning_rate具体数值opt = tf.train.GradientDescentOptimizer(learning_rate)

global_step作为计数器,add_global以在训练过程中作为计数器增加operator

最终实现 过程中learning rate decay的效果。


import tensorflow as tf
import numpy as npglobal_step = tf.Variable(tf.constant(0))
initial_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(initial_learning_rate,global_step=global_step,decay_steps=10,decay_rate=0.5)opt = tf.train.GradientDescentOptimizer(learning_rate)
add_global = global_step.assign_add(1)with tf.Session() as sess:tf.global_variables_initializer().run()print(sess.run(learning_rate))for i in range(50):g, rate = sess.run([add_global, learning_rate])print(g, rate)

结果


0.1
1 0.0933033
2 0.08705506
3 0.08122524
4 0.07578582
5 0.070710674
...
46 0.004123463
47 0.0038473257
48 0.003589682
49 0.0033492916
50 0.003125
  • 结合训练代码

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
import os, sys
#os.chdir('../')#----------constants ---------------
R1, R2 = 0.6, 0.4 # loss for lm and llc
LR = 1e-4
ILR = 1e-3
NTrain = 5*int(1e4)format_print_sub_real = lambda x: "{:^10}".format('%0.2f'%(x))
format_print_sub_str = lambda x: "{:^10}".format(x)
format_print = lambda x, ifReal: ''.join(list(map(format_print_sub_real, x))) if ifReal else ''.join(list(map(format_print_sub_str, x)))
#-------------------------
from db import train_batch, test_batch
from fcnet import CNNSplitmyNetwork = CNNSplit()inputs = tf.placeholder(tf.float32, shape=[None, 2 * 5 * 240+5], name='x')
outputs = tf.placeholder(tf.float32, shape=[None, 2], name='y')
y1, y2 = outputs[:, :1], outputs[:, 1:]out1, out2 = myNetwork(inputs)#loss
loss1 = tf.losses.mean_squared_error(y1, out1)
loss2 = tf.losses.mean_squared_error(y2, out2)
Joint_Loss = R1 * loss1 + R2 * loss2#optimizer
global_step = tf.Variable(tf.constant(0)) # learning rate counter
add_global = global_step.assign_add(1) # learning rate counter add operator
learning_rate = tf.train.exponential_decay(ILR,
global_step=global_step,
decay_steps=1000,
decay_rate=0.95) # learning rate decayJL_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(Joint_Loss) # introduce learning rate
L1_op = tf.train.AdamOptimizer().minimize(loss1)
L2_op = tf.train.AdamOptimizer().minimize(loss2)# accuracy estimator
MAPE1 = tf.reduce_mean(tf.abs((y1 - out1) / y1))
MAPE2 = tf.reduce_mean(tf.abs((y2 - out2) / y2))# trainingrecord = []
record_acc_best = [sys.maxsize, sys.maxsize, sys.maxsize, sys.maxsize]with tf.Session() as session:
session.run(tf.initialize_all_variables())
for i in range(NTrain+1):
bx, by = session.run([train_batch])[0]
_, l1, l2, Jl, _ = session.run([JL_op, loss1, loss2, Joint_Loss, add_global],
feed_dict={inputs: bx, outputs: by}) # learning rate add one operatorif i % 100 == 0:
#bx, by = session.run([train_batch])[0]
mape1, mape2, l1, l2, Jl = session.run([MAPE1, MAPE2,loss1, loss2, Joint_Loss],
feed_dict={inputs: bx, outputs: by})
bx_, by_ = session.run([test_batch])[0]
mape1_, mape2_, l1_, l2_, Jl_ = session.run([MAPE1, MAPE2,loss1, loss2, Joint_Loss],
feed_dict={inputs: bx_, outputs: by_})
if mape1 <= record_acc_best[0] and mape2 <= record_acc_best[1]:
record_acc_best[0] = mape1
record_acc_best[1] = mape2
if mape1_ <= record_acc_best[2] and mape2_ <= record_acc_best[3]:
record_acc_best[2] = mape1_
record_acc_best[3] = mape2_
results = [mape1, mape2, l1, l2, Jl , mape1_, mape2_, l1_, l2_, Jl_]
record.append(np.array(results))
step_i, lri = session.run([global_step, learning_rate]) # learning rate presentation
print('step %d, %d learning rate %0.4f \n %s \n trainging results: best %0.2f, %0.2f \n %s \n testing results: best %0.2f %0.2f \n %s \n'
%(i, step_i, lri,
format_print(['mape1', 'mape2', 'l1', 'l2', 'Joint Loss'], False),
record_acc_best[0], record_acc_best[1],
format_print(results[:5], True),
record_acc_best[2], record_acc_best[3],
format_print(results[5:], True) ))
pd.DataFrame(record, columns=['mape1', 'mape2', 'l1', 'l2', 'Joint Loss', 'mape1_', 'mape2_', 'l1_', 'l2_', 'Joint Loss_']).to_csv('input/results_tfonly.csv')

earlystopping 机制

一般来说机器学习的训练次数会设置到很大,如果模型的表现没有进一步提升,那么训练可以停止了,继续训练很可能会导致过拟合keras.callbacks.EarlyStopping就是用来提前结束训练的。

代码实现

似乎需要在keras的框架下,或者是在tensorflow 里的keras 框架下。

https://blog.csdn.net/zhangpeterx/article/details/90897439

过拟和处理方法.md相关推荐

  1. python-模拟登陆多种方法总结

    python-模拟登陆 目录 python-模拟登陆 一.已知cookie模拟登陆 1.1.urllib 1.2.requests 二.python模拟登录获取cookie和post获取cookie ...

  2. Java-虚拟机-终结方法finalize

    注意:finalize方法已经在9中作废 下面的例子中,会打印出"调用finalize方法",说明虚拟机执行垃圾回收的时候,会调用finalize方法 public class A ...

  3. iOS 14捏制个性化拟我表情方法教程

    OS14新增的发型.头饰和口罩,翻新的面部和肌肉结构让拟我表情和贴纸更富于表现力.所以,在即将来临的春节团聚,帮好友和家人在iOS14中捏制一个有趣的拟我表情吧! 设备要求:一台支持FaceID的设备 ...

  4. suricata mysql_Suricata启用Hyperscan支持以及Prelude-siem安装方法.md

    # Suricata启用Hyperscan支持以及Prelude-siem安装方法 ## 0x01 安装 Hyperscan ### 1.Hyperscan 安装要求: * GCC 版本大于等于 4. ...

  5. linux下解压.git,linux下万能解压的几种方法.md

    ## 万能脚本解压法 代码如下: ```sh #!/bin/sh # # Usage: extract # Description: extracts archived files / mounts ...

  6. J1900 小主机 PE 安装 OpenWrt 软路由固件到硬盘的方法.md

    工具 老毛桃 PE:https://www.laomaotao.net/ physdiskwrite 刻录软件:https://m0n0.ch/wall/physdiskwrite.php 具体步骤: ...

  7. python中的魔法方法__new___Python魔法方法会调用new方法吗?

    Python中的sort()方法用于数组排序,本文以实例形式对此加以详细说明:一.基本形式列表有自己的sort方法,其对列表进行原址排序,既然是原址排序,那显然元组不可能拥有这种方法,因为元组是不可修 ...

  8. Quasi-Dense Similarity Learning for Multiple Object Tracking(用于多目标跟踪的拟密集相似度学习)

    Quasi-Dense Similarity Learning for Multiple Object Tracking 论文:下载地址 代码:下载地址 QDTrack: 一. 介绍 二. 方法 1. ...

  9. 教你几个提升网站快速排名的好方法

    很多seo技术人员认为,网站关键词排名是和文章内容有决定性的作用,其实并不是这样的,文章内容这块是起到了非常重要的作用,但是网站想要做到快速排名却重点不是它,也许你会看到一些网站内容是极其少的,但是他 ...

最新文章

  1. opencv第一课 打开一个图片
  2. [uEnv.txt]在uEnv.txt文件中使用if语句实现Image/dtb文件切换
  3. @WebInitParam注解
  4. openjudge-NOI 2.6-1759 最长上升子序列
  5. 随想录(校园招聘记)
  6. [原创]RedisDesktopManager工具使用介绍
  7. 4.5 Spark SQL 处理JSON数据
  8. Windows设计师:多核芯片要求全新操作系统
  9. vs2012新建项目时出现错误提示框解决办法
  10. 分享一个Qt的pdf查看器
  11. Screw 整合Oracle 报错异常信息oracle.jdbc.driver.T4CConnection.isValid(I)Z
  12. uWSGI, Gunicorn, 啥玩意儿?
  13. HR唠家常式的套路题
  14. 1,matlab仿真正运动学
  15. 米兰大学计算机科学,米兰大学
  16. html页面不能放大缩小,互联网常识:html怎么禁止页面放大缩小
  17. 设备管理与检修方式的发展史(转)
  18. Oracle如何新建表
  19. MySQL高级篇知识点——其它数据库日志
  20. 【例9-3】结构体变量的引用

热门文章

  1. Ubuntu下查看电脑CPU温度
  2. 乔纳森·丹尼可(Jonathan Danylko)
  3. 用 Photoshop 计算命令优化美女照片皮肤
  4. mysql查看导入进度_查看MySQL LOAD DATA进度
  5. 如何用命令行解决可执行文件的权限问题
  6. I2C设备地址(7位地址左移)
  7. 微信分享功能实现,兼容安卓和IOS
  8. 超直白win7性能优化方案
  9. java毕业设计宠物喂养资讯分享平台的设计与实现Mybatis+系统+数据库+调试部署
  10. Gson按指定字段顺序序列化