参考:
Module: tf.nn.rnn_cell


tf.contrib.rnn.LSTMCell

tf.nn.rnn_cell.LSTMCell

目前1.8版本这两个都可以,tf.contrib.rnn为主,贡献阶段,tf.nn.rnn_cell中还仅仅是目录和链接。

到了1.12版本,LSTMCell,RNNCell等常用单元已搬迁至tf.nn.rnn_cell。

一.重要函数和类

这节主要就是说一下tensorflow里面在LSTM中比较常用的API了,毕竟是砖头,弄清楚肯定是有益处的.
这里先列一下

tf.nn.rnn_cell.LSTMCell
tf.nn.rnn_cell.MultiRNNCell
tf.nn.rnn_cell.LSTMStateTuple
tf.nn.rnn_cell.ResidualWrapper()
tf.nn.rnn_cell.DropoutWrapper
tf.nn.dynamic_rnn()
tf.nn.bidirectional_dynamic_rnn()
tf.sequence_mask()
tf.boolean_mask()

既然说到这里,那这里还说一个与词向量有关的常见函数,后面一并讲解.

tf.nn.embedding_lookup()

Ⅰ.tf.nn.rnn_cell.LSTMCell

文档:tf.nn.rnn_cell.LSTMCell

BasicLSTMCell是比较基本的创建LSTM cell的一个类,但是在1.12中已经弃用了,只剩下LSTMCell,当然目前BasicLSTMCell还是可以用的。

首先来看一下使用LSTMCell的时候怎么创建一个对象吧,构造函数为:

__init__(num_units,use_peepholes=False,cell_clip=None,initializer=None,num_proj=None,proj_clip=None,num_unit_shards=None,num_proj_shards=None,forget_bias=1.0,state_is_tuple=True,activation=None,reuse=None,name=None,dtype=None,**kwargs
)

参数:

num_units:LSTM cell中的units数量
use_peepholes: 要是为True 的话,表示使用diagonal/peephole连接。
cell_clip:可选,浮点值, (optional) A float value, if provided the cell state is clipped by this value prior to the cell output activation.
initializer: 可选,权重和后面投射层(projection)的矩阵权重初始化方式。
num_proj:可选,可以简单理解为一个全连接,表示投射(projection)操作之后输出的维度,要是为None的话,表示不进行投射操作。
proj_clip: (optional) A float value. If num_proj > 0 and proj_clip is provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip].
num_unit_shards:已弃用
num_proj_shards: 已弃用
forget_bias: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training. Must set it manually to 0.0 when restoring from CudnnLSTM trained checkpoints.
state_is_tuple:state状态作为一个元组,今后都默认为True
activation: 内部状态的激活函数,默认是hanh
reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.
dtype: Default dtype of the layer (default of None means use the type of the first input). Required when build is called before call.

举个例子,比如你想定义一个内部节点数为128的一个Cell,就可以用下面的语句,

import numpy as np
import tensorflow as tf
from tensorflow.contrib.layers.python.layers import initializerslstm_cell=tf.nn.rnn_cell.LSTMCell(num_units=128,use_peepholes=True,initializer=initializers.xavier_initializer(),num_proj=64,name="LSTM_CELL"
)print("output_size:",lstm_cell.output_size)
print("state_size:",lstm_cell.state_size)
print(lstm_cell.state_size.h)
print(lstm_cell.state_size.c)输出:
output_size: 64
state_size: LSTMStateTuple(c=128, h=64)
64
128

你会发现这个构造函数里面居然没有基本的输入信息! 但是不用担心,关于输入的一些细节底层都做好了,只要在后面的环节里面给进去输入就行了.后面会继续讲到.这里还要了解一点,相对于别人把这128叫做隐藏层的节点,其实我更倾向于理解为在Cell中的128个节点,每个节点接受同样的输入向量,然后得到一个值,128个节点合起来,输出的话就是一个128维的向量.而上面的代码还经过了一次全连接操作,因此最后输出的是64维的输出。

说到这里可能就有点晕了,为什么一下128维,一下64维,因此这里提一下这个类比较重要的两个属性(当然,这个类不止这两个属性).分别是:output_sizestate_size.看名字就能够猜到,output_sizestate_size 分别表示的LSTM的输出尺寸状态尺寸的. 有一个博客对这里的解释很好,可以去看一下,我就不啰嗦了。
tf.nn.dynamic_rnn的输出outputs和state含义

128个units决定了state_size 中的c就是128维的,这个很简单.这里的重点是state的格式,这里发现他是一个LSTMStateTuple的类型,别管那么多,简单当做一个tuple看待就行(但是和传统的tuple还是有区别的).之所以是一个tuple,是因为state包含了h和c两方面的内容(这里需要知道一些LSTM的原理)。

然后这个类还有一个很重要的函数,如下

zero_state(batch_size,dtype)

返回一个填零的状态state,注意,这里是初始化一个state,不是初始化整个LSTM.

参数:
batch_size: 批大小
dtype: state使用的类型.

如下例,会产生(2,80,128)形状的0状态,参数batch_size是80,而2和128是已经在BasicLSTMCell中定义过了

如果state_size是int或TensorShape,则返回值是填充零N-D的形状的 张量[batch_size, state_size]

如果state_size是一个tuple,那么返回的值是同样结构的tuple,其中每个元素都是一个2-D的tensor,他们的形状为[batch_size, s] 其中的s是state_size中各自的s .

正常如下例:c和h的维度一样,则两个都是[batch_size,num_units]

如上例,为h输出加入了映射层num_proj=64, 则h和c维度不一样了。。则一个为[batch_size,h],另一个[batch_size,c]

cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units)   #128
# lstm cell is divided into two parts (c_state, h_state)
# LSTMStateTuple 为【2,80,128】
# LSTMStateTuple(   c=(80,128), h=(80,128)  )
init_state = cell.zero_state(batch_size, dtype=tf.float32)  #80
print(init_state)输出:
LSTMStateTuple(
c=<tf.Tensor 'BasicLSTMCellZeroState/zeros:0' shape=(80, 128) dtype=float32>,
h=<tf.Tensor 'BasicLSTMCellZeroState/zeros_1:0' shape=(80, 128) dtype=float32>
)

.__call__(inputs,state,scope=None)

作用:在给定的状态(state)和输入上运行RNN cell,这个方法是在一个"时间点"上面运行一次RNN的方法,是比较偏底层的一个函数,对于理解RNN的运行过程非常有帮助。
后面将会讲到的tf.nn.dynamic_rnn() 等接口就是更加高层的接口,直接把所有的运行过程都得到了.

参数:
inputs: 2-D tensor,形状为[batch_size,input_size].在实际使用的时候,你会先把数据整理成为[batch_size,time_steps_size,input_size] 的形状,所以假如当前时刻是i,那么使用的时候,直接使用[:,i,:] 作为数据传入就行了.
state: 要是self.state_size 是一个整形 ,那么这个参数应该当是一个形状为 [batch_size,self.state_size] 的tensor,否则,要是self.state_size 是一个整数的元组,那么这个应当是一个形状为[batch_size,s] 的元组,其中s在self.state_size 中.
scope: 这个创建的子图的变量域(VariableScope),默认是类名.

返回值:
A pair containing:
Output: 一个形状为[batch_size,self.output_size] 的2-D tensor
New state:新的state,和之前的state结构一样.

with tf.variable_scope('cell'):cell = tf.contrib.rnn.BasicLSTMCell(self._cell_size)with tf.name_scope('initial_state'):self._cell_initial_state = cell.zero_state(self._batch_size, dtype=tf.float32)print(self._cell_initial_state)  #两个(20,11)self.cell_outputs = []cell_state = self._cell_initial_statefor t in range(self._time_steps):if t > 0: tf.get_variable_scope().reuse_variables()cell_output, cell_state = cell(l_in_y[:, t, :], cell_state)self.cell_outputs.append(cell_output)self._cell_final_state = cell_state#cell_outputs是(TIME_STEP,BATCH,CELL_SIZE)  就是TIME_STEP个(batch,cell_size)

Ⅱ.tf.nn.rnn_cell.MultiRNNCell

前面的类可以定义一个一层的LSTM,那么怎么定义多层的LSTM类呢? 这个类主要的作用是把单层LSTM结合为多层的LSTM.
首先来看他的构造函数是怎样的.

__init__cells,state_is_tuple=True)

参数:
cells:一个列表,里面是你想叠起来的RNNCells,
state_is_tuple:要是是True 的话, 以后都是默认是True了,因此这个参数不用管。接受和返回的state都是n-tuple,其中n = len(cells).

然后还有一些其他的函数和属性都和前面的BasicLSTMCell差不多.但是这里还是要说一下在这里,他的两个属性output_sizestate_size 会变成怎么样的形式.下面举一个例子:

import numpy as np
import tensorflow as tf
from tensorflow.contrib.layers.python.layers import initializerslstm_cell_1=tf.nn.rnn_cell.LSTMCell(num_units=128,use_peepholes=True,initializer=initializers.xavier_initializer(),num_proj=64,
)
lstm_cell_2=tf.nn.rnn_cell.LSTMCell(num_units=128,use_peepholes=True,initializer=initializers.xavier_initializer(),num_proj=64,
)
lstm_cell_3=tf.nn.rnn_cell.LSTMCell(num_units=128,use_peepholes=True,initializer=initializers.xavier_initializer(),num_proj=64,
)multi_cell=tf.nn.rnn_cell.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])print("output_size:",multi_cell.output_size)
print("state_size:",type(multi_cell.state_size))
print("state_size:",multi_cell.state_size)#需要先索引到具体的那层cell,然后取出具体的state状态
print(multi_cell.state_size[0].h)
print(multi_cell.state_size[0].c)init_state = multi_cell.zero_state(80, dtype=tf.float32)
print(init_state[0].h)
print(init_state[0].c)结果:
output_size: 64
state_size: <class 'tuple'>
state_size: (LSTMStateTuple(c=128, h=64), LSTMStateTuple(c=128, h=64), LSTMStateTuple(c=128, h=64))
64
128
Tensor("MultiRNNCellZeroState/LSTMCellZeroState/zeros_1:0", shape=(80, 64), dtype=float32)
Tensor("MultiRNNCellZeroState/LSTMCellZeroState/zeros:0", shape=(80, 128), dtype=float32)

这里首先建立了3层LSTM,然后使用MultiRNNCell 的构造函数把他们堆叠在一起,所以结果中的属性output_size为64,就是最后那层的projection_num数量了.

重要的是state_size的样式,可以看到是一个tuple里面,然后又有3个LSTMStateTuple对象,其实这里也可以看出来了,就是每层的LSTMStateTuple属性放到了一个大的tuple里面.

Ⅲ.tf.nn.dynamic_rnn()

这个函数的作用就是通过指定的RNN Cell来展开计算神经网络.
他的构造函数如下:

outputs,statedynamic_rnn(cell,inputs,sequence_length=None,initial_state=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)

对于dynamic_rnn来说每个batch的序列长度都是一样的(不足的话自己要去padding),这个函数会根据 sequence_length 中止计算.同时dynamic_rnn是动态生成graph的

参数:

cell: RNNCell的对象.
inputs: RNN的输入,当time_major == False (default) 的时候,必须是形状为 [batch_size, max_time, ...] 的tensor, 要是 time_major == True 的话, 必须是形状为 [max_time, batch_size, ...] 的tensor. 前面两个维度应该在所有的输入里面都应该匹配.
sequence_length: 可选,一个int32/int64类型的vector,他的尺寸是[batch_size]. 对于最后结果的正确性,这个还是非常有用的.因为给他具体每一个序列的长度,能够精确的得到结果,排除了之前为了把所有的序列弄成一样的长度padding造成的不准确.
initial_state: 可选,RNN的初始状态. 要是cell.state_size 是一个整形,那么这个参数必须是一个形状为 [batch_size, cell.state_size] 的tensor. 要是cell.state_size 是一个tuple, 那么这个参数必须是一个tuple,其中元素为形状为[batch_size, s] 的tensor,s为cell.state_size 中的各个相应size.
dtype: 可选,表示输入的数据类型和期望输出的数据类型.当初始状态没有被提供或者RNN的状态由多种形式构成的时候需要显示指定.
parallel_iterations: 默认是32,表示的是并行运行的迭代数量(Default: 32). 有一些没有任何时间依赖的操作能够并行计算,实际上就是空间换时间和时间换空间的折中,当value远大于1的时候,会使用的更多的内存但是能够减少时间,当这个value值很小的时候,会使用小一点的内存,但是会花更多的时间.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: 规定了输入和输出tensor的数据组织格式,如果 true, tensor的形状需要是[max_time, batch_size, depth]. 若是false, 那么tensor的形状为[batch_size, max_time, depth]. 要是使用time_major = True 的话,会更加高效率一点,因为避免了在RNN计算的开始和结束的时候对于矩阵的转置 ,然而,大多数的tensorflow数据格式都是采用的以batch为主的格式,所以这里也默认采用以batch为主的格式.
scope: 子图的scope名称,默认是"rnn"

返回(非常重要):
返回(outputs, state)形式的结果对,其中:

  • outputs: 表示RNN的输出隐状态h,就是所有时间步的h,要是time_major == False (default),那么这个tensor的形状为[batch_size, max_time, cell.output_size],要是time_major == True, 这个Tensor的形状为[max_time, batch_size, cell.output_size]. 这里需要注意一点,要是是双向LSTM,那么outputs就会是一个tuple,其中两个元素分别表示前向的outputs和反向的outputs,后面讲到双向LSTM会详细说这个内容。
  • state: 最终时间步的states,要是单向网络,假如有K层,states就是一个元组,里面包含K(层数)个LSTMStateTuple,分别代表这些层最终的状态信息。要是双向网络,那么还是元组,元组里面又是两个小元组分别表示前向的states和后向的states。相应的小元组里面就是每一层的最终时刻的states信息。

例1:单层lstm

inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
#lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
#lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多层lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])print("output_size:",lstm_cell_1.output_size)       #128
print("state_size:",lstm_cell_1.state_size)  #LSTMStateTuple(c=128, h=128)
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(cell=lstm_cell_1,inputs=inputs,dtype=tf.float32
)print("output.shape:",output.shape)      #(32, 40, 128)
print("len of state tuple",len(state))   #2
print("state.h.shape:",state.h.shape)    #(32, 128)
print("state.c.shape:",state.c.shape)    #(32, 128)

例二:多层LSTM

注意:output的形状是(32, 40, 512),因为输出的是最顶层的短时记忆 h 。

多层的state就是一个tuple,而tuple的每一个元素都是每一层LSTM的LSTMStateTuple.

inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_1 = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_2 = tf.contrib.rnn.BasicLSTMCell(num_units=256)
lstm_cell_3 = tf.contrib.rnn.BasicLSTMCell(num_units=512)
#多层lstm_cell
lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])print("output_size:",lstm_cell.output_size)   #512
print("state_size:",lstm_cell.state_size)     #( LSTMStateTuple(c=128, h=128), #  LSTMStateTuple(c=256, h=256), #  LSTMStateTuple(c=512, h=512)  )
#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.dynamic_rnn(cell=lstm_cell,inputs=inputs,dtype=tf.float32
)print("output.shape:",output.shape)     #(32, 40, 512)
print("len of state tuple",len(state))  #3
print(state)
(
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_2:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_3:0' shape=(32, 128) dtype=float32>),
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_4:0' shape=(32, 256) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_5:0' shape=(32, 256) dtype=float32>),
LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_6:0' shape=(32, 512) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_7:0' shape=(32, 512) dtype=float32>)
)

Ⅳ tf.nn.bidirectional_dynamic_rnn

outputs,output_states =bidirectional_dynamic_rnn(cell_fw,cell_bw,inputs,sequence_length=None,initial_state_fw=None,initial_state_bw=None,dtype=None,parallel_iterations=None,swap_memory=False,time_major=False,scope=None)

参数:

cell_fw:RNNCell的一个实例,用于正向。
cell_bw:RNNCell的一个实例,用于反向。
inputs:RNN输入。如果time_major == False(默认),则它必须是形状为 [batch_size, max_time, ...]的tensor,或者这些元素的嵌套元组。如果time_major == True,则它必须是形状为[max_time, batch_size, ...]的tensor ,或者是这些元素的嵌套元组。
sequence_length:(可选)一个int32 / int64向量,大小[batch_size],包含批处理中每个序列的实际长度。如果未提供,则所有批次条目均假定为完整序列; 并且时间反转从时间0到max_time每个序列被应用。
initial_state_fw:(可选)前向RNN的初始状态。这必须是适当类型和形状的张量[batch_size, cell_fw.state_size]。如果cell_fw.state_size是一个元组,这应该是一个具有形状的张量的元组[batch_size, s] for s in cell_fw.state_size。
initial_state_bw:(可选)与之相同initial_state_fw,但使用相应的属性cell_bw。
dtype:(可选)初始状态和预期输出的数据类型。如果未提供initial_states或者RNN状态具有异构dtype,则为必需。
parallel_iterations:(默认:32)。并行运行的迭代次数。那些没有任何时间依赖性并且可以并行运行的操作将会是。此参数用于空间换算时间。值>> 1使用更多的内存,但花费更少的时间,而更小的值使用更少的内存,但计算需要更长的时间。
swap_memory:透明地交换前向推理中产生的张量,但是从GPU到后端支持所需的张量。这允许训练通常不适合单个GPU的RNN,而且性能损失非常小(或不)。
time_major:inputs和outputs张量的形状格式。如果为True的话,这些都Tensors的形状为[max_time, batch_size, depth]。如果为False的话,这些Tensors的形状是[batch_size, max_time, depth]。
scope:创建子图的VariableScope; 默认为“bidirectional_rnn”

返回:

元组(outputs,output_states) 其中:
outputs:包含正向和反向rnn输出的元组(output_fw,output_bw)
如果time_major == False(默认值),则output_fw将是一个形状为[batch_size, max_time, cell_fw.output_size] 的tensor,output_bw将是一个形状为[batch_size, max_time, cell_bw.output_size]的tensor.
如果time_major == True,则output_fw将为一个形状为[max_time, batch_size, cell_fw.output_size] 的tensor, output_bw将是一个形状为[max_time, batch_size, cell_bw.output_size] 的tensor.
output_state,也是一个tuple,内容是(output_state_fw, output_state_bw) 也就是说,前向的state和后向的state放到了一个元组里面.

这里举一个例子:

在这个例子里面,还用到了一个拼接state的例子,可以作为自己初始化state或者拼接state的模板.

state_concat=tf.contrib.rnn.LSTMStateTuple(c=state_c_concat,h=state_h_concat)

拼接成一个tuple元组,(状态c的Tensor,状态h的Tensor)

import tensorflow as tf
import numpy as np
tf.reset_default_graph()inputs = tf.placeholder(np.float32, shape=(32,40,5)) # 32 是 batch_size
lstm_cell_fw = tf.contrib.rnn.BasicLSTMCell(num_units=128)
lstm_cell_bw = tf.contrib.rnn.BasicLSTMCell(num_units=128)#多层lstm_cell
#lstm_cell=tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell_1,lstm_cell_2,lstm_cell_3])print("output_fw_size:",lstm_cell_fw.output_size)  #128
print("state_fw_size:",lstm_cell_fw.state_size)    #LSTMStateTuple(c=128, h=128)
print("output_bw_size:",lstm_cell_bw.output_size)  #128
print("state_bw_size:",lstm_cell_bw.state_size)    #LSTMStateTuple(c=128, h=128)#print(lstm_cell.state_size.h)
#print(lstm_cell.state_size.c)
output,state=tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_cell_fw,cell_bw=lstm_cell_bw,inputs=inputs,dtype=tf.float32
)
print(output)
#元组 ( <tf.Tensor 'bidirectional_rnn/fw/fw/transpose:0' shape=(32, 40, 128) dtype=float32>,
#      <tf.Tensor 'ReverseV2:0' shape=(32, 40, 128) dtype=float32>   )
output_fw=output[0]
output_bw=output[1]
state_fw=state[0]
state_bw=state[1]print("output_fw.shape:",output_fw.shape)   #(32, 40, 128)
print("output_bw.shape:",output_bw.shape)   #(32, 40, 128)
print("len of state tuple",len(state_fw))   #2
print("state_fw:",state_fw)      #LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_2:0' shape=(32, 128) dtype=float32>, #   h=<tf.Tensor 'bidirectional_rnn/fw/fw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
print("state_bw:",state_bw)      #LSTMStateTuple(c=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_2:0' shape=(32, 128) dtype=float32>, #   h=<tf.Tensor 'bidirectional_rnn/bw/bw/while/Exit_3:0' shape=(32, 128) dtype=float32>)
#print("state.h.shape:",state.h.shape)
#print("state.c.shape:",state.c.shape)#state_concat=tf.concat(values=[state_fw,state_fw],axis=1)
#print(state_concat)
state_h_concat=tf.concat(values=[state_fw.h,state_bw.h],axis=1)
print("state_fw_h_concat.shape",state_h_concat.shape)     #(32, 256)state_c_concat=tf.concat(values=[state_fw.c,state_bw.c],axis=1)
print("state_fw_h_concat.shape",state_c_concat.shape)     #(32, 256)state_concat=tf.contrib.rnn.LSTMStateTuple(c=state_c_concat,h=state_h_concat)
print(state_concat)
#LSTMStateTuple(c=<tf.Tensor 'concat_1:0' shape=(32, 256) dtype=float32>,
#                 h=<tf.Tensor 'concat:0' shape=(32, 256) dtype=float32>)

Ⅴ.tf.nn.embedding_lookup()

embedding_lookup(params,ids,partition_strategy=‘mod’,name=None,max_norm=None)

这个函数主要是在任务内进行embeddings的时候使用的一个函数,通过这个函数来把一个字或者词映射到对应维度的词向量上面去. 要是设置为可训练的Variable的话,可以在进行任务的时候同时对于词向量进行训练.

params: 表示完整的嵌入张量,或者除了第一维度之外具有相同形状的P个张量的列表,表示经分割的嵌入张量。
ids: 一个类型为int32或int64的Tensor,包含要在params中查找的id
partition_strategy: 指定分区策略的字符串,如果len(params)> 1,则相关。当前支持“div”和“mod”。 默认为“mod”
name: 操作名称(可选)
max_norm: 如果不是None,嵌入值将被l2归一化为max_norm的值

Ⅵ. tf.sequence_mask()

sequence_mask(lengths,maxlen=None,dtype=tf.bool,name=None)

作用:返回一个mask tensor表示每个序列的前N个位置.
If lengths has shape [d_1, d_2, …, d_n] the resulting tensor mask has dtype dtype and shape [d_1, d_2, …, d_n, maxlen], with
mask[i_1, i_2, …, i_n, j] = (j < lengths[i_1, i_2, …, i_n])

参数:
lengths: 整形的tensor, 他的所有的值都要小于或等于maxlen.
maxlen: scalar integer tensor, size of last dimension of returned tensor. Default is the maximum value in lengths.
dtype: output type of the resulting tensor.
name: op名称
返回:
A mask tensor of shape lengths.shape + (maxlen,), cast to specified dtype.

例子:

tf.sequence_mask([1, 3, 2], 5)  # [[True, False, False, False, False],#  [True, True, True, False, False],#  [True, True, False, False, False]]tf.sequence_mask([[1, 3],[2,0]])  # [[[True, False, False],#   [True, True, True]],#  [[True, True, False],#   [False, False, False]]]

Ⅶ.tf.boolean_mask()

boolean_mask(tensor,mask,name=‘boolean_mask’)
把boolean类型的mask值应用到tensor上面,可以和numpy里面的tensor[mask] 类比.

参数:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N同时K必须是已知的
name: 可选,操作名
返回:
一个(N-K+1)维tensor.相应的值对应mask tensor中的True.

上面这些API是对于要使用的东西就基本的了解.接下来就开始讲例子了.

二.实例

如开头所说,接下来讲的几个例子都是一些玩具示例,但是对于新手是绝对友好的,这些简单例子涵盖了进行LSTM编程需要的一些基本思想和手段,通过消化这些简单例子可以快速上手,构建出后面适合自己的更加复杂的网络结构.
接下来从最基本的例子一个一个来讲,每个例子都可以直接作为脚本直接跑起来.

Ⅰ.预测sin函数

代码:

import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
tf.reset_default_graph()TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):X=[]y=[]for i in range(len(seq)-TIME_STEPS):X.append([seq[i:i+TIME_STEPS]])y.append([seq[i+TIME_STEPS]])     #每十个数,预测一个数return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)   #(10990, 1, 10)  和  (10990, 1)
X_test,y_test=generate(seq_test)#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")#lstm instancelstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)#initialize to zeroinit_state=lstm_cell.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)#dynamic rnnoutputs,states=tf.nn.dynamic_rnn(cell=lstm_cell,inputs=X_p,initial_state=init_state,dtype=tf.float32)print(outputs.shape)   #(128, 10, 1)h=outputs[:,-1,:]print(h.shape)         #(128, 1)#--------------------------------------------------------------------------------------------##---------------------------------define loss and optimizer----------------------------------#mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)#print(loss.shape)optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):results = np.zeros(shape=(TEST_EXAMPLES, 1))   #(1100,1)train_losses=[]test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss=sess.run(fetches=(optimizer,mse),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)print("average training loss:", sum(train_losses) / len(train_losses))for j in range(TEST_EXAMPLES//BATCH_SIZE):result,test_loss=sess.run(fetches=(h,mse),feed_dict={X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=resulttest_losses.append(test_loss)print("average test loss:", sum(test_losses) / len(test_losses))plt.plot(range(1000),results[:1000,0])plt.show()

图中红色粗线是真实值,可以看到,在迭代150个epoch之后,我们的结果越来越接近真实值了.

Ⅱ.预测sin函数多层版

代码:

import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
tf.reset_default_graph()TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):X=[]y=[]for i in range(len(seq)-TIME_STEPS):X.append([seq[i:i+TIME_STEPS]])y.append([seq[i+TIME_STEPS]])return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")#lstm instancelstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)   #30lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)     #1multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])#initialize to zeroinit_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)print(init_state)#(LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros:0' shape=(128, 30) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(128, 30) dtype=float32>),#  LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState_1/zeros:0' shape=(128, 1) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/BasicLSTMCellZeroState_1/zeros_1:0' shape=(128, 1) dtype=float32>))#dynamic rnnoutputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)print(states)#(LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_2:0' shape=(128, 30) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_3:0' shape=(128, 30) dtype=float32>),# LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_4:0' shape=(128, 1) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_5:0' shape=(128, 1) dtype=float32>))#print(outputs.shape)  #(128, 10, 1)h=outputs[:,-1,:]#print(h.shape)   #(128, 1)#--------------------------------------------------------------------------------------------##---------------------------------define loss and optimizer----------------------------------#mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)#print(loss.shape)optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):results = np.zeros(shape=(TEST_EXAMPLES, 1))train_losses=[]test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss=sess.run(fetches=(optimizer,mse),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)print("average training loss:", sum(train_losses) / len(train_losses))for j in range(TEST_EXAMPLES//BATCH_SIZE):result,test_loss=sess.run(fetches=(h,mse),feed_dict={X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=resulttest_losses.append(test_loss)print("average test loss:", sum(test_losses) / len(test_losses))plt.plot(range(1000),results[:1000,0])plt.show()

在这里,我们发现仅仅是50个epoch之后,得到的效果就要明显好于前面第一个的结果.

自定义初始状态 及 手动展开计算

# -*- coding: utf-8 -*-import numpy as np
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as plt
tf.reset_default_graph()
TIME_STEPS=10
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=1
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=11000
TEST_EXAMPLES=1100#------------------------------------Generate Data-----------------------------------------------#
#generate data
def generate(seq):X=[]y=[]for i in range(len(seq)-TIME_STEPS):X.append([seq[i:i+TIME_STEPS]])y.append([seq[i+TIME_STEPS]])return np.array(X,dtype=np.float32),np.array(y,dtype=np.float32)#s=[i for i in range(30)]
#X,y=generate(s)
#print(X)
#print(y)seq_train=np.sin(np.linspace(start=0,stop=100,num=TRAIN_EXAMPLES,dtype=np.float32))
seq_test=np.sin(np.linspace(start=100,stop=110,num=TEST_EXAMPLES,dtype=np.float32))#plt.plot(np.linspace(start=0,stop=100,num=10000,dtype=np.float32),seq_train)
#plt.plot(np.linspace(start=100,stop=110,num=1000,dtype=np.float32),seq_test)
#plt.show()X_train,y_train=generate(seq_train)
#print(X_train.shape,y_train.shape)
X_test,y_test=generate(seq_test)#reshape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,TIME_STEPS,1))
X_test=np.reshape(X_test,newshape=(-1,TIME_STEPS,1))#draw y_test
plt.plot(range(1000),y_test[:1000,0],"r*")
#print(X_train.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,1),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,1),name="pred_placeholder")#lstm instancelstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)   #30lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)     # 1multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])#自己初始化state#第一层statelstm_layer1_c=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))lstm_layer1_h=tf.zeros(shape=(BATCH_SIZE,HIDDEN_UNITS1))layer1_state=rnn.LSTMStateTuple(c=lstm_layer1_c,h=lstm_layer1_h)#第二层statelstm_layer2_c = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))lstm_layer2_h = tf.zeros(shape=(BATCH_SIZE, HIDDEN_UNITS))layer2_state = rnn.LSTMStateTuple(c=lstm_layer2_c, h=lstm_layer2_h)init_state=(layer1_state,layer2_state)print(init_state)#( LSTMStateTuple(c=<tf.Tensor 'zeros:0' shape=(128, 30) dtype=float32>, #                 h=<tf.Tensor 'zeros_1:0' shape=(128, 30) dtype=float32>), #  LSTMStateTuple(c=<tf.Tensor 'zeros_2:0' shape=(128, 1) dtype=float32>, #                 h=<tf.Tensor 'zeros_3:0' shape=(128, 1) dtype=float32>)   )#自己展开RNN计算outputs = list()                                    #用来接收存储每步的结果state = init_statewith tf.variable_scope('RNN'):for timestep in range(TIME_STEPS):if timestep > 0:tf.get_variable_scope().reuse_variables()# 这里的state保存了每一层 LSTM 的状态(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)outputs.append(cell_output)print(outputs)    #10个(128,1)形状Tensor 的列表h = outputs[-1]print(h)          #Tensor("RNN/RNN/multi_rnn_cell/cell_1/cell_1/basic_lstm_cell/mul_29:0", #shape=(128, 1), dtype=float32)#---------------------------------define loss and optimizer----------------------------------#mse=tf.losses.mean_squared_error(labels=y_p,predictions=h)#print(loss.shape)optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=mse)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):results = np.zeros(shape=(TEST_EXAMPLES, 1))train_losses=[]test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss=sess.run(fetches=(optimizer,mse),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)print("average training loss:", sum(train_losses) / len(train_losses))for j in range(TEST_EXAMPLES//BATCH_SIZE):result,test_loss=sess.run(fetches=(h,mse),feed_dict={X_p:X_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_test[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})results[j*BATCH_SIZE:(j+1)*BATCH_SIZE]=resulttest_losses.append(test_loss)print("average test loss:", sum(test_losses) / len(test_losses))plt.plot(range(1000),results[:1000,0])plt.show()

这个例子和上面的Ⅱ是一模一样的,唯一的区别就是使用了自定义的初始状态,从这个例子可以看一下怎么自定义一个状态. 然后就是之前自动展开的,这里变成了手动展开计算,这里的计算过程很有启发意义,有时候需要自己掌控每一步的结果的时候,可以使用这个来展开计算.

#自己展开RNN计算outputs = list()                                    #用来接收存储每步的结果state = init_statewith tf.variable_scope('RNN'):for timestep in range(TIME_STEPS):if timestep > 0:tf.get_variable_scope().reuse_variables()# 这里的state保存了每一层 LSTM 的状态(cell_output, state) = multi_lstm(X_p[:, timestep, :], state)outputs.append(cell_output)h = outputs[-1]

Ⅲ.MNIST图像分类

LSTM也可以做图像分类,在这里,思想还是非常简单的,MNIST的图像可以表示为28x28 的形式,

这个代码不用看:

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as pltTIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")#lstm instancelstm_cell1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)lstm_cell=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)multi_lstm=rnn.MultiRNNCell(cells=[lstm_cell1,lstm_cell])#initialize to zeroinit_state=multi_lstm.zero_state(batch_size=BATCH_SIZE,dtype=tf.float32)print(init_state)#dynamic rnnoutputs,states=tf.nn.dynamic_rnn(cell=multi_lstm,inputs=X_p,initial_state=init_state,dtype=tf.float32)#print(outputs.shape)h=outputs[:,-1,:]#print(h.shape)#--------------------------------------------------------------------------------------------##---------------------------------define loss and optimizer----------------------------------#cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)#print(loss.shape)correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):#results = np.zeros(shape=(TEST_EXAMPLES, 10))train_losses=[]accus=[]#test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss,accu=sess.run(fetches=(optimizer,cross_loss,accuracy),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)accus.append(accu)print("average training loss:", sum(train_losses) / len(train_losses))print("accuracy:",sum(accus)/len(accus))

结果:

Ⅳ.双向LSTM做图像分类

代码:

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as pltTIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")#lstm instancelstm_forward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)   #10lstm_backward=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)  #10outputs,states=tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_forward,cell_bw=lstm_backward,inputs=X_p,dtype=tf.float32)outputs_fw=outputs[0]outputs_bw = outputs[1]h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]  #其实就是数值相加# print(h.shape)     #(batch,10)#---------------------------------------------------------------------------------------------##---------------------------------define loss and optimizer----------------------------------#cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)#print(loss.shape)correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):#results = np.zeros(shape=(TEST_EXAMPLES, 10))train_losses=[]accus=[]#test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss,accu=sess.run(fetches=(optimizer,cross_loss,accuracy),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)accus.append(accu)print("average training loss:", sum(train_losses) / len(train_losses))print("accuracy:",sum(accus)/len(accus))

这个例子的结果为:

会发现在后面不管怎么学都学不到东西了.这是因为上面我们只使用了一层双向网络.接下来仅仅需要小小的改动,把上面这个网络改为深层的双向LSTM.

Ⅴ.深层双向LSTM做图像分类

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.rnn as rnn
import matplotlib.pyplot as pltTIME_STEPS=28
BATCH_SIZE=128
HIDDEN_UNITS1=30
HIDDEN_UNITS=10
LEARNING_RATE=0.001
EPOCH=50TRAIN_EXAMPLES=42000
TEST_EXAMPLES=28000#------------------------------------Generate Data-----------------------------------------------#
#generate data
train_frame = pd.read_csv("../Mnist/train.csv")
test_frame = pd.read_csv("../Mnist/test.csv")# pop the labels and one-hot coding
train_labels_frame = train_frame.pop("label")# get values
# one-hot on labels
X_train = train_frame.astype(np.float32).values
y_train=pd.get_dummies(data=train_labels_frame).values
X_test = test_frame.astype(np.float32).values#trans the shape to (batch,time_steps,input_size)
X_train=np.reshape(X_train,newshape=(-1,28,28))
X_test=np.reshape(X_test,newshape=(-1,28,28))
#print(X_train.shape)
#print(y_dummy.shape)
#print(X_test.shape)#-----------------------------------------------------------------------------------------------------##--------------------------------------Define Graph---------------------------------------------------#
graph=tf.Graph()
with graph.as_default():#------------------------------------construct LSTM------------------------------------------##place hoderX_p=tf.placeholder(dtype=tf.float32,shape=(None,TIME_STEPS,28),name="input_placeholder")y_p=tf.placeholder(dtype=tf.float32,shape=(None,10),name="pred_placeholder")#lstm instancelstm_forward_1=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)lstm_forward_2=rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)lstm_forward=rnn.MultiRNNCell(cells=[lstm_forward_1,lstm_forward_2])lstm_backward_1 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS1)lstm_backward_2 = rnn.BasicLSTMCell(num_units=HIDDEN_UNITS)lstm_backward=rnn.MultiRNNCell(cells=[lstm_backward_1,lstm_backward_2])outputs,states=tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_forward,cell_bw=lstm_backward,inputs=X_p,dtype=tf.float32)print(outputs)print(states)outputs_fw=outputs[0]outputs_bw = outputs[1]h=outputs_fw[:,-1,:]+outputs_bw[:,-1,:]print(h.shape)#---------------------------------------;-----------------------------------------------------##---------------------------------define loss and optimizer----------------------------------#cross_loss=tf.losses.softmax_cross_entropy(onehot_labels=y_p,logits=h)#print(loss.shape)correct_prediction = tf.equal(tf.argmax(h, 1), tf.argmax(y_p, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))optimizer=tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss=cross_loss)init=tf.global_variables_initializer()#-------------------------------------------Define Session---------------------------------------#
with tf.Session(graph=graph) as sess:sess.run(init)for epoch in range(1,EPOCH+1):#results = np.zeros(shape=(TEST_EXAMPLES, 10))train_losses=[]accus=[]#test_losses=[]print("epoch:",epoch)for j in range(TRAIN_EXAMPLES//BATCH_SIZE):_,train_loss,accu=sess.run(fetches=(optimizer,cross_loss,accuracy),feed_dict={X_p:X_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE],y_p:y_train[j*BATCH_SIZE:(j+1)*BATCH_SIZE]})train_losses.append(train_loss)accus.append(accu)print("average training loss:", sum(train_losses) / len(train_losses))print("accuracy:",sum(accus)/len(accus))

相比起上面单层的bilstm,这里才35轮就已经到了95%了,说明在信息抽象的能力上面,多层的架构要好于单层的架构.

【Tensorflow 大马哈鱼】构造LSTM超长简明教程相关推荐

  1. Tensorflow简明教程

      一直以来没有系统学习过TF,为了更加高效的投入到近期的关系抽取比赛中,所以准备系统学习一下,系统学习的内容是李理老师的<Tensorflow简明教程>,地址为 http://fancy ...

  2. python多态_Python 简明教程 21,Python 继承与多态

    程序不是年轻的专利,但是,它属于年轻. 目录 目录 我们已经知道封装,继承和多态 是面向对象的三大特征,面向对象语言都会提供这些机制. 1,封装 在这一节介绍类的私有属性和方法的时候,我们已经讲到过封 ...

  3. 汇编 div_Solidity汇编开发简明教程

    在用Solidity开发以太坊智能合约时,使用汇编可以直接与EVM交互,降低 gas开销成本,更精细的控制智能合约的行为,因此值得Solidity开发者学习 并加以利用.本文是Solidity汇编开发 ...

  4. ArcGIS Pro 简明教程(3)数据编辑

    ArcGIS Pro 简明教程(3)数据编辑 by 李远祥 数据编辑是GIS中最常用的功能之一,ArcGIS Pro在GIS数据编辑上使用习惯有一定的改变,因此,本章可以重点看看一些编辑工具的使用和使 ...

  5. 《实变函数简明教程》,P114,第7题(积分具有绝对连续性 推导 Lebesgue可积)

    <实变函数简明教程>,P114,第7题(积分具有绝对连续性 推导 Lebesgue可积) 积分绝对连续性 待分析命题 引理:P57,29(2) 证明过程 积分绝对连续性   可测集EEE上 ...

  6. 【二】gym初次入门一学就会---代码详细解析简明教程----平衡杆案例

    相关文章: [一]gym环境安装以及安装遇到的错误解决 [二]gym初次入门一学就会-简明教程 [三]gym简单画图 [四]gym搭建自己的环境,全网最详细版本,3分钟你就学会了! [五]gym搭建自 ...

  7. 《量子信息与量子计算简明教程》第三章·量子纠缠状态及其应用 (上)

    本专栏的主要内容是 <量子信息与量子计算简明教程>陈汉武 这本书的学习笔记及复习整理. 本章所涉及到的主要内容概览如下: 一.量子纠缠态   关于量子纠缠态,如果阅读过第一章·基本概念(上 ...

  8. Java手机游戏开发简明教程 (SunJava开发者认证程序员 郎锐)

    原文发布时间为:2008-07-30 -- 来源于本人的百度文章 [由搬家工具导入] Java手机游戏开发实例简明教程 (SunJava开发者认证程序员 郎锐) 一.手机游戏编写基础 1.手机游戏设计 ...

  9. python程序设计简明教程知识点_《Python 简明教程》读书笔记系列一 —— 基本语法...

    基础知识 注释 注释 是 # 符号右侧的任何文本,主要用作程序读者的注释. 在程序中要使用尽可能多的有用注释: 解释假设(或者前提 / 条件) 解释重要的决定 解释重要的细节 解释你想要解决的问题 解 ...

最新文章

  1. python参数传递方法_深入理解python中函数传递参数是值传递还是引用传递
  2. php采集百度推荐词,php抓取百度快照、百度收录、百度热词程序代码_PHP教程
  3. Logtail从入门到精通(二):开启日志采集之旅
  4. Step By Step 搭建 MySql MHA 集群
  5. 同时安装sql server和oracle导致系统启动变慢的解决方案
  6. linux json 写sql注入,sql注入之AJAX(SQL Injection (AJAX/JSON/jQuery))
  7. html是描述型语言,JavaScript_JavaScript基础教程——入门必看篇,JavaScript他是一种描述性语言, - phpStudy...
  8. 配置高并发jdbc连接池
  9. Mac下虚拟机win10键盘不兼容解决方案-MS Office Word篇
  10. android 图片置顶,Android布局图片置顶
  11. linux sqlserver命令,sqlserver的命令行
  12. Android VideoView播放 项目中的 视频文件 自动横屏 全屏播放
  13. 3.3 测试实现标准的ZIO服务
  14. Rman 在非归档模式增量備份
  15. 手机号时间戳加密传到前端_如何在不到一个小时的时间内加密您的一生
  16. 遍历操作__getitem__
  17. thinkpad x1c 2015使用GPT分区方式安装原版win7系统
  18. Go语言读取文件的常用方式
  19. 计算机毕业设计Java校园面包超市系统(源码+系统+mysql数据库+Lw文档)
  20. 【Python之禅】你应该了解的PYTHON

热门文章

  1. IDEA_查找接口的实现 的快捷键
  2. 百度地图JavaScript开发入门教程
  3. windows下如何使用cmd启动redis
  4. android中oncreate方法,Android解决在onCreate()中获取View的width、Height为0的方法
  5. TypeScript入门学习之路
  6. win10与win7局域网络共享方法
  7. ipad手写笔推荐品牌,苹果平板平替笔排行
  8. iOS开发——定位 获取经纬度
  9. python爬虫爬取音频文件
  10. 科学家发现「幼儿」星球 可能是颗小型棕矮星