卷积神经网络(TextCNN)在句子分类上的实现

本篇博客记录的是论文Convolutional Neural Networks for Sentence Classification中的实验实现过程，一篇介绍使用CNN对句子进行分类的论文。尽管网上有些代码已经实现了使用CNN进行句子分类(TextCNN),但是是基于Theano来实现的，本文将介绍使用TensorFlow来实现整个论文的实验过程，一方面熟悉使用TensorFlow API,另一方面加深自己对CNN在NLP上的应用的理解。
实例的Github地址

论文实验思路

1. 实验模型图
先上图再解释

图中展示的是TextCNN模型架构，句子中每个word使用K维向量来表示，于是句子可表示为一个N*K的矩阵，作为CNN的输入。

2. 实验前存在的疑问

2.1 Word Embedding，采用什么方式进行Embedding(one-hot or word2vec or glove)效果较好。
2.2 CNN的输入N*K 中的N怎么定义，即输入的句子的序列的长度怎么定义，因为对于不同的句子，包含的词的数量是不一样的。而CNN的输入是需要固定的矩阵N*K。
2.3 对于不在词汇表中的词是怎么Embedding.

3.TextCNN模型说明及实验介绍
3.1 数据集
论文中做的实验使用了多个数据集，而我实验的过程中只使用了MR数据集，验证方式是10 folds的交叉验证方式。

MR: Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews.
Specifically:
rt-polarity.pos contains 5331 positive snippets
rt-polarity.neg contains 5331 negative snippets

3.2 实验中的模型类别
CNN-rand: 句子中的的word vector都是随机初始化的，同时当做CNN训练过程中需要优化的参数；
CNN-static: 句子中的word vector是使用word2vec预先对Google News dataset (about 100 billion words)进行训练好的词向量表中的词向量。且在CNN训练过程中作为固定的输入，不作为优化的参数;
CNN-non-static: 句子中的word vector是使用word2vec预先对Google News dataset (about 100 billion words)进行训练好的词向量表中的词向量。在CNN训练过程中作为固定的输入，做为CNN训练过程中需要优化的参数；
说明：

3.2.1 GoogleNews-vectors-negative300.bin.gz词向量表是通过word2vec使用命令预先训练好，花费时间较长。
已经训练好的：GoogleNews-vectors-negative300.bin.gz百度云盘下载地址密码:18yf
3.2.2 word2vec预先训练命令如：./word2vec -train text8(语料) -output vectors.bin(输出词向量表) -cbow(训练使用模型方式) 0 -size 48 -window 5 -negative 0 -hs 1 -sample 1e-4 -threads 20 -binary 1 -iter 100
3.2.3 除了使用word2vec对语料库进行预先训练外，也可以使用glove或FastText进行词向量训练。

3.3. 模型架构介绍
模型参数

rectified linear units线性修正单元

filter Windows的h大小：3,4,5；对应的Feature Map的数量为100

dropout rate (p) 为0.5，l2 constraint (s)为3,

mini-batch size 为50.

梯度下降算法学习率0.05

3.3.1输入层
如上图中所示，对于模型的输入是由每个句子中的词的词向量组成的矩阵作为输入层的输入N*K,其中K为词向量的长度，N为句子的长度。词向量的表示方式有3种，CNN-rand、CNN-static、CNN-non-static。对于没有出现在训练好的词向量表中的词(未登录词)的词向量，论文实验中采取的是使用随机初始化为0或者偏小的正数表示。—疑问(2.3)(可认为采用的是平滑处理方式)
3.3.2卷积层
在输入层的基础上，使用Filter Window进行卷积操作得到Feature Map。实验中使用的3种类型大小的Filter Window,分别是3*K,4*K,5*K，K表示词向量的长度。其中每种类型大小的Filter Window 有100个含有不同值的Filter。每一个Filter能从输入的矩阵中抽取出一个Feature Map特征，在NLP中称为文本特征。
实验中对Feature Map的池化操作方式是Max-over-time Pooling的方式，即将每个Feature Map向量中最大的一个值抽取出来,组成一个一维向量。
3.3.3全连接层
该层的输入为池化操作后形成的一维向量，经过激活函数输出，再加上Dropout层防止过拟合。并在全连接层上添加l2正则化参数。
3.3.4输出层
该层的输入为全连接层的输出，经过SoftMax层作为输出层，进行分类。对于多分类问题可以使用SoftMax层,对于二分类问题可以使用一个含有sigmod激活函数的神经元作为输出层，实验中采用的是SoftMax层。

论文代码详解

先吐槽再总结

代码实现部分必须得吐槽一下，编写代码花了2天，调试bug居然也花了2天，可能还是个TensorFlow新手的原因吧(自我安慰一下)。吐槽的背后还是需要自己深思反省一下的。
1.实现搭建多层神经网络的时候一定得先明确好神经网络的架构，该NN中有哪些层，每一层的输入和输出是什么,其中神经元的激励函数是什么，每一层的参数和偏置项是什么。一定需要先规划好，不然后面调试会很痛苦！！！
2.文本的数据预处理过程中，一定要仔细，各个类型间的转换都得提前思考好，构造训练和测试数据集的时候可以先写好训练数据的Demo.
3. 代码编写过程一定要流程化，首先，然后，最后，不然调试的时候找bug简直想吐血。

Step 1 搭建实验总体流程

text_cnn_main.py
1get paramater—2load data—3create TextCNN model—4start train—5validataion

    # 1 get paramaterparse = argparse.ArgumentParser(description='Paramaters for construct TextCNN Model')# #方式一 type = bool# parse.add_argument('--nonstatic',type=ast.literal_eval,help='use textcnn nonstatic or not',dest='tt')# 方式二 取bool值的方式)添加互斥的参数group_static = parse.add_mutually_exclusive_group(required=True)group_static.add_argument('--static', dest='static_flag', action='store_true', help='use static Text_CNN')group_static.add_argument('--nonstatic', dest='static_flag', action='store_false', help='use nonstatic Text_CNN')group_word_vec = parse.add_mutually_exclusive_group(required=True)group_word_vec.add_argument('--word2vec', dest='wordvec_flag', action='store_true', help='word_vec is word2vec')group_word_vec.add_argument('--rand', dest='wordvec_flag', action='store_false', help='word_vec is rand')group_shuffer_batch = parse.add_mutually_exclusive_group(required=False)group_shuffer_batch.add_argument('--shuffer', dest='shuffer_flag', action='store_true', help='the train do shuffer')group_shuffer_batch.add_argument('--no-shuffer', dest='shuffer_flag', action='store_false',help='the train do not shuffer')parse.add_argument('--learnrate', type=float, dest='learnrate', help='the NN learnRate', default=0.05)parse.add_argument('--epochs', type=int, dest='epochs', help='the model train epochs', default=10)parse.add_argument('--batch_size', type=int, dest='batch_size', help='the train gd batch size.(50-300)', default=50)parse.add_argument('--dropout_pro', type=float, dest='dropout_pro', help='the nn layer dropout_pro', default=0.5)parse.set_defaults(static_flag=True)parse.set_defaults(wordvec_flag=True)parse.set_defaults(shuffer_flag=False)args = parse.parse_args()# 2 load dataprint('load data. . .')X = pickle.load(open('./NLP/result/word_vec.p','rb'))word_vecs_rand, word_vecs, word_cab, sentence_max_len, revs = X[0],X[1],X[2],X[3],X[4]print('load data finish. . .')# configuration tffilter_sizes = [3, 4, 5]filter_numbers = 100embedding_size = 300# use word2vec or notW = word_vecs_randif args.wordvec_flag:W = word_vecspass# pdb.set_trace()word_ids,W_list = process_data.getWordsVect(W)# use static train or notstatic_falg = args.static_flag# use shuffer the data or notshuffer_falg = args.shuffer_flag#交叉验证results = []for index in tqdm(range(10)):#打调试断点# pdb.set_trace()# train_x, train_y, test_x, test_y = process_data.get_train_test_data1(W,revs,index,sentence_max_len,default_values=0.0,vec_size=300)train_x, train_y, test_x, test_y = process_data.get_train_test_data2(word_ids,revs,index,sentence_max_len)# 3 create TextCNN modeltext_cnn = TextCNN(W_list,shuffer_falg,static_falg,filter_numbers,filter_sizes,sentence_max_len,embedding_size,args.learnrate,args.epochs,args.batch_size,args.dropout_pro)# 4 start traintext_cnn.train(train_x,train_y)# 5 validataionaccur,loss = text_cnn.validataion(test_x, test_y)#results.append(accur)print('cv {} accur is :{:.3f} loss is {:.3f}'.format(index+1,accur,loss))text_cnn.close()print('last accuracy is {}'.format(np.mean(results)))

Step 2 参数说明

使用的是argparse解析的终端参数
示例：python ./NLP/Text_CNN/text_cnn_main.py --nonstatic --word2vec

Paramaters for construct TextCNN Model
optional arguments:-h, --help            show this help message and exit--static              use static Text_CNN--nonstatic           use nonstatic Text_CNN--word2vec            word_vec is word2vec--rand                word_vec is rand--shuffer             the train do shuffer--no-shuffer          the train do not shuffer--learnrate LEARNRATEthe NN learnRate--epochs EPOCHS       the model train epochs--batch_size BATCH_SIZEthe train gd batch size.(50-300)--dropout_pro DROPOUT_PROthe nn layer dropout_pro

Step 3 数据处理

process_data.py 此处只不展示具体代码，具体代码查看github地址。
1. 从二进制文件中加载数据集，并设置好每条review对应的label和cv中的类别。

def load_data_k_cv(folder,cv=10,clear_flag=True)
参数说明：
folder:MR 二进制文件的地址
cv:K-fold CV 交叉验证的分属类别
clear_flag：是否替换掉特殊字符
返回值:
word_cab=defaultdict(float),训练集中的词汇表及对应的频率计数。
revs = []，每条review对应的说明。
如revs[0]={"y": 1,"text": 'I like this movie',"num_words": 4,"spilt": np.random.randint(0, cv)}

2.加载Word2Vec预训练好的词向量二进制文件，使用的是Google News的语料库训练的.

# 加载文件过程参考的是word2vec.WordVectors.from_binary(fname, *args, **kwargs)方法
def load_binary_vec(fname, vocab)
参数说明：
fnmae:使用word2vec预先训练好的词向量的文件名
vocab:MR训练集中的词汇表
返回值:
word_vecs = {}，MR训练集中的词在word2vec训练好的词向量表中对应的向量。

3.对于MR训练集中在语料库Google News没有出现的词的处理(未登录词处理)

def add_unexist_word_vec(w2v,vocab)#将词汇表中没有embedding的词初始化():param w2v:经过word2vec训练好的词向量:param vocab:总体要embedding的词汇表

4.构造模型训练的数据集即模型的输入，输出格式。
方式一： 直接输入每个句子中的词对应的词向量组成的矩阵[sentence_length,embedding_size],实验中使用review中最长的词长度作为CNN的固定sentence_length输入，不足的padding 0，—疑问2.2

input shape:[min_batch_size,sentence_length,embedding_size]
output shape:[min_batch_size,label_size]

方式二： 直接输入的是每个句子中的词对应的word2vec词向量表中对应的词id,用于后面的tf.nn.embedding_lookup

input shape:[min_batch_size,sentence_length]
output shape:[min_batch_size,label_size]

两种方式的比较：
方式一，数据集的输入较清晰，明确，作为TensorFlow中placeholder输入。对于CNN-nonstatic和CNN-rand难以调整。对CNN-static非常适用。
方式二，构造数据集困难，但对三种类型的model的代码编写非常方便。

def get_train_test_data1(word_vecs,revs,cv_id=0,sent_length = 56,default_values=0.,vec_size = 300)
def get_train_test_data2(word_ids,revs,cv_id=0,sent_length = 56)

Step 4 CNN-rand/CNN-static/CNN-nonstatic模型搭建

text_cnn_model.py 基于TensorFlow实现的。(对应上述的方式二)
placeholder和Variable，一个是作为模型的样本输入通过feed_dict输入，一个作为模型训练的参数，当tf.Variable(trainable=false)不作为模型训练的参数，为true时作为模型训练的参数。此处便是CNN-static/CNN-nonstatic的设置项。

        # setting graphtf.reset_default_graph()self.train_graph = tf.Graph()with self.train_graph.as_default():# 1 input layerself.input_x = tf.placeholder(dtype=tf.int32,shape=[None,sentence_length],name='input_x')self.input_y = tf.placeholder(dtype=tf.int32, shape=[None, 2], name='input_y')self.dropout_pro = tf.placeholder(dtype=tf.float32, name='dropout_pro')self.learning_rate = tf.placeholder(dtype=tf.float32, name='learning_rate')self.l2_loss = tf.constant(0.0)#方式二embedding_layer作为 输入placeholder# self.embedding_layer = tf.placeholder(dtype=tf.float32, shape=[self.batch_size, sentence_length, embedding_size],#                                       name='embedding_layer')#2 embedding layerwith tf.name_scope('embedding_layer'):train_bool = not self.__static_falg# tf.convert_to_tensor(W_list,dtype=tf.float32)# pdb.set_trace()self.embedding_layer_W = tf.Variable(initial_value=W_list,dtype=tf.float32, trainable=train_bool, name='embedding_layer_W')print("ssssssss")self.embedding_layer_layer = tf.nn.embedding_lookup(self.embedding_layer_W, self.input_x)self.embedding_layer_expand = tf.expand_dims(self.embedding_layer_layer, -1)#3 conv layer + maxpool layer for each filer sizepool_layer_lst = []for filter_size in filter_sizes:max_pool_layer = self.__add_conv_layer(filter_size,filter_numbers)pool_layer_lst.append(max_pool_layer)# 4.full connect droput + softmax + l2# combine all the max pool —— featurewith tf.name_scope('dropout_layer'):# pdb.set_trace()max_num = len(filter_sizes) * self.filter_numbersh_pool = tf.concat(pool_layer_lst,name='last_pool_layer',axis=3)pool_layer_flat = tf.reshape(h_pool,[-1,max_num],name='pool_layer_flat')dropout_pro_layer = tf.nn.dropout(pool_layer_flat,self.dropout_pro,name='dropout')with tf.name_scope('soft_max_layer'):SoftMax_W = tf.Variable(tf.truncated_normal([max_num,2],stddev=0.01),name='softmax_linear_weight')self.__variable_summeries(SoftMax_W)# print('test1------------')SoftMax_b = tf.Variable(tf.constant(0.1,shape=[2]),name='softmax_linear_bias')self.__variable_summeries(SoftMax_b)# print('test2------------')self.l2_loss += tf.nn.l2_loss(SoftMax_W)self.l2_loss += tf.nn.l2_loss(SoftMax_b)# dropout_pro_layer_reshape = tf.reshape(dropout_pro_layer,[batch_size,-1])self.softmax_values = tf.nn.xw_plus_b(dropout_pro_layer,SoftMax_W,SoftMax_b,name='soft_values')# print ('++++++',self.softmax_values.shape)self.predictions = tf.argmax(self.softmax_values,axis=1,name='predictions',output_type=tf.int32)with tf.name_scope('loss'):losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.softmax_values,labels=self.input_y)self.loss = tf.reduce_mean(losses) + 0.001 * self.l2_loss #lambda = 0.001tf.summary.scalar('last_loss',self.loss)with tf.name_scope('accuracy'):correct_acc = tf.equal(self.predictions,tf.argmax(self.input_y,axis=1,output_type=tf.int32))self.accuracy = tf.reduce_mean(tf.cast(correct_acc,'float'),name='accuracy')tf.summary.scalar('accuracy',self.accuracy)with tf.name_scope('train'):optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)# print('test1------------')# pdb打个断点# pdb.set_trace()self.train_op = optimizer.minimize(self.loss)# print('test2------------')# init Variableself.session = tf.InteractiveSession(graph=self.train_graph)self.merged = tf.summary.merge_all()self.train_writer = tf.summary.FileWriter('./NLP/log/text_cnn', graph=self.train_graph)

Step 5 模型训练和预测

主要是分betch给模型feed数据

 def train(self,train_x,train_y):self.session.run(tf.global_variables_initializer())#迭代训练for epoch in range(self.epochs):# pdb.set_trace()train_batch = self.__get_batchs(train_x, train_y, self.batch_size)train_loss, train_acc, count = 0.0, 0.0, 0for batch_i in range(len(train_x)//self.batch_size):x,y = next(train_batch)feed = {self.input_x:x,self.input_y:y,self.dropout_pro:self.dropout_pro_item,self.learning_rate:self.learning_rate_item}_,summarys,loss,accuracy = self.session.run([self.train_op,self.merged,self.loss,self.accuracy],feed_dict=feed)train_loss, train_acc, count = train_loss + loss, train_acc + accuracy, count + 1self.train_writer.add_summary(summarys,epoch)# each 5 batch print logif (batch_i+1) % 15 == 0:print('Epoch {:>3} Batch {:>4}/{} train_loss = {:.3f} accuracy = {:.3f}'.format(epoch,batch_i,(len(train_x)//self.batch_size),train_loss/float(count),train_acc/float(count)))

参考链接

1. Convolutional Neural Networks for Sentence Classification
2. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
3. A Neural Probabilistic Language Model
4. 卷积神经网络(CNN)在句子建模上的应用

原文：https://juejin.im/entry/5aab428bf265da2392361748

卷积神经网络(TextCNN)在句子分类上的实现相关推荐

文本分类(下) | 卷积神经网络(CNN)在文本分类上的应用
正文共3758张图,4张图,预计阅读时间18分钟. 1.简介原先写过两篇文章,分别介绍了传统机器学习方法在文本分类上的应用以及CNN原理,然后本篇文章结合两篇论文展开,主要讲述下CNN在文本分类上的 ...
卷积神经网络在句子分类上的应用[翻译]
最近翻译几篇paper,也算逼着自己多看看文章.对于一些概念的理解不够透彻可能导致翻译不准确,以及某些术语实在也是找不到合适的中文词,可能会有些别扭或索性没翻.大家将就着看.哪位大神看到了,如有不足还 ...
论文笔记（二）：基于卷积神经网络的高分辨率遥感图像上的水体识别技术
基于卷积神经网络的高分辨率遥感图像上的水体识别技术作者:徐文健作者单位:浙江大学日期:2017.01 (硕士学位论文) https://kns.cnki.net/KCMS/detail/deta ...
基于卷积神经网络CNN的水果分类预测，卷积神经网络水果等级识别
目录背影卷积神经网络CNN的原理卷积神经网络CNN的定义卷积神经网络CNN的神经元卷积神经网络CNN的激活函数卷积神经网络CNN的传递函数卷积神经网络CNN水果分类预测基本结构主要参 ...
使用卷积神经网络实现猫狗分类任务
一.数据集下载链接二.基础环境配置三.训练及测试过程使用卷积神经网络在猫狗分类数据集上实现分类任务. 一.数据集下载链接猫狗分类数据集链接 → 提取码:1uwy. 二.基础环境配置 W ...
基于卷积神经网络VGG实现水果分类识别
基于卷积神经网络VGG实现水果分类识别一. 前言二. 模型介绍三. 数据处理四. 模型搭建 4.1 定义卷积池化网络 4.2 搭建VGG网络 4.3 参数配置 4.4 模型训练 4.5 绘制l ...
【深度学习】卷积神经网络实现图像多分类的探索
[深度学习]卷积神经网络实现图像多分类的探索文章目录 1 数字图像解释 2 cifar10数据集踩坑 3 Keras代码实现流程 3.1 导入数据 3.2 浅层CNN 3.3 深层CNN 3.4 进 ...
Python图像识别实战（四）：搭建卷积神经网络进行图像二分类（附源码和实现效果）
前面我介绍了可视化的一些方法以及机器学习在预测方面的应用,分为分类问题(预测值是离散型)和回归问题(预测值是连续型)(具体见之前的文章). 从本期开始,我将做一个关于图像识别的系列文章,让读者慢慢理解 ...
MXNet中使用卷积神经网络textCNN对文本进行情感分类
在图像识别领域,卷积神经网络是非常常见和有用的,我们试图将它应用到文本的情感分类上,如何处理呢?其实思路也是一样的,图片是二维的,文本是一维的,同样的,我们使用一维的卷积核去处理一维的文本(当作一维的 ...
[深度学习-实战篇]情感分析之卷积神经网络-TextCNN,包含代码
0. 前言在"卷积神经网络"中我们探究了如何使用二维卷积神经网络来处理二维图像数据.在之前的语言模型和文本分类任务中,我们将文本数据看作是只有一个维度的时间序列,并很自然地使用循 ...

卷积神经网络(TextCNN)在句子分类上的实现

卷积神经网络(TextCNN)在句子分类上的实现相关推荐

最新文章

热门文章