学习RNN-part2

RNN学习-词向量

词嵌入层难学：实战我们会load预训练模型。本次代码你会学到3项：

加载词嵌入层，并用余弦公式表达词相似度
使用词嵌入层可解决词类analogy问题，例如会使模型基于man2woman，学习到king2?
有些词嵌入层需要修改，避免政治正确

实战代码：

# 1 导入
import numpy as np
from w2v_utils import *
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')# 2 已知嵌入矩阵，计算词之间的相似度
def cosine_similarity(u, v):dot = np.dot(u, v)norm_u = np.linalg.norm(u)norm_v = np.linalg.norm(v)cosine_similarity = dot / norm_u / norm_vreturn cosine_similarity
# 2.1 找两个词的相似度
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))  # 0.89
# 2.2 已知合适词组，找一个词最般配的另一个词
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]words = word_to_vec_map.keys()max_cosine_sim = -100              best_word = None                   for w in words:        if w in [word_a, word_b, word_c] :continuecosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c))if cosine_sim > max_cosine_sim:max_cosine_sim = cosine_simbest_word = wreturn best_word
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))
'''italy -> italian :: spain -> spanish
india -> delhi :: japan -> tokyo
man -> woman :: boy -> girl
small -> smaller :: large -> larger'''

RNN学习-情感分析

通过嵌入层，将输入的文字附加上emoji表情+原文字输出

实战代码1-用basicRNN构建表情颜文字：

# 1 导入
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt
X_train, Y_train = read_csv('data/train_emoji.csv') # m=127
X_test, Y_test = read_csv('data/tesss.csv') # m=56
maxLen = len(max(X_train, key=len).split())
# 1.1 预览一下
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))
"""I am proud of your achievements ?"""
# 1.2 预处理：Y变成(m,5)独热码
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
# 1.3 预览数据
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt') # 400,001words
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])
"""the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos"""# 2 实现模型
# 2.1 处理输入词向量
def sentence_to_avg(sentence, word_to_vec_map):"""提取句子中每个词的GloVe representation然后累加/句子长度作为句子的特征向量"""words = sentence.lower().split()avg = np.zeros(50,)for w in words:avg += word_to_vec_map[w]avg = avg / len(words)return avg
# 2.2 构建basicRNN模型
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):"""Arguments:X -- shape (m, 1)Y -- shape (m, 1)Returns:pred -- vector of predictions, numpy-array of shape (m, 1)W -- weight matrix of the softmax layer, of shape (n_y, n_h)b -- bias of the softmax layer, of shape (n_y,)"""np.random.seed(1)m = Y.shape[0]                          # number of training examplesn_y = 5                                 # number of classesn_h = 50                                # dimensions of the GloVe vectorsW = np.random.randn(n_y, n_h) / np.sqrt(n_h)b = np.zeros((n_y,))Y_oh = convert_to_one_hot(Y, C = n_y) # Optimization loopfor t in range(num_iterations):                       for i in range(m):                                avg = sentence_to_avg(X[i],word_to_vec_map)z = np.dot(W,avg) + ba = softmax(z)cost = -1 * np.multiply(Y[i],np.log(a))dz = a - Y_oh[i]dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))db = dzW = W - learning_rate * dWb = b - learning_rate * db       if t % 100 == 0:print("Epoch: " + str(t) + " --- cost = " + str(cost))pred = predict(X, Y, W, b, word_to_vec_map)return pred, W, b
# 2.3 开始训练
pred, W, b = model(X_train, Y_train, word_to_vec_map)
'''Epoch: 0 --- cost = [ 2.82117539  2.22537435  3.90409976  3.65077617  4.17192113]
Accuracy: 0.348484848485
Epoch: 100 --- cost = [  7.39085514   6.39666398   0.15943637   9.61056197  11.77782592]
Accuracy: 0.931818181818
Epoch: 200 --- cost = [  7.86956435   7.883712     0.08912738  11.25652113  13.75952996]
Accuracy: 0.954545454545
Epoch: 300 --- cost = [  8.06494045   8.67838712   0.06864535  12.0741376   14.92485916]
Accuracy: 0.969696969697'''
# 2.4 检验模型成果
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)
'''Training set:
Accuracy: 0.977272727273
Test set:
Accuracy: 0.857142857143'''
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])
pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)
'''
Accuracy: 0.833333333333i adore you ❤️
i love you ❤️
funny lol ?
lets play with a ball ⚾
food is ready ?
not feeling happy ?'''

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too.

What you should remember from this part:

Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.
Emojify-V1 will perform poorly on sentences such as “This movie is not good and not enjoyable” because it doesn’t understand combinations of words–it just averages all the words’ embedding vectors together, without paying attention to the ordering of words. You will build a better algorithm in the next part.

实战代码2：用LSTM构建表情颜文字

# 1 导入
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)# 2 输入预处理:填充
ef sentences_to_indices(X, word_to_index, max_len):"""将X这些句子转化成特征向量并填充Arguments:X -- array of sentences (strings), of shape (m, 1)word_to_index -- a dictionary containing the each word mapped to its indexmax_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. Returns:X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)"""    m = X.shape[0]                                   X_indices = np.zeros((m,max_len))for i in range(m):                               sentence_words = X[i].lower().split()j = 0for w in sentence_words:X_indices[i, j] = word_to_index[w]j = j+1return X_indices# 3 词嵌入层
def pretrained_embedding_layer(word_to_vec_map, word_to_index):"""Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.Arguments:word_to_vec_map -- dictionary mapping words to their GloVe vector representation.word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)Returns:embedding_layer -- pretrained layer Keras instance"""vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)emb_matrix = np.zeros((vocab_len,emb_dim))# Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabularyfor word, index in word_to_index.items():emb_matrix[index, :] = word_to_vec_map(word_to_index(index)) embedding_layer = Embedding(input_dim = vocab_len,output_dim = emb_dim,trainable=False)# Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".embedding_layer.build((None,))# Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.embedding_layer.set_weights([emb_matrix])    return embedding_layer# 4 构建模型
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):"""Function creating the Emojify-v2 model's graph.Arguments:input_shape -- shape of the input, usually (max_len,)word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationword_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)Returns:model -- a model instance in Keras"""# Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).sentence_indices = Input(input_shape, dtype = 'int32')# Create the embedding layer pretrained with GloVe Vectors (≈1 line)embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)# Propagate sentence_indices through your embedding layer, you get back the embeddingsembeddings = embedding_layer(sentence_indices)# Propagate the embeddings through an LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a batch of sequences.X = LSTM(128, return_sequences = True)(embeddings)# Add dropout with a probability of 0.5X = Dropout(0.5)(X)# Propagate X trough another LSTM layer with 128-dimensional hidden state# Be careful, the returned output should be a single hidden state, not a batch of sequences.X = LSTM(128, return_sequences = False)(embeddings)# Add dropout with a probability of 0.5X = Dropout(0.5)(X)# Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.X = Dense(5)(X)# Add a softmax activationX = Activation("softmax")(X)# Create Model instance which converts sentence_indices into X.model = Model(inputs = sentence_indices, outputs=X)return model
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()
"""_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         (None, 10)                0
_________________________________________________________________
embedding_3 (Embedding)      (None, 10, 50)            20000050
_________________________________________________________________
lstm_3 (LSTM)                (None, 10, 128)           91648
_________________________________________________________________
dropout_3 (Dropout)          (None, 10, 128)           0
_________________________________________________________________
lstm_4 (LSTM)                (None, 128)               131584
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 645
_________________________________________________________________
activation_2 (Activation)    (None, 5)                 0
=================================================================
Total params: 20,223,927
Trainable params: 223,877
Non-trainable params: 20,000,050
_________________________________________________________________"""# 5 训练
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
"""Epoch 50/50
132/132 [==============================] - 0s - loss: 0.0797 - acc: 0.9848     - ETA: 0s - loss: 0.0812 - acc: 0.984"""# 6 评估
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print()
print("Test accuracy = ", acc)
"""Test accuracy =  0.925000008515"""
# This code allows you to see the mislabelled examples
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):x = X_test_indicesnum = np.argmax(pred[i])if(num != Y_test[i]):print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())
"""
Expected emoji:❤️ prediction: I love taking breaks  ?
Expected emoji:? prediction: she is a bully ?
Expected emoji:? prediction: she said yes   ?
Expected emoji:❤️ prediction: I love you to the stars and back  ?
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.
x_test = np.array(['not feeling happy'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))
"""not feeling happy ?"""

我们可以学到

用keras框架输入每一个mini-batch必须保证X的长度一致才可向量化，但句子的长度往往不一致。因此我们：padding
学会如何创建embedding keras层。keras.layers.Embedding(vocab_len, sequence_length)
- step1:将整个X根据mini-batch切成列表indices
- step2:填充到max length
- step3:喂给embedding层即可 E维度为(400001,max_length)
- step4:生成对应的矩阵
If you have an NLP task where the training set is small, using word embeddings can help your algorithm significantly. Word embeddings allow your model to work on words in the test set that may not even have appeared in your training set.
Training sequence models in Keras (and in most other deep learning frameworks) requires a few important details:
- To use mini-batches, the sequences need to be padded so that all the examples in a mini-batch have the same length.
- An Embedding() layer can be initialized with pretrained values. These values can be either fixed or trained further on your dataset. If however your labeled dataset is small, it’s usually not worth trying to train a large pre-trained set of embeddings.
- LSTM() has a flag called return_sequences to decide if you would like to return every hidden states or only the last one.
- You can use Dropout() right after LSTM() to regularize your network.

学习RNN-part2相关推荐

一直以来伴随我的一些学习习惯(part2)
一直以来伴随我的一些学习习惯(part2) By 刘未鹏(pongba) C++的罗浮宫(http://blog.csdn.net/pongba) 接着上次的写. 1. 学习和思考的过程中常问自己的几 ...
精讲深度学习RNN三大核心点，三分钟掌握循环神经网络
本文将剖析循环神经网络(RNN)的工作原理,精讲循环神经网络的特点和实现方式.野蛮智能,小白也能看懂的人工智能. 循环神经网络从何而来? 我在我的这篇文章介绍了卷积神经网络(CNN)卷积神经网络(CN ...
深度学习RNN实现股票预测实战（附数据、代码）
背景知识最近再看一些量化交易相关的材料,偶然在网上看到了一个关于用RNN实现股票预测的文章,出于好奇心把文章中介绍的代码在本地跑了一遍,发现可以work.于是就花了两个晚上的时间学习了下代码,顺便把 ...
模型学习 - RNN及一系列发展
本学弱喜欢在本子上记笔记,但字迹又丑. 望看不懂我的字的大佬不要喷我,看得懂的大佬批评指正. 本篇博客主要介绍了RNN(模型及学习算法BPTT).SRNN.BRNN.DBRNN.ESN.LSTM(及一 ...
深度学习——RNN原理与TensorFlow2下的IMDB简单实践
在深度学习中,RNN是处理序列数据的有效方法之一,也是深度的一种很好的体现,本文将简单介绍RNN的工作方式,以及针对IMDB数据集的简单实践 RNN简介 RNN(Recurrent Neural Ne ...
【深度学习】你有哪些深度学习(RNN、CNN)调参的经验？
No.1 总结一下在旷视实习两年来的炼丹经验,我主要做了一些 RL,图像质量,图像分类,GAN 相关的任务,日常大概占用 5 - 10 张卡. 可复现性和一致性有的同学在打比赛的时候,从头到尾只维护 ...
rnn神经网络层次_精讲深度学习RNN三大核心点，三分钟掌握循环神经网络
每天给小编五分钟,小编用自己的代码,让你轻松学习人工智能.本文将剖析循环神经网络(RNN)的工作原理,精讲循环神经网络的特点和实现方式.野蛮智能,小白也能看懂的人工智能. 循环神经网络从何而来? 我在 ...
序列学习——RNN网络之 LSTM 原理
文章目录引入发展史 LSTMs 是什么短期记忆长期记忆 RNN 的局限 LSTM 网络详解符号约定 LSTM 网络分解其他 LSTM 网络整体模型引入所谓序列学习,就是输出不单单取决 ...
数据挖掘学习日志(part2)--主成分法确定权重与R实现
学习笔记,仅供参考,有错必纠参考文献:基于主成分分析的指标权重确定方法–韩小孩; 主成分法确定权重原理构造样本阵其中, x i j x_{ij} x
《统计学：从数据到结论》学习笔记(part2)--总体是人们所关心的所有个体的集合
学习笔记学习书籍:<统计学:从数据到结论>-吴喜之: 参考书目:<统计学>-贾俊平总体如果我们想抽样调查马鞍山市民对于建地铁的观点,那么此时,单个马鞍山市民被称为调查的对 ...

学习RNN-part2

RNN学习-词向量

RNN学习-情感分析

学习RNN-part2相关推荐

最新文章

热门文章