在CS230的lecture 8中主要吴恩达老师如何阅读文献，以及一些职业建议，都是一些比较实用的建议和指导。

关于阅读文献，吴恩达老师提倡先列出一个这个领域的文献列表，可能只包含几篇文章，然后精读其中可能某一篇开创性的文章，由这篇再得到另外相关的文章并添加到这个列表里来继续阅读，读了新的文章之后反过来可能对最开始的其他文章会有新的理解，然后如此反复咀嚼。一般说来阅读了5-20篇文章后可能会对这个领域有一个基本的了解，可以做一些应用，读了50-100篇文章后可能对这个领域就有了最够深的了解，可以做一些研究了。
如何读文章——多次阅读（multiple passes）
1、阅读文章的标题、摘要、图表
2、阅读简介+结论+图表+略读剩余部分（可以忽略related work）
3、跳过数学推导部分读文章
4、通读全文，跳过那些不重要的部分
在很快的读完一篇文章后，需要知道：
1、作者试图做什么事？
2、这个方法关键的要素是什么？
3、对你来说有什么用？
4、哪些参考文献你想继续了解？
如果想要深入理解算法的数学原理，可以：通读全文在理解的基础上忘掉结论，从头开始推导。这样联系了之后会对未来自己研究的算法推导更有感觉。
如果想要深入理解代码，可以：
1、下载开源代码并运行
2、自己重新实现一遍算法
关于机器学习的职业，吴恩达老师给出建议是尽量使自己成为T字型的人才，即对人工智能的多个领域要有了解，并在某一领域进行深入，参与一些拿得出手的大项目等等。

coursera上面的课程主要是针对词嵌入（Word Embedding）来进行讲解的，原本这周还要完成C5M3的，但是考虑到下周没有编程作业，所以把C5M3的两个编程作业移到了下周完成。

回顾知识点

词嵌入（word embedding）就是一种把词语向量化的一种方式，但是这种向量化不像独热编码那样，而是用特征化的表示来表示，最终把单个词语转换成一定维数的向量（这个维数是可以设定的，不像独热编码那样根据语料库大小来变化）。

词嵌入的维度代表这个词语的特性，即用多少维的空间来描述这个词语。例如后面的编程作业中使用的是300维，所以就是把一万多个词汇给嵌入到这个300维的空间中（因为原本用独热向量表示的话就在一个特别特别高维的空间中，现在嵌入到一个相对低维的空间），让每个词语都有自己独特的一个表示方式，即他们的嵌入向量，多个这样的向量就组成了一个嵌入矩阵（embedding matrix）。

相似性度量

两个词语之间一定存在某种关系，例如常见的反义词、近义词等等，这种关系就体现在他们的嵌入向量上。在类比推理中，一般给出了“A对于B正如C对于D”如此形式，ABCD都代表词语，但是其中D是未知的，需要根据A和B的关系来推测，所以如何衡量这种向量之前的关系，可以使用余弦相似性（cosine similarity），即： $v)=u⋅v∥u∥2∥v∥2=cos⁡(θ)\text{CosineSimilarity(u, v)} = \frac {u\cdot v} {\|u\|_2 \|v\|_2} = \cos(\theta)$ 其中 $θ\theta$ 就表示向量 $u, v$ 之间的夹角。类比推理认为A和B之间的关系和C对于D是类似的，所以他们的嵌入向量A-B与嵌入向量C-D之间应该是平行的，即夹角接近为0度，所以余弦相似性接近于1。所以如果要在所有词汇中寻找最适合的类比推理出来就应该比较所有这样的相似性中最接近1的那个词。

嵌入矩阵的学习

假设嵌入矩阵的维数为 $300×10000300\times10000$ ，类似参数学习的方法，这3000000个数字都将作为参数在训练中习得。即，在初始时先初始化嵌入矩阵，然后这些嵌入向量会随着模型前向传播并计算当前的损失，利用梯度下降法来反向传播求得每个参数的梯度，并在每次迭代中更新参数，最后才能学到一个比较好的嵌入矩阵。

但是一般在深度学习框架中将这一学习过程变成了一个层的学习。例如在Keras中的Embedding层，就是将嵌入矩阵当做参数进行学习，只需传入输入的维数和嵌入矩阵的维数，就可以高效地进行训练了。

这样看来其实原理和方法都很简单，但实际操作中，由于最后需要经过softmax得到输出层结果，分母上需要把所有语料库词语的嵌入向量全都求和，这个复杂度是语料库大小的，在大的词汇表上这个成本就很高了。所以有了在论文《Efficient Estimation of Word Representations in Vector Space》中提到的两种Word2Vec模型来学习嵌入矩阵。

Skip-Gram模型构造了一个监督学习问题，它给定上下文词汇 $c$ ，要求预测在这个词正负一定词距内随机选择的某个目标词汇 $t$ ，最后计算其条件概率的交叉熵损失，由此来优化学到一个好的嵌入矩阵。整个模型相当于是给定词汇来推测语句。
CBOW模型，即连续词袋模型（Continuous Bag-Of-Words Model）。与Skip-Gram模型相反，CBOW是从语句来推测词汇出发，获得中间词两边的的上下文，然后用周围的词去预测中间的词。

对于算法的效率方面，可以使用分级（hierarchical）的softmax分类器，利用决策树的方式将算法的复杂度降至对数级别；或者使用负采样（Negative Sampling），类似boosting中把一个softmax转换成多个逻辑回归来降低训练成本。

GloVe算法（Global Vectors for Word Representation）只需优化目标函数 $∑i∑jf(Xij)(θiTej+bi+bj′−log⁡Xij)2\sum_i\sum_j f(X_{ij})(\theta_i^Te_j+b_i+b_j'-\log X_{ij})^2$ 由此学习到嵌入向量，就能对两个单词 $i, j$ 同时出现的频率有很好的预测。

词嵌入除偏

确保嵌入向量不会因为一些例如性别歧视、种族歧视等非预期形式的偏见影响，所以需要对向量做一些操作。一般说来，分为三个步骤：

确定偏见向量。例如将有关性别的男、女或者男孩、女孩这种一对对的词语进行平均，得到一个关于性别的偏见向量，其余剩下的维度可以看做是无偏的。
中和。减去嵌入向量在第一步中得到的偏见向量上的投影，得到一个在偏见向量上无偏的结果。例如减去性别向量方向的投影，以此消除性别偏见。
均衡。对于应该只有在偏见向量上有差别的词汇，例如女演员和男演员这一组词，希望他们在除了偏见向量之外的方向上都保持一致，只有在性别向量这里有差别，即确保一对特定的单词与无偏见方向上距离相等。

文本情感分析

简单来说，可以将一句话的所有词嵌入向量进行求和平均然后传入softmax来得到情感分类的类别概率，但是这样做没有考虑到上下文关系也就是词序，所以这种情况下使用序列模型例如RNN、LSTM等会比较有优势。

这种情况下可以使用之前讲到LSTM的多对一的框架，只需在Keras中控制LSTM的return_sequences来获得所有隐层状态或者最后一个隐层状态即可。

作业代码

1. Operations on Word Vectors - Debiasing

import numpy as np
from w2v_utils import *# 加载词向量，50维的GloVe向量，一共400000个词汇
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')
# word_to_vec_map['arom']
# len(word_to_vec_map['arom'])# 余弦相似度
def cosine_similarity(u, v):"""Cosine similarity reflects the degree of similariy between u and vArguments:u -- a word vector of shape (n,)          v -- a word vector of shape (n,)Returns:cosine_similarity -- the cosine similarity between u and v defined by the formula above."""cosine_similarity = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))return cosine_similarity# 测试
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
ball = word_to_vec_map["ball"]
crocodile = word_to_vec_map["crocodile"]
france = word_to_vec_map["france"]
italy = word_to_vec_map["italy"]
paris = word_to_vec_map["paris"]
rome = word_to_vec_map["rome"]print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))
print("cosine_similarity(ball, crocodile) = ",cosine_similarity(ball, crocodile))
print("cosine_similarity(france - paris, rome - italy) = ",cosine_similarity(france - paris, rome - italy))# 词的类比
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):"""Performs the word analogy task as explained above: a is to b as c is to ____. Arguments:word_a -- a word, stringword_b -- a word, stringword_c -- a word, stringword_to_vec_map -- dictionary that maps words to their corresponding vectors. Returns:best_word --  the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity"""word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()v_a, v_b, v_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]words = word_to_vec_map.keys()best_word = Nonebest_similarity = -100for w in words:if w not in [word_a, word_b, word_c]:similarity = cosine_similarity(v_b - v_a, word_to_vec_map[w] - v_c)if similarity > best_similarity:best_word = wbest_similarity = similarityreturn best_word# 测试
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:print ('{} -> {} :: {} -> {}'.format(*triad, complete_analogy(*triad, word_to_vec_map)))# 去除词向量中的偏见
g = word_to_vec_map['woman'] - word_to_vec_map['man']
print(g)# 女生姓名的相似度为正，男生为负
print ('List of names and their similarities with constructed vector:')
# 男生女生的姓名
name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin']
for w in name_list:print (w, cosine_similarity(word_to_vec_map[w], g))# 一些带有偏见的相似度
print('Other words and their similarities:')
word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', 'technology',  'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']
for w in word_list:print (w, cosine_similarity(word_to_vec_map[w], g))# 减去性别向量方向的投影，以此消除性别偏见
def neutralize(word, g, word_to_vec_map):"""Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. This function ensures that gender neutral words are zero in the gender subspace.Arguments:word -- string indicating the word to debiasg -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)word_to_vec_map -- dictionary mapping words to their corresponding vectors.Returns:e_debiased -- neutralized word vector representation of the input "word""""e = word_to_vec_map[word]e_biascomponent = np.dot(e, g) / np.linalg.norm(g)**2 * ge_debiased = e - e_biascomponentreturn e_debiased# 测试
e = "receptionist"
print("cosine similarity between " + e + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map["receptionist"], g))e_debiased = neutralize("receptionist", g, word_to_vec_map)
print("cosine similarity between " + e + " and g, after neutralizing: ", cosine_similarity(e_debiased, g))# 性别特征词的均衡算法
def equalize(pair, bias_axis, word_to_vec_map):"""Debias gender specific words by following the equalize method described in the figure above.Arguments:pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor") bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. genderword_to_vec_map -- dictionary mapping words to their corresponding vectorsReturnse_1 -- word vector corresponding to the first worde_2 -- word vector corresponding to the second word"""w1, w2 = paire_w1, e_w2 = word_to_vec_map[w1], word_to_vec_map[w2]# 按照公式计算均衡后的词向量mu = (e_w1 + e_w2) / 2mu_B = np.dot(mu, bias_axis) / np.linalg.norm(bias_axis)**2 * bias_axismu_orth = mu - mu_Be_w1B = np.dot(e_w1, bias_axis) / np.linalg.norm(bias_axis)**2 * bias_axise_w2B = np.dot(e_w2, bias_axis) / np.linalg.norm(bias_axis)**2 * bias_axiscorrected_e_w1B = np.sqrt(abs(1 - np.linalg.norm(mu_orth)**2)) * (e_w1B - mu_B) / np.linalg.norm(e_w1 - mu_orth - mu_B)corrected_e_w2B = np.sqrt(abs(1 - np.linalg.norm(mu_orth)**2)) * (e_w2B - mu_B) / np.linalg.norm(e_w2 - mu_orth - mu_B)e1 = corrected_e_w1B + mu_orthe2 = corrected_e_w2B + mu_orthreturn e1, e2# 测试
print("cosine similarities before equalizing:")
print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], g))
print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], g))
print()
e1, e2 = equalize(("man", "woman"), g, word_to_vec_map)
print("cosine similarities after equalizing:")
print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, g))
print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, g))

2. Emojify

import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt# 加载数据集
X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/test.csv')maxLen = len(max(X_train, key=len).split())
# 打印某个训练样本
index = 5
print(X_train[index], label_to_emoji(Y_train[index]))'''====== Emojifier-V1 ======='''
# 将标签转为杜热标签
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
# 查看
index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])# 使用50维的GloVe embeddings把句子转换成词向量
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')
# 查看
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])# 句子转成词向量并取均值
def sentence_to_avg(sentence, word_to_vec_map):"""Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each wordand averages its value into a single vector encoding the meaning of the sentence.Arguments:sentence -- string, one training example from Xword_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationReturns:avg -- average vector encoding information about the sentence, numpy-array of shape (50,)"""words = sentence.lower().split()avg = np.zeros((50,))for w in words:avg += word_to_vec_map[w]avg = avg / len(words)return avg# 测试
avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
print("avg = ", avg)# 建立模型
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):"""Model to train word vector representations in numpy.Arguments:X -- input data, numpy array of sentences as strings, of shape (m, 1)Y -- labels, numpy array of integers between 0 and 7, numpy-array of shape (m, 1)word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationlearning_rate -- learning_rate for the stochastic gradient descent algorithmnum_iterations -- number of iterationsReturns:pred -- vector of predictions, numpy-array of shape (m, 1)W -- weight matrix of the softmax layer, of shape (n_y, n_h)b -- bias of the softmax layer, of shape (n_y,)"""n_y = 5n_h = 50m = X.shape[0]# 初始化参数np.random.seed(1)W = np.random.randn(n_y, n_h) / np.sqrt(n_h)b = np.zeros((n_y,))# 把Y变成独热编码Y_oh = convert_to_one_hot(Y, C=n_y)# 迭代for t in range(num_iterations):for i in range(m):avg_i = sentence_to_avg(X[i], word_to_vec_map)z_i = np.dot(W, avg_i) + ba_i = softmax(z_i)cost_i = - np.sum(Y_oh[i] * np.log(a_i))# 计算梯度dz_i = a_i - Y_oh[i]dW = np.dot(dz_i.reshape(n_y, 1), avg_i.reshape(1, n_h))db = dz_i# 更新梯度W -= learning_rate * dWb -= learning_rate * db# 打印信息if t % 100 == 0:print("Epoch: " + str(t) + " --- cost = " + str(cost_i))pred = predict(X, Y, W, b, word_to_vec_map)return pred, W, b# 在训练集上运行
pred, W, b = model(X_train, Y_train, word_to_vec_map)
# print(pred)# 对比测试集上的效果
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)# 查看未出现在训练集中的句子的效果
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "you are not happy"])
Y_my_labels = np.array([[0], [0], [2], [1], [4], [3]])pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)# 打印测试集的混淆矩阵来查看模型预测错误的情况
print(Y_test.shape)
print('           '+ label_to_emoji(0)+ '    ' + label_to_emoji(1) + '    ' +  label_to_emoji(2)+ '    ' + label_to_emoji(3)+'   ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)'''====== Emojifier-V2 ======='''
# 使用LSTM
import numpy as np
np.random.seed(0)
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input, Dropout, LSTM, Activation
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
np.random.seed(1)
from tensorflow.keras.initializers import glorot_uniform# 把句子填充变成索引列表，以便后面传入embedding层
# 注意需要padding，以便后面使用小批量时长度是一样的
def sentences_to_indices(X, word_to_index, max_len):"""Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.The output shape should be such that it can be given to `Embedding()` (described in Figure 4). Arguments:X -- array of sentences (strings), of shape (m, 1)word_to_index -- a dictionary containing the each word mapped to its indexmax_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. Returns:X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)"""m = X.shape[0]# 初始化X_indicesX_indices = np.zeros((m, max_len))for i in range(m):words = X[i].lower().split()for j in range(min(len(words), max_len)):X_indices[i, j] = word_to_index[words[j]]return X_indices# 测试
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1, word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =", X1_indices)# 建立预训练的embedding层
def pretrained_embedding_layer(word_to_vec_map, word_to_index):"""Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.Arguments:word_to_vec_map -- dictionary mapping words to their GloVe vector representation.word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)Returns:embedding_layer -- pretrained layer Keras instance"""vocab_len = len(word_to_index) + 1emb_dim = 50# 初始化embedding矩阵emb_matrix = np.zeros((vocab_len, emb_dim))# 将word_to_vec_map的字典型向量传入emb_matrixfor word, index in word_to_index.items():emb_matrix[index, :] = word_to_vec_map[word]# 创建embedding层，并设为不可训练embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)embedding_layer.build((None,))# 将权重设置为emb_matrixembedding_layer.set_weights([emb_matrix])return embedding_layer# 测试
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])# 创建模型
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):"""Function creating the Emojify-v2 model's graph.Arguments:input_shape -- shape of the input, usually (max_len,)word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representationword_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)Returns:model -- a model instance in Keras"""# 输入层sentence_indices = Input(input_shape, dtype='int32')# 得到embedding层embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)# 传入embedding层embeddings = embedding_layer(sentence_indices)# 128维隐藏状态的LSTM，返回序列，return_sequences=True则返回所有隐层状态，False则只返回最后一个隐层状态X = LSTM(128, return_sequences=True)(embeddings)# dropoutX = Dropout(0.5)(X)# 128维隐藏状态的LSTM，返回单个隐层状态X = LSTM(128, return_sequences=False)(X)# dropoutX = Dropout(0.5)(X)# 全连接层，并传入激活X = Dense(5)(X)Outputs = Activation('softmax')(X)# 创建模型model = Model(inputs=sentence_indices, outputs=Outputs)return model# 创建模型
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()# 编译
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])# 训练
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
model.fit(X_train_indices, Y_oh_train, epochs=50, batch_size=32, shuffle=True)# 评估测试集上的表现
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
loss, acc = model.evaluate(X_test_indices, Y_oh_test)
print("Test accuracy = ", acc)
# 0.8214286# 查看训练集上分类错误的样本
C = 5
pred = model.predict(X_test_indices)
for i in range(len(X_test)):num = np.argmax(pred[i])if(num != Y_test[i]):print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())# 尝试自己的话预测出来的表情
x_test = np.array(["I like delicious food"])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))

Stanford CS230深度学习（八）词嵌入与文本情感分析相关推荐

深度学习实战-词嵌入计算文本相似性
使用词嵌入计算文本相似性文章目录使用词嵌入计算文本相似性简介词嵌入预训练词嵌入查看文本相似性 Word2vec的数学特性可视化词嵌入词嵌入中发现实体类类内部语义距离可视化国家数据补 ...
Stanford CS230深度学习（二）手动搭建DNN
这篇博客主要是对第二次课进行学习总结,并完成lecture2中要求的C1M3以及C1M4的编程作业.课程资源详见Stanford CS230深度学习(一) 由于这个课的编程作业基本上是一边给任务一边给 ...
深度学习：词嵌入之word2vec
http://blog.csdn.net/pipisorry/article/details/76147604 word2vec简介深度学习在自然语言处理中第一个应用:训练词嵌入.Google 的 ...
深入理解深度学习——语境词嵌入（Contextual Word Embedding）
分类目录:<深入理解深度学习>总目录前文介绍了因word2vec而流行的Word Embedding,这种表示方法比离散的独热编码要好很多,因为它不仅降低了维度,还可以反映出语义空间中的 ...
Stanford CS230深度学习（六）目标检测、人脸识别和神经风格迁移
在CS230的lecture 6中主要吴恩达老师讲述了一些关于机器学习和深度学习的tips,用一个触发词台灯的例子教我们如何快速的解决实际中遇到的问题,这节课主要是偏思维上的了解,还是要实际问题实际分 ...
Stanford CS230深度学习（一）
斯坦福CS230可以作为深度学习的入门课,最近我也在跟着看视频.完成编程作业.首先列出使用的资源链接,然后给出第一课的理解和编程作业的代码. 所有资料如下: 一.课程连接: b站课堂讲授版:Stanf ...
动手学深度学习之词嵌入基础及进阶
参考伯禹学习平台<动手学深度学习>课程内容内容撰写的学习笔记原文链接:https://www.boyuai.com/elites/course/cZu18YmweLv10OeV/less ...
Stanford CS230深度学习（七）RNN和LSTM
在CS230的lecture 7中主要讲了神经网络的解释性,包括: 显著性图saliency maps(计算pre-softmax分数关于输入层的梯度并可视化) 遮挡敏感性occlusion sens ...
Stanford CS230深度学习（五）CNN和ResNet
本周CS230在lecture5中主要讲了一下深度学习在医疗诊断方面的应用,感觉挺有意义的,个人认为属于激励我们去学好基础知识然后进行应用的一堂课吧.coursera上主要是讲了卷积神经网络.残差网络 ...

Stanford CS230深度学习（八）词嵌入与文本情感分析

目录