(1)Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words.
[A]True
[B]False

答案:B
解析:注意和one-hot的区别。

(2)What is t-SNE?
[A]A linear transformation that allows us to solve analogies on word vectors.
[B]A non-linear dimensionality reduction technique.
[C]A supervised learning algorithm for learning word embeddings.
[D]An open-source sequence modeling library.

答案:B
解析:t-SNE是一种非线性的降维算法。

(3)Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

x(input text) y(happy?)
I’m feeling wonderful today! 1
I’m bummed my cat is ill 0
Really enjoying this! 1

Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y=1.
[A]True
[B]False

答案:A
解析:正向积极的词会有相似的特征向量。

(4)Which of these equations do you think should hold for a good word embedding?(Check all that apply)
[A]eboy−egirl≈ebrother−esistere_{boy}-e_{girl} \approx e_{brother}-e_{sister}eboyegirlebrotheresister
[B]eboy−egirl≈esister−ebrothere_{boy}-e_{girl} \approx e_{sister}-e_{brother}eboyegirlesisterebrother
[C]eboy−ebrother≈egirl−esistere_{boy}-e_{brother} \approx e_{girl}-e_{sister}eboyebrotheregirlesister
[D]eboy−ebrother≈esister−egirle_{boy}-e_{brother} \approx e_{sister}-e_{girl}eboyebrotheresisteregirl

答案:A,C

(5)Let EEE be an embedding matrix, and let o1234o_{1234}o1234 be a one-hot vector, corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call ET∗o1234E^T*o_{1234}ETo1234 in Python?
[A]it is computationally wasteful.
[B]The correct formula is ET∗e1234E^T*e_{1234}ETe1234
[C]This doesn’t handle unknown words (<UNK>)
[D]None of the above: Calling the Python snippet as described above is fine.

答案:A
解析:one-hot向量维度高,并且大多数为0,所以EEEo1234o_{1234}o1234 进行相乘效率很低。

(6)When learning word embeddings, we create an artificial task of estimating P(target∣context)P(target|context)P(targetcontext). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.
[A]True
[B]False

答案:B
解析:错在artificial人工。

(7)In the word2vec algorithm, you estimate P(t∣c)P(t|c)P(tc), where ttt is the target word and ccc is a context word, How are ttt and ccc chosen from the training set? Pick the best answer.
[A]ccc is the one word that comes immediately before ttt.
[B]ccc is the sequence of all the words in the sentence before ttt
[C]ccc is a sequence of several words immediately before ttt.
[D]ccc and ttt are chosen to be nearby words.

答案:D

(8)Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec mode: uses the following softmax function:
P(t∣c)=eθtTec∑t′=110000eθt′TecP\left( t|c \right) =\frac{e^{\theta _t^Te_c}}{\sum_{t'=1}^{10000}{e^{\theta _{t'}^{T}e_c}}} P(tc)=t=110000eθtTeceθtTec
Which of these statements are correct? Check all that apply.
[A]θt\theta_tθt and ece_cec are both 500 dimensional vectors.
[B]θt\theta_tθt and ece_cec are both 10000 dimensional vectors.
[C]θt\theta_tθt and ece_cec are both trained with an optimization algorithm such as Adam or gradient descent.
[D]After training, we should expect θt\theta_tθt to be very close to ece_cec when ttt and ccc are the same word.

答案:A,C
解析:由题意embedding的大小为500维度,所以θt\theta_tθtece_cec的维度都为500。
D选项有点争议,具体见
Why does word2vec use 2 representations for each word?
Word2Vec哪个矩阵是词向量?
word2Vec的CBOW,SKIP-gram为什么有2组词向量?
本人认为θ\thetaθ向量和eee向量均可作为词向量,只是表达的方式和所表达的特征有所不同,所以数值上也会不同。
表达方式不同可以理解为半径为1的圆和面积为π\piπ的圆,他们表达方式不同但都表示同一个圆。也可以理解为处于不同基底的向量空间。
表达的特征不同可以理解为对于同一个词不同向量提取到的特征不同。就比如“juice”这个词,一个提取到的特征这是一种液体,另一个提取到的特征这是由水果制成的。
如有错误,请大佬指出。

(9)Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The GloVe model minimizes this objective:
min⁡∑i=110000∑j=110000f(Xij)(θiTej+bi+bj′−log⁡Xij)2\min \sum_{i=1}^{10000}{\sum_{j=1}^{10000}{f\left( X_{ij} \right) \left( \theta _i^Te_j+b_i+b_j'-\log X_{ij} \right) ^2}} mini=110000j=110000f(Xij)(θiTej+bi+bjlogXij)2
Which of these statements are correct? Check all that apply.
[A]θi\theta_iθi and eje_jej should be initialized to 0 at the beginning of training.
[B]θi\theta_iθi and eje_jej should be initialized randomly at the beginning of training.
[C]XijX_{ij}Xij is the number of times word i appears in the context of word j.
[D]The weighting function f(.)f(.)f(.) must satisfy f(0)=0f(0)=0f(0)=0

答案:B,C,D

(10)You have trained word embeddings using a text dataset of m1 words. You are considering using these word embeddings for a language task, for which you have separate labeled dataset of m2 words. keeping in mind that using word embeddings of a form of transfer learning, under which of these circumstance would you expect the word embeddings to be helpful?
[A]m1>>m2m1>>m2m1>>m2
[B]m1<<m2m1<<m2m1<<m2

答案:A

【吴恩达深度学习】05_week2_quiz Natural Language Processing Word Embeddings相关推荐

  1. 深度学习入门首推资料--吴恩达深度学习全程笔记分享

    本文首发于微信公众号"StrongerTang",可打开微信搜一搜,或扫描文末二维码,关注查看更多文章. 原文链接:(https://mp.weixin.qq.com/s?__bi ...

  2. 360题带你走进深度学习!吴恩达深度学习课程测试题中英对照版发布

    吴恩达的深度学习课程(deepLearning.ai)是公认的入门深度学习的宝典,本站将课程的课后测试题进行了翻译,建议初学者学习.所有题目都翻译完毕,适合英文不好的同学学习. 主要翻译者:黄海广 内 ...

  3. 花书+吴恩达深度学习(十五)序列模型之循环神经网络 RNN

    目录 0. 前言 1. RNN 计算图 2. RNN 前向传播 3. RNN 反向传播 4. 导师驱动过程(teacher forcing) 5. 不同序列长度的 RNN 如果这篇文章对你有一点小小的 ...

  4. 吴恩达深度学习教程——中文笔记网上资料整理

    吴恩达深度学习笔记整理 内容为网上博主博文整理,如有侵权,请私信联系. 课程内容: Coursera:官方课程安排(英文字幕).付费用户在课程作业中可以获得作业评分,每门课程修完可获得结课证书:不付费 ...

  5. 吴恩达 深度学习1 2022, 浙大AI第一课

    强推![浙大公开课]2022B站最好最全的机器学习课程,从入门到实战!人工智能/AI/机器学习/数学基础_哔哩哔哩_bilibili 我们规定了行为和收益函数后,就不管了,构造一个算法,让计算机自己去 ...

  6. 【吴恩达深度学习】自然语言处理---个人总结(持续更新)

    这门大课主要知识点有: 吴恩达深度学习专业-自然语言处理--个人总结 一.GRU--Gated Recurrent Unit 二.LSTM-- Long short-term memory 三.Wor ...

  7. [转载]《吴恩达深度学习核心笔记》发布,黄海广博士整理!

    红色石头 深度学习专栏 深度学习入门首推课程就是吴恩达的深度学习专项课程系列的 5 门课.该专项课程最大的特色就是内容全面.通俗易懂并配备了丰富的实战项目.今天,给大家推荐一份关于该专项课程的核心笔记 ...

  8. 737 页《吴恩达深度学习核心笔记》发布,黄海广博士整理!

    点击上方"AI有道",选择"置顶"公众号 重磅干货,第一时间送达 深度学习入门首推课程就是吴恩达的深度学习专项课程系列的 5 门课.该专项课程最大的特色就是内容 ...

  9. 吴恩达深度学习笔记1-Course1-Week1【深度学习概论】

    2018.5.7 吴恩达深度学习视频教程网址 网易云课堂:https://mooc.study.163.com/smartSpec/detail/1001319001.htm Coursera:htt ...

最新文章

  1. 修改文件vim 插件:perl-support的修改和使用
  2. 盛骁杰:对于充满好奇心的技术人来说多媒体技术是个宝藏
  3. 【渝粤教育】电大中专测量学作业 题库
  4. 【转】ABP源码分析三十九:ABP.Hangfire
  5. Swift封装 滑出式导航栏
  6. java构造函数_Java开发人员也会犯的十大错误
  7. SPSS简介【SPSS 004期】
  8. Git 笔记:廖雪峰Git 教程总结
  9. Java学习笔记2——java的安装和配置
  10. 色彩配色基础泛谈《第一天》到底说了麻?
  11. html5 retina 1像素,7种方法解决移动端Retina屏幕1px边框问题
  12. 什么是顶级域名、地理域名和个性域名
  13. 多帧图像增强 matlab,MATLAB中图像增强技术的实现
  14. 《惢客创业日记》2021.07.15-17(周四)房东和租客,谁更弱势?
  15. three实战:月球围绕地球
  16. JavaScript中实现键值对的方法
  17. 我们会有我们的天长地久
  18. Java大数据学习路线图
  19. Linux驱动_设备树下LED驱动
  20. 中国银行的海外IT建设之路

热门文章

  1. 国外问卷调查一个月能有多大的收益呢?20w+吗?
  2. c#代码转python代码工具_Python至C#代码转换
  3. 文件处理命令(二)目录处理命令
  4. 论文阅读:Pointwise Convolutional Neural Networks
  5. Go+ 发布 weekly release: v0.7.3
  6. 3dmax制作玻璃杯液体材质
  7. 批量合并excel工作表
  8. 王子与公主的爱情故事新结局(转)
  9. 初学Web前端会用到开发工具【零基础web前端入门视频教程】
  10. JAVA实验3:Java-MySQL实现银行转账系统