(1)Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words.


(2)What is t-SNE?
[A]A linear transformation that allows us to solve analogies on word vectors.
[B]A non-linear dimensionality reduction technique.
[C]A supervised learning algorithm for learning word embeddings.
[D]An open-source sequence modeling library.


(3)Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

x(input text) y(happy?)
I’m feeling wonderful today! 1
I’m bummed my cat is ill 0
Really enjoying this! 1

Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y=1.


(4)Which of these equations do you think should hold for a good word embedding?(Check all that apply)
[A]eboy−egirl≈ebrother−esistere_{boy}-e_{girl} \approx e_{brother}-e_{sister}eboyegirlebrotheresister
[B]eboy−egirl≈esister−ebrothere_{boy}-e_{girl} \approx e_{sister}-e_{brother}eboyegirlesisterebrother
[C]eboy−ebrother≈egirl−esistere_{boy}-e_{brother} \approx e_{girl}-e_{sister}eboyebrotheregirlesister
[D]eboy−ebrother≈esister−egirle_{boy}-e_{brother} \approx e_{sister}-e_{girl}eboyebrotheresisteregirl


(5)Let EEE be an embedding matrix, and let o1234o_{1234}o1234 be a one-hot vector, corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call ET∗o1234E^T*o_{1234}ETo1234 in Python?
[A]it is computationally wasteful.
[B]The correct formula is ET∗e1234E^T*e_{1234}ETe1234
[C]This doesn’t handle unknown words (<UNK>)
[D]None of the above: Calling the Python snippet as described above is fine.

解析:one-hot向量维度高,并且大多数为0,所以EEEo1234o_{1234}o1234 进行相乘效率很低。

(6)When learning word embeddings, we create an artificial task of estimating P(target∣context)P(target|context)P(targetcontext). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.


(7)In the word2vec algorithm, you estimate P(t∣c)P(t|c)P(tc), where ttt is the target word and ccc is a context word, How are ttt and ccc chosen from the training set? Pick the best answer.
[A]ccc is the one word that comes immediately before ttt.
[B]ccc is the sequence of all the words in the sentence before ttt
[C]ccc is a sequence of several words immediately before ttt.
[D]ccc and ttt are chosen to be nearby words.


(8)Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec mode: uses the following softmax function:
P(t∣c)=eθtTec∑t′=110000eθt′TecP\left( t|c \right) =\frac{e^{\theta _t^Te_c}}{\sum_{t'=1}^{10000}{e^{\theta _{t'}^{T}e_c}}} P(tc)=t=110000eθtTeceθtTec
Which of these statements are correct? Check all that apply.
[A]θt\theta_tθt and ece_cec are both 500 dimensional vectors.
[B]θt\theta_tθt and ece_cec are both 10000 dimensional vectors.
[C]θt\theta_tθt and ece_cec are both trained with an optimization algorithm such as Adam or gradient descent.
[D]After training, we should expect θt\theta_tθt to be very close to ece_cec when ttt and ccc are the same word.

Why does word2vec use 2 representations for each word?

(9)Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The GloVe model minimizes this objective:
min⁡∑i=110000∑j=110000f(Xij)(θiTej+bi+bj′−log⁡Xij)2\min \sum_{i=1}^{10000}{\sum_{j=1}^{10000}{f\left( X_{ij} \right) \left( \theta _i^Te_j+b_i+b_j'-\log X_{ij} \right) ^2}} mini=110000j=110000f(Xij)(θiTej+bi+bjlogXij)2
Which of these statements are correct? Check all that apply.
[A]θi\theta_iθi and eje_jej should be initialized to 0 at the beginning of training.
[B]θi\theta_iθi and eje_jej should be initialized randomly at the beginning of training.
[C]XijX_{ij}Xij is the number of times word i appears in the context of word j.
[D]The weighting function f(.)f(.)f(.) must satisfy f(0)=0f(0)=0f(0)=0


(10)You have trained word embeddings using a text dataset of m1 words. You are considering using these word embeddings for a language task, for which you have separate labeled dataset of m2 words. keeping in mind that using word embeddings of a form of transfer learning, under which of these circumstance would you expect the word embeddings to be helpful?


【吴恩达深度学习】05_week2_quiz Natural Language Processing Word Embeddings相关推荐

  1. 深度学习入门首推资料--吴恩达深度学习全程笔记分享

    本文首发于微信公众号"StrongerTang",可打开微信搜一搜,或扫描文末二维码,关注查看更多文章. 原文链接:(https://mp.weixin.qq.com/s?__bi ...

  2. 360题带你走进深度学习!吴恩达深度学习课程测试题中英对照版发布

    吴恩达的深度学习课程(deepLearning.ai)是公认的入门深度学习的宝典,本站将课程的课后测试题进行了翻译,建议初学者学习.所有题目都翻译完毕,适合英文不好的同学学习. 主要翻译者:黄海广 内 ...

  3. 花书+吴恩达深度学习(十五)序列模型之循环神经网络 RNN

    目录 0. 前言 1. RNN 计算图 2. RNN 前向传播 3. RNN 反向传播 4. 导师驱动过程(teacher forcing) 5. 不同序列长度的 RNN 如果这篇文章对你有一点小小的 ...

  4. 吴恩达深度学习教程——中文笔记网上资料整理

    吴恩达深度学习笔记整理 内容为网上博主博文整理,如有侵权,请私信联系. 课程内容: Coursera:官方课程安排(英文字幕).付费用户在课程作业中可以获得作业评分,每门课程修完可获得结课证书:不付费 ...

  5. 吴恩达 深度学习1 2022, 浙大AI第一课

    强推![浙大公开课]2022B站最好最全的机器学习课程,从入门到实战!人工智能/AI/机器学习/数学基础_哔哩哔哩_bilibili 我们规定了行为和收益函数后,就不管了,构造一个算法,让计算机自己去 ...

  6. 【吴恩达深度学习】自然语言处理---个人总结(持续更新)

    这门大课主要知识点有: 吴恩达深度学习专业-自然语言处理--个人总结 一.GRU--Gated Recurrent Unit 二.LSTM-- Long short-term memory 三.Wor ...

  7. [转载]《吴恩达深度学习核心笔记》发布,黄海广博士整理!

    红色石头 深度学习专栏 深度学习入门首推课程就是吴恩达的深度学习专项课程系列的 5 门课.该专项课程最大的特色就是内容全面.通俗易懂并配备了丰富的实战项目.今天,给大家推荐一份关于该专项课程的核心笔记 ...

  8. 737 页《吴恩达深度学习核心笔记》发布,黄海广博士整理!

    点击上方"AI有道",选择"置顶"公众号 重磅干货,第一时间送达 深度学习入门首推课程就是吴恩达的深度学习专项课程系列的 5 门课.该专项课程最大的特色就是内容 ...

  9. 吴恩达深度学习笔记1-Course1-Week1【深度学习概论】

    2018.5.7 吴恩达深度学习视频教程网址 网易云课堂:https://mooc.study.163.com/smartSpec/detail/1001319001.htm Coursera:htt ...


  1. 修改文件vim 插件:perl-support的修改和使用
  2. 盛骁杰:对于充满好奇心的技术人来说多媒体技术是个宝藏
  3. 【渝粤教育】电大中专测量学作业 题库
  4. 【转】ABP源码分析三十九:ABP.Hangfire
  5. Swift封装 滑出式导航栏
  6. java构造函数_Java开发人员也会犯的十大错误
  7. SPSS简介【SPSS 004期】
  8. Git 笔记:廖雪峰Git 教程总结
  9. Java学习笔记2——java的安装和配置
  10. 色彩配色基础泛谈《第一天》到底说了麻?
  11. html5 retina 1像素,7种方法解决移动端Retina屏幕1px边框问题
  12. 什么是顶级域名、地理域名和个性域名
  13. 多帧图像增强 matlab,MATLAB中图像增强技术的实现
  14. 《惢客创业日记》2021.07.15-17(周四)房东和租客,谁更弱势?
  15. three实战:月球围绕地球
  16. JavaScript中实现键值对的方法
  17. 我们会有我们的天长地久
  18. Java大数据学习路线图
  19. Linux驱动_设备树下LED驱动
  20. 中国银行的海外IT建设之路


  1. 国外问卷调查一个月能有多大的收益呢?20w+吗?
  2. c#代码转python代码工具_Python至C#代码转换
  3. 文件处理命令(二)目录处理命令
  4. 论文阅读:Pointwise Convolutional Neural Networks
  5. Go+ 发布 weekly release: v0.7.3
  6. 3dmax制作玻璃杯液体材质
  7. 批量合并excel工作表
  8. 王子与公主的爱情故事新结局(转)
  9. 初学Web前端会用到开发工具【零基础web前端入门视频教程】
  10. JAVA实验3:Java-MySQL实现银行转账系统