​​​​​​One-hot representation

  • assigns a unique index to each word → a high-dimensional sparse representation
  • cannot capture the semantic relatedness among words (The difference between cat and dog is as much as the difference between cat and bed in one-hot word representation)
  • inflexible to deal with new words in a real-world scenario

Distributed representation

  • Representation learning aims to learn informative representations of objects from raw data automatically. /// Distributed representation has been proved to be more effificient because it usually has low dimensions that can prevent the sparsity issue.
  • Deep learning is a typical approach for representation learning.

Development of representation learning in NLP
Representation Learning Major characteristics
N-gram Model Predicts the next item in a sequence based on its previous n-1 items ∈probabilistic language model
Bag-of-words disregarding the orders of these words in the document: ①each word that has appeared in the document corresponds to a unique and nonzero dimension. ②a score can be computed for each word (e.g., the numbers of occurrences) to indicate the weights
TF-IDF BoW → Moreover, researchers usually take the importance of different words into consideration, rather than treat all the words equally
Neural Probabilistic Language Model (NPLM) NPLM first assigns a distributed vector for each word, then uses a neural network to predict the next word. 例如,前馈神经网络语言模型、循环神经网络语言模型、长短期记忆的循环神经网络语言模型。

Word embeddings:

Word2Vec, GloVe, fastText

Inspired by NPLM, there came many methods that embed words into distributed representations. ///  Word embeddings in the NLP pipeline map discrete words into informative low-dimensional vectors.

Pre-trained Language Models (PLM):

ELMo, BERT

take complicated context in text into consideration /// calculate dynamic representations for the words based on their context, which is especially useful for the words with multiple meanings /// pretrained fine-tuning pipeline

The Pre-trained language model family

Applications

Neural Relation Extraction

  • Sentence-Level NRE: A basic form of sentence-level NRE consists of three components: (a) an input encoder to give a representation for each input word (Word Embeddings, Position Embeddings, Part-of-speech (POS) Tag Embeddings, WordNet Hypernym Embeddings)(b) a sentence encoder which computes either a single vector or a sequence of vectors to represent the original sentence.(c) a relation classifier which calculates the conditional probability distribution of all relations.

  • Bag-Level NRE: utilizing information from multiple sentences (bag-level) rather than a single sentence (sentence-level) to decide if a relation holds between two entities. A basic form of bag-level NRE consists of four components: (a) an input encoder similar to sentence-level NRE, (b) a sentence encoder similar to sentence-level NRE, (c) a bag encoder which computes a vector representing all related sentences in a bag, and (d) a relation classifier similar to sentence-level NRE which takes bag vectors as input instead of sentence vectors.

Topic Model

  • Topic modeling algorithms do not require any prior annotations or labeling of the documents.
  • 主题模型∈生成模型,一篇文章中每个词都是通过 “以一定概率选择某个主题,并从这个主题中以一定概率选择某个词语” 这样一个过程得到的。
  • LDA即根据给定的一篇文档,反推其主题分布。在LDA中,一篇文档的生成过程如下

for each document in the collection, we generate the words in a two-stage process:

1. Randomly choose a distribution over topics.

2. For each word in the document,

• Randomly choose a topic from the distribution over topics in step #1.

• Randomly choose a word from the corresponding distribution over the vocabulary.

LDA生成过程图

Assumptions of the LDA

  • One assumption that LDA makes is the bag-of-words assumption that the order of the words in the document does not matter.
  • Another assumption is that the order of documents does not matter → → → This assumption may be unrealistic when analyzing long-running collections that span years or centuries. In such collections, we may want to assume that the topics change over time. One approach to this problem is the dynamic topic model, a model that respects the ordering of the documents and gives a more productive posterior topical structure than LDA. 
  • The third assumption about LDA is that the number of topics is assumed known and fifixed.

Other

Knowledge point

  • To build an effective machine learning system, we first transform useful information on raw data into internal representations such as feature vectors.
  • Conventional machine learning systems adopt careful feature engineering as preprocessing to build feature representations from raw data.
  • The distributional hypothesis that linguistic objects with similar distributions have similar meanings is the basis for distributed word representation learning.

Chapter 6: Sememe Knowledge Representation

  • For example, the meaning of man can be considered as the combination of the meanings of human, male and adult (Sememe)
  • WordNet is a large lexical database for the English language. HowNet:Chinese and English

An example of word annotated with sememes in HowNet

【NLP】Representation Learning for Natural Language Processing相关推荐

  1. 自然语言处理NLP 2022年最新综述:An introduction to Deep Learning in Natural Language Processing

    论文题目:An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools ...

  2. Deep Learning in Natural Language Processing中文连载(三)

    第二章 对话语言理解中的深度学习 Gokhan Tur, Asli Celikyilmaz, 何晓东,Dilek Hakkani-Tür 以及邓力 摘要  人工智能的最新进展导致对话助手的可用性增加, ...

  3. Deep Learning in Natural Language Processing中文连载(一)

    前言: 感谢邓力.刘洋博士能够提供给广大NLP从业者.爱好者提供了这本全面.通俗易懂的好书,以及其他专家前辈在具体章节.领域做出的贡献. 本书共338页,涵盖了NLP基本问题的介绍,以及深度学习在对话 ...

  4. 【NLP】Contrastive Learning NLP Papers

    来自 | 知乎 作者 | 光某人 地址 | https://zhuanlan.zhihu.com/p/363900943 编辑 | 机器学习算法与自然语言处理公众号 本文仅作学术分享,若侵权,请联系后 ...

  5. Recent Trends in Deep Learning Based Natural Language Processing(arXiv)笔记

    深度学习方法采用多个处理层来学习数据的层次表示,并在许多领域中产生了最先进的结果.最近,在自然语言处理(NLP)的背景下,各种模型设计和方法蓬勃发展.本文总结了已经用于大量NLP任务的重要深度学习相关 ...

  6. 慢慢读《Deep Learning In Natural Language Processing》(一)

    第一次浪潮:Rationalism "The approaches, based on the belief that knowledge of language in the human ...

  7. NLP指南 Your Guide to Natural Language Processing (NLP)

    原文链接:https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1 适合新手入 ...

  8. 【NLP】FedNLP: 首个联邦学习赋能NLP的开源框架,NLP迈向分布式新时代

    文 | 阿毅 两周前,南加大Yuchen Lin(PhD student @USC and ex-research intern @GoogleAI)所在的团队在Twitter官宣开源首个以研究为导向 ...

  9. 论文阅读笔记(一)【Journal of Machine Learning Research】Natural Language Processing (Almost) from Scratch(未完)

    学习内容 题目: 自然语言从零开始 Natural Language Processing (Almost) from Scratch 2021年7月28日 1-5页 这将是一个长期的过程,因为本文长 ...

  10. 【课程笔记】李弘毅2020 Deep Learning for Human Language Processing

    简要说明 这是我在学习李弘毅老师的2020春季课程[Deep Learning for Human Language Processing]时做的课程笔记.写课程笔记的初衷是为了帮助自己之后快速的回顾 ...

最新文章

  1. Mongodb的索引操作
  2. 迷你HTM在线L编辑器—xhEditor
  3. eclipse关联本地maven仓库和配置
  4. [Leedcode][JAVA][第202题][快乐数]
  5. 计算火车运行时间(pta)
  6. 2016年美国太阳能光伏发电成本持续下降
  7. python100例详解-Python基础之列表常见操作经典实例详解
  8. java 代码混淆 开源_java代码混淆(使用 ProGuard 工具)
  9. 如何查看当前项目jdk版本:
  10. 视频中的I、P、B帧
  11. 【bzoj4011】落忆枫音
  12. Spring Security Oauth2 JWT
  13. js简单的文本编辑器(所见即所得)
  14. 【音视频基础】(三):俗称照片的彩色数字图像一
  15. CEF3如何不加载图片以方便采集信息
  16. Linux 指令备忘录
  17. 几乎全面的食品英文总结 (吃遍英文单词)
  18. 哪里可以下载Samsung Galaxy S2 USB驱动程序?
  19. PR、AE软件使用操作
  20. 3.Java获得内网网段所有可通信的ip地址

热门文章

  1. c++中调用c编写的动态链接库出现undefined reference to `xxx‘的解决方法
  2. 小米、百度、bigo 、滴滴 、快手等iOS 面试后的一次阶段性总结
  3. P2000 拯救世界(生成函数裸题+NTT高精)
  4. 卸载360企业版密码
  5. Android 实现欢迎界面
  6. 给想上MIT的牛学生说几句
  7. golang调用aliyun的语音通话
  8. AMR 文件解析及编解码流程
  9. matlab程序求解工程,面向计算科学与工程的Matlab编程源码
  10. 【帝国CMS】灵动标签调用标题图片没有图片时显示默认图片