

Chinese Natural Language Understanding, NLU, in Dialogue Systems 1&2 Task Introduction&Chinese vs English in NLP


When I first read about natural language understanding, I was very confused. What is the difference between natural language processing and understanding? The reason for my confusion is that the purpose of natural language processing is to understand human language, discussed from this perspective, all the natural language processing tasks should be within the scope of natural language understanding. To some extent, the concept of "natural language processing" is equivalent to "natural language understanding".


When the further investigation was carried out, it became clear that people were generally referring to specific tasks as natural language understanding. In this series, the natural language understanding includes the tasks for dialogue systems.


  1. 自然语言理解任务介绍(←这一篇)

  2. 自然语言处理中英文的区别(←这一篇)

  3. 学术界中的方法

  4. 工业界中的方法

  5. 中文对话系统相关的挑战赛

  6. 相关的有用的资源、资料

This series of articles will cover:

  1. Task Introduction (← this one)

  2. Chinese vs English in NLP (← this one)

  3. Academic Methods

  4. Industry Methods

  5. Chinese Dialogue System Challenge Track

  6. Resources


Who is suitable to read this article? Whether you are a beginner in the natural language processing field, you are an expert who has been deeply involved in the field for many years, or you are a teacher who needs ideas to explain these tasks to others, I hope you will get useful information in this series of articles. This series can be used for a quick overview of the general ideas of different approaches and previous studies.

1. 任务介绍(Task Introduction)


We start with a brief description of what task we are talking about. The exact tasks will be described in more detail in the following articles. In the simplest terms, we hope AI models are able to understand what we say and react accordingly.

在下面这个例子中,我们对智能音箱发出了2条指令:播放今天的新闻 以及 提高播放的音量。

In the example below, we said to the smart speaker: play today's news and raise the volume.

这个智能音箱很好的完成了这两个任务:它播放了新闻 并且 提高了音量。

The smart speaker does both tasks well: it plays the news and turns up the volume.


  • 我们的意图是播放新闻

    • 北京(哪里的新闻)

    • 当天(什么时候的新闻)

  • 我们的意图是调整音量:增大

From these two examples above, the task of natural language understanding is briefly explained. We are expecting the model to understand the following key pieces of information.

  • Our intention is to play the news

    • Beijing (where news happened)

    • The day (when)

  • Our intention is to adjust the volume: increase


  • 语音助手Siri, 科塔娜

  • 自助下单、订票、预约

  • 智能家居(灯光、电视、窗帘等语音控制)

In reality, such examples are gradually integrated into our lives:

  • the voice assistant, Siri and Cortana

  • Self-service ordering, booking and reservation

  • Smart homes (voice control of lights, TV, curtains, etc.)

2.自然语言处理中英文的区别(Chinese vs English in NLP)

字符和分词 (Characters and Words)


The concept of character is existing in both English and Chinese. However, there are some differences. In both English and Chinese, different characters make up different words. However, in English, there are clear boundaries between words. In Chinese, there is no such boundary.


Therefore, in some scenarios and methods, you may find that Chinese word segmentation was one step of the pre-processing procedure. In English, this step is not required. (Note: the diagram on the left does not mean that English also requires word segmentation. It is just a friendly and easy-understand example to show the non-Chinese speaking readers the core idea of word segmentation.)


However, Chinese word segmentation is not as easy as one might think. The reason is that different segmentation results can have very different meanings for the whole sentence. AI model may completely misunderstand the sentence.


For example:


1)看我 头像 牛不 ?这句话题的意思是想问问大家自己的头像图片是不是很酷。

2)看我头 像 牛 不?而这句话的意思是说想问问大家自己的头是不是和牛很像。

Here shows two possible word segmentation results of the same sentence:

1)看我 头像 牛不 ?Look at my avatar, is it awesome?

2)看我头 像 牛 不?Look at my head, does it look like a cow?

中文自然语言处理的一般步骤 (General Steps for Chinese Natural Language Processing)


Methods for Chinese NLP can generally be divided into the following three categories:

  • 中文文本→预处理步骤(比如中文分词)→词向量→输入到模型中

  • Chinese text → pre-processing step (e.g. Chinese word segmentation) → word vector/embedding → input to the model

  • 中文文本→字符向量→输入到模型中

  • Chinese text → character vector/embedding → input to the model

  • 中文文本→预处理步骤(比如中文分词)→预处理结果(比如词向量)+字符向量结合→输入到模型中

  • Chinese text → pre-processing step (e.g. Chinese word segmentation) → pre-processing results (e.g., word embedding) + character embedding  → input to the model


It is important to note that there are no good or bad methods of doing Chinese natural language processing, only suitable or unsuitable. It is not the case that combining more and more features will necessarily make an AI model more and more suitable for your needs. The same method may have different performances in different situations.


What other possible features could be used to improve model performance?


As we all know, Chinese characters are very culturally rich and there are many more features that can be utilized. We aim to discuss the features that are commonly used. In this section, we try not to go into too much detail about the culture of Chinese characters. If you are not very interested in this section, feel free to skip it. Before skipping, you could keep in mind that there are many features of Chinese characters that can be used to improve an AI model to understand the meaning of the whole sentence better. These features are Chinese radicals, pinyin (romanisation of Chinese), pronunciation, glyphs and so on.

部首偏旁 (Chinese Radical)


In order to take into account the readers who do not know much about Chinese, here is a brief explanation of radicals. A Chinese character can be made up of different parts (you can also think of each character as a miniature painting, which is made up of different parts). A radical is a part of this character. Generally speaking, once we find the radicals of a character, we can probably guess what the meaning of the character is related to. This is why some studies have shown that the incorporation of radical features improves the ability of models to understand the Chinese natural language.


  • 在左图中,处在中间的是一个部首偏旁,它和眼睛相关。围绕着它的是一圈汉字,这些汉字都含有这个部首偏旁,并且他们的含义和眼睛都有着紧密的联系。

  • 在右图中,“木”字的含义和树有关系。跟在“木”后面的“林”、“森”,可以看的出来,“木”越来越多,于是便有了树丛、树林的含义。注:配图来源于网络

Take the following two figures as examples.

  • In the picture on the left, there is a radical in the middle. This radical is associated with eyes. Surrounding it is a circle of characters which all contain this radical and whose meanings are closely related to eyes.

  • In the picture on the right, the character "木" has a meaning related to a piece of wood. The character "木" is followed by the two characters "林" (woods) and "森" (forest). It can be seen that there are more and more "木"s in "林" and "森", thus that the two characters have the meanings of a forest.

汉字的拼音(Pinyin, Romanisation of Chinese)


As you know, in English, there are phonetic symbols to indicate how a word is pronounced. In Chinese, Pinyin, the romanisation of Chinese, is very similar to phonetic symbols. One character's pinyin is made up of two parts: the letters and the tone (i.e., flat, rising, falling-rising, falling or neutral tone). This article will not explain much about tones. It is sufficient to understand the rest of this section if you keep this in mind: the same letter can be given various tones in different situations. That indicates very different meanings.


We just discussed that a letter can have different tones for the characters in various words. What drives Chinese learners crazy is that one character's tone and pronunciation are also not fixed. How to pronounce a character also depends on the context in order to use the same character to express different meanings.

汉字的形状 (Glyph)


  • 最左边:3种实际存在的事物(太阳、山、大象)

  • 中间:文字演变的过程

  • 最后:今天正在使用的汉字

The meaning of some Chinese characters can also be guessed from their glyph. Below we present three examples:

  • Leftmost: Sun, Mountain, Elephant

  • Middle: Character Evolution

  • Rightmost: the Chinese characters used today

小结 (Summary)


As we can see from the above, the underlying meaning behind a Chinese character is not as simple as just a Chinese character looks. If these implied features can be unearthed or detected by an AI model, then it may improve the model's ability to understand Chinese. In fact, in English, similar approaches have been adopted (e.g. by extracting word roots and affixes).


The table below briefly summarises what features were used for different Chinese natural language processing tasks. Please note that this is not an exhaustive list. However, this table can generally provide information--what Chinese-specific features can be used in which tasks. We hope it can bring you some inspiration for your current work.

下一篇 (Next)


In the following articles, we will start to look at several previous academic methods for Chinese natural language understanding!


Please be free to use the material or any parts of the material for non-business purposes. Try to cite this article if you can. Please contact me via the public WeChat account or other ways before you use any pictures or texts (except the original content taken from publications or other authors' articles) in any forms (including but not limited to translating them or making screenshots) for any business/commercial purpose (including but not limited to using the pictures or my texts in your slides for any online or offline courses held by industries).

对话系统中的中文自然语言理解 (NLU) 任务介绍相关推荐

  1. 象形文字--中文自然语言理解的突破

    中文自然语言理解一直是自然语言理解领域的难点和有意思的课题.之所以难,很大原因是因为中文由象形文字演化而来.但是,目前的中文NLP理论中,似乎不多见关于如何利用象形这一重要元素的. 我(个人)相信,这 ...

  2. 机器学习不会解决自然语言理解(NLU)问题

    作为唯一由人类自身创造的符号,自然语言处理一直是机器学习界不断研究的方向. 自然语言处理技术主要是让机器理解人类的语言的一门领域.在自然语言处理技术中,大量使用了编译原理相关的技术,例如词法分析,语法 ...

  3. 实在智能参与中文自然语言理解评价标准体系(CLUE)阶段性进展回顾

    「实在智能」简介 「实在智能」(杭州实在智能科技有限公司)是一家人工智能科技公司,聚焦大规模复杂问题的智能决策领域,通过AI+RPA技术打造广泛应用于各行业的 智能软件机器人,即"数字员工& ...

  4. 自然语言处理NLP、自然语言理解NLU、自然语言生成NLG、任务家族

    自然语言处理NLP.自然语言理解NLU.自然语言生成NLG.任务家族 自然语言生成(NLG) 看图说话(image caption) 说话生图(text to image) 文本相似性(text si ...

  5. 对话系统中自然语言理解NLU——意图识别与槽位填充

    目录 1. 什么是意图识别和槽位填充 1.1 语义槽的设计 2. 意图识别的方法 2.1 规则模板 2.2 统计机器学习 2.3 深度学习 3. 意图识别的难点 4. 槽位填充的方法 5. 参考 问答 ...

  6. ChineseGLUE(CLUE):针对中文自然语言理解任务的基准平台

    导语 2018 年,来自纽约大学.华盛顿大学.DeepMind 机构的研究者创建了一个多任务自然语言理解基准和分析平台--GLUE(General Language Understanding Eva ...

  7. 自然语言理解(NLU)个人入门笔记记录1

    概念理解: NLP是我们在让机器基于文本数据完成特定任务时使用的思想.方法和技术的总称--其中一部分支持机器理解文本数据的内容,因此统称NLU:一部分支持机器生成人类可以理解的文本数据,因此统称NLG ...

  8. 自然语言一般使用计算机,自然语言理解

    自然语言处理(N LP , Natural Language Processing)是使用自然语言同计算机进行通讯的技术, 因为处理自然语言的关键是要让计算机"理解"自然语言,所以 ...

  9. 【论文分享】EMNLP 2020 自然语言理解

    点击上方,选择星标,每天给你送干货! 来自:复旦DISC 引言 自然语言理解(Natural Language Understanding,NLU)是希望机器像人一样,具备正常人的语言理解能力,是人机 ...


  1. 几张表格怎么联动_猛男必备具皮肤:和平精英火箭少女联动火爆来袭,这摩托皮不香?...
  2. 一个关于组织学员学习技术的笔试题--求讨论
  3. VScode新建自定义模板快捷方式
  4. java list 字段去重_java list 根据对象一个字段去重
  5. 论一只爬虫的自我修养(第二天)
  6. 简直不要太硬了!一文带你彻底理解文件系统 | 原力计划
  7. 电脑排行榜笔记本_热门笔记本电脑排行榜推荐_windows7教程
  8. LINUX下载编译libvpx
  9. Linux:20个linux常用命令
  10. angularjs 获取复选框的值_如何利用Python批量获取天眼查企业信息?
  11. linux 解压 7z 分卷压缩文件,linux分卷压缩与解压缩
  12. c语言 code table,单片机C语言unsigned char code table是什么意思?
  13. linux 修改路由表 永久,CentOS 6.9永久设置静态路由表以及路由表常用设置
  14. DZY Loves Chinese/DZY Loves Chinese II 题解
  15. 农村信用社答题小程序
  16. windows系统在路由器组成的局域网中共享打印机
  17. 怎么看计算机配件型号,如何看硬件参数
  18. Hbase命令行语句
  19. HIHO#1245 : 王胖浩与三角形
  20. 机器学习:SOM聚类的实现


  1. matlab三相电路基波图形,非正弦稳态对称三相电路如图a所示。A相电源电压为,其中基波角频率为ω1=1rad/s。负载参数为R=...
  2. CTA 认证android平台 彩信/ MMS 受控原理
  3. 2014,微信是糖,甜到忧伤
  4. linux内核IDR机制详解(一)
  5. 如何正确的向领导汇报工作?
  6. ssl证书过期怎么解决?
  7. c语言如何反复执行一段程序,C语言中重复执行程序的问题
  8. Android游戏开发教程汇总
  9. redis c++接口
  10. IGRP/EIGRP 内部网管路由选择协议