上下文无关文法和正则文法

Have you ever noticed that, when you are writing code in a text editor like VS code, it recognizes things like unmatched braces? And it also sometimes warns you, with an irritating red highlight, about the incorrect syntax that you have written?

您是否曾经注意到,当您在像VS代码这样的文本编辑器中编写代码时,它会识别出不匹配的花括号之类的东西吗? 有时它还会以红色高亮警告您有关您编写的语法错误的警告?

If not, then think about it. That is after all a piece of code. How can you write code for such a task? What would be the underlying logic behind it?

如果没有,请考虑一下。 那毕竟是一段代码。 您如何为此类任务编写代码? 其背后的潜在逻辑是什么?

These are the kinds of questions that you will face if you have to write a compiler for a programming language. Writing a compiler is not an easy task. It is bulky job that demands a significant amount of time and effort.

如果您必须为编程语言编写编译器,则将遇到这些问题。 编写编译器并非易事。 这项繁琐的工作需要大量的时间和精力。

In this article, we are not going to talk about how to build compilers. But we will talk about a concept that is a core component of the compiler: Context Free Grammars.

在本文中,我们将不讨论如何构建编译器。 但是,我们将讨论作为编译器核心组件的概念:上下文无关文法。

介绍 (Introduction)

All the questions we asked earlier represent a problem that is significant to compiler design called Syntax Analysis. As the name suggests, the challenge is to analyze the syntax and see if it is correct or not. This is where we use Context Free Grammars. A Context Free Grammar is a set of rules that define a language.

我们之前提出的所有问题都代表着一个对编译器设计非常重要的问题,称为语法分析。 顾名思义,挑战在于分析语法并查看其是否正确。 这是我们使用上下文无关文法的地方。 上下文无关语法是定义语言的一组规则。

Here, I would like to draw a distinction between Context Free Grammars and grammars for natural languages like English.

在这里,我想区分上下文无关文法和自然语言(如英语)的语法之间的区别。

Context Free Grammars or CFGs define a formal language. Formal languages work strictly under the defined rules and their sentences are not influenced by the context. And that's where it gets the name context free.

上下文无关文法或CFG定义了一种正式语言。 形式语言严格按照定义的规则工作,其句子不受上下文的影响。 这就是免费获取名称上下文的地方。

Languages such as English fall under the category of Informal Languages since they are affected by context. They have many other features which a CFG cannot describe.

诸如英语之类的语言属于非正式语言,因为它们受上下文的影响。 它们具有CFG无法描述的许多其他功能。

Even though CFGs cannot describe the context in the natural languages, they can still define the syntax and structure of sentences in these languages. In fact, that is the reason why the CFGs were introduced in the first place.

即使CFG无法用自然语言描述上下文,它们仍然可以使用这些语言定义句子的语法和结构。 实际上,这就是为什么首先引入CFG的原因。

In this article we will attempt to generate English sentences using CFGs. We will learn how to describe the sentence structure and write rules for it. To do this, we will use a JavaScript library called Tracery which will generate sentences on the basis of rules we defined for our grammar.

在本文中,我们将尝试使用CFG生成英语句子。 我们将学习如何描述句子结构并为其编写规则。 为此,我们将使用一个称为TraceryJavaScript库,该库将根据我们为语法定义的规则生成句子。

Before we dive into the code and start writing the rules for the grammar, let's just discuss some basic terms that we will use in our CFG.

在深入研究代码并开始编写语法规则之前,我们仅讨论一些将在CFG中使用的基本术语。

Terminals: These are the characters that make up the actual content of the final sentence. These can include words or letters depending on which of these is used as the basic building block of a sentence.

终端 :这些字符组成了最后一句话的实际内容。 这些可以包括单词或字母,具体取决于将哪个单词或字母用作句子的基本组成部分。

In our case we will use words as the basic building blocks of our sentences. So our terminals will include words such as "to", "from", "the", "car", "spaceship", "kittens" and so on.

在我们的案例中,我们将单词作为句子的基本组成部分。 因此,我们的航站楼将包含诸如“至”,“来自”,“该”,“汽车”,“太空飞船”,“小猫”等词语。

Non Terminals: These are also called variables. These act as a sub language within the language defined by the grammar. Non terminals are placeholders for the terminals. We can use non terminals to generate different patterns of terminal symbols.

非终端 :也称为变量。 它们在语法定义的语言中充当副语言。 非终端是终端的占位符。 我们可以使用非终端来生成终端符号的不同模式。

In our case we will use these Non terminals to generate noun phrases, verb phrases, different nouns, adjectives, verbs and so on.

在本例中,我们将使用这些Non终端生成名词短语,动词短语,不同的名词,形容词,动词等。

Start Symbol: a start symbol is a special non terminal that represents the initial string that will be generated by the grammar.

起始符号 :起始符号是一个特殊的非终结符,表示将由语法生成的初始字符串。

Now that we know the terminology let's start learning about the grammatical rules.

现在我们已经知道了术语,让我们开始学习语法规则。

While writing grammar rules, we will start by defining the set of terminals and a start state. As we learned before, that start symbol is a non-terminal. This means it will belong to the set of non-terminals.

在编写语法规则时,我们将从定义一组终端和一个开始状态开始。 正如我们之前所了解的,该开始符号是一个非终止符。 这意味着它将属于非终端集合。

T: ("Monkey", "banana", "ate", "the")
S: Start state.

And the rules are:

规则是:

S --> nounPhrase verbPhrase
nounPhrase --> adj nounPhrase | adj noun
verbPhrase --> verb nounPhrase
adjective  --> the
noun --> Monkey | banana
verb --> ate

The above grammatical rules may seem somewhat cryptic at first. But if we look carefully, we can see a pattern that is being generated out of these rules.

上面的语法规则乍看起来似乎有些神秘。 但是,如果我们仔细观察,就会发现这些规则正在产生一种模式。

A better way to think about the above rules is to visualise them in the form of a tree structure. In that tree we can put S in the root and nounPhrase and verbPhrase can be added as children of the root. We can proceed in the same way with nounPhrase and verbPhrase too. The tree will have terminals as its leaf nodes because that is where we end these derivations.

考虑上述规则的一种更好的方法是以树结构的形式可视化它们。 在那棵树中,我们可以将S放在根中,然后可以将名词短语动词短语添加为根的子代。 我们也可以对名词短语动词短语进行相同的处理。 该树将以终端作为其叶节点,因为这是我们结束这些派生的地方。

In the above image we can see that S (a nonterminal)  derives two non terminals NP(nounPhrase) and VP(verbPhrase). In the case of NP, it has derived two non terminals, Adj and Noun.

在上图中,我们可以看到S (一个非终结符)派生了两个非终结符NP ( 名词短语 )和VP ( 动词短语 )。 在NP的情况下,它派生了两个非终结符AdjNoun

If you look at the grammar, NP could also have chosen Adj and nounPhrase. While generating text, these choices are made randomly.

如果您查看语法, NP也可能选择了AdjnounPhrase 。 在生成文本时,这些选择是随机进行的。

And finally the leaf nodes have terminals which are written in the bold text. So if you move from left to right, you can see that a sentence is formed.

最后,叶节点具有以粗体文字显示的终端。 因此,如果从左向右移动,则可以看到形成了一个句子。

The term often used for this tree is a Parse Tree. We can create another parse tree for a different sentence generated by this grammar in a similar way.

通常用于此树的术语是解析树。 我们可以用类似的方式为该语法生成​​的不同句子创建另一个分析树。

Now let's proceed further to the code. As I mentioned earlier, we will use a JavaScript library called Tracery for text generation using CFGs. We will also write some code in HTML and CSS for the front-end part.

现在,让我们继续进行代码。 正如我前面提到的,我们将使用一个称为TraceryJavaScript库来使用CFG生成文本。 我们还将在前端部分用HTML和CSS编写一些代码。

代码 (The Code)

Let's start by first getting the tracery library. You can clone the library from GitHub here. I have also left the link to the GitHub repository by galaxykate at the end of the article.

让我们首先获取窗饰库。 您可以在此处从GitHub克隆该库。 我还在文章结尾处留下了galaxykate到GitHub存储库的链接。

Before we use the library we will have to import it. We can do this simply in an HTML file like this.

在使用库之前,我们必须先导入它。 我们可以简单地在这样HTML文件中执行此操作。

<html><head><script src="tracery-master/js/vendor/jquery-1.11.2.min.js"></script><script src="tracery-master/tracery.js"></script><script src="tracery-master/js/grammars.js"></script><script src='app.js'></script></head></html>

I have added the cloned tracery file as a script in my HTML code. We will also have to add JQuery to our code because tracery depends on JQuery. Finally, I have added app.js which is the file where I will add rules for the grammar.

我已将克隆的窗饰文件作为脚本添加到我HTML代码中。 我们还必须将JQuery添加到我们的代码中,因为窗饰取决于JQuery。 最后,我添加了app.js ,这是我将在其中添加语法规则的文件。

Once that is done, create a JavaScript file where we will define our grammar rules.

完成此操作后,创建一个JavaScript文件,在其中定义语法规则。

var rules = {"start": ["#NP# #VP#."],"NP": ["#Det# #N#", "#Det# #N# that #VP#", "#Det# #Adj# #N#"],"VP": ["#Vtrans# #NP#", "#Vintr#"],"Det": ["The", "This", "That"],"N": ["John Keating", "Bob Harris", "Bruce Wayne", "John Constantine", "Tony Stark", "John Wick", "Sherlock Holmes", "King Leonidas"],"Adj": ["cool", "lazy", "amazed", "sweet"],"Vtrans": ["computes", "examines", "helps", "prefers", "sends", "plays with", "messes up with"],"Vintr": ["coughs", "daydreams", "whines", "slobbers", "appears", "disappears", "exists", "cries", "laughs"]}

Here you will notice that the syntax for defining rules is not much different from how we defined our grammar earlier. There are very minor differences such as the way non-terminals are defined between the hash symbols. And also the way in which different derivations are written. Instead of using the "|" symbol for separating them, here we will put all the different derivations as different elements of an array. Other than that, we will use the semicolons instead of arrows to represent the transition.

在这里,您会注意到定义规则的语法与我们之前定义语法的方式没有太大不同。 有一些细微的差异,例如在哈希符号之间定义非终结符的方式。 以及编写不同派生方式的方式。 代替使用“ |” 符号,将它们分开,在这里,我们将所有不同的导数作为数组的不同元素放置。 除此之外,我们将使用分号代替箭头来表示过渡。

This new grammar is a little more complicated than the one we defined earlier. This one includes many other things such as Determiners, Transitive Verbs and Intransitive Verbs. We do this to make the generated text look more natural.

这种新语法比我们之前定义的语法复杂一些。 这包括许多其他事物,例如限定词,及物动词和不及物动词。 我们这样做是为了使生成的文本看起来更自然。

Let's now call the tracery function "createGrammar" to create the grammar we just defined.

现在让我们调用窗饰函数“ createGrammar”来创建我们刚刚定义的语法。

let grammar = tracery.createGrammar(rules);

This function will take the rules object and generate a grammar on the basis of these rules. After creating the grammar, we now want to generate some end result from it. To do that we will use a function called "flatten".

该函数将使用规则对象并在这些规则的基础上生成语法。 创建语法后,我们现在要从中生成一些最终结果。 为此,我们将使用一个名为“ flatten”的函数。

let expansion = grammar.flatten('#start#');

It will generate a random sentence based on the rules that we defined earlier. But let's not stop there. Let's also build a user interface for it. There's not much we will have to do for that part – we just need a button and some basic styles for the interface.

它将根据我们之前定义的规则生成一个随机句子。 但是,我们不要就此止步。 我们还为其构建一个用户界面。 在那部分,我们不需要做太多的事情,我们只需要一个按钮和一些界面的基本样式即可。

In the same HTML file where we added the libraries we will add some elements.

在添加库的同一HTML文件中,我们将添加一些元素。

<html><head><title>Weird Sentences</title><link rel="stylesheet" href="style.css"/><link href="https://fonts.googleapis.com/css?family=UnifrakturMaguntia&display=swap" rel="stylesheet"><link href="https://fonts.googleapis.com/css?family=Harmattan&display=swap" rel="stylesheet"><script src="tracery-master/js/vendor/jquery-1.11.2.min.js"></script><script src="tracery-master/tracery.js"></script><script src="tracery-master/js/grammars.js"></script><script src='app.js'></script></head><body><h1 id="h1">Weird Sentences</h1><button id="generate" onclick="generate()">Give me a Sentence!</button><div id="sentences"></div></body>
</html>

And finally we will add some styles to it.

最后,我们将为其添加一些样式。

body {text-align: center;margin: 0;font-family: 'Harmattan', sans-serif;
}#h1 {font-family: 'UnifrakturMaguntia', cursive;font-size: 4em;background-color: rgb(37, 146, 235);color: white;padding: .5em;box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);
}#generate {font-family: 'Harmattan', sans-serif;font-size: 2em;font-weight: bold;padding: .5em;margin: .5em;box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);background-color: rgb(255, 0, 64);color: white;border: none;border-radius: 2px;outline: none;
}#sentences p {box-shadow: 1px 1px 1px 1px rgb(206, 204, 204);margin: 2em;margin-left: 15em;margin-right: 15em;padding: 2em;border-radius: 2px;font-size: 1.5em;
}

We will also have to add some more JavaScript to manipulate the interface.

我们还必须添加更多JavaScript来操纵接口。

let sentences = []
function generate() {var data = {"start": ["#NP# #VP#."],"NP": ["#Det# #N#", "#Det# #N# that #VP#", "#Det# #Adj# #N#"],"VP": ["#Vtrans# #NP#", "#Vintr#"],"Det": ["The", "This", "That"],"N": ["John Keating", "Bob Harris", "Bruce Wayne", "John Constantine", "Tony Stark", "John Wick", "Sherlock Holmes", "King Leonidas"],"Adj": ["cool", "lazy", "amazed", "sweet"],"Vtrans": ["computes", "examines", "helps", "prefers", "sends", "plays with", "messes up with"],"Vintr": ["coughs", "daydreams", "whines", "slobbers", "appears", "disappears", "exists", "cries", "laughs"]}let grammar = tracery.createGrammar(data);let expansion = grammar.flatten('#start#');sentences.push(expansion);printSentences(sentences);
}function printSentences(sentences) {let textBox = document.getElementById("sentences");textBox.innerHTML = "";for(let i=sentences.length-1; i>=0; i--) {textBox.innerHTML += "<p>"+sentences[i]+"</p>"}
}

Once you have finished writing the code, run your HTML file. It should look something like this.

完成编写代码后,运行HTML文件。 它看起来应该像这样。

Every time you click the red button it will generate a sentence. Some of these sentences might not make any sense. This is because, as I said earlier, CFGs cannot describe the context and some other features that natural languages possess. It is used only to define the syntax and structure of the sentences.

每次单击红色按钮,都会生成一个句子。 其中一些句子可能没有任何意义。 正如我前面所说,这是因为CFG无法描述自然语言所具有的上下文和其他某些功能。 它仅用于定义句子的语法和结构。

You can check out the live version of this here.

您可以在此处查看此版本的实时版本。

结论 (Conclusion)

If you have made it this far, I highly appreciate your resilience. It might be a new concept for some of you, and others might have learnt about it in their college courses. But still, Context Free Grammars have interesting applications that range widely from Computer Science to Linguistics.

如果您能做到这一点,我非常感谢您的应变能力。 对于您中的某些人来说,这可能是一个新概念,而其他人可能已经在大学课程中学到了这一概念。 但是,上下文无关文法仍然具有有趣的应用,范围从计算机科学到语言学。

I have tried my best to present the main ideas of CFGs here, but there is a lot more that you can learn about them. Here I have left links to some great resources:

我已经尽力在这里介绍CFG的主要思想,但是您可以了解到更多的知识。 在这里,我留下了一些重要资源的链接:

  • Context Free Grammars by Daniel Shiffman.

    Daniel Shiffman撰写的上下文无关文法 。

  • Context Free Grammars Examples by Fullstack Academy

    Fullstack Academy提供的上下文无关文法示例

  • Tracery by Galaxykate

    银河的窗饰

翻译自: https://www.freecodecamp.org/news/context-free-grammar/

上下文无关文法和正则文法

上下文无关文法和正则文法_什么是上下文无关文法?相关推荐

  1. 【编译原理】正则文法与正则式的等价性

    正则文法到正规式的转换规则: 文法表达式 正则式 规则1 A->xB B->y A=xy 规则2 A->xA|y A=x*y 规则3 A->x A->y A=x|y 正则 ...

  2. ZUCC_编译语言原理与编译_实验04 语言与文法

    编译语言原理与编译实验报告 课程名称 编程语言原理与编译 实验项目 语言与文法 实验目的 了解文法的历史 理解产生式规则 掌握最左推导,最右推导 掌握文法的二义性 掌握文法的分类与层次 实验内容 一. ...

  3. lr0文法分析表示例_详解自然语言处理(NLP)5大语义分析技术及14类应用(建议收藏)...

    导读:自然语言处理(Natural Language Processing,NLP)技术是与自然语言的计算机处理有关的所有技术的统称,其目的是使计算机能够理解和接受人类用自然语言输入的指令,完成从一种 ...

  4. lr0文法分析表示例_一个简单实例的LR分析过程

    经过前面两篇文章.已经讲清楚了LR语法分析中最重要的分析表的构造过程.先补充一个小问题,就是LR(0)项目的分类 根据圆点所在的位置和圆点后是终结符还是非终结符或为空把项目分为以下几种: 移进项目: ...

  5. lr(0)文法的判断与分析 python_怎么判断一个文法是LR(0)

    展开全部 LR(0)分析就是LR(K)分析当K=0的情况,32313133353236313431303231363533e78988e69d8331333431366431亦即在分析的每一步,只要根 ...

  6. 编译原理陈意云3-20 (a) 证明下面文法 S→AaAb|BbBa A→ε B→ε 是LL(1)文法,但不是SLR(1)文法。

    思路:依次判断是否为LL(1)文法和SLR文法即可 证明: (1)首先该文法无左递归存在,没有公共左因子. 其次:对于S→AaAb|BbBa FIRST(AaAb)={a} FIRST(BbBa)={ ...

  7. 正则邮箱_自己写一个通用的邮箱正则表达式

    今天把正则又复习了一遍,为了加深记忆,自己写一个邮箱的正则表达式 咱们先来看几个合法的邮箱地址 hd33322@nat123.com maksim.kim.82@d-link.ua vova_laza ...

  8. java 正则 关键字_正则表达式关键字

    在表达式中有特殊意义,需要添加 "\" 才能匹配该字符本身的字符汇总 字符 说明 ^ 匹配输入字符串的开始位置.要匹配 "^" 字符本身,请使用 "\ ...

  9. input正则邮箱_常用正则表达式—邮箱(Email)

    常用正则表达式-邮箱(Email) 本文针对有一点正则基础的同学,如果你对正则一无所知,请移步"正则表达式30分钟入门教程"学习. 要验证一个字符串是否为邮箱的话,首先要了解邮箱账 ...

最新文章

  1. windows查看端口占用以及关闭相应的进程
  2. 使用此代码可以解决python包导入路径问题?
  3. 购物场景的对话流程如何实现?
  4. UVA 11732 - strcmp() Anyone?(Trie)
  5. 《从零开始学Swift》学习笔记(Day 53)——do-try-catch错误处理模式
  6. C程序设计(第四版)谭浩强著-学习笔记
  7. 深度Linux Wine+DXVK
  8. anjuta 连接mysql_buntu下的可视化C/C++编译器anjuta配置的方法
  9. Linux无线网卡配置 intel 9462 网卡 速度无法超过54Mbit
  10. 学习方法论与相关建议
  11. 鸟哥的Linux笔记-------磁盘与文件系统
  12. 瑞萨e2studio(5)----使用UART串口烧写程序到瑞萨芯片
  13. X86加装PCIE网卡无法访问ESXi的问题
  14. LeetCode——复数乘法 C++
  15. FFmpeg系列(四)—— mp4音视频流分离
  16. 乐普生物下周三上市:最高募资近10亿港元 年亏将超10亿
  17. 手机计算机文档如何发到手机上,电脑的word文档怎么传到手机上
  18. 不存在从“int” 转换到“ListNode”的适当构造函数 错误解决方法
  19. Excel两列数据去重
  20. matlab实验报告七,matlab实验报告七

热门文章

  1. D. Tokitsukaze, CSL and Stone Game(博弈)
  2. C#+AE缓冲区分析
  3. 编程语言排行榜迎来历史性时刻!C语言和Java均败了!
  4. python读取视频流提取视频帧的两种方法_ffmpeg-python 任意提取视频帧
  5. Windows软件界面字体和图标太小的解决办法
  6. Linux/Mac/Windows - 搭建开发环境的变化记录
  7. 计算机教师资格笔试题,2017年初中信息技术教师资格证面试真题及答案(第四批)...
  8. IAISH 2022.11月赛 乙组
  9. Asp.net基于工作流引擎的系统框架设计开发(源代码+论文)
  10. 项目基础信息概况一览表