前言

来源于设计计算器的小组作业(Github链接:https://github.com/Skywuuuu/LexAndYACC),运用了Yacc和Lex来设计一个简单的计算器。文章内容为英文,但通俗易懂(用翻译软件翻译一下应该没问题,暂时不校对中文版了(偷懒))。感谢小伙伴们的帮助,以下排名不分先后:️️dzj,lxz,lyhan,skywuuu. 文章有点长,Introduction可跳过,Preparation部分讲的是如何在windows和Ubuntu下安装Flex和Bison,之后是展示我们设计的功能(Demonstration fo functions)和代码解释(Code Explanation),遇到的一些困难和我们编译器的不足也可以跳过,最后一个部分是介绍ANTLR的。以下,enjoy:

Introduction

To deepen our understandingofthe working procedures of the compiler and learn how to design it practically,we construct acalculatoras a project.We will discuss how to develop this simple compiler using two basic tools which are Lex (Lexical Analyzer Generator) and Yacc (Yet Another Compiler-Compiler) in the Ubuntu(Different operating systems will support different versions of software). Specifically, the details about the implementation of the calculator and how to use it correctlywill be elaboratedstep by step as a manual.In the preparation stage, userswould knowhow to download and install the compulsory packages inwindowsand Ubuntuandhow to compile the source filesand runthem.Three output formats supported by our calculator andthe basicand advancedoperationswill beintroduced in the demonstration of calculator functions.In the code explanation, detailed information about sixdifferent source files and difficulties we met will be discussed. Then, somelimitationsofour calculator will be mentioned.Lastly, we will introduce a more advanced generator, ANTLR, and discuss why it is better.

Preparation

To construct a calculator, flex and GNU bison are necessary and powerful tools to use.

Download and Installation

Windows

Flex

If the operating system of your computer is Windows, you should download and install flex. The steps are as below:

  1. Clickthis linkand you will see this page.

  1. Click Setup (withthered underline in the picture) and thenit will automatically jump to another page and the download will begin in five seconds. After that, you will get an executable file (file ends with .exe). You can find it under the default download directory (C:\Users\Your_name\Downloads).

  1. Double clicks and the installation begins. You will see the window below.

  1. Press Next button to continue until you see the window below. Just use the default path and click Next. Copy the address path below for later use is better.

  1. Just choose Full installation, which means all of them are ticked. And click next.

  1. Click next until you complete the flex setup and clickFinish to exit the installation.

  1. Finally, you need to set the environmental variable.

    1. Right click this PC and choose the Properties.
    2. Choose Advanced system settings.
    3. Click Environment variables.
    4. Under the System variables, click New…
    5. Give the variable a name and paste the path, for example, my path is D:\GnuWin32\bin. Do remember to include directory bin.
    6. Finally, click OK.
  2. Test whether flex is successfully installed into your computer.
    1. Click windows.
    2. Type cmd, the abbreviation of command,and open the command window.
    3. Type flex -V to check the version of flex. If it shows your version, that means your installation succeeds.

Bison

In this part, we will introduce you the way to download and install bison.

  1. Clickthis linkand you will see this page.

  1. Click Setup (with red underline) and thenit will automatically jump to another page and the download will begin. After that, you will get an executable file (file ends with .exe). You can find it under the directory C:\Users\Your_name\Downloads.

Step 3 to step 8 are similar to flex, so we just skip the introduction.

Ubuntu

Under Ubuntu, the download and installation are quite simple. Type the command below:

If ubuntu asks you to choose yes or no, just type Y and enter to continue.

How to compile all of the files

  • Compile lex file

    • flex LA.l
  • Compile yacc file
    • yacc -dtv SA.y
  • Compile C file by g++ to get the object file. (The reason for using g++ is that the source files apply some c++ characteristics.)
    • g++ -c c_file.c
    • g++ -c lex.yy.c
    • g++ -c y.tab.c
  • Compile all object files to generate an executable file
    • g++ -o a.out y.tab.o lex.yy.o c_file.o
  • For convenience, we write a shell script including all of the commandsabove to compile the file and generate an executable file.The shell script file is demonstrated below:

For different output formats, we just need to run the corresponding shell script to compile the source files. (Interpreter: makefile_i.sh, Compiler: makefile_c.sh, Graph: makefile_g.sh)

Demonstration of functions

Three types of outputs

Before demonstrating the functions of this calculator, we introduceyou3 different versions of outputs, which are interpreterversion, compilerversion, and syntax treeversion.

  • Interpreterversion

Fortheinterpreter, actually, all of the shown output in the function demonstration part is in this type. The interpreter will execute the input code line by line, therefore, we can see the output of print function on the terminal.

  • Compilerversion

The compilerversioncan output the corresponding intermediate code of the input code.

  • Syntax tree / Graphversion

The last type of output isthesyntax tree or graph. It can demonstrate the syntax tree of the input code.

Print

The calculator can output the value of a variable through “print”.

Input:

Output:

Noticethat if we print a variable having not been assigned value, it will output 0.

Input:

Output:

Basic arithmetic operations

  • Addition

Input:

Output:

  • Subtraction

Input:

Output:

  • Multiplication

Input:

Output:

  • Division

Input:

Output:

Other operations

  • fac

Function fac is the abbreviation of factorial, which means users can use this function to calculate the mathematical expression like 5! and get the answer. However, ifanegative number or floating number is given, an error will occur.

Input:

Output:

  • B2D

Function B2D is the abbreviation of transferring binary number to decimal number, which means users can use this function to transfer, for example, 10000B to 16. Remember to add capital B at the end ofthebinary string.

Input:

Output:

  • sin

Function sin means sin operation is supported in our compiler. We keep four decimal places.

Input:

Output:

Input:

Output:

  • cos

Input:

Output:

Input:

Output:

  • pow

Function pow is the abbreviation of calculating the power of a number. You can directly use ‘^’ symbol to get the power of a number.

Input:

Output:

Input:

Output:

Logic Judgement

  • if-else

This calculator accepts if-else statement. It can support condition statements including >, <, !=, ==, >=, <=. The if-else statement supports 2 types of format. The first type has braces enclosing the statement. For example:

Input :

Output:

It also acceptsstatementswithout braces, but only 1 statement can be writteninside the if and else respectively.

Input:

Output:

Iteration

  • while

Input:

Output:

Code Explanation

In this section,we will explainsix source files, which are LA.l, SA.y, calc.h, interpreter.c, compiler.c, graph.c. Notice that when we add a new operator, not only the lex and yacc file should be modified, the interpreter.c, compiler.c, and graph.c should also be changed. And due to the limitation of the articlelength,we decide to explain the code structure based on interpreter.c, andthe way to addanew operator based oncompiler.c andgraph.c, because the logic of these three files issimilar.

LA.l

The basic technology of Lex is the FSA (finite-state automation), which consists of one beginning state and one or many ending states. The Lex implements the FSA through the regular expressions, which will be simulated to execute the similar procedures in FSA to match the input contents and then return the tokens or execute other commands set previously. The LA.l file in our project consists ofthree parts:definitions,rules, andsubroutineswhich are separated by %%.

Definitions

In the partof definition, the header files of C or C++should be included.Contents between%{ and }%will be written into the .C file entirely withoutanychange after the compilation steps.Therefore,we can use some build-in functions or variables conveniently.

Rules

Variables and functions

In the rules part, there are some default variables and functions in lex file we need to useas follows.

  • "yytext"is the pointer to the matched string.
  • “yylval” isthe semantic value associated with the token. It will be passed to the yacc file as the parameters to execute the next operation.Thus,when further operation needs the content of the matched strings, yylval will be needed.Specifically, in our file, the yylval.s represents the stringdata typeand the yylval.iValue stands for floating numbers because floating numbersare more suitable for most of the operations (For the integer, we just store them as xxx.000000).
  • yyerroris used to output the error information whenthe input doesn’t match the rules.
  • atof() andatoi() are used to convert the input values to the float, int format.

Regular Expressions and Actions

  • Decimal Number

Since we need the token and the detailed value of them for someoperations, we keep the value in the yylval and pass them to the yacc file as semantic values. The two ‘?’ in [0-9]+(.[0-9]+)?([eE][0-9]+)? improve the robustness of the input, which can handle the cases in both integer and float formats since ‘?’ means those two items can exist or not.

  • Relation Operators

  • Symbols****

We only need the operators’ name to pass to yacc file so that we can use themtocombine with different expressions. So we just return their names as tokens using ‘*’.

  • Binary Number

We first use the malloc() to allocate a memory with string type for the yylval.s, and we use the strcpy() that is declared in the to copy the string content from yytext to yylval.s. So the whole bit string can be passed to yacc file as semantic value with token ‘BIN’ .

  • Mathematical Functions

We just return the tokens and the yacc file will execute the corresponding functions we write accordingtothe token it receives. We just execute these three commands like basic operands, ‘+’, ‘-‘, ‘*’, ‘/’, and the onlydifferentthing is that we write the details of these three functions by ourselves.

  • Keywords

  • Identifiers

The identifiers startwith ‘’ or letters and just can be constructed by letter, ‘’ or numbers. And it will pass the token and semantic value to SA.y file.

  • Ignore Whitespaces

  • Error Reporting Function

When tokens don’t satisfy the rules, we call the yyerror function to output the error information.

Subroutines

Theyywrap( )functionwill be executed in this part. Itis used to determine whether the program has read all the input streams. If it’s called at the end of the last input command, it will return 1 so that system can receive and stop the program, otherwise it will return 0.

SA.y

Yacc is a useful program used toproduce the source code of the syntax analyzer by the language based on the gramma we rule. The grammar of the Yacc file is similar to the lex one, whichisdivided into three parts:definitions,rules**,andsubroutines**.

Definitions

Header filesand function prototypesare contained.Besides, thevariable"sym" is defined bymapin C++, so that thevariable namecan bea string withanuncertain lengthother than just one letter.

  • Keywords Definition

Each keyword begins with a “%” and is followed by a token.

The"%union"part is used to assign the different data types to the different symbols."iValue"belongsto float type variables,"sIndex"belongsto char type variable,“nPtr” is thepointer pointing to any type of variables and the"s"is the string variable.

As shown, it’s optional to include a that supports the member name in the %union part for the**%token**,%left,%right****,and%nonassoc, while the**%type**must include a .

In our file, the**%token**part decides the terminals in the grammar ofthecalculator, which include the FLOAT with token , the VARIABLE with token , the BIN with token, and the WHILE, IF, PRINT, SIN, B2D, FAC.

The******%left, %right****,and****%nonassoc**identify tokens for theirassociation modes. For instance, “+ - * /” tokens is left-associative, thus defined in the %left.

The**%type**identifies nonterminals.

Rules

In this part, the source codes resemble the BNF grammar.The grammar rules containproductions on the left side andsemanticactions on the right side enclosedin {}.

In the parsing stage, we can use the values of the symbols storedin the value stacks: $$ means the attribute values of the non-terminalsonthe left side, $i means the attribute value of the ith symbol on the right side. These variableswillbe passed to the opr() function with the matched operators and the quantities of the operands. Besides, thecon(), str()andthe id() will be called to create the pointer with the value and the matched type.

Subroutines

The details of the functions are shown here. The con(), str(), id() are the functions used to return the target pointer with the variable type and value. And the opr() is specially used to return the pointer storingthe operands and the operator.

calc3.h

  • nodeEnum

The keyword “enum” is the abbreviation for enumeration. The default value of the first enumeration member is 0, and the value of subsequent enumeration members is incremented by 1 over the previous member, which means typeCon=0, typeId=1, typeOpr=2 and typeStr=4."nodeEnum"is the enumeration variable.

  • conNodeType

This is the definition of constant value. We useafloating number rather than integer because floating numbersis more precisethan integer.

  • strNodeType

The “strNodeType” is used to hold the constant string, for example, the binary string 1000B will be stored by strNodeType.

  • idNodeType

In this node, the name of identifiers can be stored here.

  • oprNodeType

Operator has its symbol and the number of operands. That’s why we define “oper” to store the ASCII code oftheoperator and “nops” to store the number of operands. The extra struct “nodeTypeTag” can be used at runtime to reserve all kinds of node we define.

  • nodeType

This struct can be regarded as a collection of all types that we introduce in front of it.

  • bool operator()(char const *a, char const *b) const

This is acomparison function passed to map so that map can compare char* correctly.

  • sym

The variable “sym” means symbol table, it stores the names of identifiers during the execution oftheprogram. We can get the value of an identifier by its name, whichis the feature of map, one ofthedata structures. We add keyword "extern"in front of the map"sym"so that"sym"can be usedas a general variablein other files.

interpreter.c

The purpose of interpreter is to get the numerical answer of the input. It contains three functions. They are ex(), BtoD(), factorial(), respectively.

float ex(nodeType *p)

Function ex() carries the main execution of this file. It is recursive. And switch-case structure inside it makes the compiler be able to recognize different types of variables and do the function such as assignment, calculation, iteration, logic judgement,and demonstration correspondingly.

Here is the pseudocode of ex():

float BtoD (char* bin)

Function BtoD is in charge of transferring binary string into decimal number. It will keep doing the calculation until it encountersa capital B, which represents the end of the binary string. There are two reasons why we use “char*” rather than int type. On the one hand, “char*” can contain a very long sequence of string while “int” cannot. On the other hand, we need an identifier to differentiate the binary strings.

float factorial (float fac)

Function factorial can calculate the factorial of zero and positive integer (Here we use floating number to represent integer). Therefore, this function will check if the variable “fac” is a positive integer-like floating number. If the answer is yes, then the function of calculating the factorial will continue, otherwise, the corresponding error information will be returned.

graph.c

This file can output the syntax tree of the given code. It contains 3 main parts, which are the ex() function, exNode() function, and a bunch of drawing functions.When a new operator is added to the calculator, we just need to modify the exNode()function and left the remaindingjobto the drawing functions.

Adding new operators:

For example, whenthesin()function is addedto the calculator, we just need to add the following code inside the switch structure under the case ofp->type ==typeOpr:

Adding new data types:

At this calculator, we also addeda string datatype, which is a new type of node. Therefore, weshouldfirstly add a string node in the calc3.h file like other types of node. Assume its"enum"namedefined in the calc3.his"typeStr", then weshouldadd the following code inside the first switch structure, which tells the exNode() function what to do when encountering a string type node:

Also, weshouldadd the"typeStr"into the following if statement, which is used to draw the graph ofleaves in the parse tree:

compiler.c

Adding new data types:

When a new type of data is added to thecalculator, wealsoneed to update the compiler.c file. For example, we added a string data type to the compiler and declared it in the calc3.h file, whose"enum"name is"typeStr". We just need to add a new case like the following picture into the first switch structure:

Adding new operators:

Another case is that a new operator is added. For example, the factorial operator is added, and we should add the following code inside the switch structure under"typeOpr"case:

It will runtheex()function with the operands recursivelyfirstly,and print out"fac"on the terminal.

Difficulty

Duringthe development of our calculator, we faced the followingdifficulties,andwe solve them at the end.

SA.y file receives the wrong lexeme

We have a function B2D(), which can turn the binary number into decimal. It will receive a regular expression, “[01]+B”, that is, a series of 0 and 1 ending with “B”, and then return a token named"BIN", whose type is"char*". However, we foundthat the lexeme passed to the syntax analyzer is not our expectation. The wrong lexeme contains the semi-colon(

运用Yacc和Lex来DIY计算器相关推荐

  1. Yacc和lex的部分翻译

    <<编译原理>> 词法分析Lexical analysis或Scanning:根据代码文本区分单词类型 保留字(while if else ...),数据类型(整型,字符,浮点 ...

  2. Yacc 与 Lex 快速入门(词法分析和语法分析)

    Lex 代表 Lexical Analyzar.Yacc 代表 Yet Another Compiler Compiler. 让我们从 Lex 开始吧. Lex Lex 是一种生成扫描器的工具.扫描器 ...

  3. Yacc 与 Lex 快速入门

    Yacc 与 Lex 快速入门 Lex 与 Yacc 介绍 Ashish Bansal 2000 年 11 月 01 日发布 WeiboGoogle+用电子邮件发送本页面 5 Lex 代表 Lexic ...

  4. HotSpot源码(一):Docker与虚拟机的区别,class字节码解析,linux内核源码下载地址,Yacc与Lex快速入门

    Docker是虚拟机吗? Docker是用来隔离的,使用的是隔离的namespace,使用OS提供的接口进行应用程序之间的资源隔离,不是虚拟机.再加上它自己特殊的文件系统,一层一层叠加.他只不过是一个 ...

  5. windows下yacc和lex开发环境配置(Parser Generator篇)

    1.下载安装 Parser Generator是Windows下YACC和LEX的实现,是由英国Bumble-Bee Software公司开发. 下载地址http://www.bumblebeesof ...

  6. 编译原理 yacc lex 制作一个计算器

    这篇文档是我从别的地方摘抄的,留给自己以后回忆使用.(写的非常详细!) Flex工具的使用方法 Lex 是一种生成扫描器的工具. Lex是Unix环境下非常著名的工具,主要功能是生成一个扫描器(Sca ...

  7. 编译原理-如何使用flex和yacc工具构造一个高级计算器

    Flex工具的使用方法 Lex 是一种生成扫描器的工具. Lex是Unix环境下非常著名的工具,主要功能是生成一个扫描器(Scanner)的C源码. 扫描器是一种识别文本中的词汇模式的程序. 这些词汇 ...

  8. YACC、LEX、JAVACC-------常用的编译工具

    CC(Compiler Compiler) CC的意思就是"编译器的编译器". 你可以定义一种上下文无关文法(CFG),然后针对这个特定的CFG你可以写出一个C程序来解释这种CFG ...

  9. Lex/Yacc 初识Lex

    因工作需要接触了一下Lex和Yacc,个人感觉挺有趣的,所以就写下来了. Lex是Lexical的缩写,大概就可以理解为词汇提取. Yacc是Yet another compiler compiler ...

最新文章

  1. nova instance出错:message: Proxy error: 502 Read from server failed
  2. 第十六届全国大学生智能车各分赛区所需要的比赛系统器材
  3. [转]关于jQuery性能优化
  4. 华三的stp根桥、端口角色选举规则
  5. tomcat-9.0.20部署的问题及性能监控
  6. 怎么读 Tomcat 源码?
  7. 保存文件 安卓_手机怎么解压zip文件 安卓手机zip文件怎么打开?
  8. MOS管防倒灌电路设计及其过程分析
  9. 善用佳软站长:畅谈大数据时代的知识管理
  10. 大多数Nobody游戏下载(带象棋残局攻略) 中文破解版
  11. parallel循环java_使用Java8新特性parallelStream遇到的坑
  12. vue3 动态获取屏幕尺寸
  13. MacBook Pro死机强制重启键
  14. Discuz论坛密码与密保加密规则
  15. ninja 编译threadx(ubuntu)
  16. 一图看懂中国AI战场局势:只有百度和华为真的在做AI
  17. Timingdesigner入门 基础 教程
  18. element-ui中tree组件双击事件的实现
  19. Mac下将文件复制到移动硬盘
  20. 瑞康医药的上云之旅:企业数字化转型首先要选好平台

热门文章

  1. lin通信ldf文件解析_LIN总线开发之LDF文件介绍
  2. Linux中光盘的挂载
  3. C++ 实验四 NO. 3 书店的前台收银销售类, 请完成该类定义,实验并且要满足: 1)向购物车中添加书籍;2):查看购物车;3):结算
  4. python 输出csv格式
  5. Java代理模式之蔡徐坤老师加深你的理解
  6. 清明的来历、清明的由来、清明诗歌
  7. 为什么电机启动电流大?启动后电流又小了?
  8. 小米的开源监控系统open-falcon架构设计,看完明白如何设计一个好的系统
  9. 启动车子温车_车启动到水温正常要多久
  10. latex如何输入三种花体字母