llvm学习笔记（2）

2. LLVM的后端描述

2.1. 类型描述

为了更好地描述寄存器所能支持的值类型（大小），以及操作数的类型（大小），Tablegen在ValueTypes.td里给出了一系列的类型定义，它们都继承自ValueType：

16 class ValueType<int size, int value> {

17 string Namespace = "MVT";

18 int Size = size;

19 int Value = value;

20 }

其中Size表示该类型的比特大小，Value是该类型的标识值（Value必须与MachineValueType.h中的类MVT里的枚举类型SimpleValueType给出的数值一致）。以此ValueType基类，TableGen会用到了以下的类型（也即是LLVM IR会用到的类型）：

OtherVT：“其他”值	il：一比特的布尔值	i8：8比特整数值	i16：16比特整数值
i32：32比特整数值	i64：64比特整数值	i128：128比特整数值	f16：16比特浮点值
f32：32比特浮点值	f64：64比特浮点值	f80：80比特浮点值	f128：128比特浮点值
ppcf128：PPC的128比特浮点值		v2i1：2✕i1向量	v4i1：4✕i1向量
v8i1：8✕i1向量	v16i1：16✕i1向量	v32i1：32✕i1向量	v64i1：64✕i1向量
v1i8：1✕i8向量	v2i8：2✕i8向量	v4i8：4✕i8向量	v8i8：8✕i8向量
v16i8：16✕i8向量	v32i8：32✕i8向量	v64i8：64✕i8向量	v1i16：1✕i16向量
v2i16：2✕i16向量	v4i16：4✕i16向量	v8i16：8✕i16向量	v16i16：16✕i16向量
v32i16：32✕i16向量	v1i32：1✕i32向量	v2i32：2✕i32向量	v4i32：4✕i32向量
v8i32：8✕i32向量	v16i32：16✕i32向量	v1i64：1✕i64向量	v2i64：2✕i64向量
v4i64：4✕i64向量	v8i64：8✕i64向量	v16i64：16✕i64向量	v1i128：1✕i128向量
v2f16：2✕f16向量	v4f16：4✕f16向量	v8f16：16✕f16向量	v1f32：1✕f32向量
v2f32：2✕f32向量	v4f32：4✕f32向量	v8f32：8✕f32向量	v16f32：16✕f32向量
v1f64：1✕f64向量	v2f64：2✕f64向量	v4f64：4✕f64向量	v8f64：8✕f64向量
x86mmx：XMM值	FlagVT：	isVoid：非值类型	untyped：非类型值
iPTRAny：将当前指针大小映射到任意地址空间		vAny：任意大小向量	fAny：任意格式浮点值
iAny：任意比特大小整数值	iPTR：当前指针大小	Any：任意大小、任意类型值	MetadataVT：元数据

其中vXiY，vXfY是X86的SSE、AVX指令集所支持的向量类型。

2.2. 指令描述

指令在Target.td（该文件用于描述与目标机器无关的接口）中有一个定义Instruction。不过它更像是一个描述指令的容器，将指令方方面面的描述集中起来。其中，OutOperandList与InOperandList分别是输出、输入操作数，InOperandList将用于指令模式的匹配，OutOperandList用于选择得到的MachineSDNode对象的操作数。AsmString是字符串形式的汇编代码，Pattern指出在SelectionDAG中什么样的DAG片段能匹配这条指令，Itinerary则可以描述该指令在处理器中的执行步骤，而SchedRW则描述该指令对CPU资源的占用情况。

注意，描述一条指令不需要用到所有的域。

320 //===----------------------------------------------------------------------===//

321 // Instruction set description - These classes correspond to the C++ classes in

322 // the Target/TargetInstrInfo.h file.

323 //

324 class Instruction {

325 string Namespace = "";

326

327 dag OutOperandList; // An dag containing the MI def operand list.

328 dag InOperandList; // An dag containing the MI use operand list.

329 string AsmString = ""; // The .s format to print the instruction with.

330

331 // Pattern - Set to the DAG pattern for this instruction, if we know of one,

332 // otherwise, uninitialized.

333 list<dag> Pattern;

334

335 // The follow state will eventually be inferred automatically from the

336 // instruction pattern.

337

338 list<Register> Uses = []; // Default to using no non-operand registers

339 list<Register> Defs = []; // Default to modifying no non-operand registers

340

341 // Predicates - List of predicates which will be turned into isel matching

342 // code.

343 list<Predicate> Predicates = [];

344

345 // Size - Size of encoded instruction, or zero if the size cannot be determined

346 // from the opcode.

347 int Size = 0;

348

349 // DecoderNamespace - The "namespace" in which this instruction exists, on

350 // targets like ARM which multiple ISA namespaces exist.

351 string DecoderNamespace = "";

352

353 // Code size, for instruction selection.

354 // FIXME: What does this actually mean?

355 int CodeSize = 0;

356

357 // Added complexity passed onto matching pattern.

358 int AddedComplexity = 0;

359

360 // These bits capture information about the high-level semantics of the

361 // instruction.

362 bit isReturn = 0; // Is this instruction a return instruction?

363 bit isBranch = 0; // Is this instruction a branch instruction?

364 bit isIndirectBranch = 0; // Is this instruction an indirect branch?

365 bit isCompare = 0; // Is this instruction a comparison instruction?

366 bit isMoveImm = 0; // Is this instruction a move immediate instruction?

367 bit isBitcast = 0; // Is this instruction a bitcast instruction?

368 bit isSelect = 0; // Is this instruction a select instruction?

369 bit isBarrier = 0; // Can control flow fall through this instruction?

370 bit isCall = 0; // Is this instruction a call instruction?

371 bit canFoldAsLoad = 0; // Can this be folded as a simple memory operand?

372 bit mayLoad = ?; // Is it possible for this inst to read memory?

373 bit mayStore = ?; // Is it possible for this inst to write memory?

374 bit isConvertibleToThreeAddress = 0; // Can this 2-addr instruction promote?

375 bit isCommutable = 0; // Is this 3 operand instruction commutable?

376 bit isTerminator = 0; // Is this part of the terminator for a basic block?

377 bit isReMaterializable = 0; // Is this instruction re-materializable?

378 bit isPredicable = 0; // Is this instruction predicable?

379 bit hasDelaySlot = 0; // Does this instruction have an delay slot?

380 bit usesCustomInserter = 0; // Pseudo instr needing special help.

381 bit hasPostISelHook = 0; // To be *adjusted* after isel by target hook.

382 bit hasCtrlDep = 0; // Does this instruction r/w ctrl-flow chains?

383 bit isNotDuplicable = 0; // Is it unsafe to duplicate this instruction?

384 bit isConvergent = 0; // Is this instruction convergent?

385 bit isAsCheapAsAMove = 0; // As cheap (or cheaper) than a move instruction.

386 bit hasExtraSrcRegAllocReq = 0; // Sources have special regalloc requirement?

387 bit hasExtraDefRegAllocReq = 0; // Defs have special regalloc requirement?

388 bit isRegSequence = 0; // Is this instruction a kind of reg sequence?

389 // If so, make sure to override

390 // TargetInstrInfo::getRegSequenceLikeInputs.

391 bit isPseudo = 0; // Is this instruction a pseudo-instruction?

392 // If so, won't have encoding information for

393 // the [MC]CodeEmitter stuff.

394 bit isExtractSubreg = 0; // Is this instruction a kind of extract subreg?

395 // If so, make sure to override

396 // TargetInstrInfo::getExtractSubregLikeInputs.

397 bit isInsertSubreg = 0; // Is this instruction a kind of insert subreg?

398 // If so, make sure to override

399 // TargetInstrInfo::getInsertSubregLikeInputs.

400

401 // Side effect flags - When set, the flags have these meanings:

402 //

403 // hasSideEffects - The instruction has side effects that are not

404 // captured by any operands of the instruction or other flags.

405 //

406 bit hasSideEffects = ?;

407

408 // Is this instruction a "real" instruction (with a distinct machine

409 // encoding), or is it a pseudo instruction used for codegen modeling

410 // purposes.

411 // FIXME: For now this is distinct from isPseudo, above, as code-gen-only

412 // instructions can (and often do) still have encoding information

413 // associated with them. Once we've migrated all of them over to true

414 // pseudo-instructions that are lowered to real instructions prior to

415 // the printer/emitter, we can remove this attribute and just use isPseudo.

416 //

417 // The intended use is:

418 // isPseudo: Does not have encoding information and should be expanded,

419 // at the latest, during lowering to MCInst.

420 //

421 // isCodeGenOnly: Does have encoding information and can go through to the

422 // CodeEmitter unchanged, but duplicates a canonical instruction

423 // definition's encoding and should be ignored when constructing the

424 // assembler match tables.

425 bit isCodeGenOnly = 0;

426

427 // Is this instruction a pseudo instruction for use by the assembler parser.

428 bit isAsmParserOnly = 0;

429

430 InstrItinClass Itinerary = NoItinerary;// Execution steps used for scheduling.

431

432 // Scheduling information from TargetSchedule.td.

433 list<SchedReadWrite> SchedRW;

434

435 string Constraints = ""; // OperandConstraint, e.g. $src = $dst.

436

437 /// DisableEncoding - List of operand names (e.g. "$op1,$op2") that should not

438 /// be encoded into the output machineinstr.

439 string DisableEncoding = "";

440

441 string PostEncoderMethod = "";

442 string DecoderMethod = "";

443

444 /// Target-specific flags. This becomes the TSFlags field in TargetInstrDesc.

445 bits<64> TSFlags = 0;

446

447 ///@name Assembler Parser Support

448 ///@{

449

450 string AsmMatchConverter = "";

451

452 /// TwoOperandAliasConstraint - Enable TableGen to auto-generate a

453 /// two-operand matcher inst-alias for a three operand instruction.

454 /// For example, the arm instruction "add r3, r3, r5" can be written

455 /// as "add r3, r5". The constraint is of the same form as a tied-operand

456 /// constraint. For example, "$Rn = $Rd".

457 string TwoOperandAliasConstraint = "";

458

459 ///@}

460

461 /// UseNamedOperandTable - If set, the operand indices of this instruction

462 /// can be queried via the getNamedOperandIdx() function which is generated

463 /// by TableGen.

464 bit UseNamedOperandTable = 0;

465 }

TableGen对class声明（包括def，defm，multiclass声明）的处理是生成一个Record对象，并保存在解析器的Records容器内。所不同者，def与defm声明得到的Record对象是可援引的（Records容器内包含两个子容器，分别用于class与def声明）。对上面的Instruction声明，TableGen不会产生一个对应的C++类声明，对应的C++类声明定义在Target/TargetInstrInfo.h里，是手动编写的类TargetInstrInfo。

v7.0增加了这些域：

bit hasNoSchedulingInfo = 0;

指示是否不预期因调度时延询问该指令。如果是，即使对完整的调度模型，也无需调度信息。

bit hasCompleteDecoder = 1;

指令解码器方法是否能完全确定该指令是否有效。

string AsmVariantName = "";

用于该指令的汇编器变体名。如果指定了，在MatchTable中指令仅表示为这个变体。如果没指定，将基于AsmString来确定汇编器变体。

bit FastISelShouldIgnore = 0;

FastISel是否应该忽略该指令。对特定的ISA，它们有映射到相同ISD操作码、值类型操作数及指令选择谓词的多条指令。FastISel不能处理这种情形，但SelectionDAG可以。

2.2.1. TableGen的内置类型

在上面的Instruction声明中，dag，bit，string，int与list都可视为TD语言的保留字，它们只允许出现在class，multiclass，def，defm及foreach这样的声明里。它们是TD的内置类型。

2.2.1.1. Dag

dag是一个很特殊的类型，它表示了程序中间表达树中的dag【有向无环（子）图】结构。因此，它的实例是一个递归构造，有这样的语法：

“(“DagArg DagArgList”)”

DagArgList ::= DagArg (“,” DagArg)*

DagArg ::= Value [“:” TokVarName] | TokVarName

第一个式子中的DagArg称为该dag的操作符。第三个式子中的Value也可以是一个dag结构。下面是一个LLVM里的实际例子：

(set VR128:$dst, (v2i64 (scalar_to_vector (i64 (bitconvert (x86mmx VR64:$src))))))

这个dag值有6层嵌套。第一层的操作符是“set”，它有两个值。在第一个值“VR128:$dst”中，“VR128”是值部分，“$dst”是该值的符号名（符号名必须以“$”开头），在上下文中代表这个值。第二个值则是一个dag值，其操作符是“v2i64”，“v2i64”的操作数又是一个dag值，其操作符是“scalar_to_vector”，操作数是一个dag值，以“i64”作为操作符，以此类推。

这个dag值描述了一个转换操作：64位标量的源操作数保存在MMX寄存器里，首先转换为64位有符号整数，然后转换为一个2✕i64向量，保存入一个128位目标寄存器。

dag的操作符要么是一个简单的def（例如“out”，“in”，“set”，它们对TableGen有特殊的含义）；要么是一个SDNode派生定义，描述一个操作（例如上面的“scalar_to_vector”与“bitconvert”）；又或者是一个ValueType派生定义，描述值的类型（例如上面的“VR128”，“i64”，“x86mmx”）。

2.2.1.2. List

顾名思义，这代表了一个队列。List值有这样的形式：

“[“ ValueList ”]” [“<” Type ”>”]

ValueList ::= [ValueListNE]

ValueListNE ::= Value (“,” Value)*

List值可以是空的，即“[]”。下面是一个LLVM的实际例子：

[llvm_ptr_ty, llvm_ptr_ty]

注意，在TD语言里，“[{…}]”结构不是一个List值，而是表示一个内嵌的代码片段。

2.2.1.3. String

String基本上就是C++的字符串常量。

2.2.1.4. Bit、int

Bit代表一个比特位，int则是一个64位整数。

2.2.1.5. Bits

Bits代表若干比特位，需要参数指定长度，如上面的“bits<64>”。

llvm学习笔记（2）相关推荐

我的LLVM学习笔记——OLLVM混淆研究之FLA篇
因为要做代码保护,所以抽时间研究了下OLLVM中的三种保护方案:BCF(Bogus Control Flow,中文名虚假控制流).FLA(Control Flow Flattening,中文名控制流平 ...
LLVM学习笔记（43-2）
V7.0的变化 V7.0的 SubtargetEmitter::EmitProcessorModels()改写颇多,因为对处理器的描述进行了相当程度的增强. 1344 void SubtargetE ...
LLVM学习笔记（43）
3.6.2.5. 输出代码与数据结构 3.6.2.5.1. 资源使用与时延 SchedTables保存在WriteProcResources,WriteLatencies,ReadAdvanceEnt ...
LLVM学习笔记（16）
3.4.2.4. PatFrag的处理 3.4.2.4.1. 模式树的构建 PatFrag是一个可重用的构件,TableGen会在PatFrag出现的地方展开其定义,有点像C/C++中的宏.为此,Co ...
LLVM学习笔记（51）
3.10. X86折叠表的生成(v7.0) 指令折叠是在寄存器分配过程中执行的优化,目的是删除不必要的拷贝指令.例如,这样的一组指令: %EBX = LOAD %mem_address %EAX = ...
Cocoa编程学习笔记一
Cocoa编程学习笔记一一.Cocoa的起源 Mac OS X的窗口服务器与UNIX中的X窗口服务器具有相同的功能:从用户那里接受事件,并将时间转发给应用程序,将应用程序发过来的数据显示在屏幕上.N ...
源码状态机_LLVM学习笔记(1)--初探源码
一直耳闻LLVM相比于GCC: well documented 架构灵活,前后端解耦符合龙书的讲解昨天读到了一篇虽然概括却很周到的llvm入门导引陈钦霖:LLVM Pass入门导引zhuanla ...
C++Primer第5版学习笔记（一）
C++Primer第5版学习笔记(一) 第一.二章的重难点内容这个笔记本主要记录了我在学习C++Primer(第5版,中文版)的过程中遇到的重难点及其分析.因为第一.二章都比较简单,因 ...
Unity DOTS 学习笔记2 - 面向数据设计的基本概念（上）
上一章,我们安装了ECS套件,也进行了一些介绍,但是比较笼统.没有一些基础知识储备,很难开始编写代码.本章首先翻译和整理了部分Unity官方的DOTS知识,需要对面向数据有更深刻的认识. DOD知识准 ...

llvm学习笔记（2）

llvm学习笔记（2）相关推荐

最新文章

热门文章