
  • 高级语言相对汇编语言的优势
  • 编译器优化的选项


  • 开发效率高。IDE和编译器会提醒你的错误。由于编译器优化的存在,高级语言在执行效率上的劣势并不大。
  • 出错概率
  • 跨平台

cc是C compiler的缩写。


-Og 使机器码的结构与源代码相似,避免代码的变形,通常用于教学。实际使用中,一般使用更高级别的优化,如-O1 or -O2。

At any given time, only limited sub-ranges of virtual addresses are considered valid. For example, x86-64 virtual addresses are represented by 64-bit words. In current implementations of these machines, the upper 16 bits must be set to zero, and so an address can potentially specify a byte
over a range of 248, or 64 terabytes.The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

$ gcc -Og -S mstore.c
$ gcc -Og -c mstore.c
$ objdump -d mstore.o

在link这一步一定需要一个main函数。仅含非main函数的.c也可以生成.s汇编和.o机器码。main函数的加入会使得尺寸大大增加,因为 it contains not just the machine code for the procedures we provided but also code used to start and terminate the program as well as to interact with the operating system.

1.shifted the location of the code to a different range of addresses
2. match function calls with the locations of the executable code for
those functions(也就是call命令中会指明被call函数的地址)
3. NOP have been inserted to grow the code for the function to 16 bytes, enabling a better placement of the next block of code in terms of memory system performance.





Recall that when performing a cast that involves both a size change and a
change of “signedness” in C, the operation should change the size first (Section 2.2.6).

Since the stack is contained in the same memory as the program code and
other forms of program data, programs can access arbitrary positions within the stack using the standard memory addressing methods.(可以随机访问的栈,可与STL对比)

In addition, LEA can be used to compactly describe common arithmetic operations.

The higher-order bits are ignored. So, for example, when register %cl has hexadecimal value 0xFF, then instruction salb would shift by 7, while salw would shift by 15, sall would shift by 31, and salq would shift by 63.

We see that most of the instructions shown in Figure 3.10 can be used for either unsigned or two’s-complement arithmetic. This is one of the features that makes two’s-complement arithmetic the preferred way to implement signed integer arithmetic.
They use different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.

用rax和rdx拼接成oct word用于乘除法:
multiplying two 64-bit signed or unsigned integers can yield a product that requires 128 bits to represent.

In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

jmp label (direct jump) 指令中的label在得到.o文件时会被翻译:
In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.
此外也可jmp reg或者jmp mem (indirect jmp)
Conditional jumps can only be direct.

汇编代码中rep ret的解释:
AMD recommends using the combination of rep followed by ret to avoid making the ret instruction the destination of a conditional jump instruction. According to AMD, their processors cannot properly predict the destination of a ret instruction when it is reached from a jump instruction. The rep instruction serves as a form of no-operation here, and so inserting it as the jump destination does not change behavior of the code, except to make it faster on AMD processors.

The flow of control does not depend on data, and this makes it easier for the processor to keep its pipeline full. (P243)


They are particularly useful when dealing with tests where there can be a large number of possible outcomes. Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table.
The advantage of using a jump table over a long sequence of if-else statements is that the time taken to perform the switch is independent of the number of switch cases.

As P calls Q, control and data information are added to the end of the stack.

many procedures have six or fewer arguments, and so all of their parameters can be passed in registers.

CALL instruction pushes an address A onto the stack and sets the PC to the beginning
of Q. The counterpart instruction ret pops an address A off the stack and sets the PC to A.




For data type T and integer constant N, consider a declaration of the form T A[N];
Let us denote the starting location as xA. The declaration has two effects. First, it allocates a contiguous region of L . N bytes in memory, where L is the size (in bytes) of data type T . Second, it introduces an identifier A that can be used as a pointer to the beginning of the array. The value of this pointer will be xA.


The struct data type constructor is the closest thing C provides to the objects of C++ and Java.The objects of C++ and Java are more elaborate than structures in C, in that they also associate
a set of methods with an object that can be invoked to perform computation.

The selection of the different fields of a structure is handled completely at compile time. The machine code contains no information about the field declarations or the names of the fields.

Unions can be useful in several contexts. However, they can also lead to nasty bugs, since they bypass the safety provided by the C type system. One application is when we know in advance that the use of two different fields in a data structure will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.


Casting from one type of pointer to another changes its type but not its value. Pointers can also point to functions.The value of a function pointer is the address of the first instruction in the machine-code representation of the function.


Thus, even if many machines are running identical code, they would all be using different stack addresses. This is implemented by allocating a random amount of space between 0 and n bytes on the stack at the start of a program, for example, by using the allocation function alloca, which allocates space for a specified number of bytes on the stack.
If we set up a 256-byte nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses, which is entirely feasible for a determined attacker. For the 64-bit case, trying to enumerate 224 = 16,777,216 is a bit more daunting. We can see that stack randomization and other aspects of ASLR can increase the effort required to successfully attack a system, and therefore greatly reduce the rate at which a virus or worm can spread,
but it cannot provide a complete safeguard.

Buffer canary:
Stack protection does a good job of preventing a buffer overflow attack from corrupting state stored on the program stack. It incurs only a small performance penalty, especially because gcc only inserts it when there is a local buffer of type char in the function.

Some types of programs require the ability to dynamically generate and execute code. For example, “just-in-time” compilation techniques dynamically generate code for programs written in interpreted languages, such as Java, to improve execution performance. Whether or not the run-time system can restrict the executable code to just that part generated by the compiler in creating the original
program depends on the language and the operating system.

对于变长栈(中括号内是变量),需要使用rbp来帮忙寻找定长的local variable。(如果用rsp来寻址的话,偏移量会与中括号内的变量相关,而用rbp则可以确保偏移量为常数)

single instruction, multiple data, or SIMD(P322)
media register被称为MM,扩展版本包括XMM、YMM,它们被用于存储浮点数。

operations like y=a+r, where y and a are vectors, while r is a real scalar. It essentially adds the scalar r to every element of a.
C++, on the other hand, as well as other higher-level languages, supports operations on user-defined types, which are by definition not scalar, or on other types that have no immediate support from hardware. (built_in类型的一般是scalar,自定义类型的一般是compound。)

the code optimization guidelines recommend that 32-bit memory data satisfy a 4-byte alignment and that 64-bit data satisfy an 8-byte alignment.

Up to eight floating-point arguments can be passed in XMM registers %xmm0–%xmm7. These registers are used in the order the arguments are listed. Additional floating-point arguments can be passed on the stack.

A function that returns a floating-point value does so in register %xmm0.

All XMM registers are caller saved. The callee may overwrite any of these registers without first saving it.





