CSAPP : Arch Lab 解题报告
准备
官网下好解压。
载入tar文件,运用 tar xvf archlab-handout.tar
将文件解压。里面包含README, Makefile, sim.tar, archlab.ps, archlab.pdf, and simguide.pdf.
于是你可能有以下问题
如果出现can not locate 就是镜像源不行。可以去网上搜个阿里云的。然后再把/etc/apt/sources.list把里面的网址都换了。
换后注意sudo update
/usr/bin/ld: cannot find -lfl
sudo apt-get install flex/usr/bin/ld: cannot find -ltk
/usr/bin/ld: cannot find -ltclsudo apt-get install tk8.5
sudo apt-get install tcl8.5同时把自己的实验文件Makefile修改了。修改格式如下:# Comment this out if you don't have Tcl/Tk on your system#GUIMODE=-DHAS_GUI# Modify the following line so that gcc can find the libtcl.so and
# libtk.so libraries on your system. You may need to use the -L option
# to tell gcc which directory to look in. Comment this out if you
# don't have Tcl/Tk.TKLIBS=-L/usr/lib -ltk8.5 -ltcl8.5 /*改成这样*/# Modify the following line so that gcc can find the tcl.h and tk.h
# header files on your system. Comment this out if you don't have
# Tcl/Tk.TKINC=-isystem /usr/include/tcl8.5最后重新make clean ;make就可以了若之后出现同样问题照做。后面有个实验需要把Makefile里面的含GUI的一行给删除掉
TESTA
手写Y86汇编。要实现的函数在example.c中。本想着偷懒直接反汇编把得到的反汇编文件改成Y86。发现反汇编出来的代码更麻烦。所以还是手写吧。
对着书上第四章的一个大例子模仿出来
自己新建一个文件 vim sum_list.ys
三者的结果均在%rax中,若没有%rax的变化即代码存在bug。%rax均是cba
相关编译运行代码如下
unix > ./yas A-sum.ys
unix > ./yis A-sum.yo
# sum_list.ys example.c#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt# Sample linked list.align 8ele1:.quad 0x00a.quad ele2ele2:.quad 0x0b0.quad ele3ele3:.quad 0xc00.quad 0main:irmovq ele1,%rdicall sum_listretsum_list:xorq %rax,%rax #rax=0jmp testloop:mrmovq (%rdi),%r10addq %r10,%raxmrmovq 8(%rdi),%rdi test: andq %rdi,%rdijne loopret#Stack starts here and grows to lower addresses.pos 0x100stack:
这里直接写递归,保存寄存器到栈里去然后递归
# sum_list.ys example.c
#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt
# Sample linked list.align 8ele1:.quad 0x00a.quad ele2ele2:.quad 0x0b0.quad ele3ele3:.quad 0xc00.quad 0
main:irmovq ele1,%rdicall sum_listret
sum_list:xorq %rax,%rax #rax=0andq %rdi,%rdi je return mrmovq (%rdi),%r10 #long val =ls-valpushq %r10mrmovq 8(%rdi),%rdi call sum_listpopq %rbxaddq %rbx,%raxret
return:ret
#Stack starts here and grows to lower addresses.pos 0x1000
stack:
#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt
.align 8
#Source block
src:.quad 0x00a.quad 0x0b0.quad 0xc00
# Destination block
dest:.quad 0x111.quad 0x222.quad 0x333main:xorq %rax,%rax #long result=0irmovq src,%rdiirmovq dest,%rsi irmovq $1,%r9irmovq $3,%r8irmovq $8,%r11andq %r8,%r8jmp test
loop:mrmovq (%rdi),%rcxaddq %r11,%rdi rmmovq %rcx,(%rsi)addq %r11,%rsixorq %rcx,%raxsubq %r9,%r8
test: jne loop ret
#Stack starts here and grows to lower addresses.pos 0x100
stack:
TESTB
根据第四章流水线的讲解,结合opq和irmovq的表格来写。
得出的iaddq格式如下
阶段 iaddq V,rB
取指 icode:ifun <-- M1[PC]rA:rB <-- M1[PC+1]valC <-- M8[PC+2]valP <-- PC+10
译码 valB <-- R[rB]
执行 valE <-- valB+valCset CC
访存 None
写回 R[rB] <-- valE
更新 PC <-- valP
我们在sim/seq/seq-full.hcl里添加"IIADDQ",这里就要结合书上的知识判每个顺序过程
#/* $begin seq-all-hcl */
####################################################################
# HCL Description of Control for Single Cycle Y86-64 Processor SEQ #
# Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2010 #
###################################################################### Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work####################################################################
# C Include's. Don't alter these #
####################################################################quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'word_t gen_pc(){return 0;}'
quote 'int main(int argc, char *argv[])'
quote ' {plusmode=0;return sim_main(argc,argv);}'####################################################################
# Declarations. Do not change/remove/delete any of these #
######################################################################### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 'I_NOP'
wordsig IHALT 'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ 'I_ALU'
wordsig IJXX 'I_JMP'
wordsig ICALL 'I_CALL'
wordsig IRET 'I_RET'
wordsig IPUSHQ 'I_PUSHQ'
wordsig IPOPQ 'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ 'I_IADDQ'##### Symbolic represenations of Y86-64 function codes #####
wordsig FNONE 'F_NONE' # Default function code##### Symbolic representation of Y86-64 Registers referenced explicitly #####
wordsig RRSP 'REG_RSP' # Stack Pointer
wordsig RNONE 'REG_NONE' # Special value indicating "no register"##### ALU Functions referenced explicitly #####
wordsig ALUADD 'A_ADD' # ALU should add its arguments##### Possible instruction status values #####
wordsig SAOK 'STAT_AOK' # Normal execution
wordsig SADR 'STAT_ADR' # Invalid memory address
wordsig SINS 'STAT_INS' # Invalid instruction
wordsig SHLT 'STAT_HLT' # Halt instruction encountered##### Signals that can be referenced by control logic ######################### Fetch stage inputs #####
wordsig pc 'pc' # Program counter
##### Fetch stage computations #####
wordsig imem_icode 'imem_icode' # icode field from instruction memory
wordsig imem_ifun 'imem_ifun' # ifun field from instruction memory
wordsig icode 'icode' # Instruction control code
wordsig ifun 'ifun' # Instruction function
wordsig rA 'ra' # rA field from instruction
wordsig rB 'rb' # rB field from instruction
wordsig valC 'valc' # Constant from instruction
wordsig valP 'valp' # Address of following instruction
boolsig imem_error 'imem_error' # Error signal from instruction memory
boolsig instr_valid 'instr_valid' # Is fetched instruction valid?##### Decode stage computations #####
wordsig valA 'vala' # Value from register A port
wordsig valB 'valb' # Value from register B port##### Execute stage computations #####
wordsig valE 'vale' # Value computed by ALU
boolsig Cnd 'cond' # Branch test##### Memory stage computations #####
wordsig valM 'valm' # Value read from memory
boolsig dmem_error 'dmem_error' # Error signal from data memory####################################################################
# Control Signal Definitions. #
#################################################################################### Fetch Stage #################################### Determine instruction code
word icode = [imem_error: INOP;1: imem_icode; # Default: get from instruction memory
];# Determine instruction function
word ifun = [imem_error: FNONE;1: imem_ifun; # Default: get from instruction memory
];bool instr_valid = icode in { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ ,IIADDQ };# Does fetched instruction require a regid byte?
bool need_regids =icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };# Does fetched instruction require a constant word?
bool need_valC =icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };################ Decode Stage ##################################### What register should be used as the A source?
word srcA = [icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : rA;icode in { IPOPQ, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the B source?
word srcB = [icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ } : rB;icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the E destination?
word dstE = [icode in { IRRMOVQ } && Cnd : rB;icode in { IIRMOVQ, IOPQ,IIADDQ } : rB;icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE; # Don't write any register
];## What register should be used as the M destination?
word dstM = [icode in { IMRMOVQ, IPOPQ } : rA;1 : RNONE; # Don't write any register
];################ Execute Stage ##################################### Select input A to ALU
word aluA = [icode in { IRRMOVQ, IOPQ } : valA;icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : valC;icode in { ICALL, IPUSHQ } : -8;icode in { IRET, IPOPQ } : 8;# Other instructions don't need ALU
];## Select input B to ALU
word aluB = [icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, IPUSHQ, IRET, IPOPQ,IIADDQ } : valB;icode in { IRRMOVQ, IIRMOVQ } : 0;# Other instructions don't need ALU
];## Set the ALU function
word alufun = [icode == IOPQ : ifun;1 : ALUADD;
];## Should the condition codes be updated?
bool set_cc = icode in { IOPQ,IIADDQ };################ Memory Stage ##################################### Set read control signal
bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET };## Set write control signal
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };## Select memory address
word mem_addr = [icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;icode in { IPOPQ, IRET } : valA;# Other instructions don't need address
];## Select memory input data
word mem_data = [# Value from registericode in { IRMMOVQ, IPUSHQ } : valA;# Return PCicode == ICALL : valP;# Default: Don't write anything
];## Determine instruction status
word Stat = [imem_error || dmem_error : SADR;!instr_valid: SINS;icode == IHALT : SHLT;1 : SAOK;
];################ Program Counter Update ############################## What address should instruction be fetched atword new_pc = [# Call. Use instruction constanticode == ICALL : valC;# Taken branch. Use instruction constanticode == IJXX && Cnd : valC;# Completion of RET instruction. Use value from stackicode == IRET : valM;# Default: Use incremented PC1 : valP;
];
#/* $end seq-all-hcl */
TESTC
最后这个lab,做的有点无语。首先把上面的iaddq指令放到这次的hcl里面。修改pipe-full.hcl
#/* $begin pipe-all-hcl */
####################################################################
# HCL Description of Control for Pipelined Y86-64 Processor #
# Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2014 #
###################################################################### Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work####################################################################
# C Include's. Don't alter these #
####################################################################quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "pipeline.h"'
quote '#include "stages.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'####################################################################
# Declarations. Do not change/remove/delete any of these #
######################################################################### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP 'I_NOP'
wordsig IHALT 'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ 'I_ALU'
wordsig IJXX 'I_JMP'
wordsig ICALL 'I_CALL'
wordsig IRET 'I_RET'
wordsig IPUSHQ 'I_PUSHQ'
wordsig IPOPQ 'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ 'I_IADDQ'##### Symbolic represenations of Y86-64 function codes #####
wordsig FNONE 'F_NONE' # Default function code##### Symbolic representation of Y86-64 Registers referenced #####
wordsig RRSP 'REG_RSP' # Stack Pointer
wordsig RNONE 'REG_NONE' # Special value indicating "no register"##### ALU Functions referenced explicitly ##########################
wordsig ALUADD 'A_ADD' # ALU should add its arguments##### Possible instruction status values #####
wordsig SBUB 'STAT_BUB' # Bubble in stage
wordsig SAOK 'STAT_AOK' # Normal execution
wordsig SADR 'STAT_ADR' # Invalid memory address
wordsig SINS 'STAT_INS' # Invalid instruction
wordsig SHLT 'STAT_HLT' # Halt instruction encountered##### Signals that can be referenced by control logic ################### Pipeline Register F ##########################################wordsig F_predPC 'pc_curr->pc' # Predicted value of PC##### Intermediate Values in Fetch Stage ###########################wordsig imem_icode 'imem_icode' # icode field from instruction memory
wordsig imem_ifun 'imem_ifun' # ifun field from instruction memory
wordsig f_icode 'if_id_next->icode' # (Possibly modified) instruction code
wordsig f_ifun 'if_id_next->ifun' # Fetched instruction function
wordsig f_valC 'if_id_next->valc' # Constant data of fetched instruction
wordsig f_valP 'if_id_next->valp' # Address of following instruction
boolsig imem_error 'imem_error' # Error signal from instruction memory
boolsig instr_valid 'instr_valid' # Is fetched instruction valid?##### Pipeline Register D ##########################################
wordsig D_icode 'if_id_curr->icode' # Instruction code
wordsig D_rA 'if_id_curr->ra' # rA field from instruction
wordsig D_rB 'if_id_curr->rb' # rB field from instruction
wordsig D_valP 'if_id_curr->valp' # Incremented PC##### Intermediate Values in Decode Stage #########################wordsig d_srcA 'id_ex_next->srca' # srcA from decoded instruction
wordsig d_srcB 'id_ex_next->srcb' # srcB from decoded instruction
wordsig d_rvalA 'd_regvala' # valA read from register file
wordsig d_rvalB 'd_regvalb' # valB read from register file##### Pipeline Register E ##########################################
wordsig E_icode 'id_ex_curr->icode' # Instruction code
wordsig E_ifun 'id_ex_curr->ifun' # Instruction function
wordsig E_valC 'id_ex_curr->valc' # Constant data
wordsig E_srcA 'id_ex_curr->srca' # Source A register ID
wordsig E_valA 'id_ex_curr->vala' # Source A value
wordsig E_srcB 'id_ex_curr->srcb' # Source B register ID
wordsig E_valB 'id_ex_curr->valb' # Source B value
wordsig E_dstE 'id_ex_curr->deste' # Destination E register ID
wordsig E_dstM 'id_ex_curr->destm' # Destination M register ID##### Intermediate Values in Execute Stage #########################
wordsig e_valE 'ex_mem_next->vale' # valE generated by ALU
boolsig e_Cnd 'ex_mem_next->takebranch' # Does condition hold?
wordsig e_dstE 'ex_mem_next->deste' # dstE (possibly modified to be RNONE)##### Pipeline Register M #########################
wordsig M_stat 'ex_mem_curr->status' # Instruction status
wordsig M_icode 'ex_mem_curr->icode' # Instruction code
wordsig M_ifun 'ex_mem_curr->ifun' # Instruction function
wordsig M_valA 'ex_mem_curr->vala' # Source A value
wordsig M_dstE 'ex_mem_curr->deste' # Destination E register ID
wordsig M_valE 'ex_mem_curr->vale' # ALU E value
wordsig M_dstM 'ex_mem_curr->destm' # Destination M register ID
boolsig M_Cnd 'ex_mem_curr->takebranch' # Condition flag
boolsig dmem_error 'dmem_error' # Error signal from instruction memory##### Intermediate Values in Memory Stage ##########################
wordsig m_valM 'mem_wb_next->valm' # valM generated by memory
wordsig m_stat 'mem_wb_next->status' # stat (possibly modified to be SADR)##### Pipeline Register W ##########################################
wordsig W_stat 'mem_wb_curr->status' # Instruction status
wordsig W_icode 'mem_wb_curr->icode' # Instruction code
wordsig W_dstE 'mem_wb_curr->deste' # Destination E register ID
wordsig W_valE 'mem_wb_curr->vale' # ALU E value
wordsig W_dstM 'mem_wb_curr->destm' # Destination M register ID
wordsig W_valM 'mem_wb_curr->valm' # Memory M value####################################################################
# Control Signal Definitions. #
#################################################################################### Fetch Stage ##################################### What address should instruction be fetched at
word f_pc = [# Mispredicted branch. Fetch at incremented PCM_icode == IJXX && !M_Cnd : M_valA;# Completion of RET instructionW_icode == IRET : W_valM;# Default: Use predicted value of PC1 : F_predPC;
];## Determine icode of fetched instruction
word f_icode = [imem_error : INOP;1: imem_icode;
];# Determine ifun
word f_ifun = [imem_error : FNONE;1: imem_ifun;
];# Is instruction valid?
bool instr_valid = f_icode in { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ,IIADDQ };# Determine status code for fetched instruction
word f_stat = [imem_error: SADR;!instr_valid : SINS;f_icode == IHALT : SHLT;1 : SAOK;
];# Does fetched instruction require a regid byte?
bool need_regids =f_icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };
# Does fetched instruction require a constant word?
bool need_valC =f_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };# Predict next value of PC
word f_predPC = [f_icode in { IJXX, ICALL } : f_valC;1 : f_valP;
];################ Decode Stage ######################################## What register should be used as the A source?
word d_srcA = [D_icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : D_rA;D_icode in { IPOPQ, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the B source?
word d_srcB = [D_icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ } : D_rB;D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the E destination?
word d_dstE = [D_icode in { IRRMOVQ, IIRMOVQ, IOPQ,IIADDQ} : D_rB;D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE; # Don't write any register
];## What register should be used as the M destination?
word d_dstM = [D_icode in { IMRMOVQ, IPOPQ } : D_rA;1 : RNONE; # Don't write any register
];## What should be the A value?
## Forward into decode stage for valA
word d_valA = [D_icode in { ICALL, IJXX } : D_valP; # Use incremented PCd_srcA == e_dstE : e_valE; # Forward valE from executed_srcA == M_dstM : m_valM; # Forward valM from memoryd_srcA == M_dstE : M_valE; # Forward valE from memoryd_srcA == W_dstM : W_valM; # Forward valM from write backd_srcA == W_dstE : W_valE; # Forward valE from write back1 : d_rvalA; # Use value read from register file
];word d_valB = [d_srcB == e_dstE : e_valE; # Forward valE from executed_srcB == M_dstM : m_valM; # Forward valM from memoryd_srcB == M_dstE : M_valE; # Forward valE from memoryd_srcB == W_dstM : W_valM; # Forward valM from write backd_srcB == W_dstE : W_valE; # Forward valE from write back1 : d_rvalB; # Use value read from register file
];################ Execute Stage ####################################### Select input A to ALU
word aluA = [E_icode in { IRRMOVQ, IOPQ } : E_valA;E_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : E_valC;E_icode in { ICALL, IPUSHQ } : -8;E_icode in { IRET, IPOPQ } : 8;# Other instructions don't need ALU
];## Select input B to ALU
word aluB = [E_icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, IPUSHQ, IRET, IPOPQ,IIADDQ } : E_valB;E_icode in { IRRMOVQ, IIRMOVQ } : 0;# Other instructions don't need ALU
];## Set the ALU function
word alufun = [E_icode == IOPQ : E_ifun;1 : ALUADD;
];## Should the condition codes be updated?
bool set_cc = E_icode in {IIADDQ,IOPQ} &&# State changes only during normal operation!m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT };## Generate valA in execute stage
word e_valA = E_valA; # Pass valA through stage## Set dstE to RNONE in event of not-taken conditional move
word e_dstE = [E_icode == IRRMOVQ && !e_Cnd : RNONE;1 : E_dstE;
];################ Memory Stage ######################################## Select memory address
word mem_addr = [M_icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : M_valE;M_icode in { IPOPQ, IRET } : M_valA;# Other instructions don't need address
];## Set read control signal
bool mem_read = M_icode in { IMRMOVQ, IPOPQ, IRET };## Set write control signal
bool mem_write = M_icode in { IRMMOVQ, IPUSHQ, ICALL };#/* $begin pipe-m_stat-hcl */
## Update the status
word m_stat = [dmem_error : SADR;1 : M_stat;
];
#/* $end pipe-m_stat-hcl */## Set E port register ID
word w_dstE = W_dstE;## Set E port value
word w_valE = W_valE;## Set M port register ID
word w_dstM = W_dstM;## Set M port value
word w_valM = W_valM;## Update processor status
word Stat = [W_stat == SBUB : SAOK;1 : W_stat;
];################ Pipeline Register Control ########################## Should I stall or inject a bubble into Pipeline Register F?
# At most one of these can be true.
bool F_bubble = 0;
bool F_stall =# Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };# Should I stall or inject a bubble into Pipeline Register D?
# At most one of these can be true.
bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB };bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Cnd) ||# Stalling at fetch while ret passes through pipeline# but not condition for a load/use hazard!(E_icode in { IMRMOVQ, IPOPQ } && E_dstM in { d_srcA, d_srcB }) &&IRET in { D_icode, E_icode, M_icode };# Should I stall or inject a bubble into Pipeline Register E?
# At most one of these can be true.
bool E_stall = 0;
bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Cnd) ||# Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB};# Should I stall or inject a bubble into Pipeline Register M?
# At most one of these can be true.
bool M_stall = 0;
# Start injecting bubbles as soon as exception passes through memory stage
bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT };# Should I stall or inject a bubble into Pipeline Register W?
bool W_stall = W_stat in { SADR, SINS, SHLT };
bool W_bubble = 0;
#/* $end pipe-all-hcl */
测试编译:
make VERSION=full
./correctness.pl #结果是否正确
./benchmark.pl #得出分数
开始尝试六路展开,然后把条件跳转换成条件转移。
测完了之后喜提0分。
因为条件转移要的指令更多。
0分代码
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src, %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portion# Loop headerxorq %rax,%rax # count = 0;
Loop:iaddq $-6,%rdxjl Remain # 先判断剩下的长度是否<6,进入特判;不然循环做 iaddq $6,%rdx # 把长度变回来,最后再减掉 mrmovq (%rdi),%r8 mrmovq 8(%rdi),%r9 rrmovq %rax,%r13 iaddq $1,%rax andq %r8,%r8cmovle %r13,%raxrmmovq %r8,(%rsi)iaddq $8,%rsi jmp S2
S2:rrmovq %rax,%r13iaddq $1,%raxandq %r9,%r9cmovle %r13,%raxrmmovq %r9,(%rsi)iaddq $8,%rsijmp S3
S3:mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11rrmovq %rax,%r13iaddq $1,%raxandq %r10,%r10cmovle %r13,%raxrmmovq %r10,(%rsi)iaddq $8,%rsijmp S4
S4:rrmovq %rax,%r13iaddq $1,%raxandq %r11,%r11cmovle %r13,%raxrmmovq %r11,(%rsi)iaddq $8,%rsijmp S5
S5:mrmovq 32(%rdi),%r12mrmovq 40(%rdi),%r14rrmovq %rax,%r13iaddq $1,%raxandq %r12,%r12cmovle %r13,%raxrmmovq %r12,(%rsi)iaddq $8,%rsijmp S6
S6:rrmovq %rax,%r13iaddq $1,%raxandq %r14,%r14cmovle %r13,%raxrmmovq %r14,(%rsi)iaddq $8,%rsiiaddq $-6,%rdxiaddq $48,%rdijmp Loop
#####################################################################
Solveremain:mrmovq (%rdi),%r8 mrmovq 8(%rdi),%r9rrmovq %rax,%r13 iaddq $1,%rax #条件转移andq %r8,%r8cmovle %r13,%raxrmmovq %r8,(%rsi) iaddq $8,%rsi jmp Solver1
Solver1:iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r9,%r9cmovle %r13,%raxrmmovq %r9,(%rsi)iaddq $8,%rsijmp Solver2
Solver2:mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r10,%r10cmovle %r13,%raxrmmovq %r10,(%rsi)iaddq $8,%rsijmp Solver3
Solver3:iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r11,%r11cmovle %r13,%raxrmmovq %r11,(%rsi)iaddq $8,%rsijmp Solver4
Solver4:mrmovq 32(%rdi),%r12 iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r12,%r12cmovle %r13,%raxrmmovq %r12,(%rsi)iaddq $8,%rsijmp Done
Remain:iaddq $5,%rdx #如果此时为负数说明原来就是0 此时rdx存的是下标0~4jl Donejmp Solveremain #跳转到处理剩余函数的部分
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
然后出去吃了个饭回来看了看别人的博客。得到了启发:直接进行六路展开,>=6的不断跑循环直到<6为止。对于>=6的直接if跳就完事。<6的部分直接对半判断然后开整。<6的部分处理得不够好。只拿了40.分
##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portionxorq %rax,%raxjmp StartLoop1Loop6:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11mrmovq 32(%rdi),%r12mrmovq 40(%rdi),%r13rmmovq %r8,(%rsi)andq %r8,%r8jle L61iaddq $1,%rax
L61: rmmovq %r9,8(%rsi)andq %r9,%r9jle L62iaddq $1,%rax
L62:rmmovq %r10,16(%rsi)andq %r10,%r10jle L63iaddq $1,%rax
L63: rmmovq %r11,24(%rsi)andq %r11,%r11jle L64iaddq $1,%rax
L64:rmmovq %r12,32(%rsi)andq %r12,%r12jle L65iaddq $1,%rax
L65: rmmovq %r13,40(%rsi)andq %r13,%r13jle L66iaddq $1,%rax
L66:iaddq $48,%rdiiaddq $48,%rsi
StartLoop1:iaddq $-6,%rdxjge Loop6iaddq $6,%rdxjmp StartLoop2
Loop2:iaddq $3,%rdxiaddq $-1,%rdxjl Donermmovq %r8,(%rsi)andq %r8,%r8jle L21iaddq $1,%rax
L21: iaddq $-1,%rdxjl Donermmovq %r9,8(%rsi)andq %r9,%r9jle L22iaddq $1,%rax
L22:iaddq $-1,%rdxjl Donermmovq %r10,16(%rsi)andq %r10,%r10jle Doneiaddq $1,%raxjmp DoneLoop3:iaddq $-1,%rdxrmmovq %r8,(%rsi)andq %r8,%r8jle L31iaddq $1,%rax
L31:iaddq $-1,%rdxrmmovq %r9,8(%rsi)andq %r9,%r9jle L32iaddq $1,%rax
L32:iaddq $-1,%rdxrmmovq %r10,16(%rsi)andq %r10,%r10jle L33iaddq $1,%rax
L33:iaddq $-1,%rdxjl Donermmovq %r11,24(%rsi)andq %r11,%r11jle L34iaddq $1,%rax
L34:iaddq $-1,%rdxjl Donermmovq %r12,32(%rsi)andq %r12,%r12jle L35iaddq $1,%rax
L35:iaddq $-1,%rdxjl Donermmovq %r13,40(%rsi)andq %r13,%r13jle Doneiaddq $1,%raxjmp Done
StartLoop2:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9mrmovq 16(%rdi),%r10iaddq $-3,%rdxjle Loop2 iaddq $3,%rdx mrmovq 24(%rdi),%r11mrmovq 32(%rdi),%r12jmp Loop3
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
再去学习了其他人的博客。
由CSAPP4.5.8节,对流水线的优化有:
- 加载/使用冒险: 即在一条从内存读出一个值的指令和一条使用这个值的指令间,流水线必会暂停一个周期
- 预测错误分支: 在分支逻辑发现不该选择分支之前,分支目标处几条指令已经进入流水线了.必须取消这些指令,并从跳转指令后面的那条指令开始取指.可以通过重新架构硬件更改处理器预测逻辑,或者写代码时迎合处理器预测逻辑解决.
还有CSAPP第五章的循环展开+提高并行性。(个人认为这个要求的代码主要也只能继续优化这两点)
于是我们看到若直接把rmmovq 放mrmovq (%rdi),%r8的下面。会有一个加载/冒险冲突。我们中间拿其他可用的代码代替即可。
mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi)
对于下面<6的部分,我们对其二路展开。喜提47.3
##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portionxorq %rax,%raxjmp Start1Loop6:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi) andq %r8,%r8jle L61iaddq $1,%rax
L61: mrmovq 16(%rdi),%r10rmmovq %r9,8(%rsi)andq %r9,%r9jle L62iaddq $1,%rax
L62:mrmovq 24(%rdi),%r11rmmovq %r10,16(%rsi)andq %r10,%r10jle L63iaddq $1,%rax
L63: mrmovq 32(%rdi),%r12rmmovq %r11,24(%rsi)andq %r11,%r11jle L64iaddq $1,%rax
L64:mrmovq 40(%rdi),%r13rmmovq %r12,32(%rsi)andq %r12,%r12jle L65iaddq $1,%rax
L65: rmmovq %r13,40(%rsi)andq %r13,%r13jle L66iaddq $1,%rax
L66:iaddq $48,%rdiiaddq $48,%rsi
Start1:iaddq $-6,%rdxjge Loop6iaddq $6,%rdxjmp Start2
Loop2:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi)andq %r8,%r8jle L21iaddq $1,%rax
L21: rmmovq %r9,8(%rsi)andq %r9,%r9jle L22iaddq $1,%rax
L22:iaddq $16,%rdiiaddq $16,%rsi
Start2:iaddq $-2,%rdx #二路循环jge Loop2 mrmovq (%rdi),%r8iaddq $1,%rdxjne Donermmovq %r8,(%rsi)andq %r8,%r8jle Doneiaddq $1,%rax##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */
后记
看到知乎的那篇文章说按照他的代码再六路展开能上50分.实测那份代码四路能跑48分。
但是有一篇16年的文章四路跑了60分我就比较迷惑了。怀疑是数据水了。copy过来那份代码改了一定的编译问题之后还是无法编译。
暂时先这样了
参考文章1
参考文章2
参考文章3
CSAPP : Arch Lab 解题报告相关推荐
- [精品]CSAPP Bomb Lab 解题报告(七)——隐藏关卡
接上篇[精品]CSAPP Bomb Lab 解题报告(六) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(六)
接上篇[精品]CSAPP Bomb Lab 解题报告(五) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(五)
接上篇[精品]CSAPP Bomb Lab 解题报告(四) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(四)
接上篇[精品]CSAPP Bomb Lab 解题报告(三) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(三)
接上篇[精品]CSAPP Bomb Lab 解题报告(二) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(二)
接上篇[精品]CSAPP Bomb Lab 解题报告(一) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...
- [精品]CSAPP Bomb Lab 解题报告(一)
接上篇堆栈图解CSAPP Bomb Lab实验解析 gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 查看字 ...
- CSAPP Architecture Lab PartC满分
CSAPP Architecture Lab 此lab涉及Y86-64的实现,具体Y86的内容可查看CSAPP第四章,做完本实验可以提高你对处理器设计以及软件与硬件的理解. 从CMU官网下载完所需实验 ...
- uscao 线段树成段更新操作及Lazy思想(POJ3468解题报告)
线段树成段更新操作及Lazy思想(POJ3468解题报告) 标签: treequerybuildn2cstruct 2011-11-03 20:37 5756人阅读 评论(0) 收藏 举报 分类: ...
- 解题报告(十八)数论题目泛做(Codeforces 难度:2000 ~ 3000 + )
整理的算法模板合集: ACM模板 点我看算法全家桶系列!!! 实际上是一个全新的精炼模板整合计划 繁凡出品的全新系列:解题报告系列 -- 超高质量算法题单,配套我写的超高质量的题解和代码,题目难度不一 ...
最新文章
- 如何从SQL Server 中取得字段说明
- 站长必看系列:完全揭密百度和谷歌收录规律
- Algorithm:C++语言实现之内排序、外排序相关算法(插入排序 、锦标赛排序、归并排序)
- java 排序stackoverflow_JAVA开发知识点
- android通讯录上传服务器,Android 实现读取通讯录并上传服务器
- python socket编程_最基础的Python的socket编程入门教程
- Windows 10 IoT Core 17101 for Insider 版本更新
- oracle 从pflie启动,oracle初始化参数文件管理
- vs2005常用的调试方法
- 优秀渐变色彩应用PSD分层海报模板,大神都是这样玩渐变的,一看就懂
- php去掉 部分字符,输出,php如何去除某个字符
- mysql中set和enum使用(简单介绍)
- java的四种取整方法
- 数据结构二叉树算法c语言实现,数据结构与算法 :AVL平衡二叉树C语言实现
- jsp+mysql校园卡管理系统设计与实现
- navicat建mysql数据库密码_Navicat修改MySQL数据库密码的多种方法
- H3C S3610 交换机组播静态路由的配置
- web 视频演示,MP4小视频免费下载
- HEU 2010 France '98
- 太空射击python
热门文章
- 虚幻引擎图文笔记:The emitter is GPU but the fixed bounds checkbox is not set警告的解决
- 第一章 ESP32 PlatformIO IED开发环境搭建
- 为什么有的程序员干不到30岁就转行了?
- Thanos Query Frontend
- html 调用es2015模块,给大家分别介绍一下CommonJS和ES2015的import
- 数论概论笔记 第3章 勾股数组与单位圆
- python伪装ip地址_python伪造ip
- android悬浮功能实现,Android利用悬浮按钮实现翻页效果
- android 学习笔记 (for 黎活明讲师)
- Spring Data 数据库建模最佳实践