准备

官网下好解压。

载入tar文件,运用 tar xvf archlab-handout.tar将文件解压。里面包含README, Makefile, sim.tar, archlab.ps, archlab.pdf, and simguide.pdf.

于是你可能有以下问题

如果出现can not locate 就是镜像源不行。可以去网上搜个阿里云的。然后再把/etc/apt/sources.list把里面的网址都换了。
换后注意sudo update
/usr/bin/ld: cannot find -lfl
sudo apt-get install flex/usr/bin/ld: cannot find -ltk
/usr/bin/ld: cannot find -ltclsudo apt-get install tk8.5
sudo apt-get install tcl8.5同时把自己的实验文件Makefile修改了。修改格式如下:# Comment this out if you don't have Tcl/Tk on your system#GUIMODE=-DHAS_GUI# Modify the following line so that gcc can find the libtcl.so and
# libtk.so libraries on your system. You may need to use the -L option
# to tell gcc which directory to look in. Comment this out if you
# don't have Tcl/Tk.TKLIBS=-L/usr/lib -ltk8.5 -ltcl8.5   /*改成这样*/# Modify the following line so that gcc can find the tcl.h and tk.h
# header files on your system. Comment this out if you don't have
# Tcl/Tk.TKINC=-isystem /usr/include/tcl8.5最后重新make clean ;make就可以了若之后出现同样问题照做。后面有个实验需要把Makefile里面的含GUI的一行给删除掉

TESTA

手写Y86汇编。要实现的函数在example.c中。本想着偷懒直接反汇编把得到的反汇编文件改成Y86。发现反汇编出来的代码更麻烦。所以还是手写吧。

对着书上第四章的一个大例子模仿出来

自己新建一个文件 vim sum_list.ys

三者的结果均在%rax中,若没有%rax的变化即代码存在bug。%rax均是cba

相关编译运行代码如下
unix > ./yas A-sum.ys
unix > ./yis A-sum.yo

# sum_list.ys example.c#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt# Sample linked list.align 8ele1:.quad 0x00a.quad ele2ele2:.quad 0x0b0.quad ele3ele3:.quad 0xc00.quad 0main:irmovq ele1,%rdicall sum_listretsum_list:xorq %rax,%rax #rax=0jmp testloop:mrmovq (%rdi),%r10addq %r10,%raxmrmovq 8(%rdi),%rdi test:   andq %rdi,%rdijne loopret#Stack starts here and grows to lower addresses.pos 0x100stack: 

这里直接写递归,保存寄存器到栈里去然后递归

# sum_list.ys example.c
#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt
# Sample linked list.align 8ele1:.quad 0x00a.quad ele2ele2:.quad 0x0b0.quad ele3ele3:.quad 0xc00.quad 0
main:irmovq ele1,%rdicall sum_listret
sum_list:xorq %rax,%rax #rax=0andq %rdi,%rdi   je return mrmovq (%rdi),%r10 #long val =ls-valpushq %r10mrmovq 8(%rdi),%rdi        call sum_listpopq %rbxaddq %rbx,%raxret
return:ret
#Stack starts here and grows to lower addresses.pos 0x1000
stack:
#Excution begins at address 0.pos 0irmovq stack, %rspcall mainhalt
.align 8
#Source block
src:.quad 0x00a.quad 0x0b0.quad 0xc00
# Destination block
dest:.quad 0x111.quad 0x222.quad 0x333main:xorq %rax,%rax #long result=0irmovq src,%rdiirmovq dest,%rsi    irmovq $1,%r9irmovq $3,%r8irmovq $8,%r11andq %r8,%r8jmp test
loop:mrmovq (%rdi),%rcxaddq %r11,%rdi rmmovq %rcx,(%rsi)addq %r11,%rsixorq %rcx,%raxsubq %r9,%r8
test:   jne loop    ret
#Stack starts here and grows to lower addresses.pos 0x100
stack:

TESTB

根据第四章流水线的讲解,结合opq和irmovq的表格来写。

得出的iaddq格式如下

阶段 iaddq V,rB
取指 icode:ifun <-- M1[PC]rA:rB <-- M1[PC+1]valC <-- M8[PC+2]valP <-- PC+10
译码 valB <-- R[rB]
执行 valE <-- valB+valCset CC
访存 None
写回 R[rB] <-- valE
更新 PC <-- valP

我们在sim/seq/seq-full.hcl里添加"IIADDQ",这里就要结合书上的知识判每个顺序过程

#/* $begin seq-all-hcl */
####################################################################
#  HCL Description of Control for Single Cycle Y86-64 Processor SEQ   #
#  Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2010       #
###################################################################### Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work####################################################################
#    C Include's.  Don't alter these                               #
####################################################################quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'word_t gen_pc(){return 0;}'
quote 'int main(int argc, char *argv[])'
quote '  {plusmode=0;return sim_main(argc,argv);}'####################################################################
#    Declarations.  Do not change/remove/delete any of these       #
######################################################################### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP    'I_NOP'
wordsig IHALT   'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ    'I_ALU'
wordsig IJXX    'I_JMP'
wordsig ICALL   'I_CALL'
wordsig IRET    'I_RET'
wordsig IPUSHQ  'I_PUSHQ'
wordsig IPOPQ   'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ  'I_IADDQ'##### Symbolic represenations of Y86-64 function codes                  #####
wordsig FNONE    'F_NONE'        # Default function code##### Symbolic representation of Y86-64 Registers referenced explicitly #####
wordsig RRSP     'REG_RSP'        # Stack Pointer
wordsig RNONE    'REG_NONE'       # Special value indicating "no register"##### ALU Functions referenced explicitly                            #####
wordsig ALUADD  'A_ADD'       # ALU should add its arguments##### Possible instruction status values                             #####
wordsig SAOK    'STAT_AOK'    # Normal execution
wordsig SADR    'STAT_ADR'    # Invalid memory address
wordsig SINS    'STAT_INS'    # Invalid instruction
wordsig SHLT    'STAT_HLT'    # Halt instruction encountered##### Signals that can be referenced by control logic ######################### Fetch stage inputs        #####
wordsig pc 'pc'               # Program counter
##### Fetch stage computations      #####
wordsig imem_icode 'imem_icode'       # icode field from instruction memory
wordsig imem_ifun  'imem_ifun'        # ifun field from instruction memory
wordsig icode     'icode'     # Instruction control code
wordsig ifun      'ifun'      # Instruction function
wordsig rA    'ra'            # rA field from instruction
wordsig rB    'rb'            # rB field from instruction
wordsig valC      'valc'      # Constant from instruction
wordsig valP      'valp'      # Address of following instruction
boolsig imem_error 'imem_error'       # Error signal from instruction memory
boolsig instr_valid 'instr_valid' # Is fetched instruction valid?##### Decode stage computations      #####
wordsig valA    'vala'            # Value from register A port
wordsig valB    'valb'            # Value from register B port##### Execute stage computations    #####
wordsig valE    'vale'            # Value computed by ALU
boolsig Cnd 'cond'            # Branch test##### Memory stage computations        #####
wordsig valM    'valm'            # Value read from memory
boolsig dmem_error 'dmem_error'       # Error signal from data memory####################################################################
#    Control Signal Definitions.                                   #
#################################################################################### Fetch Stage     #################################### Determine instruction code
word icode = [imem_error: INOP;1: imem_icode;      # Default: get from instruction memory
];# Determine instruction function
word ifun = [imem_error: FNONE;1: imem_ifun;       # Default: get from instruction memory
];bool instr_valid = icode in { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ ,IIADDQ };# Does fetched instruction require a regid byte?
bool need_regids =icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };# Does fetched instruction require a constant word?
bool need_valC =icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };################ Decode Stage    ##################################### What register should be used as the A source?
word srcA = [icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ  } : rA;icode in { IPOPQ, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the B source?
word srcB = [icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ  } : rB;icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE;  # Don't need register
];## What register should be used as the E destination?
word dstE = [icode in { IRRMOVQ } && Cnd : rB;icode in { IIRMOVQ, IOPQ,IIADDQ } : rB;icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE;  # Don't write any register
];## What register should be used as the M destination?
word dstM = [icode in { IMRMOVQ, IPOPQ } : rA;1 : RNONE;  # Don't write any register
];################ Execute Stage   ##################################### Select input A to ALU
word aluA = [icode in { IRRMOVQ, IOPQ } : valA;icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : valC;icode in { ICALL, IPUSHQ } : -8;icode in { IRET, IPOPQ } : 8;# Other instructions don't need ALU
];## Select input B to ALU
word aluB = [icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, IPUSHQ, IRET, IPOPQ,IIADDQ } : valB;icode in { IRRMOVQ, IIRMOVQ } : 0;# Other instructions don't need ALU
];## Set the ALU function
word alufun = [icode == IOPQ : ifun;1 : ALUADD;
];## Should the condition codes be updated?
bool set_cc = icode in { IOPQ,IIADDQ };################ Memory Stage    ##################################### Set read control signal
bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET };## Set write control signal
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };## Select memory address
word mem_addr = [icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;icode in { IPOPQ, IRET } : valA;# Other instructions don't need address
];## Select memory input data
word mem_data = [# Value from registericode in { IRMMOVQ, IPUSHQ } : valA;# Return PCicode == ICALL : valP;# Default: Don't write anything
];## Determine instruction status
word Stat = [imem_error || dmem_error : SADR;!instr_valid: SINS;icode == IHALT : SHLT;1 : SAOK;
];################ Program Counter Update ############################## What address should instruction be fetched atword new_pc = [# Call.  Use instruction constanticode == ICALL : valC;# Taken branch.  Use instruction constanticode == IJXX && Cnd : valC;# Completion of RET instruction.  Use value from stackicode == IRET : valM;# Default: Use incremented PC1 : valP;
];
#/* $end seq-all-hcl */

TESTC

最后这个lab,做的有点无语。首先把上面的iaddq指令放到这次的hcl里面。修改pipe-full.hcl

#/* $begin pipe-all-hcl */
####################################################################
#    HCL Description of Control for Pipelined Y86-64 Processor     #
#    Copyright (C) Randal E. Bryant, David R. O'Hallaron, 2014     #
###################################################################### Your task is to implement the iaddq instruction
## The file contains a declaration of the icodes
## for iaddq (IIADDQ)
## Your job is to add the rest of the logic to make it work####################################################################
#    C Include's.  Don't alter these                               #
####################################################################quote '#include <stdio.h>'
quote '#include "isa.h"'
quote '#include "pipeline.h"'
quote '#include "stages.h"'
quote '#include "sim.h"'
quote 'int sim_main(int argc, char *argv[]);'
quote 'int main(int argc, char *argv[]){return sim_main(argc,argv);}'####################################################################
#    Declarations.  Do not change/remove/delete any of these       #
######################################################################### Symbolic representation of Y86-64 Instruction Codes #############
wordsig INOP    'I_NOP'
wordsig IHALT   'I_HALT'
wordsig IRRMOVQ 'I_RRMOVQ'
wordsig IIRMOVQ 'I_IRMOVQ'
wordsig IRMMOVQ 'I_RMMOVQ'
wordsig IMRMOVQ 'I_MRMOVQ'
wordsig IOPQ    'I_ALU'
wordsig IJXX    'I_JMP'
wordsig ICALL   'I_CALL'
wordsig IRET    'I_RET'
wordsig IPUSHQ  'I_PUSHQ'
wordsig IPOPQ   'I_POPQ'
# Instruction code for iaddq instruction
wordsig IIADDQ  'I_IADDQ'##### Symbolic represenations of Y86-64 function codes            #####
wordsig FNONE    'F_NONE'        # Default function code##### Symbolic representation of Y86-64 Registers referenced      #####
wordsig RRSP     'REG_RSP'             # Stack Pointer
wordsig RNONE    'REG_NONE'            # Special value indicating "no register"##### ALU Functions referenced explicitly ##########################
wordsig ALUADD  'A_ADD'            # ALU should add its arguments##### Possible instruction status values                       #####
wordsig SBUB    'STAT_BUB'    # Bubble in stage
wordsig SAOK    'STAT_AOK'    # Normal execution
wordsig SADR    'STAT_ADR'    # Invalid memory address
wordsig SINS    'STAT_INS'    # Invalid instruction
wordsig SHLT    'STAT_HLT'    # Halt instruction encountered##### Signals that can be referenced by control logic ################### Pipeline Register F ##########################################wordsig F_predPC 'pc_curr->pc'        # Predicted value of PC##### Intermediate Values in Fetch Stage ###########################wordsig imem_icode  'imem_icode'      # icode field from instruction memory
wordsig imem_ifun   'imem_ifun'       # ifun  field from instruction memory
wordsig f_icode 'if_id_next->icode'  # (Possibly modified) instruction code
wordsig f_ifun  'if_id_next->ifun'   # Fetched instruction function
wordsig f_valC  'if_id_next->valc'   # Constant data of fetched instruction
wordsig f_valP  'if_id_next->valp'   # Address of following instruction
boolsig imem_error 'imem_error'        # Error signal from instruction memory
boolsig instr_valid 'instr_valid'    # Is fetched instruction valid?##### Pipeline Register D ##########################################
wordsig D_icode 'if_id_curr->icode'   # Instruction code
wordsig D_rA 'if_id_curr->ra'       # rA field from instruction
wordsig D_rB 'if_id_curr->rb'       # rB field from instruction
wordsig D_valP 'if_id_curr->valp'     # Incremented PC##### Intermediate Values in Decode Stage  #########################wordsig d_srcA    'id_ex_next->srca'  # srcA from decoded instruction
wordsig d_srcB   'id_ex_next->srcb'  # srcB from decoded instruction
wordsig d_rvalA 'd_regvala'        # valA read from register file
wordsig d_rvalB 'd_regvalb'        # valB read from register file##### Pipeline Register E ##########################################
wordsig E_icode 'id_ex_curr->icode'   # Instruction code
wordsig E_ifun  'id_ex_curr->ifun'    # Instruction function
wordsig E_valC  'id_ex_curr->valc'    # Constant data
wordsig E_srcA  'id_ex_curr->srca'    # Source A register ID
wordsig E_valA  'id_ex_curr->vala'    # Source A value
wordsig E_srcB  'id_ex_curr->srcb'    # Source B register ID
wordsig E_valB  'id_ex_curr->valb'    # Source B value
wordsig E_dstE 'id_ex_curr->deste'    # Destination E register ID
wordsig E_dstM 'id_ex_curr->destm'    # Destination M register ID##### Intermediate Values in Execute Stage #########################
wordsig e_valE 'ex_mem_next->vale' # valE generated by ALU
boolsig e_Cnd 'ex_mem_next->takebranch' # Does condition hold?
wordsig e_dstE 'ex_mem_next->deste'      # dstE (possibly modified to be RNONE)##### Pipeline Register M                  #########################
wordsig M_stat 'ex_mem_curr->status'     # Instruction status
wordsig M_icode 'ex_mem_curr->icode'   # Instruction code
wordsig M_ifun  'ex_mem_curr->ifun'    # Instruction function
wordsig M_valA  'ex_mem_curr->vala'      # Source A value
wordsig M_dstE 'ex_mem_curr->deste'    # Destination E register ID
wordsig M_valE  'ex_mem_curr->vale'      # ALU E value
wordsig M_dstM 'ex_mem_curr->destm'    # Destination M register ID
boolsig M_Cnd 'ex_mem_curr->takebranch'    # Condition flag
boolsig dmem_error 'dmem_error'           # Error signal from instruction memory##### Intermediate Values in Memory Stage ##########################
wordsig m_valM 'mem_wb_next->valm' # valM generated by memory
wordsig m_stat 'mem_wb_next->status'   # stat (possibly modified to be SADR)##### Pipeline Register W ##########################################
wordsig W_stat 'mem_wb_curr->status'     # Instruction status
wordsig W_icode 'mem_wb_curr->icode'   # Instruction code
wordsig W_dstE 'mem_wb_curr->deste'    # Destination E register ID
wordsig W_valE  'mem_wb_curr->vale'      # ALU E value
wordsig W_dstM 'mem_wb_curr->destm'    # Destination M register ID
wordsig W_valM  'mem_wb_curr->valm'    # Memory M value####################################################################
#    Control Signal Definitions.                                   #
#################################################################################### Fetch Stage     ##################################### What address should instruction be fetched at
word f_pc = [# Mispredicted branch.  Fetch at incremented PCM_icode == IJXX && !M_Cnd : M_valA;# Completion of RET instructionW_icode == IRET : W_valM;# Default: Use predicted value of PC1 : F_predPC;
];## Determine icode of fetched instruction
word f_icode = [imem_error : INOP;1: imem_icode;
];# Determine ifun
word f_ifun = [imem_error : FNONE;1: imem_ifun;
];# Is instruction valid?
bool instr_valid = f_icode in { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ,IIADDQ };# Determine status code for fetched instruction
word f_stat = [imem_error: SADR;!instr_valid : SINS;f_icode == IHALT : SHLT;1 : SAOK;
];# Does fetched instruction require a regid byte?
bool need_regids =f_icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ };
# Does fetched instruction require a constant word?
bool need_valC =f_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL,IIADDQ };# Predict next value of PC
word f_predPC = [f_icode in { IJXX, ICALL } : f_valC;1 : f_valP;
];################ Decode Stage ######################################## What register should be used as the A source?
word d_srcA = [D_icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ  } : D_rA;D_icode in { IPOPQ, IRET } : RRSP;1 : RNONE; # Don't need register
];## What register should be used as the B source?
word d_srcB = [D_icode in { IOPQ, IRMMOVQ, IMRMOVQ,IIADDQ  } : D_rB;D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE;  # Don't need register
];## What register should be used as the E destination?
word d_dstE = [D_icode in { IRRMOVQ, IIRMOVQ, IOPQ,IIADDQ} : D_rB;D_icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;1 : RNONE;  # Don't write any register
];## What register should be used as the M destination?
word d_dstM = [D_icode in { IMRMOVQ, IPOPQ } : D_rA;1 : RNONE;  # Don't write any register
];## What should be the A value?
## Forward into decode stage for valA
word d_valA = [D_icode in { ICALL, IJXX } : D_valP; # Use incremented PCd_srcA == e_dstE : e_valE;    # Forward valE from executed_srcA == M_dstM : m_valM;    # Forward valM from memoryd_srcA == M_dstE : M_valE;    # Forward valE from memoryd_srcA == W_dstM : W_valM;    # Forward valM from write backd_srcA == W_dstE : W_valE;    # Forward valE from write back1 : d_rvalA;  # Use value read from register file
];word d_valB = [d_srcB == e_dstE : e_valE;    # Forward valE from executed_srcB == M_dstM : m_valM;    # Forward valM from memoryd_srcB == M_dstE : M_valE;    # Forward valE from memoryd_srcB == W_dstM : W_valM;    # Forward valM from write backd_srcB == W_dstE : W_valE;    # Forward valE from write back1 : d_rvalB;  # Use value read from register file
];################ Execute Stage ####################################### Select input A to ALU
word aluA = [E_icode in { IRRMOVQ, IOPQ } : E_valA;E_icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ,IIADDQ } : E_valC;E_icode in { ICALL, IPUSHQ } : -8;E_icode in { IRET, IPOPQ } : 8;# Other instructions don't need ALU
];## Select input B to ALU
word aluB = [E_icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, IPUSHQ, IRET, IPOPQ,IIADDQ } : E_valB;E_icode in { IRRMOVQ, IIRMOVQ } : 0;# Other instructions don't need ALU
];## Set the ALU function
word alufun = [E_icode == IOPQ : E_ifun;1 : ALUADD;
];## Should the condition codes be updated?
bool set_cc = E_icode in {IIADDQ,IOPQ} &&# State changes only during normal operation!m_stat in { SADR, SINS, SHLT } && !W_stat in { SADR, SINS, SHLT };## Generate valA in execute stage
word e_valA = E_valA;    # Pass valA through stage## Set dstE to RNONE in event of not-taken conditional move
word e_dstE = [E_icode == IRRMOVQ && !e_Cnd : RNONE;1 : E_dstE;
];################ Memory Stage ######################################## Select memory address
word mem_addr = [M_icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : M_valE;M_icode in { IPOPQ, IRET } : M_valA;# Other instructions don't need address
];## Set read control signal
bool mem_read = M_icode in { IMRMOVQ, IPOPQ, IRET };## Set write control signal
bool mem_write = M_icode in { IRMMOVQ, IPUSHQ, ICALL };#/* $begin pipe-m_stat-hcl */
## Update the status
word m_stat = [dmem_error : SADR;1 : M_stat;
];
#/* $end pipe-m_stat-hcl */## Set E port register ID
word w_dstE = W_dstE;## Set E port value
word w_valE = W_valE;## Set M port register ID
word w_dstM = W_dstM;## Set M port value
word w_valM = W_valM;## Update processor status
word Stat = [W_stat == SBUB : SAOK;1 : W_stat;
];################ Pipeline Register Control ########################## Should I stall or inject a bubble into Pipeline Register F?
# At most one of these can be true.
bool F_bubble = 0;
bool F_stall =# Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };# Should I stall or inject a bubble into Pipeline Register D?
# At most one of these can be true.
bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB };bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Cnd) ||# Stalling at fetch while ret passes through pipeline# but not condition for a load/use hazard!(E_icode in { IMRMOVQ, IPOPQ } && E_dstM in { d_srcA, d_srcB }) &&IRET in { D_icode, E_icode, M_icode };# Should I stall or inject a bubble into Pipeline Register E?
# At most one of these can be true.
bool E_stall = 0;
bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Cnd) ||# Conditions for a load/use hazardE_icode in { IMRMOVQ, IPOPQ } &&E_dstM in { d_srcA, d_srcB};# Should I stall or inject a bubble into Pipeline Register M?
# At most one of these can be true.
bool M_stall = 0;
# Start injecting bubbles as soon as exception passes through memory stage
bool M_bubble = m_stat in { SADR, SINS, SHLT } || W_stat in { SADR, SINS, SHLT };# Should I stall or inject a bubble into Pipeline Register W?
bool W_stall = W_stat in { SADR, SINS, SHLT };
bool W_bubble = 0;
#/* $end pipe-all-hcl */

测试编译:

make VERSION=full
./correctness.pl #结果是否正确
./benchmark.pl #得出分数

开始尝试六路展开,然后把条件跳转换成条件转移。

测完了之后喜提0分。
因为条件转移要的指令更多。

0分代码

#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src, %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portion# Loop headerxorq %rax,%rax        # count = 0;
Loop:iaddq $-6,%rdxjl Remain        # 先判断剩下的长度是否<6,进入特判;不然循环做 iaddq $6,%rdx       # 把长度变回来,最后再减掉 mrmovq (%rdi),%r8            mrmovq 8(%rdi),%r9      rrmovq %rax,%r13 iaddq $1,%rax   andq %r8,%r8cmovle %r13,%raxrmmovq %r8,(%rsi)iaddq $8,%rsi jmp S2
S2:rrmovq %rax,%r13iaddq $1,%raxandq %r9,%r9cmovle %r13,%raxrmmovq %r9,(%rsi)iaddq $8,%rsijmp S3
S3:mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11rrmovq %rax,%r13iaddq $1,%raxandq %r10,%r10cmovle %r13,%raxrmmovq %r10,(%rsi)iaddq $8,%rsijmp S4
S4:rrmovq %rax,%r13iaddq $1,%raxandq %r11,%r11cmovle %r13,%raxrmmovq %r11,(%rsi)iaddq $8,%rsijmp S5
S5:mrmovq 32(%rdi),%r12mrmovq 40(%rdi),%r14rrmovq %rax,%r13iaddq $1,%raxandq %r12,%r12cmovle %r13,%raxrmmovq %r12,(%rsi)iaddq $8,%rsijmp S6
S6:rrmovq %rax,%r13iaddq $1,%raxandq %r14,%r14cmovle %r13,%raxrmmovq %r14,(%rsi)iaddq $8,%rsiiaddq $-6,%rdxiaddq $48,%rdijmp Loop
#####################################################################
Solveremain:mrmovq (%rdi),%r8   mrmovq 8(%rdi),%r9rrmovq %rax,%r13  iaddq $1,%rax       #条件转移andq %r8,%r8cmovle %r13,%raxrmmovq %r8,(%rsi)  iaddq $8,%rsi   jmp Solver1
Solver1:iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r9,%r9cmovle %r13,%raxrmmovq %r9,(%rsi)iaddq $8,%rsijmp Solver2
Solver2:mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r10,%r10cmovle %r13,%raxrmmovq %r10,(%rsi)iaddq $8,%rsijmp Solver3
Solver3:iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r11,%r11cmovle %r13,%raxrmmovq %r11,(%rsi)iaddq $8,%rsijmp Solver4
Solver4:mrmovq 32(%rdi),%r12    iaddq $-1,%rdxjl Donerrmovq %rax,%r13iaddq $1,%raxandq %r12,%r12cmovle %r13,%raxrmmovq %r12,(%rsi)iaddq $8,%rsijmp Done
Remain:iaddq $5,%rdx        #如果此时为负数说明原来就是0 此时rdx存的是下标0~4jl Donejmp Solveremain     #跳转到处理剩余函数的部分
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

然后出去吃了个饭回来看了看别人的博客。得到了启发:直接进行六路展开,>=6的不断跑循环直到<6为止。对于>=6的直接if跳就完事。<6的部分直接对半判断然后开整。<6的部分处理得不够好。只拿了40.分

##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portionxorq %rax,%raxjmp StartLoop1Loop6:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9mrmovq 16(%rdi),%r10mrmovq 24(%rdi),%r11mrmovq 32(%rdi),%r12mrmovq 40(%rdi),%r13rmmovq %r8,(%rsi)andq %r8,%r8jle L61iaddq $1,%rax
L61:    rmmovq %r9,8(%rsi)andq %r9,%r9jle L62iaddq $1,%rax
L62:rmmovq %r10,16(%rsi)andq %r10,%r10jle L63iaddq $1,%rax
L63:    rmmovq %r11,24(%rsi)andq %r11,%r11jle L64iaddq $1,%rax
L64:rmmovq %r12,32(%rsi)andq %r12,%r12jle L65iaddq $1,%rax
L65:    rmmovq %r13,40(%rsi)andq %r13,%r13jle L66iaddq $1,%rax
L66:iaddq $48,%rdiiaddq $48,%rsi
StartLoop1:iaddq $-6,%rdxjge Loop6iaddq $6,%rdxjmp StartLoop2
Loop2:iaddq $3,%rdxiaddq $-1,%rdxjl  Donermmovq %r8,(%rsi)andq %r8,%r8jle L21iaddq $1,%rax
L21:    iaddq $-1,%rdxjl Donermmovq %r9,8(%rsi)andq %r9,%r9jle L22iaddq $1,%rax
L22:iaddq $-1,%rdxjl Donermmovq %r10,16(%rsi)andq %r10,%r10jle Doneiaddq $1,%raxjmp DoneLoop3:iaddq $-1,%rdxrmmovq %r8,(%rsi)andq %r8,%r8jle L31iaddq $1,%rax
L31:iaddq $-1,%rdxrmmovq %r9,8(%rsi)andq %r9,%r9jle L32iaddq $1,%rax
L32:iaddq $-1,%rdxrmmovq %r10,16(%rsi)andq %r10,%r10jle L33iaddq $1,%rax
L33:iaddq $-1,%rdxjl Donermmovq %r11,24(%rsi)andq %r11,%r11jle L34iaddq $1,%rax
L34:iaddq $-1,%rdxjl Donermmovq %r12,32(%rsi)andq %r12,%r12jle L35iaddq $1,%rax
L35:iaddq $-1,%rdxjl Donermmovq %r13,40(%rsi)andq %r13,%r13jle Doneiaddq $1,%raxjmp Done
StartLoop2:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9mrmovq 16(%rdi),%r10iaddq $-3,%rdxjle Loop2 iaddq $3,%rdx mrmovq 24(%rdi),%r11mrmovq 32(%rdi),%r12jmp Loop3
##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

再去学习了其他人的博客。

由CSAPP4.5.8节,对流水线的优化有:

  • 加载/使用冒险: 即在一条从内存读出一个值的指令和一条使用这个值的指令间,流水线必会暂停一个周期
  • 预测错误分支: 在分支逻辑发现不该选择分支之前,分支目标处几条指令已经进入流水线了.必须取消这些指令,并从跳转指令后面的那条指令开始取指.可以通过重新架构硬件更改处理器预测逻辑,或者写代码时迎合处理器预测逻辑解决.

还有CSAPP第五章的循环展开+提高并行性。(个人认为这个要求的代码主要也只能继续优化这两点)

于是我们看到若直接把rmmovq 放mrmovq (%rdi),%r8的下面。会有一个加载/冒险冲突。我们中间拿其他可用的代码代替即可。

    mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi)

对于下面<6的部分,我们对其二路展开。喜提47.3

##################################################################
# You can modify this portion
#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src , %rsi = dst, %rdx = len
ncopy:##################################################################
# You can modify this portionxorq %rax,%raxjmp Start1Loop6:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi) andq %r8,%r8jle L61iaddq $1,%rax
L61:    mrmovq 16(%rdi),%r10rmmovq %r9,8(%rsi)andq %r9,%r9jle L62iaddq $1,%rax
L62:mrmovq 24(%rdi),%r11rmmovq %r10,16(%rsi)andq %r10,%r10jle L63iaddq $1,%rax
L63:    mrmovq 32(%rdi),%r12rmmovq %r11,24(%rsi)andq %r11,%r11jle L64iaddq $1,%rax
L64:mrmovq 40(%rdi),%r13rmmovq %r12,32(%rsi)andq %r12,%r12jle L65iaddq $1,%rax
L65:    rmmovq %r13,40(%rsi)andq %r13,%r13jle L66iaddq $1,%rax
L66:iaddq $48,%rdiiaddq $48,%rsi
Start1:iaddq $-6,%rdxjge Loop6iaddq $6,%rdxjmp Start2
Loop2:mrmovq (%rdi),%r8mrmovq 8(%rdi),%r9rmmovq %r8,(%rsi)andq %r8,%r8jle L21iaddq $1,%rax
L21:    rmmovq %r9,8(%rsi)andq %r9,%r9jle L22iaddq $1,%rax
L22:iaddq $16,%rdiiaddq $16,%rsi
Start2:iaddq $-2,%rdx   #二路循环jge Loop2 mrmovq (%rdi),%r8iaddq $1,%rdxjne Donermmovq %r8,(%rsi)andq %r8,%r8jle Doneiaddq $1,%rax##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

后记

看到知乎的那篇文章说按照他的代码再六路展开能上50分.实测那份代码四路能跑48分。

但是有一篇16年的文章四路跑了60分我就比较迷惑了。怀疑是数据水了。copy过来那份代码改了一定的编译问题之后还是无法编译。

暂时先这样了

参考文章1

参考文章2

参考文章3

CSAPP : Arch Lab 解题报告相关推荐

  1. [精品]CSAPP Bomb Lab 解题报告(七)——隐藏关卡

    接上篇[精品]CSAPP Bomb Lab 解题报告(六) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  2. [精品]CSAPP Bomb Lab 解题报告(六)

    接上篇[精品]CSAPP Bomb Lab 解题报告(五) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  3. [精品]CSAPP Bomb Lab 解题报告(五)

    接上篇[精品]CSAPP Bomb Lab 解题报告(四) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  4. [精品]CSAPP Bomb Lab 解题报告(四)

    接上篇[精品]CSAPP Bomb Lab 解题报告(三) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  5. [精品]CSAPP Bomb Lab 解题报告(三)

    接上篇[精品]CSAPP Bomb Lab 解题报告(二) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  6. [精品]CSAPP Bomb Lab 解题报告(二)

    接上篇[精品]CSAPP Bomb Lab 解题报告(一) gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 ...

  7. [精品]CSAPP Bomb Lab 解题报告(一)

    接上篇堆栈图解CSAPP Bomb Lab实验解析 gdb常用指令 设置Intel代码格式:set disassembly-flavor intel 查看反汇编代码:disas phase_1 查看字 ...

  8. CSAPP Architecture Lab PartC满分

    CSAPP Architecture Lab 此lab涉及Y86-64的实现,具体Y86的内容可查看CSAPP第四章,做完本实验可以提高你对处理器设计以及软件与硬件的理解. 从CMU官网下载完所需实验 ...

  9. uscao 线段树成段更新操作及Lazy思想(POJ3468解题报告)

    线段树成段更新操作及Lazy思想(POJ3468解题报告) 标签: treequerybuildn2cstruct 2011-11-03 20:37 5756人阅读 评论(0) 收藏 举报  分类: ...

  10. 解题报告(十八)数论题目泛做(Codeforces 难度:2000 ~ 3000 + )

    整理的算法模板合集: ACM模板 点我看算法全家桶系列!!! 实际上是一个全新的精炼模板整合计划 繁凡出品的全新系列:解题报告系列 -- 超高质量算法题单,配套我写的超高质量的题解和代码,题目难度不一 ...

最新文章

  1. 如何从SQL Server 中取得字段说明
  2. 站长必看系列:完全揭密百度和谷歌收录规律
  3. Algorithm:C++语言实现之内排序、外排序相关算法(插入排序 、锦标赛排序、归并排序)
  4. java 排序stackoverflow_JAVA开发知识点
  5. android通讯录上传服务器,Android 实现读取通讯录并上传服务器
  6. python socket编程_最基础的Python的socket编程入门教程
  7. Windows 10 IoT Core 17101 for Insider 版本更新
  8. oracle 从pflie启动,oracle初始化参数文件管理
  9. vs2005常用的调试方法
  10. 优秀渐变色彩应用PSD分层海报模板,大神都是这样玩渐变的,一看就懂
  11. php去掉 部分字符,输出,php如何去除某个字符
  12. mysql中set和enum使用(简单介绍)
  13. java的四种取整方法
  14. 数据结构二叉树算法c语言实现,数据结构与算法 :AVL平衡二叉树C语言实现
  15. jsp+mysql校园卡管理系统设计与实现
  16. navicat建mysql数据库密码_Navicat修改MySQL数据库密码的多种方法
  17. H3C S3610 交换机组播静态路由的配置
  18. web 视频演示,MP4小视频免费下载
  19. HEU 2010 France '98
  20. 太空射击python

热门文章

  1. 虚幻引擎图文笔记:The emitter is GPU but the fixed bounds checkbox is not set警告的解决
  2. 第一章 ESP32 PlatformIO IED开发环境搭建
  3. 为什么有的程序员干不到30岁就转行了?
  4. Thanos Query Frontend
  5. html 调用es2015模块,给大家分别介绍一下CommonJS和ES2015的import
  6. 数论概论笔记 第3章 勾股数组与单位圆
  7. python伪装ip地址_python伪造ip
  8. android悬浮功能实现,Android利用悬浮按钮实现翻页效果
  9. android 学习笔记 (for 黎活明讲师)
  10. Spring Data 数据库建模最佳实践