linux内核启动分析(一)
文章目录
- 1.HEAD
- 1.preserve_boot_args
- 1.1 __inval_dcache_area
- 2.el2_setup
- 3. set_cpu_boot_mode_flag
- 4. __create_page_tables
- 4.1map_memory
- 5. __cpu_setup
- 6. __primary_switch
- 6.1 __enable_mmu
- 6.2 __primary_switched
最近工作中经常使用飞腾E2000的开发版,也遇到一些启动问题,所以追踪了一下linux内核启动流程。麒麟的代码我们看不了,但是我们可以直接看飞腾的开源内核代码,点击这里可以跳到gitee,我们使用的是5.10分支。
看这个文章需要的前置汇编知识点:
- .quad表示定义一个4字节的变量。.long表示定义一个8字节的变量。
- SYM_CODE_START表示定义一个函数。定义了之后可以通过bl或者b跳转到这个函数。
有一些指令不知道什么回事,可以看下一篇文章:看linux内核启动流程需要的汇编指令解释。
飞腾E2000的开发版可以使用uboot和UEFI,无论是uboot还是UEFI加载linux内核,并且启动linux内核都是从arch/arm64/kernel/head.S文件的_head这里开始运行的。下面我们来开始分析吧:
1.HEAD
__HEAD
_head:/** DO NOT MODIFY. Image header expected by Linux boot-loaders.*/
#ifdef CONFIG_EFI/** This add instruction has no meaningful effect except that* its opcode forms the magic "MZ" signature required by UEFI.*/add x13, x18, #0x16b primary_entry
#elseb primary_entry // branch to kernel start, magic.long 0 // reserved
#endif.quad 0 // Image load offset from start of RAM, little-endianle64sym _kernel_size_le // Effective size of kernel image, little-endianle64sym _kernel_flags_le // Informative flags, little-endian.quad 0 // reserved.quad 0 // reserved.quad 0 // reserved.ascii ARM64_IMAGE_MAGIC // Magic number
#ifdef CONFIG_EFI.long pe_header - _head // Offset to the PE header.pe_header:__EFI_PE_HEADER
#else.long 0 // reserved
#endif
在_head里面只跑一个函数,就是primary_entry:
SYM_CODE_START(primary_entry)bl preserve_boot_args //保留引导加载程序中传递的参数到boot_args中bl el2_setup // 判断目前是EL1还是EL2,//如果是EL1就简单了,配置sctlr_el1寄存器就好了。//如果是EL2就复杂了,需要配置sctlr_el2寄存器,配置内存,hcr,gicadrp x23, __PHYS_OFFSETand x23, x23, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0bl set_cpu_boot_mode_flag //把其他cpu都配置成跟cpu0同样的特权等级bl __create_page_tables //创建页表/** The following calls CPU setup code, see arch/arm64/mm/proc.S for* details.* On return, the CPU will be ready for the MMU to be turned on and* the TCR will have been set.*/bl __cpu_setup // 初始化处理器以打开MMU。b __primary_switch //设置TTBR0和TTBR1,使能MMU,将kernel image重定位,跳转到__primary_switchedSYM_CODE_END(primary_entry)
primary_entry主要执行了以下几个步骤:
- 调用函数preserve_boot_args保留引导加载程序中传递的参数到boot_args中
- 调用函数el2_setup判断目前是EL1还是EL2,如果是EL1就简单了,配置sctlr_el1寄存器就好了;如果是EL2就复杂了,需要配置sctlr_el2寄存器,配置内存,hcr,gic。
- 调用函数set_cpu_boot_mode_flag把其他cpu都配置成跟cpu0同样的特权等级
- 调用函数__create_page_tables创建页表
- 调用函数__cpu_setup初始化处理器以打开MMU
- 调用函数__primary_switch设置TTBR0和TTBR1,使能MMU,将kernel image重定位,跳转到__primary_switched。
1.preserve_boot_args
SYM_CODE_START_LOCAL(preserve_boot_args)mov x21, x0 // x21=FDT,将FDT的地址暂存在x21寄存器中,释放出x0以便后续做临时变量使用adr_l x0, boot_args // x0保存了boot_args变量的地址stp x21, x1, [x0] // 保存x0和x1的值到boot_args[0]和boot_args[1]stp x2, x3, [x0, #16] // 保存x2和x3的值到boot_args[2]和boot_args[3]dmb sy // needed before dc ivac with// MMU offmov x1, #0x20 // 4 x 8 bytesb __inval_dcache_area // 让[boot_args,boot_args+#0x20]的内存数据缓存失效
SYM_CODE_END(preserve_boot_args)
1.1 __inval_dcache_area
//__inval_dcache_area(kaddr, size)
SYM_FUNC_START_PI(__inval_dcache_area)/* FALLTHROUGH *//** __dma_inv_area(start, size)* - start - virtual start address of region* - size - size in question*/add x1, x1, x0 //X1存放kaddr+sizedcache_line_size x2, x3sub x3, x2, #1tst x1, x3 // end cache line aligned?bic x1, x1, x3b.eq 1fdc civac, x1 // clean & invalidate D / U line
1: tst x0, x3 // start cache line aligned?bic x0, x0, x3b.eq 2fdc civac, x0 // clean & invalidate D / U lineb 3f
2: dc ivac, x0 // invalidate D / U line
3: add x0, x0, x2cmp x0, x1b.lo 2bdsb syret
SYM_FUNC_END_PI(__inval_dcache_area)
2.el2_setup
/** If we're fortunate enough to boot at EL2, ensure that the world is* sane before dropping to EL1.** Returns either BOOT_CPU_MODE_EL1 or BOOT_CPU_MODE_EL2 in w0 if* booted in EL1 or EL2 respectively.*/
SYM_FUNC_START(el2_setup)msr SPsel, #1 //往SPsel写1,说明使用SP_EL0mrs x0, CurrentEL //获取当前特权等级cmp x0, #CurrentEL_EL2 //看看是不是特权等级是否为EL2b.eq 1f //如果是,就跳转到1fmov_q x0, (SCTLR_EL1_RES1 | ENDIAN_SET_EL1)//msr sctlr_el1, x0 //配置EL1的系统控制寄存器mov w0, #BOOT_CPU_MODE_EL1 // 返回值存在w0寄存器中isb //内存屏障ret //返回//这里说明当前等级是EL2
1: mov_q x0, (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)msr sctlr_el2, x0 //配置EL2的系统控制寄存器#ifdef CONFIG_ARM64_VHE/** Check for VHE being present. For the rest of the EL2 setup,* x2 being non-zero indicates that we do have VHE, and that the* kernel is intended to run at EL2.*/mrs x2, id_aa64mmfr1_el1 //配置内存模式寄存器ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 //把虚拟机扩展支持位提取出来
#elsemov x2, xzr
#endif/* Hyp configuration. *///Hypervisor配置寄存器mov_q x0, HCR_HOST_NVHE_FLAGS //访问到EL2的指令转发到未定义指令cbz x2, set_hcr //x2为0(不支持虚拟机扩展,也就是传统分裂模式)则跳转到set_hcrmov_q x0, HCR_HOST_VHE_FLAGS //设置中断路由到EL2、启动EL2设施、
set_hcr://虚拟机扩展模式msr hcr_el2, x0 //写入hcr_el2isb //内存屏障/** Allow Non-secure EL1 and EL0 to access physical timer and counter.* This is not necessary for VHE, since the host kernel runs in EL2,* and EL0 accesses are configured in the later stage of boot process.* Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout* as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined* to access CNTHCTL_EL2. This allows the kernel designed to run at EL1* to transparently mess with the EL0 bits via CNTKCTL_EL1 access in* EL2.*/cbnz x2, 1f //x2为0(不支持虚拟机扩展)则跳转到1fmrs x0, cnthctl_el2 //读取Hypervisor控制的计数寄存器orr x0, x0, #3 // Enable EL1 physical timersmsr cnthctl_el2, x0
1:msr cntvoff_el2, xzr // 物理计数器和虚拟计数器一致,不偏移#ifdef CONFIG_ARM_GIC_V3/* GICv3 system register access */mrs x0, id_aa64pfr0_el1 //读取处理器特性寄存器ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4cbz x0, 3f //如果不是gic3或者4.0。跳转到3f//说明gic版本为3.0或者4.0mrs_s x0, SYS_ICC_SRE_EL2 //读取中断控制器启用寄存器orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1msr_s SYS_ICC_SRE_EL2, x0isb // Make sure SRE is now setmrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,tbz x0, #0, 3f // and check that it sticksmsr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults3:
#endif/* Populate ID registers. *///填充虚拟机ID寄存器mrs x0, midr_el1mrs x1, mpidr_el1msr vpidr_el2, x0 //虚拟化处理器ID寄存器msr vmpidr_el2, x1 //虚拟化多处理器ID寄存器#ifdef CONFIG_COMPATmsr hstr_el2, xzr // Disable CP15 traps to EL2
#endif/* EL2 debug */ mrs x1, id_aa64dfr0_el1 //读取AArch64调试特性寄存器sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4cmp x0, #1b.lt 4f // Skip if no PMU presentmrs x0, pmcr_el0 //读取性能监视器控制寄存器ubfx x0, x0, #11, #5 //允许EL2访问性能监视器控制寄存器
4:csel x3, xzr, x0, lt // all PMU counters from EL1/* Statistical profiling */ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4cbz x0, 7f // Skip if SPE not presentcbnz x2, 6f // VHE?mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,找不到该寄存器and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)cbnz x4, 5f // then permit sampling of physicalmov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \1 << SYS_PMSCR_EL2_PA_SHIFT)msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
5:mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)orr x3, x3, x1 // If we don't have VHE, thenb 7f // use EL1&0 translation.
6: // For VHE, use EL2 translationorr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
7:msr mdcr_el2, x3 // Configure debug traps/* LORegions */mrs x1, id_aa64mmfr1_el1 //AArch64内存模型特征寄存器ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4cbz x0, 1fmsr_s SYS_LORC_EL1, xzr
1:/* Stage-2 translation */msr vttbr_el2, xzr //虚拟化转换表基寄存器cbz x2, install_el2_stubmov w0, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2isbretSYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
3. set_cpu_boot_mode_flag
SYM_FUNC_START_LOCAL(set_cpu_boot_mode_flag)adr_l x1, __boot_cpu_mode //把__boot_cpu_mode地址赋值给x1cmp w0, #BOOT_CPU_MODE_EL2 //如果当前cpu处于EL2b.ne 1f //跳转到1add x1, x1, #4 //当前cpu在EL1,使用__boot_cpu_mode[1]//当前cpu在EL2,使用__boot_cpu_mode[0]
1: str w0, [x1] //将w0写入__boot_cpu_modedmb sydc ivac, x1 // Invalidate potentially stale cache lineret
SYM_FUNC_END(set_cpu_boot_mode_flag)
set_cpu_boot_mode_flag主要是根据cpu当前的特权等级,把w0寄存器,也就是当前模式记录在__boot_cpu_mode中。
4. __create_page_tables
SYM_FUNC_START_LOCAL(__create_page_tables)mov x28, lr //lr是连接寄存器/** Invalidate the init page tables to avoid potential dirty cache lines* being evicted. Other page tables are allocated in rodata as part of* the kernel image, and thus are clean to the PoC per the boot* protocol.*/adrp x0, init_pg_dir //获取内核init页表的基地址adrp x1, init_pg_end //获取内核init页表的基地址sub x1, x1, x0bl __inval_dcache_area //清除Dcache/** Clear the init page tables.*///把init_pg_dir到init_pg_end这段内存清零//也就是把内核页表清零adrp x0, init_pg_diradrp x1, init_pg_endsub x1, x1, x0
1: stp xzr, xzr, [x0], #16 //把0写入以x0为地址的内存中,然后x0自增16stp xzr, xzr, [x0], #16stp xzr, xzr, [x0], #16stp xzr, xzr, [x0], #16subs x1, x1, #64b.ne 1bmov x7, SWAPPER_MM_MMUFLAGS/** Create the identity mapping.*///创建恒等映射,也就是虚拟地址和物理地址相同adrp x0, idmap_pg_dir //恒等映射的页全局目录的起始地址adrp x3, __idmap_text_start // 恒等映射代码节的起始地址#ifdef CONFIG_ARM64_VA_BITS_52 //不支持,不用看mrs_s x6, SYS_ID_AA64MMFR2_EL1and x6, x6, #(0xf << ID_AA64MMFR2_LVA_SHIFT)mov x5, #52cbnz x6, 1f
#endifmov x5, #VA_BITS_MIN //虚拟地址位数
1:adr_l x6, vabits_actual //获取PC到vabits_actual的相对偏移地址str x5, [x6] //定位PC的虚拟地址dmb sydc ivac, x6 // 使x6所在的dcache失效/** VA_BITS may be too small to allow for an ID mapping to be created* that covers system RAM if that is located sufficiently high in the* physical address space. So for the ID map, use an extended virtual* range in that case, and configure an additional translation level* if needed.** Calculate the maximum allowed value for TCR_EL1.T0SZ so that the* entire ID map region can be mapped. As T0SZ == (64 - #bits used),* this number conveniently equals the number of leading zeroes in* the physical address of __idmap_text_end.*///T0SZ决定了输出的物理地址位数,这里查看其是否足够覆盖物理地址adrp x5, __idmap_text_end //获取__idmap_text_end的页基地址clz x5, x5 //计算x5高位0的个数cmp x5, TCR_T0SZ(VA_BITS_MIN) // default T0SZ small enough?b.ge 1f // .. then skip VA range extensionadr_l x6, idmap_t0sz //计算idmap_t0sz的页内偏移str x5, [x6] //把x5的数据写入以x6为地址的内存中dmb sydc ivac, x6 // Invalidate potentially stale cache line#if (VA_BITS < 48)
#define EXTRA_SHIFT (PGDIR_SHIFT + PAGE_SHIFT - 3)
#define EXTRA_PTRS (1 << (PHYS_MASK_SHIFT - EXTRA_SHIFT))/** If VA_BITS < 48, we have to configure an additional table level.* First, we have to verify our assumption that the current value of* VA_BITS was chosen such that all translation levels are fully* utilised, and that lowering T0SZ will always result in an additional* translation level to be configured.*/
#if VA_BITS != EXTRA_SHIFT
#error "Mismatch between VA_BITS and page size/number of translation levels"
#endifmov x4, EXTRA_PTRScreate_table_entry x0, x3, EXTRA_SHIFT, x4, x5, x6 //配置额外的页表
#else/** If VA_BITS == 48, we don't have to configure an additional* translation level, but the top-level table has more entries.*/mov x4, #1 << (PHYS_MASK_SHIFT - PGDIR_SHIFT)str_l x4, idmap_ptrs_per_pgd, x5
#endif
1:ldr_l x4, idmap_ptrs_per_pgd //取idmap_ptrs_per_pgd的页内偏移到x4中mov x5, x3 // __pa(__idmap_text_start)adr_l x6, __idmap_text_end // __pa(__idmap_text_end)//为指定的虚拟地址范围映射内存map_memory x0, x1, x3, x6, x7, x3, x4, x10, x11, x12, x13, x14 //映射,写入页表/** Map the kernel image (starting with PHYS_OFFSET).*///内核镜像映射adrp x0, init_pg_dir //页表基地址mov_q x5, KIMAGE_VADDR // 代码段的虚拟地址add x5, x5, x23 // add KASLR displacementmov x4, PTRS_PER_PGD //PGD表项的数量adrp x6, _end // 代码段的物理地址末端adrp x3, _text // 代码段的物理地址起始位置sub x6, x6, x3 // 代码段长度add x6, x6, x5 // 代码段虚拟地址末端map_memory x0, x1, x5, x6, x7, x3, x4, x10, x11, x12, x13, x14 //创建内核镜像的映射关系/** Since the page tables have been populated with non-cacheable* accesses (MMU disabled), invalidate those tables again to* remove any speculatively loaded cache lines.*/dmb syadrp x0, idmap_pg_diradrp x1, idmap_pg_endsub x1, x1, x0bl __inval_dcache_area //使dcache失效adrp x0, init_pg_diradrp x1, init_pg_endsub x1, x1, x0bl __inval_dcache_area //使dcache失效ret x28 //返回
SYM_FUNC_END(__create_page_tables)
__create_page_tables主要执行了一下几个步骤:
- mov x28, lr保存返回的地址
- 清除init页表的dcache
- 循环使用stp把init_pg_dir到init_pg_end这段内存写0
- 创建恒等映射,使得虚拟地址和物理地址相同
- 创建内核镜像的映射
- 使这两个页表的dcache失效
注意:
恒等映射将idmap_pg_dir页表对应的物理空间为__idmap_text_start 到__idmap_text_end,也就是代码段的范围。粗粒度内核页表将 init_pg_dir 地址保存到ttbr1_el1 ;init_pg_dir页表对应的物理空间为_text 到_end,也就是内核镜像代码段。这两个页表后面会在paging_init之后丢弃。
4.1map_memory
我们看看map_memory是怎么创建填写也页表的:
.macro map_memory, tbl, rtbl, vstart, vend, flags, phys, pgds, istart, iend, tmp, count, svsub \vend, \vend, #1 //虚拟地址减一add \rtbl, \tbl, #PAGE_SIZE //第一级页表项的地址,是页全局基地址的下一页mov \sv, \rtblmov \count, #0//compute_indices是用来计算vstart和vend对应的 pgtable level的index的,两者之差保存在count中compute_indices \vstart, \vend, #PGDIR_SHIFT, \pgds, \istart, \iend, \count//populate_entries最终建立指向下一级的映射或者last level映射populate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmpmov \tbl, \svmov \sv, \rtbl#if SWAPPER_PGTABLE_LEVELS > 3compute_indices \vstart, \vend, #PUD_SHIFT, #PTRS_PER_PUD, \istart, \iend, \countpopulate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmpmov \tbl, \svmov \sv, \rtbl
#endif#if SWAPPER_PGTABLE_LEVELS > 2compute_indices \vstart, \vend, #SWAPPER_TABLE_SHIFT, #PTRS_PER_PMD, \istart, \iend, \countpopulate_entries \tbl, \rtbl, \istart, \iend, #PMD_TYPE_TABLE, #PAGE_SIZE, \tmpmov \tbl, \sv
#endifcompute_indices \vstart, \vend, #SWAPPER_BLOCK_SHIFT, #PTRS_PER_PTE, \istart, \iend, \countbic \count, \phys, #SWAPPER_BLOCK_SIZE - 1populate_entries \tbl, \count, \istart, \iend, \flags, #SWAPPER_BLOCK_SIZE, \tmp.endm
其中主要函数有两个:
- compute_indices:它是用来计算 vstart 和 vend 对应的 pgtable level 的 index 的,两者之差保存在 count 中;
- populate_entries:最终建立指向下一级的映射或者 last level 映射。
5. __cpu_setup
SYM_FUNC_START(__cpu_setup)tlbi vmalle1 // 使本地TLB失效dsb nshmov x1, #3 << 20 //x1=0x300000msr cpacr_el1, x1 // 使能EL1和EL0执行 FP/ASIMD指令mov x1, #1 << 12 // Reset mdscr_el1 and disablemsr mdscr_el1, x1 //对AArch64 DCC寄存器的L0访问被捕获isb // Unmask debug exceptions now,enable_dbg // since this is per-cpureset_pmuserenr_el0 x1 // Disable PMU access from EL0reset_amuserenr_el0 x1 // Disable AMU access from EL0/** Memory region attributes*/mov_q x5, MAIR_EL1_SET //设置nGnRnE等内存属性
#ifdef CONFIG_ARM64_MTE //如果使能内存标签扩展支持/** Update MAIR_EL1, GCR_EL1 and TFSR*_EL1 if MTE is supported* (ID_AA64PFR1_EL1[11:8] > 1).*/mrs x10, ID_AA64PFR1_EL1ubfx x10, x10, #ID_AA64PFR1_MTE_SHIFT, #4cmp x10, #ID_AA64PFR1_MTEb.lt 1f/* Normal Tagged memory type at the corresponding MAIR index */mov x10, #MAIR_ATTR_NORMAL_TAGGEDbfi x5, x10, #(8 * MT_NORMAL_TAGGED), #8/* initialize GCR_EL1: all non-zero tags excluded by default */mov x10, #(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK)msr_s SYS_GCR_EL1, x10/** If GCR_EL1.RRND=1 is implemented the same way as RRND=0, then* RGSR_EL1.SEED must be non-zero for IRG to produce* pseudorandom numbers. As RGSR_EL1 is UNKNOWN out of reset, we* must initialize it.*/mrs x10, CNTVCT_EL0ands x10, x10, #SYS_RGSR_EL1_SEED_MASKcsinc x10, x10, xzr, nelsl x10, x10, #SYS_RGSR_EL1_SEED_SHIFTmsr_s SYS_RGSR_EL1, x10/* clear any pending tag check faults in TFSR*_EL1 */msr_s SYS_TFSR_EL1, xzrmsr_s SYS_TFSRE0_EL1, xzr
1:
#endifmsr mair_el1, x5 //对内存的8个区域写入属性/** Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for* both user and kernel.*///准备TCRmov_q x10, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \TCR_TBI0 | TCR_A1 | TCR_KASAN_FLAGStcr_clear_errata_bits x10, x9, x5 //清除该CPU上触发勘误表的TCR位。#ifdef CONFIG_ARM64_VA_BITS_52ldr_l x9, vabits_actualsub x9, xzr, x9add x9, x9, #64tcr_set_t1sz x10, x9
#elseldr_l x9, idmap_t0sz //读取idmap_t0sz
#endiftcr_set_t0sz x10, x9 //跟新t0sz,这样我们就可以加载ID映射/** Set the IPS bits in TCR_EL1.*/tcr_compute_pa_size x10, #TCR_IPS_SHIFT, x5, x6 //设置TCR.IPS到最高支持
#ifdef CONFIG_ARM64_HW_AFDBM //如果支持Access和Dirty页面标志的硬件更新/** Enable hardware update of the Access Flags bit.* Hardware dirty bit management is enabled later,* via capabilities.*/mrs x9, ID_AA64MMFR1_EL1and x9, x9, #0xfcbz x9, 1f //如果cpu允许硬件访问标志更新功能orr x10, x10, #TCR_HA // 设置硬件访问标志更新功能
1:
#endif /* CONFIG_ARM64_HW_AFDBM */msr tcr_el1, x10 //写入tcr_el1/** Prepare SCTLR*/mov_q x0, SCTLR_EL1_SETret // return to head.S
SYM_FUNC_END(__cpu_setup)
__cpu_setup执行步骤如下:
- tlbi vmalle1 使本地TLB失效
- 使能EL1和EL0执行 FP/ASIMD指令
- 允许AArch64 DCC寄存器的L0访问被捕获
- 禁止从EL0访问PMU和AMU
- 给内存的8个region设置上DEVICE_nGnRnE,DEVICE_nGnRE,DEVICE_GRE,NORMAL_NC,NORMAL,NORMAL_WT,NORMAL这8个属性。
- 清除该CPU上触发勘误表的TCR位
- 跟新t0sz,这样我们就可以加载ID映射
- 设置硬件访问标志更新功能
6. __primary_switch
SYM_FUNC_START_LOCAL(__primary_switch)
#ifdef CONFIG_RANDOMIZE_BASEmov x19, x0 // 保留新的SCTLR_EL1值mrs x20, sctlr_el1 // 保留旧的SCTLR EL1值
#endifadrp x1, init_pg_dir //获取init_pg_dir的页表基地址bl __enable_mmu //开启mmu
#ifdef CONFIG_RELOCATABLE
#ifdef CONFIG_RELRmov x24, #0 // no RELR displacement yet
#endifbl __relocate_kernel
#ifdef CONFIG_RANDOMIZE_BASE //我们没开,不看ldr x8, =__primary_switched //把__primary_switched的内容放入x8adrp x0, __PHYS_OFFSET //获取内核代码段的页表基地址blr x8 //跳转到__primary_switched运行,返回的时候返回下一个指令/** If we return here, we have a KASLR displacement in x23 which we need* to take into account by discarding the current kernel mapping and* creating a new one.*/pre_disable_mmu_workaroundmsr sctlr_el1, x20 // disable the MMUisbbl __create_page_tables // recreate kernel mappingtlbi vmalle1 // Remove any stale TLB entriesdsb nshisbmsr sctlr_el1, x19 // re-enable the MMUisbic iallu // flush instructions fetcheddsb nsh // via old mappingisbbl __relocate_kernel
#endif
#endifldr x8, =__primary_switched //把__primary_switched的内容放入x8adrp x0, __PHYS_OFFSET //获取内核代码段的页表基地址br x8 //跳转到__primary_switched,并且不返回
SYM_FUNC_END(__primary_switch)
__primary_switch执行步骤如下:
- 获取init_pg_dir的页表基地址
- 调用函数__enable_mmu开启mmu
- 调用函数__primary_switched,并且不再返回
6.1 __enable_mmu
SYM_FUNC_START(__enable_mmu)mrs x2, ID_AA64MMFR0_EL1 //读取内存模型特征寄存器ubfx x2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4 //提取28到31这4位cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN //如果支持4k页b.lt __no_granule_support //卡死cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX //如果不支持4k页b.gt __no_granule_support //卡死//只有4KB粒度支持52位输入输出地址update_early_cpu_boot_status 0, x2, x3 //启动中的CPU更新失败状态adrp x2, idmap_pg_dir //读取内核页全局目录页表到x2phys_to_ttbr x1, x1phys_to_ttbr x2, x2msr ttbr0_el1, x2 //内核页全局目录页表写入ttbr0_el1offset_ttbr1 x1, x3msr ttbr1_el1, x1 //内核镜像的init目录页表写入ttbr1_el1isbmsr sctlr_el1, x0 //写入sctlr_el1寄存器isb/** Invalidate the local I-cache so that any instructions fetched* speculatively from the PoC are discarded, since they may have* been dynamically patched at the PoU.*/ic iallu //icache失效dsb nsh //内存屏障isbret
SYM_FUNC_END(__enable_mmu)
__enable_mmu执行步骤如下:
- 读取内存模型特征寄存器,判断是否支持我们内核设置的页大小,现在我们内核设置的页大小是4k,根据读取内存模型特征寄存器的值判断这个cpu是否支持4k页
- 启动中的CPU更新失败状态
- 设置ttbr0_el1和ttbr1_el1寄存器
- icache失效和内存屏障
6.2 __primary_switched
SYM_FUNC_START_LOCAL(__primary_switched)adrp x4, init_thread_union //init_thread_union地址保存在x4中,它存放了init进程结构体add sp, x4, #THREAD_SIZE //设置sp指针为init_thread_union偏移THREAD_SIZEadr_l x5, init_task //init_task地址保存在x5msr sp_el0, x5 //保存当前进程描述符到sp_el0,使用用户态的堆栈,说明是用户态程序#ifdef CONFIG_ARM64_PTR_AUTH__ptrauth_keys_init_cpu x5, x6, x7, x8
#endifadr_l x8, vectors // 读取vectors的地址msr vbar_el1, x8 // 设置异常向量表isbstp xzr, x30, [sp, #-16]! //把将xzr和保存在x30中的链接地址入栈mov x29, sp //将栈指针保存到x29#ifdef CONFIG_SHADOW_CALL_STACKadr_l scs_sp, init_shadow_call_stack // Set shadow call stack
#endifstr_l x21, __fdt_pointer, x5 //将FDT地址保存到__fdt_pointer变量ldr_l x4, kimage_vaddr // Save the offset betweensub x4, x4, x0 // the kernel virtual andstr_l x4, kimage_voffset, x5 //将kimage的虚拟地址和物理地址的偏移保存到kimage_voffset中// Clear BSSadr_l x0, __bss_startmov x1, xzradr_l x2, __bss_stopsub x2, x2, x0bl __pi_memset //清理bss段数据dsb ishst // Make zero page visible to PTW#ifdef CONFIG_KASANbl kasan_early_init
#endif
#ifdef CONFIG_RANDOMIZE_BASEtst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?b.ne 0fmov x0, x21 // pass FDT address in x0bl kaslr_early_init // parse FDT for KASLR optionscbz x0, 0f // KASLR disabled? just proceedorr x23, x23, x0 // record KASLR offsetldp x29, x30, [sp], #16 // we must enable KASLR, returnret // to __primary_switch()
0:
#endifadd sp, sp, #16 //sp加一mov x29, #0 mov x30, #0b start_kernel //跳转到start_kernel
SYM_FUNC_END(__primary_switched)
__primary_switched主要执行了一下步骤:
- 初始化init_task的结构体和堆栈
- 设置异常向量表
- 将FDT地址保存到__fdt_pointer变量
- 将kimage的虚拟地址和物理地址的偏移保存到kimage_voffset中
- 清理bss段数据
- 跳转到start_kernel
到这里head.S的启动就看完了。
linux内核启动分析(一)相关推荐
- linux内核启动分析 三,Linux内核分析 实验三:跟踪分析Linux内核的启动过程
贺邦 + 原创作品转载请注明出处 + <Linux内核分析>MOOC课程 http://mooc.study.163.com/course/USTC-1000029000 一. 实验过程 ...
- 通过从代码层面分析Linux内核启动来探知操作系统的启动过程
通过从代码层面分析Linux内核启动来探知操作系统的启动过程 前言说明 本篇为网易云课堂Linux内核分析课程的第三周作业,我将围绕Linux 3.18的内核中的start_kernel到init进程 ...
- 【内核】linux内核启动流程详细分析【转】
转自:http://www.cnblogs.com/lcw/p/3337937.html Linux内核启动流程 arch/arm/kernel/head-armv.S 该文件是内核最先执行的一个文件 ...
- 【内核】linux内核启动流程详细分析
Linux内核启动流程 arch/arm/kernel/head-armv.S 该文件是内核最先执行的一个文件,包括内核入口ENTRY(stext)到start_kernel间的初始化代码, 主要作用 ...
- Linux内核启动流程分析(一)【转】
转自:http://blog.chinaunix.net/uid-25909619-id-3380535.html 很久以前分析的,一直在电脑的一个角落,今天发现贴出来和大家分享下.由于是word直接 ...
- 低温linux内核启动readl,Linux内核启动流程分析(一)
很久以前分析的,一直在电脑的一个角落,今天发现贴出来和大家分享下.由于是word直接粘过来的有点乱,敬请谅解! S3C2410 Linux 2.6.35.7启动分析(第一阶段) 1.依据arch/ar ...
- linux内核参数分析,linux内核启动第一阶段分析
linux内核启动第一阶段分析 http://blog.csdn.net/aaronychen/article/details/2838341 本文的很多内容是参考了网上某位大侠的文章写的<&l ...
- Linux内核分析实验3——分析linux内核启动过程
本文大量内容引用自孟宁老师在<LINUX操作系统分析>课程中的内容 <Linux内核分析>MOOC课程 http://www.xuetangx.com/courses/cour ...
- uboot启动linux内核流程分析(三)
uboot bootz命令流程图 Uboot启动linux内核是使用bootz命令,bootz是如何启动linux内核?uboot的生命周期是怎么终止的?linux是如何启动? 启动linux内核的时 ...
最新文章
- 支付宝现使用故障 官方:不影响用户资金和信息安全
- 悟透JavaScript (强烈推荐)
- Lunar New Year and Cross Counting
- 等我敲完这行代码,就和你离婚!
- JavaScriptjQuery.函数
- BigDecimal.divide方法
- 【轨迹跟踪】基于matlab无人机轨迹跟踪【含Matlab源码 1152期】
- python太阳代码_用86行Python代码模拟太阳系
- Python:对压缩包进行解压操作
- An Introduction to Be-trees and Write Optimization 学习笔记
- Elasticsearch - Fuzzy query
- ERP : 经济批量
- HashMap 为什么是2倍扩容
- 基于JAVA面相高校学生的图书共享平台计算机毕业设计源码+数据库+lw文档+系统+部署
- 好用的翻译插件-一键自动翻译插件软件
- 【转载】一句话实现五彩LINUX
- php Carbon 中文文档
- 上海无印良品地理空间分布特征与选址策略可视化研究
- 树莓派python学习篇 (二)红外避障传感器
- Windows 屏幕气泡代码
热门文章
- 背包算法java算法
- HTML5期末大作业:苏苏网站设计——仿苏苏超市(1页) 大学生简单个人静态HTML网页设计作品 DIV布局个人介绍网页模板代码 DW学生个人网站制作成品下载
- 荣耀6手机升级android版本,华为荣耀6手机升级开发版本退回到稳定版的方法
- 摄影的误区以及摄影的本质
- 多台主机使用DNS服务器实现网络负载均衡
- js实现单击事件和双击事件
- python excel截图保存_python 对excel进行截图
- JavaSE_第1章 Java概述
- android 自定义progressdialog,Android 自定义progressDialog实现
- Android开发之实现Android原生定位