TLB

为了加速虚拟地址转换物理地址过程，CPU内部一般都集成TLB硬件单元，通过缓存存取虚拟地址与物理地址映射关系，避免再通过MMU 通过多级查表引入多次内存开销，直接将映射关系存储到硬件单元中，本质上TLB是一种特殊的cache硬件单元。在地址转换过程中，如果TLB中存储有该虚拟地址映射关系，则直接从TLB获取到物理地址，避免通过查表多次引用内存过程。

When paging is enabled, every memory access has its virtual address automatically translated into a physical address using the page-translation hierarchy. Translation-lookaside buffers (TLBs), also known as page-translation caches, nearly eliminate the performance penalty associated with page translation. TLBs are special on-chip caches that hold the most-recently used virtual-to-physical address translations. Each memory reference (instruction and data) is checked by the TLB. If the translation is present in the TLB, it is immediately provided to the processor, thus avoiding external memory references for accessing page tables

TLB 硬件原理大概如下图（图来自与smcdef大神），不再详细描述其详细硬件设计，主要结合代码TLB是如何操作：

由于TLB本质上是cache，而cache特点就是速度快，存储小、成本高特点，因此TLB不可能太大，TLB优势在于其局部性。所谓局部性就是当一个内存地址被访问之后，那么其周围附近内存地址也很有可能在将来一段时间内会被引用。同时由于物理地址是按照页对齐方式进行管理，同样在一个页内对应的虚拟地址本质上都是转换成相同物理地址，以及分级转换同样减少了映射数组维护大小。

TLBs take advantage of the principle of locality. That is, if a memory address is referenced, it is likely that nearby memory addresses will be referenced in the near future. In the context of paging, the proximity of memory addresses required for locality can be broad—it is equal to the page size. Thus, it is possible for a large number of addresses to be translated by a small number of page translations. This high degree of locality means that almost all translations are performed using the on-chip TLBs.

TLB 操作

以AMD64为例， CPU上电之并开启地址转换功能之后，每次切换进程刷新CR3之后(可以查看linux那些事之 page translation(硬件篇）了解），TLB 有可能会随机加载一些地址转换关系，当某个地址转换关系被加载到内存之后。如果内核需要修改该地址转换关系后，TLB并不能显式感觉到该地址转换关系已经改变，那么此时访问该地址时，由于TLB中存在旧的转换关系，将有可能访问到错误的物理内存。由于一般TLB不可编程，所谓不可编程即软件无法控制cache 具体要存储哪条转换关系，为了将TLB旧的关系更新调，需要kernel将该条转换关系或者整个TLB缓存设置为无效，这样访问该地址转换关系时，由于TLB被设置成无效，只能重新从内存中读取转换关系并刷新TLB。

System software is responsible for managing the TLBs when updates are made to the linear-tophysical mapping of addresses. A change to any paging data-structure entry is not automaticallyreflected in the TLB, and hardware snooping of TLBs during memory-reference cycles is notperformed. Software must invalidate the TLB entry of a modified translation-table entry so that the change is reflected in subsequent address translations

kernel或者软件无法决定TLB要加载哪条映射关系，只能通过将TLB entry或者整个TLB设置无效，更新TLB。

以AMD 64位CPU为主，设置TLB无效有显示和隐式方法，其中显示方法中提供了不同的命令将TLB设置为无效，主要是作用范围不一样，下面结合kernel和硬件手册代码来说明。

显示设置invalidate方法

AMD64 提供了一些专有指令集用于设置TLB invalid方法

INVLPG指令

INVLPG指令用于指定具有某个地址对应TLB entry无效。不管所指定的地址是否在TLB中，该指令都有效，如果在TLB旧设置无效，如果不在TLB中，则该指令实际上并没有在TLB中执行。不管是该地址转换关系是否属于global 都可以设置为无效。该指令介绍及格式如下：

mem8:意思一个8个字节地址，64位系统中地址占8个字节，注意该地址为进程的虚拟地址，同时该指令只能设置当前CR3设置进程id 的虚拟空间，即正在运行的进程空间，该指令绝不能在进程上下文中使用，因此此时cR3还未设置好。 在内核代码中native_flush_tlb_one_user()函数中有对该指令使用：


/** Flush one page in the user mapping*/
STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr)
{u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);asm volatile("invlpg (%0)" ::"r" (addr) : "memory");... ...
}

将当前运行的进程中的addr地址对应转换关系在TLB设置为无效，以重新更新该转换关系。

INVLPGA

该指令可以认为是INVLPG的变种，支持带ASID参数指定某个进程的地址对应的转换关系在TLB中无效

rAX寄存器用于存放所需要设置的虚拟地址，ECX为所需要设置的ASID。该指令主要提供给虚拟化SVM功能使用，正常非虚拟化场景一般使用不到，因为在切换cr3之后，之前旧的asid会设置为无效。invlpga()接口提供对指定asid的addr设置为无效。

static inline void invlpga(unsigned long addr, u32 asid)
{asm volatile (__ex("invlpga %1, %0") : : "c"(asid), "a"(addr));
}

INVLPGB

INVLPG指令的另外一个变种，可以指定单个entry或者一定范围内的entry TLB无效，该指令格式如下说明：

通过隐式的rAX和EDX寄存器，可以设置一定地址范围的映射关系在TLB无效，即可以一次性设置多个entry 在TLB中无效，以提高效率。该指令使用稍微复杂，需要通过rAX和EDX两个寄存器组合才能使用，内核代码中并没有使用到该指令。

INVPCID

INVPCID指令是通过将vpcid设置为无效，一次性将该vpcid内的所有entry 全部设置为无效，该指令说明：

该指令支持多种组合方式：

当reg32/64被设置为0时，按照(PCID,VA)精确匹配方式，只将PCID内的VA地址设置为无效，相当于INVLPG指令。
当reg32/64被设置为1时，将PCID内的TLB设置为无效
当reg32/64被设置为2时，将所有TLB设置为无效，包括global 页面
当reg32/64被设置为3时，将所有TLB设置为无效，不包括global 页面

指令格式如下：

当系统为32位时，使用32位寄存器地址，当系统为64位，将使用64位寄存器地址。

mem128 即占用128位空间，设置格式如下：

0~11位为PCID, 64~127为虚拟地址， 12~63位保留不使用。

内核对还指令代码位于\arch\x86\include\asm\invpcid.h，将上述所说功能进行封装：

static inline void __invpcid(unsigned long pcid, unsigned long addr,unsigned long type)
{struct { u64 d[2]; } desc = { { pcid, addr } };/** The memory clobber is because the whole point is to invalidate* stale TLB entries and, especially if we're flushing global* mappings, we don't want the compiler to reorder any subsequent* memory accesses before the TLB flush.*/asm volatile("invpcid %[desc], %[type]":: [desc] "m" (desc), [type] "r" (type) : "memory");
}

参数：

pcid:为所需要设置的无效pcid
addr:为所需要设置的无效地址，将pcid和addr 封装到一个占用128位两个64位数组中struct { u64 d[2]; } desc
type:为所需要的设置的reg32/64。所支持的type定义如下：

#define INVPCID_TYPE_INDIV_ADDR      0
#define INVPCID_TYPE_SINGLE_CTXT    1
#define INVPCID_TYPE_ALL_INCL_GLOBAL    2
#define INVPCID_TYPE_ALL_NON_GLOBAL 3

为了操作方便，内核将上述四个功能封装成为4个接口函数：

static inline void invpcid_flush_one(unsigned long pcid,unsigned long addr)
{__invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR);
}/* Flush all mappings for a given PCID, not including globals. */
static inline void invpcid_flush_single_context(unsigned long pcid)
{__invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT);
}/* Flush all mappings, including globals, for all PCIDs. */
static inline void invpcid_flush_all(void)
{__invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL);
}/* Flush all mappings for all PCIDs except globals. */
static inline void invpcid_flush_all_nonglobals(void)
{__invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL);
}

CR3 register

更新CR3 寄存器，会将CPID内所有TLB的设置为无效（除了global意外，相当于INVPCID指令 type为3情况），内核中的native_flush_tlb_local函数就是这种方式实现的：

/** Flush the entire current user mapping*/
STATIC_NOPV void native_flush_tlb_local(void)
{/** Preemption or interrupts must be disabled to protect the access* to the per CPU variable and to prevent being preempted between* read_cr3() and write_cr3().*/WARN_ON_ONCE(preemptible());invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid));/* If current->mm == NULL then the read_cr3() "borrows" an mm */native_write_cr3(__native_read_cr3());
}

VMRUN

当执行VMRUN指令是，可以通过VMCB中的TLB_CONTROL字段控制要求触发刷新TLB，该命令主要在虚拟化场景使用，不再详细描述

以上就是显示更新TLB的集中方法。

隐式设置invalidate方法

当通过修改一些寄存器标记位时会隐式将整个TLB设置为无效，包括global pages:

修改CR0.PG位（主要用于开启page enable，地址转换功能）
修改CR4.PAE（物理地址扩展功能）以及CR4.PSE（page size扩展功能）或者CR4.PGE( page global 功能）都会将整个TLB设置为无效。
由于SMI 中断进入SMM
执行RSM执行
使用WRMSR 更新 MTRR(memory type range register)
cpu初始化
扩展A20 地址位功能
使用WRMSR指令指定某种model-specific 寄存器
使用MOV指令设置CR4寄存器，将CR4.PKE从0 设置为1
使用MOV指令设置CR4寄存器，将CR4.PCIDE从1设置为0.

无须刷新TLB场景

AMD64 架构CPU下，有些操作并不需要单独刷新TLB，比如移除suoervisor权限，将read only权限移除调，或者修改no-execute 约束，将invalid 修改为valid等操作，不需要再专门刷新TLB, 硬件会自动探测到上述字段更新。但是如果只修改entry中的指向下一个表point，而不修改相关权限操作，TLB是无法探测到更新，TLB中可能还是存在旧的entry结果。

TLB相关API

内核主要提供几个API用户刷新操作TLB，X86架构位于（arch\x86\mm\tlb.c)文件中

__get_current_cr3_fast

该接口主要用于快速获取cr3寄存器中的值：

unsigned long __get_current_cr3_fast(void)
{unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd,this_cpu_read(cpu_tlbstate.loaded_mm_asid));/* For now, be very restrictive about when this can be called. */VM_WARN_ON(in_nmi() || preemptible());VM_BUG_ON(cr3 != __read_cr3());return cr3;
}

build_cr3：内核在每个cpu内部都存放有cpu_tlbstate 用于保存当前运行TLB的相关信息，其中loaded_mm 为当前运行的进程的mm，直接从cpu_tlbstate中可以获取到cr3寄存器的值，避免直接从cr3寄存器中读取，因为读取cr3寄存器非常耗时。
如果VM_BUG_ON，开启需要将需要将从cou_tlbstate中读取的cr3值与从cr3寄存器中读取的值是否相同，如果不相同，则报错误。

__flush_tlb_all

__flush_tlb_all 用于刷新所有TLB：

void __flush_tlb_all(void)
{/** This is to catch users with enabled preemption and the PGE feature* and don't trigger the warning in __native_flush_tlb().*/VM_WARN_ON_ONCE(preemptible());if (boot_cpu_has(X86_FEATURE_PGE)) {__flush_tlb_global();} else {/** !PGE -> !PCID (setup_pcid()), thus every flush is total.*/flush_tlb_local();}
}

如果开启 global page ，调用__flush_tlb_global则通过将去使能和使用CR4.PGE寄存器，刷新所有TLB，包括global page。
如果没有开启global page, 调用flush_tlb_local则只刷新当前运行进程的所有TLB。

__flush_tlb_global

__flush_tlb_global 函数如果开启global page，则通过disable 和 enable CR4.PGE寄存器刷新所有TLB：

# define __flush_tlb_global      native_flush_tlb_global

最终调用 native_flush_tlb_global函数：


/** Flush everything*/
STATIC_NOPV void native_flush_tlb_global(void)
{unsigned long cr4, flags;if (static_cpu_has(X86_FEATURE_INVPCID)) {/** Using INVPCID is considerably faster than a pair of writes* to CR4 sandwiched inside an IRQ flag save/restore.** Note, this works with CR4.PCIDE=0 or 1.*/invpcid_flush_all(); //如果开启X86_FEATURE_INVPCID，则通过invpcid 刷新TLBreturn;}/** Read-modify-write to CR4 - protect it from preemption and* from interrupts. (Use the raw variant because this code can* be called from deep inside debugging code.)*/raw_local_irq_save(flags); //关闭中断cr4 = this_cpu_read(cpu_tlbstate.cr4);/* toggle PGE */native_write_cr4(cr4 ^ X86_CR4_PGE); //去使能 PGE/* write old PGE again and flush TLBs */native_write_cr4(cr4); //使能PGEraw_local_irq_restore(flags);//开启中断
}

leave_mm

删除进程时，需要删除对应进程中的TLB映射关系：

void leave_mm(int cpu)
{struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);/** It's plausible that we're in lazy TLB mode while our mm is init_mm.* If so, our callers still expect us to flush the TLB, but there* aren't any user TLB entries in init_mm to worry about.** This needs to happen before any other sanity checks due to* intel_idle's shenanigans.*/if (loaded_mm == &init_mm)return;/* Warn if we're not lazy. */WARN_ON(!this_cpu_read(cpu_tlbstate.is_lazy));switch_mm(NULL, &init_mm, NULL);

通过调用switch_mm 将 prev参数设置为NULL，下一个进程切换到init_mm 中将相关TLB entry都删除。

void switch_mm(struct mm_struct *prev, struct mm_struct *next,struct task_struct *tsk)
{unsigned long flags;local_irq_save(flags);switch_mm_irqs_off(prev, next, tsk);local_irq_restore(flags);
}

switch_mm：函数用于切换cr3 中的asid，以刷新TLB，并将旧的进程 TLB设置无效。核心处理函数为switch_mm_irqs_off：


void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,struct task_struct *tsk)
{struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);unsigned cpu = smp_processor_id();u64 next_tlb_gen;bool need_flush;u16 new_asid;/** NB: The scheduler will call us with prev == next when switching* from lazy TLB mode to normal mode if active_mm isn't changing.* When this happens, we don't assume that CR3 (and hence* cpu_tlbstate.loaded_mm) matches next.** NB: leave_mm() calls us with prev == NULL and tsk == NULL.*//* We don't want flush_tlb_func_* to run concurrently with us. */if (IS_ENABLED(CONFIG_PROVE_LOCKING))WARN_ON_ONCE(!irqs_disabled());/** Verify that CR3 is what we think it is.  This will catch* hypothetical buggy code that directly switches to swapper_pg_dir* without going through leave_mm() / switch_mm_irqs_off() or that* does something like write_cr3(read_cr3_pa()).** Only do this check if CONFIG_DEBUG_VM=y because __read_cr3()* isn't free.*/
#ifdef CONFIG_DEBUG_VMif (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) {/** If we were to BUG here, we'd be very likely to kill* the system so hard that we don't see the call trace.* Try to recover instead by ignoring the error and doing* a global flush to minimize the chance of corruption.** (This is far from being a fully correct recovery.*  Architecturally, the CPU could prefetch something*  back into an incorrect ASID slot and leave it there*  to cause trouble down the road.  It's better than*  nothing, though.)*/__flush_tlb_all();}
#endifthis_cpu_write(cpu_tlbstate.is_lazy, false);/** The membarrier system call requires a full memory barrier and* core serialization before returning to user-space, after* storing to rq->curr. Writing to CR3 provides that full* memory barrier and core serializing instruction.*/if (real_prev == next) {VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=next->context.ctx_id);/** Even in lazy TLB mode, the CPU should stay set in the* mm_cpumask. The TLB shootdown code can figure out from* from cpu_tlbstate.is_lazy whether or not to send an IPI.*/if (WARN_ON_ONCE(real_prev != &init_mm &&!cpumask_test_cpu(cpu, mm_cpumask(next))))cpumask_set_cpu(cpu, mm_cpumask(next));/** If the CPU is not in lazy TLB mode, we are just switching* from one thread in a process to another thread in the same* process. No TLB flush required.*/if (!was_lazy)return;/** Read the tlb_gen to check whether a flush is needed.* If the TLB is up to date, just use it.* The barrier synchronizes with the tlb_gen increment in* the TLB shootdown code.*/smp_mb();next_tlb_gen = atomic64_read(&next->context.tlb_gen);if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==next_tlb_gen)return;/** TLB contents went out of date while we were in lazy* mode. Fall through to the TLB switching code below.*/new_asid = prev_asid;need_flush = true;} else {/** Avoid user/user BTB poisoning by flushing the branch* predictor when switching between processes. This stops* one process from doing Spectre-v2 attacks on another.*/cond_ibpb(tsk);if (IS_ENABLED(CONFIG_VMAP_STACK)) {/** If our current stack is in vmalloc space and isn't* mapped in the new pgd, we'll double-fault.  Forcibly* map it.*/sync_current_stack_to_mm(next);}/** Stop remote flushes for the previous mm.* Skip kernel threads; we never send init_mm TLB flushing IPIs,* but the bitmap manipulation can cause cache line contention.*/if (real_prev != &init_mm) {VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,mm_cpumask(real_prev)));cpumask_clear_cpu(cpu, mm_cpumask(real_prev));}/** Start remote flushes and then read tlb_gen.*/if (next != &init_mm)cpumask_set_cpu(cpu, mm_cpumask(next));next_tlb_gen = atomic64_read(&next->context.tlb_gen);choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);/* Let nmi_uaccess_okay() know that we're changing CR3. */this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);barrier();}if (need_flush) {this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);load_new_mm_cr3(next->pgd, new_asid, true);/** NB: This gets called via leave_mm() in the idle path* where RCU functions differently.  Tracing normally* uses RCU, so we need to use the _rcuidle variant.** (There is no good reason for this.  The idle code should*  be rearranged to call this before rcu_idle_enter().)*/trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);} else {/* The new ASID is already up to date. */load_new_mm_cr3(next->pgd, new_asid, false);/* See above wrt _rcuidle. */trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);}/* Make sure we write CR3 before loaded_mm. */barrier();this_cpu_write(cpu_tlbstate.loaded_mm, next);this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);if (next != real_prev) {cr4_update_pce_mm(next);switch_ldt(real_prev, next);}
}

prev：为当前cr3 寄存器中正在设置的进程 asid, next是将要切换的进程asid.
如果 prev 和next 相等，则说明要切换的进程和当前在运行的进程是同一个，如果当前cpu是lazy 模式就不需要刷新TLB，如果非lazy模式，则将need_flush设置为true，表明还是要刷新TLB.
如果prev 和next不相等，则说明当前运行进程和next即将更新的进程不是同一个，需要将need_flush设置为true，需要刷新TLB
load_new_mm_cr3:用于将更新cr3寄存器。

flush_tlb_kernel_range

用于刷新kernel 率属于空间的虚拟地址的TLB:


void flush_tlb_kernel_range(unsigned long start, unsigned long end)
{/* Balance as user space task's flush, a bit conservative */if (end == TLB_FLUSH_ALL ||(end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) {on_each_cpu(do_flush_tlb_all, NULL, 1);} else {struct flush_tlb_info *info;preempt_disable();info = get_flush_tlb_info(NULL, start, end, 0, false, 0);on_each_cpu(do_kernel_range_flush, info, 1);put_flush_tlb_info();preempt_enable();}
}

该函数是在kernel空间中使用vmalloc或者vmfree等接口，配置完对应物理内存之后，需要刷新page tabe时调用的函数接口，该接口支持一次性设置一段内核空间：

unsigned long start:需要刷新内核空间起始地址
unsigned long end：需要刷新内核空间结束地址。

最终调用do_kernel_range_flush函数进行刷新：

static void do_kernel_range_flush(void *info)
{struct flush_tlb_info *f = info;unsigned long addr;/* flush range by one by one 'invlpg' */for (addr = f->start; addr < f->end; addr += PAGE_SIZE)flush_tlb_one_kernel(addr);
}

flush_tlb_one_kernel

flush_tlb_one_kernel()函数用于刷新单个内核addr地址：

/** Flush one page in the kernel mapping*/
void flush_tlb_one_kernel(unsigned long addr)
{count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE);/** If PTI is off, then __flush_tlb_one_user() is just INVLPG or its* paravirt equivalent.  Even with PCID, this is sufficient: we only* use PCID if we also use global PTEs for the kernel mapping, and* INVLPG flushes global translations across all address spaces.** If PTI is on, then the kernel is mapped with non-global PTEs, and* __flush_tlb_one_user() will flush the given address for the current* kernel address space and for its usermode counterpart, but it does* not flush it for other address spaces.*/flush_tlb_one_user(addr);if (!static_cpu_has(X86_FEATURE_PTI))return;/** See above.  We need to propagate the flush to all other address* spaces.  In principle, we only need to propagate it to kernelmode* address spaces, but the extra bookkeeping we would need is not* worth it.*/this_cpu_write(cpu_tlbstate.invalidate_other, true);
}

cpu_tlbstate

为了加快对TLB软件处理，x86架构下每个cpu都拥有一份tlb state状态维护，用于记录当前TLB状态及一些关键数据，用于防止频繁操作寄存器而带来的开销：

__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {.loaded_mm = &init_mm,.next_asid = 1,.cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */
};

cpu_tlbstate 数据类型为struct tlb_state，定义位于arch\x86\include\asm\Ttbflush.h文件中：


struct tlb_state {/** cpu_tlbstate.loaded_mm should match CR3 whenever interrupts* are on.  This means that it may not match current->active_mm,* which will contain the previous user mm when we're in lazy TLB* mode even if we've already switched back to swapper_pg_dir.** During switch_mm_irqs_off(), loaded_mm will be set to* LOADED_MM_SWITCHING during the brief interrupts-off window* when CR3 and loaded_mm would otherwise be inconsistent.  This* is for nmi_uaccess_okay()'s benefit.*/struct mm_struct *loaded_mm;#define LOADED_MM_SWITCHING ((struct mm_struct *)1UL)/* Last user mm for optimizing IBPB */union {struct mm_struct    *last_user_mm;unsigned long     last_user_mm_ibpb;};u16 loaded_mm_asid;u16 next_asid;/** We can be in one of several states:**  - Actively using an mm.  Our CPU's bit will be set in*    mm_cpumask(loaded_mm) and is_lazy == false;**  - Not using a real mm.  loaded_mm == &init_mm.  Our CPU's bit*    will not be set in mm_cpumask(&init_mm) and is_lazy == false.**  - Lazily using a real mm.  loaded_mm != &init_mm, our bit*    is set in mm_cpumask(loaded_mm), but is_lazy == true.*    We're heuristically guessing that the CR3 load we*    skipped more than makes up for the overhead added by*    lazy mode.*/bool is_lazy;/** If set we changed the page tables in such a way that we* needed an invalidation of all contexts (aka. PCIDs / ASIDs).* This tells us to go invalidate all the non-loaded ctxs[]* on the next context switch.** The current ctx was kept up-to-date as it ran and does not* need to be invalidated.*/bool invalidate_other;/** Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate* the corresponding user PCID needs a flush next time we* switch to it; see SWITCH_TO_USER_CR3.*/unsigned short user_pcid_flush_mask;/** Access to this CR4 shadow and to H/W CR4 is protected by* disabling interrupts when modifying either one.*/unsigned long cr4;/** This is a list of all contexts that might exist in the TLB.* There is one per ASID that we use, and the ASID (what the* CPU calls PCID) is the index into ctxts.** For each context, ctx_id indicates which mm the TLB's user* entries came from.  As an invariant, the TLB will never* contain entries that are out-of-date as when that mm reached* the tlb_gen in the list.** To be clear, this means that it's legal for the TLB code to* flush the TLB without updating tlb_gen.  This can happen* (for now, at least) due to paravirt remote flushes.** NB: context 0 is a bit special, since it's also used by* various bits of init code.  This is fine -- code that* isn't aware of PCID will end up harmlessly flushing* context 0.*/struct tlb_context ctxs[TLB_NR_DYN_ASIDS];
};

struct mm_struct *loaded_mm; 主要记录当前硬件地址转换功能正在运行的进程的mm 。
u16 loaded_mm_asid：进程asid
u16 next_asid:下一个将要执行的进程asid
bool is_lazy: 是否lazy模式
bool invalidate_other: 当设置为true之后，修改page table时，需要将TLB其他进程 entry也需要设置无效
unsigned short user_pcid_flush_mask：用于记录当下次switch cr3时候，需要刷新相应的asid
unsigned long cr4: cr4寄存器内容
struct tlb_context ctxs[TLB_NR_DYN_ASIDS]：所有可能在TLB中存在的pcid.

参考资料

《AMD64 Architecture Programmer's manual volums》

linux那些事之TLB(Translation-Lookaside Buffer)无效操作相关推荐

Translation Lookaside Buffer (TLB)
CPU每次访问虚拟内存,虚拟地址都必须转换为对应的物理地址.从概念上说,这个转换需要遍历页表,页表是三级页表,就需要3次内存访问.就是说,每次虚拟内存访问都会导致4次物理内存访问.简单点说,如果一次虚 ...
linux那些事之 page translation(硬件篇）
Page Translation 以<AMD64 Architecture Programmer's manual volums>从硬件角度说明一个虚拟地址如何转成对应物理页.AM64 地 ...
【转】TLB(Translation Lookaside Buffers,TLB)的作用
原文网址:http://sdnydubing.blog.163.com/blog/static/137470570201122810503396/ 从虚拟地址到物理地址的转换过程可知:使用一级页表进行 ...
Linux内存管理：TLB flush操作
目录一.前言二.基本概念 1.什么是TLB? 2.为什么有TLB? 3.TLB工作原理三.ARMv8的TLB 1.TLB的组成结构 2.如何确定TLB match 3.进程切换和ASID(Add ...
linux那些事之pin memory相关API
内核中为pin memory 用户空间申请物理内存除了get_user_pages() API函数之外,还有其他相关一系列函数,主要位于mm\gup.c 主要都是针对get_user_pages进行的 ...
linux那些事之follow_page
follow_page()函数是内核中用于根据虚拟地址查找对应的物理页函数,函数定义如下: struct page *follow_page(struct vm_area_struct *vma, u ...
linux那些事之page table
内核中page table决定了一个虚拟地址如何查找到对应的物理地址关键,维护着整个整个虚拟地址与物理地址对应信息,是实现整个内存虚拟化的基础.在虚拟地址到物理地址的转换过程中为了减少整个物理空间的占 ...
Hybrid TLB Coalescing：Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations
Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocation ...
linux那些事之LRU（3）
继续承接<linux那些事之LRU(2)>,shrink_active_list()函数中需要将调用isolate_lru_pages函数将active LRU中从链表尾部隔离出nr_to ...

linux那些事之TLB(Translation-Lookaside Buffer)无效操作

TLB