内核对vm_area_struct 数据结构相关操作进行了一系列封装,方面进行后续操作,vma的操作实现大部分位于mm\mmap.c文件中

find_vma()

find_vma()是内核中经常需要查找某个给定的虚拟地址是否已经分配,源代码如下:


/* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
{struct rb_node *rb_node;struct vm_area_struct *vma;/* Check the cache first. */vma = vmacache_find(mm, addr);if (likely(vma))return vma;rb_node = mm->mm_rb.rb_node;while (rb_node) {struct vm_area_struct *tmp;tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);if (tmp->vm_end > addr) {vma = tmp;if (tmp->vm_start <= addr)break;rb_node = rb_node->rb_left;} elserb_node = rb_node->rb_right;}if (vma)vmacache_update(addr, vma);return vma;
}

参数:

  • struct mm_struct *mm:为查询的虚拟地址空间是属于哪个进程的空间管理。
  • unsigned long addr:要查询的地址。

函数源代码主要流程:

  • vmacache_find(mm, addr):首先从 vmcache中查询是否存在该地址对应的vma,关于vmcache可以看下《inux内核那些事之用户空间管理》.
  • 如果vmcache里面没有,则从mm中所管理的所有vma进行遍历查询即遍历mm中mm_rb红黑树,寻找合适的vma(合适vma定义即离addr最近的vma即addr一定小于vma->end)
  • 如果查询到vma,则刷新vmacahe vmacache_update(addr, vma), 以加快速度查找。

find_vma函数返回的vma,有两种情况:

  • 要查询的addr地址位于VMA空间范围之内,即位于[vma->vm_start, vma->vm_end) 之内
  • 要查询的addr 不位于vma内,即addr 小于vm->vm->vm_start

返回的vma 的vm->end地址一定大于addr,但是不能够保证addr位于vma内,但是能够保证返回的vma一定位于addr最近的已经分配的空间。

find_vma_prev

find_vma_prev和find_vma功能类似,不过该函数不仅能够返回查找到的vma,还返回了vma前一个prev节点:


/** Same as find_vma, but also return a pointer to the previous VMA in *pprev.*/
struct vm_area_struct *
find_vma_prev(struct mm_struct *mm, unsigned long addr,struct vm_area_struct **pprev)
{struct vm_area_struct *vma;vma = find_vma(mm, addr);if (vma) {*pprev = vma->vm_prev;} else {struct rb_node *rb_node = rb_last(&mm->mm_rb);*pprev = rb_node ? rb_entry(rb_node, struct vm_area_struct, vm_rb) : NULL;}return vma;
}

代码主要处理流程:

  • 调用find_vma查找 一个vma
  • 如果vma不为空,则将vma之前的节点返回*pprev=vma->vm_prev.
  • 如果vma为空,则返回当前进程内最后申请的vma rb_last(&mm->mm_rb), 如果rb_node为空,则说明该进程空间还没有申请过vma。

find_vma_links()

find_vma_link 主要经常用于创建一个新的vma过程中使用,该函数不仅能够返回查询到的vma以及pprev, 还返回rb_link即为新的要插入的vma返回一个合适的位置进行插入


static int find_vma_links(struct mm_struct *mm, unsigned long addr,unsigned long end, struct vm_area_struct **pprev,struct rb_node ***rb_link, struct rb_node **rb_parent)
{struct rb_node **__rb_link, *__rb_parent, *rb_prev;__rb_link = &mm->mm_rb.rb_node;rb_prev = __rb_parent = NULL;while (*__rb_link) {struct vm_area_struct *vma_tmp;__rb_parent = *__rb_link;vma_tmp = rb_entry(__rb_parent, struct vm_area_struct, vm_rb);if (vma_tmp->vm_end > addr) {/* Fail if an existing vma overlaps the area */if (vma_tmp->vm_start < end)return -ENOMEM;__rb_link = &__rb_parent->rb_left;} else {rb_prev = __rb_parent;__rb_link = &__rb_parent->rb_right;}}*pprev = NULL;if (rb_prev)*pprev = rb_entry(rb_prev, struct vm_area_struct, vm_rb);*rb_link = __rb_link;*rb_parent = __rb_parent;return 0;
}

该函数会遍历rb tree直到叶子节点没有子节点为止。

  • 遍历过程中如果addr 大于vma->vm_end,则将__rb_link = &__rb_parent->rb_rightrb_prev=__rb_parent,往右节点遍历
  • 如果addr 小于vma->vm_end, 且end>vma->vm_start, 则返回错误-ENOMEM
  • 如果addr 小于vma->vm_end  且且end<= vma->vm_start,则往左节点遍历
  • 上述如此反复,直到叶子节点没有子节点为止。
  • 并将*rb_link = __rb_link及 *rb_parent = __rb_parent, 并且返回*pprev.

vm_area_cachep

每次申请一个新的vm_area_struct 结构是,都是从vm_area_cachep中获取,vm_area_cachep为基于slab 算法管理着所有的申请的vm_area_struct结构,例如vm_area_alloc 申请一个新的VMA时,从vm_area_cachep中申请一个新的VMA结构

struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
{struct vm_area_struct *vma;vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);if (vma)vma_init(vma, mm);return vma;
}

从 vm_area_cachep中申请一个vma, vm_area_cache为基于slab分配算法(后续再详细描述)在buffy算法之上 管理小于page的内存分配算法,proc_caches_init()初始化时会创建一个 vm_area_cache:


void __init proc_caches_init(void)
{... ...mm_cachep = kmem_cache_create_usercopy("mm_struct",mm_size, ARCH_MIN_MMSTRUCT_ALIGN,SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT,offsetof(struct mm_struct, saved_auxv),sizeof_field(struct mm_struct, saved_auxv),NULL);vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);mmap_init();nsproxy_cache_init();
}

可以通过cat /proc/slabinfo 命令查看 到vm_are_struct结构分配情况:

vma_init

初始化vma函数:

static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
{static const struct vm_operations_struct dummy_vm_ops = {};memset(vma, 0, sizeof(*vma));vma->vm_mm = mm;vma->vm_ops = &dummy_vm_ops;INIT_LIST_HEAD(&vma->anon_vma_chain);
}

这里需要特别注意的就是vm_ops设置,此时vm_ops里面各自操作全部为空,即使用系统默认的VMA操作。

VMA 匿名空间

在系统中很多情况包括申请一个内存,以及mmap系统调用申请一个新的匿名映射申请的vma都属于匿名空间,vma_set_anonymous()设置VMA为匿名空间:

static inline void vma_set_anonymous(struct vm_area_struct *vma)
{vma->vm_ops = NULL;
}

可以看懂当vma为匿名空间时,vm_ops为NULL。

static inline bool vma_is_anonymous(struct vm_area_struct *vma)
{return !vma->vm_ops;
}

判断一个vma是否为匿名,如果vm_ops为空则为匿名。

vma_is_foreign

判断vma属于当前进程空间内:

static inline bool vma_is_foreign(struct vm_area_struct *vma)
{if (!current->mm)return true;if (current->mm != vma->vm_mm)return true;return false;
}

如果不属于当前运行进程内空间则返回true,否则返回false

insert_vm_struct()

把新创建的VMA插入到进程的mm 红黑树中:


/* Insert vm structure into process list sorted by address* and into the inode's i_mmap tree.  If vm_file is non-NULL* then i_mmap_rwsem is taken here.*/
int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
{struct vm_area_struct *prev;struct rb_node **rb_link, *rb_parent;if (find_vma_links(mm, vma->vm_start, vma->vm_end,&prev, &rb_link, &rb_parent))return -ENOMEM;if ((vma->vm_flags & VM_ACCOUNT) &&security_vm_enough_memory_mm(mm, vma_pages(vma)))return -ENOMEM;/** The vm_pgoff of a purely anonymous vma should be irrelevant* until its first write fault, when page's anon_vma and index* are set.  But now set the vm_pgoff it will almost certainly* end up with (unless mremap moves it elsewhere before that* first wfault), so /proc/pid/maps tells a consistent story.** By setting it to reflect the virtual start address of the* vma, merges and splits can happen in a seamless way, just* using the existing file pgoff checks and manipulations.* Similarly in do_mmap_pgoff and in do_brk.*/if (vma_is_anonymous(vma)) {BUG_ON(vma->anon_vma);vma->vm_pgoff = vma->vm_start >> PAGE_SHIFT;}vma_link(mm, vma, prev, rb_link, rb_parent);return 0;
}
  • 调用find_vma_links(),查找vma要插入的红黑树中节点位置。
  • 如何flag 设置了VM_ACCOUNT,且security_vm_enough_memory_mm()没有足够内存 返回失败,否则继续往下走。
  • vma_is_anonymous(),如果是匿名映射 则把vma->vm_pgoff 设置成 vma->vm_start >> PAGE_SHIFT,即偏移位置为vm的start 虚拟地址 vfn
  • vma_link将vma 添加到相应节点中。

vma_link

插入vma到相应的节点上去,由于vma 即支持双向链表又支持红黑树方式,处理起来比较繁琐:


static void vma_link(struct mm_struct *mm, struct vm_area_struct *vma,struct vm_area_struct *prev, struct rb_node **rb_link,struct rb_node *rb_parent)
{struct address_space *mapping = NULL;if (vma->vm_file) {mapping = vma->vm_file->f_mapping;i_mmap_lock_write(mapping);}__vma_link(mm, vma, prev, rb_link, rb_parent);__vma_link_file(vma);if (mapping)i_mmap_unlock_write(mapping);mm->map_count++;validate_mm(mm);
}
  • 如果是文件映射,则获取文件映射mapping = vma->vm_file->f_mapping 空间
  • __vma_link:将vma插入到链表和红黑树中。
  • __vma_link_file:如果是文件映射,则还需要将vma添加到文件映射空间address_space中的i_mmap中
  • mm->map_count++: mm中map_count+1.
  • validate_mm: vma的后续处理。

__vma_link

__vma_link将所创建的vma  链接到rb tree和双向链表中:

static void
__vma_link(struct mm_struct *mm, struct vm_area_struct *vma,struct vm_area_struct *prev, struct rb_node **rb_link,struct rb_node *rb_parent)
{__vma_link_list(mm, vma, prev);__vma_link_rb(mm, vma, rb_link, rb_parent);
}

vma_merge

vma虽然是一个虚拟空间,但是再反复申请或释放过程中会不断产生一些碎片,为了解决这些碎片问题需要不断与周围vma进行合并。合并的另外一个好处就是可以节省vma数目,减少内存占用空间。因此在新申请一个节点之前,首先需要根据需要申请的addr以及大小判断是否和之前或者之后的vma需要合并,如何能够合并就可以减少vma结构。

struct vm_area_struct *vma_merge(struct mm_struct *mm,struct vm_area_struct *prev, unsigned long addr,unsigned long end, unsigned long vm_flags,struct anon_vma *anon_vma, struct file *file,pgoff_t pgoff, struct mempolicy *policy,struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
{pgoff_t pglen = (end - addr) >> PAGE_SHIFT;struct vm_area_struct *area, *next;int err;/** We later require that vma->vm_flags == vm_flags,* so this tests vma->vm_flags & VM_SPECIAL, too.*/if (vm_flags & VM_SPECIAL)return NULL;if (prev)next = prev->vm_next;elsenext = mm->mmap;area = next;if (area && area->vm_end == end)      /* cases 6, 7, 8 */next = next->vm_next;/* verify some invariant that must be enforced by the caller */VM_WARN_ON(prev && addr <= prev->vm_start);VM_WARN_ON(area && end > area->vm_end);VM_WARN_ON(addr >= end);/** Can it merge with the predecessor?*/if (prev && prev->vm_end == addr &&mpol_equal(vma_policy(prev), policy) &&can_vma_merge_after(prev, vm_flags,anon_vma, file, pgoff,vm_userfaultfd_ctx)) {/** OK, it can.  Can we now merge in the successor as well?*/if (next && end == next->vm_start &&mpol_equal(policy, vma_policy(next)) &&can_vma_merge_before(next, vm_flags,anon_vma, file,pgoff+pglen,vm_userfaultfd_ctx) &&is_mergeable_anon_vma(prev->anon_vma,next->anon_vma, NULL)) {/* cases 1, 6 */err = __vma_adjust(prev, prev->vm_start,next->vm_end, prev->vm_pgoff, NULL,prev);} else                 /* cases 2, 5, 7 */err = __vma_adjust(prev, prev->vm_start,end, prev->vm_pgoff, NULL, prev);if (err)return NULL;khugepaged_enter_vma_merge(prev, vm_flags);return prev;}/** Can this new request be merged in front of next?*/if (next && end == next->vm_start &&mpol_equal(policy, vma_policy(next)) &&can_vma_merge_before(next, vm_flags,anon_vma, file, pgoff+pglen,vm_userfaultfd_ctx)) {if (prev && addr < prev->vm_end)  /* case 4 */err = __vma_adjust(prev, prev->vm_start,addr, prev->vm_pgoff, NULL, next);else {                 /* cases 3, 8 */err = __vma_adjust(area, addr, next->vm_end,next->vm_pgoff - pglen, NULL, next);/** In case 3 area is already equal to next and* this is a noop, but in case 8 "area" has* been removed and next was expanded over it.*/area = next;}if (err)return NULL;khugepaged_enter_vma_merge(area, vm_flags);return area;}return NULL;
}

该函数参数比较多:

mm: 所处的进程空间

prev: 新创建节点要插入的位置之前的节点

addr: 新申请的进程起始地址。

end:新申请的进程空间结束地址。

vm_flags:新申请的进程空间的vm_flags

anon_vma:匿名映射,后续需要详细将主要用于反向映射。

file:文件映射

pgoff: 文件映射中的偏移

policy:内存策略

vm_userfaultfd_ctx: context上下文。

该函数整个代码逻辑比较简单,就是判断是否能够和之前vma以及之后vma向合并:

与前一个vma合并的条件为:

  • 前一个vma->end结束地址等于addr
  • vma policy策略相同
  • 不能和前一个vma vm_flags相冲突

和后一个vma合并条件为:

  • 后一个vma->start地址与end地址相等
  • vma policy策略相同
  • 不能和后一个vma vm_flags相冲突

 如果需要申请的新的进程空间vma flag设置为VM_SPECIAL,则不能进行合并直接进行返回

#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)

因此一个新申请的进程空间可以合并情况如下:

  • 新VMA的起始地址和前prev节点结束地址重叠,和前一个prev节点合并且满足合并需求,但是不能和后next节点合并
  • 新VMA的结束地址和后next节点的起始地址重叠,和后一个next节点合并且满足合并需求,但是不能和prev节点合并
  • 新的VMA正好能够将prev和next都能衔接上且都能满足合并需求,那么就可以和prev和next合并成一个大的vma

如果能够合并则调用__vma_adjust()进行调整。

__vma_adjust()

__vma_adjust为具体执行合并操作:


/** We cannot adjust vm_start, vm_end, vm_pgoff fields of a vma that* is already present in an i_mmap tree without adjusting the tree.* The following helper function should be used when such adjustments* are necessary.  The "insert" vma (if any) is to be inserted* before we drop the necessary locks.*/
int __vma_adjust(struct vm_area_struct *vma, unsigned long start,unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,struct vm_area_struct *expand)
{struct mm_struct *mm = vma->vm_mm;struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;struct address_space *mapping = NULL;struct rb_root_cached *root = NULL;struct anon_vma *anon_vma = NULL;struct file *file = vma->vm_file;bool start_changed = false, end_changed = false;long adjust_next = 0;int remove_next = 0;if (next && !insert) {struct vm_area_struct *exporter = NULL, *importer = NULL;if (end >= next->vm_end) {/** vma expands, overlapping all the next, and* perhaps the one after too (mprotect case 6).* The only other cases that gets here are* case 1, case 7 and case 8.*/if (next == expand) {/** The only case where we don't expand "vma"* and we expand "next" instead is case 8.*/VM_WARN_ON(end != next->vm_end);/** remove_next == 3 means we're* removing "vma" and that to do so we* swapped "vma" and "next".*/remove_next = 3;VM_WARN_ON(file != next->vm_file);swap(vma, next);} else {VM_WARN_ON(expand != vma);/** case 1, 6, 7, remove_next == 2 is case 6,* remove_next == 1 is case 1 or 7.*/remove_next = 1 + (end > next->vm_end);VM_WARN_ON(remove_next == 2 &&end != next->vm_next->vm_end);/* trim end to next, for case 6 first pass */end = next->vm_end;}exporter = next;importer = vma;/** If next doesn't have anon_vma, import from vma after* next, if the vma overlaps with it.*/if (remove_next == 2 && !next->anon_vma)exporter = next->vm_next;} else if (end > next->vm_start) {/** vma expands, overlapping part of the next:* mprotect case 5 shifting the boundary up.*/adjust_next = (end - next->vm_start) >> PAGE_SHIFT;exporter = next;importer = vma;VM_WARN_ON(expand != importer);} else if (end < vma->vm_end) {/** vma shrinks, and !insert tells it's not* split_vma inserting another: so it must be* mprotect case 4 shifting the boundary down.*/adjust_next = -((vma->vm_end - end) >> PAGE_SHIFT);exporter = vma;importer = next;VM_WARN_ON(expand != importer);}/** Easily overlooked: when mprotect shifts the boundary,* make sure the expanding vma has anon_vma set if the* shrinking vma had, to cover any anon pages imported.*/if (exporter && exporter->anon_vma && !importer->anon_vma) {int error;importer->anon_vma = exporter->anon_vma;error = anon_vma_clone(importer, exporter);if (error)return error;}}
again:vma_adjust_trans_huge(orig_vma, start, end, adjust_next);if (file) {mapping = file->f_mapping;root = &mapping->i_mmap;uprobe_munmap(vma, vma->vm_start, vma->vm_end);if (adjust_next)uprobe_munmap(next, next->vm_start, next->vm_end);i_mmap_lock_write(mapping);if (insert) {/** Put into interval tree now, so instantiated pages* are visible to arm/parisc __flush_dcache_page* throughout; but we cannot insert into address* space until vma start or end is updated.*/__vma_link_file(insert);}}anon_vma = vma->anon_vma;if (!anon_vma && adjust_next)anon_vma = next->anon_vma;if (anon_vma) {VM_WARN_ON(adjust_next && next->anon_vma &&anon_vma != next->anon_vma);anon_vma_lock_write(anon_vma);anon_vma_interval_tree_pre_update_vma(vma);if (adjust_next)anon_vma_interval_tree_pre_update_vma(next);}if (root) {flush_dcache_mmap_lock(mapping);vma_interval_tree_remove(vma, root);if (adjust_next)vma_interval_tree_remove(next, root);}if (start != vma->vm_start) {vma->vm_start = start;start_changed = true;}if (end != vma->vm_end) {vma->vm_end = end;end_changed = true;}vma->vm_pgoff = pgoff;if (adjust_next) {next->vm_start += adjust_next << PAGE_SHIFT;next->vm_pgoff += adjust_next;}if (root) {if (adjust_next)vma_interval_tree_insert(next, root);vma_interval_tree_insert(vma, root);flush_dcache_mmap_unlock(mapping);}if (remove_next) {/** vma_merge has merged next into vma, and needs* us to remove next before dropping the locks.*/if (remove_next != 3)__vma_unlink_common(mm, next, next);else/** vma is not before next if they've been* swapped.** pre-swap() next->vm_start was reduced so* tell validate_mm_rb to ignore pre-swap()* "next" (which is stored in post-swap()* "vma").*/__vma_unlink_common(mm, next, vma);if (file)__remove_shared_vm_struct(next, file, mapping);} else if (insert) {/** split_vma has split insert from vma, and needs* us to insert it before dropping the locks* (it may either follow vma or precede it).*/__insert_vm_struct(mm, insert);} else {if (start_changed)vma_gap_update(vma);if (end_changed) {if (!next)mm->highest_vm_end = vm_end_gap(vma);else if (!adjust_next)vma_gap_update(next);}}if (anon_vma) {anon_vma_interval_tree_post_update_vma(vma);if (adjust_next)anon_vma_interval_tree_post_update_vma(next);anon_vma_unlock_write(anon_vma);}if (mapping)i_mmap_unlock_write(mapping);if (root) {uprobe_mmap(vma);if (adjust_next)uprobe_mmap(next);}if (remove_next) {if (file) {uprobe_munmap(next, next->vm_start, next->vm_end);fput(file);}if (next->anon_vma)anon_vma_merge(vma, next);mm->map_count--;mpol_put(vma_policy(next));vm_area_free(next);/** In mprotect's case 6 (see comments on vma_merge),* we must remove another next too. It would clutter* up the code too much to do both in one go.*/if (remove_next != 3) {/** If "next" was removed and vma->vm_end was* expanded (up) over it, in turn* "next->vm_prev->vm_end" changed and the* "vma->vm_next" gap must be updated.*/next = vma->vm_next;} else {/** For the scope of the comment "next" and* "vma" considered pre-swap(): if "vma" was* removed, next->vm_start was expanded (down)* over it and the "next" gap must be updated.* Because of the swap() the post-swap() "vma"* actually points to pre-swap() "next"* (post-swap() "next" as opposed is now a* dangling pointer).*/next = vma;}if (remove_next == 2) {remove_next = 1;end = next->vm_end;goto again;}else if (next)vma_gap_update(next);else {/** If remove_next == 2 we obviously can't* reach this path.** If remove_next == 3 we can't reach this* path because pre-swap() next is always not* NULL. pre-swap() "next" is not being* removed and its next->vm_end is not altered* (and furthermore "end" already matches* next->vm_end in remove_next == 3).** We reach this only in the remove_next == 1* case if the "next" vma that was removed was* the highest vma of the mm. However in such* case next->vm_end == "end" and the extended* "vma" has vma->vm_end == next->vm_end so* mm->highest_vm_end doesn't need any update* in remove_next == 1 case.*/VM_WARN_ON(mm->highest_vm_end != vm_end_gap(vma));}}if (insert && file)uprobe_mmap(insert);validate_mm(mm);return 0;
}

参数:

vma:要合并调整的vma

start: 新合并的vma 起始地址

end:要合并的结束地址

pgoff: 文件映射则指向偏移

insert:要插入的节点位置

expand:合并后要扩展到的节点位置,如果要合并的vma只是向前或者向后一个vma,则expand和vma相等,如果是两个vma合并,则expand指向要扩展到的节点vma.

remove_vma_list()

该函数主要是从一个进程空间中 删除vma 以及vma后面所有next空间即删除vma及后面整个list

static void remove_vma_list(struct mm_struct *mm, struct vm_area_struct *vma)
{unsigned long nr_accounted = 0;/* Update high watermark before we lower total_vm */update_hiwater_vm(mm);do {long nrpages = vma_pages(vma);if (vma->vm_flags & VM_ACCOUNT)nr_accounted += nrpages;vm_stat_account(mm, vma->vm_flags, -nrpages);vma = remove_vma(vma);} while (vma);vm_unacct_memory(nr_accounted);validate_mm(mm);
}

主要步骤:

  • 更新hiwater_vm, 如果当前进程空间所使用的 total_vm 大于hiwater_vm, 则mm->hiwater_vm = mm->total_vm
  • 下面是一个循环删除该vma以及对应指向的下一个vma->next:计算出该vma总共有多少个页nrpages, 如果vma->vm_flags 设置VM_ACCOUNT,则nr_accounted += nrpages
  • vm_stat_account:进行页统计,删除vma后,就会相应空余出空间,因此按照vma类型,统计更新mm->exec_vm、mm->stack_vm以及mm->data_vm
  • remove_vma 删除vma,并返回要删除的vma指向的下一个vma

remove_vma()

删除单个vma:

static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
{struct vm_area_struct *next = vma->vm_next;might_sleep();if (vma->vm_ops && vma->vm_ops->close)vma->vm_ops->close(vma);if (vma->vm_file)fput(vma->vm_file);mpol_put(vma_policy(vma));vm_area_free(vma);return next;
}
  • 如果vm_ops及vm_ops_close不为空,则调用vm_ops->close()进行相应操作
  • 如果vma是一个文件映射,则将对应的文件映射计数减一
  • vm_area_free:是否该vma占用的内存空间

vm_area_free

将该vma占用的内存空间释放,返回给vm_area_cachep,进行下次使用:

void vm_area_free(struct vm_area_struct *vma)
{kmem_cache_free(vm_area_cachep, vma);
}

vm_area_dup

申请一个新的vma,并将orig中的内存复制给新的vma,与vm_area_alloc 相比 增加了复制功能:

struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
{struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);if (new) {*new = *orig;INIT_LIST_HEAD(&new->anon_vma_chain);new->vm_next = new->vm_prev = NULL;}return new;
}
  • 从vm_area_cachep 中申请一个新的vma
  • 将orig 内容复制为 新的vma,并将新的vma中anon_vma_chain初始化为空,且new->vm_next = new->vm_prev = NULL

find_vma_intersection

根据[start_addr, end_addr)地址查找一个vma,与find_vma 不同的时,所返回的vma如果不为空该vma空间必然属于[start_addr, end_addr)

static inline struct vm_area_struct * find_vma_intersection(struct mm_struct * mm, unsigned long start_addr, unsigned long end_addr)
{struct vm_area_struct * vma = find_vma(mm,start_addr);if (vma && end_addr <= vma->vm_start)vma = NULL;return vma;
}
  • 调用find_vma 根据start_addr地址查找离start_addr最近的vma
  • 判断vma所属的空间是否属于[start_addr,end_addr),如果包含则返回该vma,如果不包含则返回NULL.

count_vma_pages_range()

计算一个进程空间的[start, end) 地址范围内 所有的已经申请的虚拟空间需要多少个实际物理页:

static unsigned long count_vma_pages_range(struct mm_struct *mm,unsigned long addr, unsigned long end)
{unsigned long nr_pages = 0;struct vm_area_struct *vma;/* Find first overlaping mapping */vma = find_vma_intersection(mm, addr, end);if (!vma)return 0;nr_pages = (min(end, vma->vm_end) -max(addr, vma->vm_start)) >> PAGE_SHIFT;/* Iterate over the rest of the overlaps */for (vma = vma->vm_next; vma; vma = vma->vm_next) {unsigned long overlap_len;if (vma->vm_start > end)break;overlap_len = min(end, vma->vm_end) - vma->vm_start;nr_pages += overlap_len >> PAGE_SHIFT;}return nr_pages;
}
  • 调用find_vma_intersection查找对应的进程空间中[addr, end)中是第一个已经申请的vma
  • 计算第一个vma 占用的page 数目
  • 然后根据第一个vma,查找next是否也在[addr, end)空间内,如果是则计算next 占用的page数目,并依次在寻找下一个vma是否也属于[addr, end)空间,直到找到[addr, end)空间内所有的vma,并计算出占用的page数目。

find_extend_vma()

find_extend_vma是find_vma函数的一个扩展,其函数接口如下:

struct vm_area_struct *
find_extend_vma(struct mm_struct *mm, unsigned long addr)

该函数的功能除了根据addr查找到对应用户进程空间的vma,如果找到到的addr地址<vma->vm_start,即addr不属于vma,则需要对vma进行扩展,以便扩展出的vma能够包括该addr,该函数的实现有三种方式,具体采用哪种方式与内核配置有关:

  • no mmu配置即不支持mmu内核配置,find_extend_vma功能等同于find_vma,不会扩展
  • 如果采用默认内核配置,即栈的增长方式是从上到下,则将find_vma()返回的vma进行扩展,扩展vma->start等于addr,以便该vma包含该地址
  • 如果内核配置为CONFIG_STACK_GROWSUP,即栈的增长方式从下到上,则扩展使用的vma 是使用vma->prev,即将前一个vma即vma->prev的end地址扩展为addr。

默认内核配置find_extend_vma代码如下:

struct vm_area_struct *
find_extend_vma(struct mm_struct *mm, unsigned long addr)
{struct vm_area_struct *vma;unsigned long start;addr &= PAGE_MASK;vma = find_vma(mm, addr);if (!vma)return NULL;if (vma->vm_start <= addr)return vma;if (!(vma->vm_flags & VM_GROWSDOWN))return NULL;/* don't alter vm_start if the coredump is running */if (!mmget_still_valid(mm))return NULL;start = vma->vm_start;if (expand_stack(vma, addr))return NULL;if (vma->vm_flags & VM_LOCKED)populate_vma_page_range(vma, addr, start, NULL);return vma;
}
  • 如果查找的vma为空 则之间返回为NULL,意思就是既然无法查找到离addr最近的vma,那么也就无法进行扩展
  • vma->vm_start <= addr意味着查找到的vma包括addr地址,不需要进行扩展 直接返回
  • 如果vma->vm_flags标记为没有设置  VM_GROWSDOWN,则说明增长方式不支持,直接返回
  • mmget_still_valid,验证mm是否处于有效
  • expand_stack()是核心处理函数,对vma进行扩展
  • 如果vm_flags设置了VM_LOCKED,则意味着需要对扩展的vma进行地址锁定,分配扩展部分的地址对应的物理内存

expand_stack()

expand_stack()对vma进行扩展,扩展方式与栈的增长方式有关,默认情况下是采用down扩展方式即将vma->start向下扩展包括addr:

int expand_stack(struct vm_area_struct *vma, unsigned long address)
{return expand_downwards(vma, address);
}

采用down扩展方式调用的expand_downwards处理函数

expand_downwards()

该函数处理流程如下:


/** vma is the first one with address < vma->vm_start.  Have to extend vma.*/
int expand_downwards(struct vm_area_struct *vma,unsigned long address)
{struct mm_struct *mm = vma->vm_mm;struct vm_area_struct *prev;int error = 0;address &= PAGE_MASK;if (address < mmap_min_addr)return -EPERM;/* Enforce stack_guard_gap */prev = vma->vm_prev;/* Check that both stack segments have the same anon_vma? */if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&vma_is_accessible(prev)) {if (address - prev->vm_end < stack_guard_gap)return -ENOMEM;}/* We must make sure the anon_vma is allocated. */if (unlikely(anon_vma_prepare(vma)))return -ENOMEM;/** vma->vm_start/vm_end cannot change under us because the caller* is required to hold the mmap_lock in read mode.  We need the* anon_vma lock to serialize against concurrent expand_stacks.*/anon_vma_lock_write(vma->anon_vma);/* Somebody else might have raced and expanded it already */if (address < vma->vm_start) {unsigned long size, grow;size = vma->vm_end - address;grow = (vma->vm_start - address) >> PAGE_SHIFT;error = -ENOMEM;if (grow <= vma->vm_pgoff) {error = acct_stack_growth(vma, size, grow);if (!error) {/** vma_gap_update() doesn't support concurrent* updates, but we only hold a shared mmap_lock* lock here, so we need to protect against* concurrent vma expansions.* anon_vma_lock_write() doesn't help here, as* we don't guarantee that all growable vmas* in a mm share the same root anon vma.* So, we reuse mm->page_table_lock to guard* against concurrent vma expansions.*/spin_lock(&mm->page_table_lock);if (vma->vm_flags & VM_LOCKED)mm->locked_vm += grow;vm_stat_account(mm, vma->vm_flags, grow);anon_vma_interval_tree_pre_update_vma(vma);vma->vm_start = address;vma->vm_pgoff -= grow;anon_vma_interval_tree_post_update_vma(vma);vma_gap_update(vma);spin_unlock(&mm->page_table_lock);perf_event_mmap(vma);}}}anon_vma_unlock_write(vma->anon_vma);khugepaged_enter_vma_merge(vma, vma->vm_flags);validate_mm(mm);return error;
}

主要处理流程如下:

  • 对要扩展的address地址进行预处理,包括进行页对齐 以及地址有效性检查,还包括对各种进程地址返回检查 防止出现越界等现象
  • address < vma->vm_start 表明对齐之后 还是需要对地址进行扩展,对vma进行扩展操作,计算要扩展大小等,设置vma->vm_start= address,以及更改vma中的信息
  • 如果该vma是huge age则需要进一步处理
  • validate_mm:如果不是开启CONFIG_DEBUG_VM_RB,则处理为空 主要为调式vma使用。

linux内核那些事之VMA常用操作相关推荐

  1. linux内核那些事之 VMA Gap

    rb_subtree_gap kernel为了提高对VMA 查找.插入速度,采用了很多技术,例如VMA采用RB tree和 双向链表两种组织形式,另外对VMA结构对cache line对齐从而提高ca ...

  2. linux内核那些事之buddy(anti-fragment机制)(4)

    程序运行过程中,有些内存是短暂的驻留 用完一段时间之后就可以将内存释放以供后面再次使用,但是有些内存一旦申请之后,会长期使用而得不到释放.长久运行有可能造成碎片.以<professional l ...

  3. linux内核那些事之mmap_region流程梳理

    承接<linux内核那些事之mmap>,mmap_region()是申请一个用户进程虚拟空间 并根据匿名映射或者文件映射做出相应动作,是实现mmap关键函数,趁这几天有空闲时间 整理下mm ...

  4. linux内核那些事之buddy

    buddy算法是内核中比较古老的一个模块,很好的解决了相邻物理内存碎片的问题即"内碎片问题",同时有兼顾内存申请和释放效率问题,内核从引入该算法之后一直都能够在各种设备上完好运行, ...

  5. linux内核那些事之buddy(anti-fragment机制-steal page)(5)

    继<linux内核那些事之buddy(anti-fragment机制)(4)>,在同一个zone内指定的migrate type中没有足够内存,会启动fallback机制,从fallbac ...

  6. linux内核那些事之buddy(慢速申请内存__alloc_pages_slowpath)(5)

    内核提供__alloc_pages_nodemask接口申请物理内存主要分为两个部分:快速申请物理内存get_page_from_freelist(linux内核那些事之buddy(快速分配get_p ...

  7. linux内核那些事之pg_data_t、zone结构初始化

    free_area_init 继续接着<linux内核那些事之ZONE>,分析内核物理内存初始化过程,zone_sizes_init()在开始阶段主要负责对各个类型zone 大小进行计算, ...

  8. linux内核那些事之Sparse vmemmap

    <inux内核那些事之物理内存模型之SPARCE(3)>中指出在传统的sparse 内存模型中,每个mem_section都有一个属于自己的section_mem_map,如下图所示: 而 ...

  9. linux内核驱动中对字符串的操作【转】

    转自:http://www.360doc.com/content/12/1224/10/3478092_255969530.shtml Linux内核中关于字符串的相关操作,首先包含头文件: [cpp ...

最新文章

  1. 1054 The Dominant Color
  2. 【S操作】轻松优雅防止(解决)两次掉进同一坑的完美解决方案
  3. Android异步任务机制之AsycTask
  4. 单调不减序列查询第一个大于等于_[力扣84,85] 单调栈
  5. Redis操作命令总结
  6. OpenMP之双重for循环并行计算改进
  7. 高通量数据中批次效应的鉴定和处理(六)- 直接校正表达矩阵
  8. 对工作生活的一点感悟
  9. WEB HTTP:浏览器HTTP协议漫谈、请求对象Httprequest、响应对象HttpResponse、浏览器内部工作原理(待完善)
  10. AnnotationTransactionAttributeSource is only available on Java 1.5 and higher
  11. Fragment学习笔记
  12. JSK-12 最后一个单词的长度【入门】
  13. 自动驾驶_基于强化学习的自动驾驶系统
  14. Spring中@Autowired注入static静态变量空指针异常
  15. 高校后勤管理系统java代码_《高校后勤管理系统的设计与实现》论文笔记二
  16. CIM系统与MES系统介绍
  17. C盘无损扩容 win10(亲测,良心,有用)
  18. js之按键控制div移动
  19. 2019第十届蓝桥杯大赛软件类省赛C++ B组真题题解
  20. ae批量修改字体_AE脚本-批量文字替换图层样式属性编辑脚本Aescripts pt_TextEdit 2.41 + 使用教程...

热门文章

  1. DevOps是什么意思
  2. 再问数据中台 - 数据中台建设的最大的挑战是什么
  3. Spring 实践 -AOP
  4. 设计抗住千万级流量的架构思路(转)
  5. dom选择方法的区别
  6. 交换机的简介及数据通信过程,子网掩码及路由基础
  7. Mysql总结_02_mysql数据库忘记密码时如何修改
  8. 怎么在html页面和js里判断是否是IE浏览器
  9. Oracle建立表空间,用户等环节
  10. 程序员常见保健方法【转贴】