继<linux内核那些事之buddy(anti-fragment机制)(4)>,在同一个zone内指定的migrate type中没有足够内存,会启动fallback机制,从fallbacks数组中寻找到合适其他type中获取到steal page,实施steal page核心处理函数为steal_suitable_fallback。

steal_suitable_fallback

steal_suitable_fallback定义如下:

void steal_suitable_fallback(struct zone *zone, struct page *page,unsigned int alloc_flags, int start_type, bool whole_block)

函数功能:

  • 实施steal page核心功能函数:steal 页面时是否需要修改page block migrate type属性。当order足够大时,会一次性将整个pageblock迁移过来,同时修改page block migratetype 。当只能steal pageblock中的一部分内存中,则并不修改page block migratetype意味着当前page block处于compatible migratetype 即一部分被其他migrateype使用。逻辑大概如下:

参数:

  • struct zone *zone: 所申请page位于zone.
  • struct page *page:所要开始steal page物理页。
  • unsigned int alloc_flags:申请内存使用的alloc flags。
  • int start_type:申请内存所指定的migrate type。
  • bool whole_block: 是否steal 整个page block。

steal_suitable_fallback流程

steal_suitable_fallback处理流程如下:

steal_suitable_fallback源码

结合steal_suitable_fallback源码分析:


/** This function implements actual steal behaviour. If order is large enough,* we can steal whole pageblock. If not, we first move freepages in this* pageblock to our migratetype and determine how many already-allocated pages* are there in the pageblock with a compatible migratetype. If at least half* of pages are free or compatible, we can change migratetype of the pageblock* itself, so pages freed in the future will be put on the correct free list.*/
static void steal_suitable_fallback(struct zone *zone, struct page *page,unsigned int alloc_flags, int start_type, bool whole_block)
{unsigned int current_order = page_order(page);int free_pages, movable_pages, alike_pages;int old_block_type;old_block_type = get_pageblock_migratetype(page);/** This can happen due to races and we want to prevent broken* highatomic accounting.*/if (is_migrate_highatomic(old_block_type))goto single_page;/* Take ownership for orders >= pageblock_order */if (current_order >= pageblock_order) {change_pageblock_range(page, current_order, start_type);goto single_page;}/** Boost watermarks to increase reclaim pressure to reduce the* likelihood of future fallbacks. Wake kswapd now as the node* may be balanced overall and kswapd will not wake naturally.*/boost_watermark(zone);if (alloc_flags & ALLOC_KSWAPD)set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);/* We are not allowed to try stealing from the whole block */if (!whole_block)goto single_page;free_pages = move_freepages_block(zone, page, start_type,&movable_pages);/** Determine how many pages are compatible with our allocation.* For movable allocation, it's the number of movable pages which* we just obtained. For other types it's a bit more tricky.*/if (start_type == MIGRATE_MOVABLE) {alike_pages = movable_pages;} else {/** If we are falling back a RECLAIMABLE or UNMOVABLE allocation* to MOVABLE pageblock, consider all non-movable pages as* compatible. If it's UNMOVABLE falling back to RECLAIMABLE or* vice versa, be conservative since we can't distinguish the* exact migratetype of non-movable pages.*/if (old_block_type == MIGRATE_MOVABLE)alike_pages = pageblock_nr_pages- (free_pages + movable_pages);elsealike_pages = 0;}/* moving whole block can fail due to zone boundary conditions */if (!free_pages)goto single_page;/** If a sufficient number of pages in the block are either free or of* comparable migratability as our allocation, claim the whole block.*/if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||page_group_by_mobility_disabled)set_pageblock_migratetype(page, start_type);return;single_page:move_to_free_list(page, zone, current_order, start_type);
}
  • 根据要steal(也可以称为迁移)的page获取到对应page block的migrate type(迁移属性)。
  • 如果page迁移属性为MIGRATE_HIGHATOMIC,则说明order为0,不做paga block迁移属性,直接调用move_to_free_list将页面迁移到对应freelist中即可。
  • 如果steal page的current_order大于等于pageblock_order,则说明要迁移的page 至少要大于一个page block,直接调用change_pageblock_range,修改对应page block迁移属性,并将要迁移页面move_to_free_list将页面迁移到对应freelist。
  • 如果上述情况都不是,需要进一步判断是否可以修改page block migrate type。
  • 首先修改zone boost water mark,决定kswapd回收内存尺度。
  • 如果alloc_flags设置ALLOC_KSWAPD,则内存发生迁移因为内存不足,可以提前触发KSWAPD线程进行内存规整等操作以便提前整理空闲物理内存

触发kswapd内核线程方式:


/** This function implements actual steal behaviour. If order is large enough,* we can steal whole pageblock. If not, we first move freepages in this* pageblock to our migratetype and determine how many already-allocated pages* are there in the pageblock with a compatible migratetype. If at least half* of pages are free or compatible, we can change migratetype of the pageblock* itself, so pages freed in the future will be put on the correct free list.*/
static void steal_suitable_fallback(struct zone *zone, struct page *page,unsigned int alloc_flags, int start_type, bool whole_block)
{unsigned int current_order = page_order(page);int free_pages, movable_pages, alike_pages;int old_block_type;old_block_type = get_pageblock_migratetype(page);... .../** Boost watermarks to increase reclaim pressure to reduce the* likelihood of future fallbacks. Wake kswapd now as the node* may be balanced overall and kswapd will not wake naturally.*/boost_watermark(zone);if (alloc_flags & ALLOC_KSWAPD)set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);... ...
}
  • steal_suitable_fallback函数将zone->flags 设置为ZONE_BOOSTED_WATERMARK,当申请完成之后:

/** Allocate a page from the given zone. Use pcplists for order-0 allocations.*/
static inline
struct page *rmqueue(struct zone *preferred_zone,struct zone *zone, unsigned int order,gfp_t gfp_flags, unsigned int alloc_flags,int migratetype)
{... ...
out:/* Separate test+clear to avoid unnecessary atomics */if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) {clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);wakeup_kswapd(zone, 0, 0, zone_idx(zone));}... ...
}
  • rmqueue函数会检查zone->flags 是否设置为ZONE_BOOSTED_WATERMARK,如果设置则将调用wakeup_kswapd触发kswapd,进行内存规整回收等操作。
  • whole_block如果为false,则说明不能做修改整个page block迁移属性,只将做页面迁移不做属性迁移。
  • move_freepages_block: 将指定空闲页进行页迁移,当迁移的空闲页数量和alike_pages大于>=pageblock_order,说明进行的时整个page block,需要修改其page block迁移属性
  • move_to_free_list:页面迁移,将页面从旧的migrate type中的free list迁移到新的migrate type free list中,后续新的migrate type中将有足够内存用于此次内存申请。

允许修改/迁移page block migrate type准则

由steal_suitable_fallback流程可以得出允许修改/迁移page block migrate type准则:

  • page 所对应oder 直接大于或者等于pageblock_order,允许做page block迁移属性修改。
  • 当page 对应小于pageblock_order时,whole_block为false 说明不允许修改page block migrate type。
  • 当page 对应小于pageblock_order时,whole_block为true是,需要判断page block原有属性为MIGRATE_MOVABLE,则说明page block原本就可以迁移,可以直接修改page block migrate type
  • 当page 对应小于pageblock_order时,whole_block为true是,page block原有属性不是MIGRATE_MOVABLE,则需要根据page block里面迁移的free_pages空闲页数量和可以利用alike_pages, 如果(free_pages+alike_pages) >pageblock_order,允许修改page block migrate type.
  • 其他情况不允许做修改page block migrate type。

修改页迁移属性使用set_pageblock_migratetype()函数。

页迁移

页迁移使用move_to_free_list接口,将page从原有的free list中删除同时加入到新的migrate type对应free list:

/* Used for pages which are on another list */
static inline void move_to_free_list(struct page *page, struct zone *zone,unsigned int order, int migratetype)
{struct free_area *area = &zone->free_area[order];list_move(&page->lru, &area->free_list[migratetype]);
}

page_group_by_mobility_disabled

page block migrate type可以通过page_group_by_mobility_disabled 开启和关闭,当系统启动过程对zone 进行初始化,会根据zone内的物理内存实际情况进行判断:


/** unless system_state == SYSTEM_BOOTING.** __ref due to call of __init annotated helper build_all_zonelists_init* [protected by SYSTEM_BOOTING].*/
void __ref build_all_zonelists(pg_data_t *pgdat)
{... .../** Disable grouping by mobility if the number of pages in the* system is too low to allow the mechanism to work. It would be* more accurate, but expensive to check per-zone. This check is* made on memory-hotadd so a system can start with mobility* disabled and enable it later*/if (vm_total_pages < (pageblock_nr_pages * MIGRATE_TYPES))page_group_by_mobility_disabled = 1;elsepage_group_by_mobility_disabled = 0;pr_info("Built %u zonelists, mobility grouping %s.  Total pages: %ld\n",nr_online_nodes,page_group_by_mobility_disabled ? "off" : "on",vm_total_pages);
#ifdef CONFIG_NUMApr_info("Policy zone: %s\n", zone_names[policy_zone]);
#endif
}
  • 当内存小于pageblock_nr_pages * MIGRATE_TYPES 物理页时,将把migrate type特性关闭。

move_freepages_block

move_freepages_block()函数,是当指定要page block允许做迁移,需要将page 迁移到对应空闲页中:


int move_freepages_block(struct zone *zone, struct page *page,int migratetype, int *num_movable)
{unsigned long start_pfn, end_pfn;struct page *start_page, *end_page;if (num_movable)*num_movable = 0;start_pfn = page_to_pfn(page);start_pfn = start_pfn & ~(pageblock_nr_pages-1);start_page = pfn_to_page(start_pfn);end_page = start_page + pageblock_nr_pages - 1;end_pfn = start_pfn + pageblock_nr_pages - 1;/* Do not cross zone boundaries */if (!zone_spans_pfn(zone, start_pfn))start_page = page;if (!zone_spans_pfn(zone, end_pfn))return 0;return move_freepages(zone, start_page, end_page, migratetype,num_movable);
}
  • start_pfn = page_to_pfn(page):获取要迁移page 的pfn。
  • start_pfn = start_pfn & ~(pageblock_nr_pages-1):将页帧号pfn 与pagbe block对齐。
  • start_page = pfn_to_page(start_pfn);page block对齐之后的起始物理页面。
  • end_page = start_page + pageblock_nr_pages - 1:对应page block 结束物理页面
  • end_pfn = start_pfn + pageblock_nr_pages - 1:结束pfn:
  • 分别对start_pfn和end_pfn做检查
  • move_freepages:按照指定范围成批迁移页面。

move_freepages

move_freepages将指定范围的页面,迁移到指定的migrate type free list中:


/** Move the free pages in a range to the free lists of the requested type.* Note that start_page and end_pages are not aligned on a pageblock* boundary. If alignment is required, use move_freepages_block()*/
static int move_freepages(struct zone *zone,struct page *start_page, struct page *end_page,int migratetype, int *num_movable)
{struct page *page;unsigned int order;int pages_moved = 0;for (page = start_page; page <= end_page;) {if (!pfn_valid_within(page_to_pfn(page))) {page++;continue;}if (!PageBuddy(page)) {/** We assume that pages that could be isolated for* migration are movable. But we don't actually try* isolating, as that would be expensive.*/if (num_movable &&(PageLRU(page) || __PageMovable(page)))(*num_movable)++;page++;continue;}/* Make sure we are not inadvertently changing nodes */VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);VM_BUG_ON_PAGE(page_zone(page) != zone, page);order = page_order(page);move_to_free_list(page, zone, order, migratetype);page += 1 << order;pages_moved += 1 << order;}return pages_moved;
}
  • 按照指定范围的页面做迁移,从page block的对齐起始页开始
  • move_to_free_list:将整个page block迁移到对应migate type中。
  • 循环下一page block。

can_steal_fallback

can_steal_fallback 根据order和migrate type判断,steal page时,是否允许将整个page block进行迁移,如果允许,则将整个page block迁移并且修改page block migrate type:


/** When we are falling back to another migratetype during allocation, try to* steal extra free pages from the same pageblocks to satisfy further* allocations, instead of polluting multiple pageblocks.** If we are stealing a relatively large buddy page, it is likely there will* be more free pages in the pageblock, so try to steal them all. For* reclaimable and unmovable allocations, we steal regardless of page size,* as fragmentation caused by those allocations polluting movable pageblocks* is worse than movable allocations stealing from unmovable and reclaimable* pageblocks.*/
static bool can_steal_fallback(unsigned int order, int start_mt)
{/** Leaving this order check is intended, although there is* relaxed order check in next check. The reason is that* we can actually steal whole pageblock if this condition met,* but, below check doesn't guarantee it and that is just heuristic* so could be changed anytime.*/if (order >= pageblock_order)return true;if (order >= pageblock_order / 2 ||start_mt == MIGRATE_RECLAIMABLE ||start_mt == MIGRATE_UNMOVABLE ||page_group_by_mobility_disabled)return true;return false;
}
  • order >= pageblock_order :当oder大于等 pageblock是,说明至少需要一个 pageblock,允许将整个 pageblock做迁移
  • order 》=pageblock_order /2同样也允许做 pageblock做迁移
  • MIGRATE_RECLAIMABLE:说明 pageblock可回收,也可以直接做整个 pageblock做迁移
  • MIGRATE_UNMOVABLE:可以直接做整个 pageblock做迁移。

linux内核那些事之buddy(anti-fragment机制-steal page)(5)相关推荐

  1. linux内核那些事之buddy(anti-fragment机制)(4)

    程序运行过程中,有些内存是短暂的驻留 用完一段时间之后就可以将内存释放以供后面再次使用,但是有些内存一旦申请之后,会长期使用而得不到释放.长久运行有可能造成碎片.以<professional l ...

  2. linux内核那些事之buddy(慢速申请内存__alloc_pages_slowpath)(5)

    内核提供__alloc_pages_nodemask接口申请物理内存主要分为两个部分:快速申请物理内存get_page_from_freelist(linux内核那些事之buddy(快速分配get_p ...

  3. linux内核那些事之buddy

    buddy算法是内核中比较古老的一个模块,很好的解决了相邻物理内存碎片的问题即"内碎片问题",同时有兼顾内存申请和释放效率问题,内核从引入该算法之后一直都能够在各种设备上完好运行, ...

  4. linux内核那些事之pg_data_t、zone结构初始化

    free_area_init 继续接着<linux内核那些事之ZONE>,分析内核物理内存初始化过程,zone_sizes_init()在开始阶段主要负责对各个类型zone 大小进行计算, ...

  5. linux内核那些事之Sparse vmemmap

    <inux内核那些事之物理内存模型之SPARCE(3)>中指出在传统的sparse 内存模型中,每个mem_section都有一个属于自己的section_mem_map,如下图所示: 而 ...

  6. linux内核那些事之mmap_region流程梳理

    承接<linux内核那些事之mmap>,mmap_region()是申请一个用户进程虚拟空间 并根据匿名映射或者文件映射做出相应动作,是实现mmap关键函数,趁这几天有空闲时间 整理下mm ...

  7. linux内核那些事之struct page

    struct page page(页)是linux内核管理物理内存的最小单位,内核将整个物理内存按照页对齐方式划分成千上万个页进行管理,内核为了管理这些页将每个页抽象成struct page结构管理每 ...

  8. linux内核那些事之ZONE

    struct zone 从linux 三大内存模型中可以了解到,linux内核将物理内存按照实际使用用途划分成不同的ZONE区域,ZONE管理在物理内存中占用重要地位,在内核中对应的结构为struct ...

  9. linux内核那些事之物理内存模型之FLATMEM(1)

    linux内核中物理内存管理是其中比较重要的一块,随着内核从32位到64位发展,物理内存管理也不断进行技术更新,按照历史演进共有FLATMEM.DISCONTIGMEM以及SPRARSEMEM模型.( ...

最新文章

  1. 转载--httpclient原理和应用
  2. 如何分析802.11协议中的BA帧(block acknowledgement)
  3. Coursera课程Python for everyone:Quiz: Multi-Table Relational SQL
  4. mysql ldf文件太大_Linux_数据库清除日志文件(LDF文件过大),清除日志: 复制代码 代码如 - phpStudy...
  5. vue实现监听滚动条
  6. 企业如何快速响应用户需求 且看云徙“数据+业务”双中台化简为繁
  7. NanShan开源即时通讯团队讨论程序员空闲期可以做的事
  8. 如何让ul的符号隐藏_如何对文件进行加密?分享一下我对文件进行加密的方法(菜鸟级)...
  9. python小软件实例教程_【趣味案例】用Python做一个时光回忆录小软件,女朋友当场流泪说非我不嫁!...
  10. 快速查找Hashtable中的元素
  11. 危!Python 官方存储库 PyPI 再成“祸源”?
  12. python加载csv文件去重_用python读写和处理csv文件
  13. Flash 二进制传图片到后台Java服务器接收
  14. mx350显卡天梯图_2019.8月CPU和显卡性能天梯图
  15. 可编辑的电子海图系统功能拓展与应用
  16. 饥荒正版怎么创建专用服务器,饥荒联机版如何创建本地服务器 创建本地服务器步骤...
  17. jquery 如何获取某个元素中的第几个子元素
  18. IP地址 A\B\C类
  19. 2020考研数学一考研大纲原文
  20. 【ROM制作工具】V1.0.0.23新版全新发布啦

热门文章

  1. 甘特图:制定项目计划的三个要点
  2. stm_aix stm_bpx stm_bm stm_ai stm_bp 参数说明
  3. c语言字母加单引号和不加,C语言单引号和双引号的区别
  4. OSI 模型 TCP/IP 各层的作用以及协议 vlan的三种端口 (交换部分二)
  5. js动画效果 - 动画曲线
  6. java jsp 跳转_JSP 页面跳转的实现方法
  7. markdown数学公式编辑
  8. GJB 5000B二级-VV验证与确认
  9. ros 机械臂复位_松下机械手原点调整方法
  10. 与音乐的计算机职业生涯规划书,音乐专业职业规划书-