最近看了一些开源的C/C++库,其中都对于内存分配这块做出了自己的一些优化和说明,也涉及到了一些内存分配字节对齐以及内存分页的问题。

对于内存分配的字节对齐问题,一直都是只知其事,不知其解,平时也很少关注这一块会带来的性能问题。但是要是放在一个高并发,快速以及资源最大化利用的系统里面,这一块往往是需要注意的,所以也就趁着这次机会,大概的了解一下。

我们先来看一下glibc里面malloc.c的定义

1100 /*
1101   -----------------------  Chunk representations -----------------------
1102 */
1103
1104
1105 /*
1106   This struct declaration is misleading (but accurate and necessary).
1107   It declares a "view" into memory allowing access to necessary
1108   fields at known offsets from a given base. See explanation below.
1109 */
1110
1111 struct malloc_chunk {
1112
1113   INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
1114   INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */
1115
1116   struct malloc_chunk* fd;         /* double links -- used only if free. */
1117   struct malloc_chunk* bk;
1118
1119   /* Only used for large blocks: pointer to next larger size.  */
1120   struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
1121   struct malloc_chunk* bk_nextsize;
1122 };
1123
1124
1125 /*
1126    malloc_chunk details:
1127
1128     (The following includes lightly edited explanations by Colin Plumb.)
1129
1130     Chunks of memory are maintained using a `boundary tag' method as
1131     described in e.g., Knuth or Standish.  (See the paper by Paul
1132     Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1133     survey of such techniques.)  Sizes of free chunks are stored both
1134     in the front of each chunk and at the end.  This makes
1135     consolidating fragmented chunks into bigger chunks very fast.  The
1136     size fields also hold bits representing whether chunks are free or
1137     in use.
1138
1139     An allocated chunk looks like this:
1140
1141
1142     chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1143             |             Size of previous chunk, if allocated            | |
1144             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1145             |             Size of chunk, in bytes                       |M|P|
1146       mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1147             |             User data starts here...                          .
1148             .                                                               .
1149             .             (malloc_usable_size() bytes)                      .
1150             .                                                               |
1151 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1152             |             Size of chunk                                     |
1153             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1154
1155
1156     Where "chunk" is the front of the chunk for the purpose of most of
1157     the malloc code, but "mem" is the pointer that is returned to the
1158     user.  "Nextchunk" is the beginning of the next contiguous chunk.
1159
1160     Chunks always begin on even word boundaries, so the mem portion
1161     (which is returned to the user) is also on an even word boundary, and
1162     thus at least double-word aligned.
1163
1164     Free chunks are stored in circular doubly-linked lists, and look like this:
1165
1166     chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1167             |             Size of previous chunk                            |
1168             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1169     `head:' |             Size of chunk, in bytes                         |P|
1170       mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1171             |             Forward pointer to next chunk in list             |
1172             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1173             |             Back pointer to previous chunk in list            |
1174             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1175             |             Unused space (may be 0 bytes long)                .
1176             .                                                               .
1177             .                                                               |
1178 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1179     `foot:' |             Size of chunk, in bytes                           |
1180             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1181
1182     The P (PREV_INUSE) bit, stored in the unused low-order bit of the
1183     chunk size (which is always a multiple of two words), is an in-use
1184     bit for the *previous* chunk.  If that bit is *clear*, then the
1185     word before the current chunk size contains the previous chunk
1186     size, and can be used to find the front of the previous chunk.
1187     The very first chunk allocated always has this bit set,
1188     preventing access to non-existent (or non-owned) memory. If
1189     prev_inuse is set for any given chunk, then you CANNOT determine
1190     the size of the previous chunk, and might even get a memory
1191     addressing fault when trying to do so.
1192
1193     Note that the `foot' of the current chunk is actually represented
1194     as the prev_size of the NEXT chunk. This makes it easier to
1195     deal with alignments etc but can be very confusing when trying
1196     to extend or adapt this code.
1197
1198     The two exceptions to all this are
1199
1200      1. The special chunk `top' doesn't bother using the
1201         trailing size field since there is no next contiguous chunk
1202         that would have to index off it. After initialization, `top'
1203         is forced to always exist.  If it would become less than
1204         MINSIZE bytes long, it is replenished.
1205
1206      2. Chunks allocated via mmap, which have the second-lowest-order
1207         bit M (IS_MMAPPED) set in their size fields.  Because they are
1208         allocated one-by-one, each must contain its own trailing size field.
1209
1210 */
1211
1212 /*
1213   ---------- Size and alignment checks and conversions ----------
1214 */
1215
1216 /* conversion from malloc headers to user pointers, and back */
1217
1218 #define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))
1219 #define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
1220
1221 /* The smallest possible chunk */
1222 #define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))
1223
1224 /* The smallest size we can malloc is an aligned minimal chunk */
1225
1226 #define MINSIZE  \
1227   (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))
1228
1229 /* Check if m has acceptable alignment */
1230
1231 #define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
1232
1233 #define misaligned_chunk(p) \
1234   ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem (p)) \
1235    & MALLOC_ALIGN_MASK)
1236
1237
1238 /*
1239    Check if a request is so large that it would wrap around zero when
1240    padded and aligned. To simplify some other code, the bound is made
1241    low enough so that adding MINSIZE will also not wrap around zero.
1242  */
1243
1244 #define REQUEST_OUT_OF_RANGE(req)                                 \
1245   ((unsigned long) (req) >=                                                   \
1246    (unsigned long) (INTERNAL_SIZE_T) (-2 * MINSIZE))
1247
1248 /* pad request bytes into a usable size -- internal version */
1249
1250 #define request2size(req)                                         \
1251   (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
1252    MINSIZE :                                                      \
1253    ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
1254
1255 /*  Same, except also perform argument check */
1256
1257 #define checked_request2size(req, sz)                             \
1258   if (REQUEST_OUT_OF_RANGE (req)) {                                           \
1259       __set_errno (ENOMEM);                                                   \
1260       return 0;                                                               \
1261     }                                                                         \
1262   (sz) = request2size (req);
1263 

其中,有很多的宏定义,我们只看最主要的几个。request2size负责内存对齐操作,MINSIZE是malloc时内存占用的最小内存单元,32位系统为16字节,64位系统为32字节,MALLOC_ALIGNMENT为内存对齐字节数,由于在32和64位系统中,size_t为4字节和8字节,所以MALLOC_ALIGNMENT在32位和64位系统中,分别为8和16.

实际上,对齐参数(MALLOC_ALIGNMENT)大小的设定需要满足以下两点:

1. 必须是2的幂

2. 必须是void *的整数倍

所以从request2size可知,在64位系统,如果申请内存为1~24字节,系统内存消耗32字节,当申请25字节的内存时,系统内存消耗48字节。而对于32位系统,申请内存为1~12字节时,系统内存消耗为16字节,当申请内存为13字节时,系统内存消耗为24字节。

这里分享一个别人写的怎么实现一个简单的malloc函数:http://blog.codinglabs.org/articles/a-malloc-tutorial.html

malloc内存分配字节对齐问题相关推荐

  1. linux 跟踪内存,用strace跟踪malloc内存分配

    strace介绍 strace是一个非常有用的命令,它用于记录和跟踪程序运行期间收到的信号和调用的系统调用. strace的简单使用 ubuntu64:~$ strace cat /dev/null ...

  2. c++内存中字节对齐问题详解

    一.什么是字节对齐,为什么要对齐? 现代计算机中内存空间都是按照byte划分的,从理论上讲似乎对任何类型的变量的访问可以从任何地址开始,但实际情况是在访问特定类型变量的时候经常在特定的内存地址访问,这 ...

  3. linux c 字节对齐申请内存与字节对齐数组声明

    查找当前系统cache line大小:cat /sys/devices/system/cpu/cpu1/cache/index0/coherency_line_size 函数:void * memal ...

  4. 利用TCMalloc替换Nginx和Redis默认glibc库的malloc内存分配

    TCMalloc的全称为Thread-Caching Malloc,是谷歌开发的开源工具google-perftools中的一个成员.与标准的glibc库的Malloc相比,TCMalloc库在内存分 ...

  5. malloc内存分配详解

    这里的存储分配程序,讲的就是标准库中malloc函数的实现原理.首先要了解针对malloc的内存存储结构.malloc不像全局变量一样,不是在编译器编译的时候就会分配内存空间,而是在调用到malloc ...

  6. malloc的内存分配之 malloc(0)的内存分配情况

    #include<iostream> using namespace std; int main() {char *p;if((p=(char *)malloc(0))==NULL)put ...

  7. malloc函数分配内存失败的原因及解决方法

    原文链接:http://blog.csdn.net/lighthear/article/details/70146602 malloc函数分配内存失败的原因及解决方法 先说结论 malloc()函数分 ...

  8. 仔细讨论 C/C++ 字节对齐问题⭐⭐

    原文:https://www.cnblogs.com/AlexMiller/p/5509609.html 字节对齐的原因 为了提高 CPU 的存储速度,编译器会对 struct 和 union的存储进 ...

  9. C/C++内存分配管理

    内存分配及管理 1.内存分配方式 在C++中内存分为5个区,分别是堆.栈.自由存储区.全局/静态存储区和常量存储区. 堆:堆是操作系统中的术语,是操作系统所维护的一块特殊内存,用于程序的内存动态分配, ...

最新文章

  1. 《预训练周刊》第8期:首个千亿中文大模型「盘古」问世、谷歌等提出视频音频文本转换器VATT...
  2. 关于MySQL连接Navicat Premium 12失败的解决方法
  3. /etc/profile /etc/profile .bash_profile .bashrc解释
  4. 面向对象——构造方法(重载)
  5. 公开课 | 如何轻松上手华为最新开源框架MindSpore?
  6. 在Linux和Mac OS X系统上运行.NET
  7. Spring MVC遭遇checkbox的问题解决方案
  8. ora-12154问题处理
  9. 2020统计局的行政划分表_国家统计局:月入2000-5000元并非“中等收入群体”
  10. width:100%与绝对定位同时存在,偏移出父级容器
  11. php private ,public protected三者的区别
  12. MATLAB 检验矩阵中是否有某些元素 对元素统计
  13. 多个同名进程linux获取对应pid,Linux Shell根据进程名获取PID
  14. 7. JavaScript RegExp 对象
  15. asp在线html编辑器,ASP下使用FCKeditor在线编辑器的方法
  16. 残差网络resnet网络原理详解
  17. 什么是数据缩减,无损4:1缩减有多难?
  18. android 通知静音_如何使电话静音(但不包括短信和通知)
  19. 深度学习 | 训练及优化方法
  20. [P3975][TJOI2015]弦论(后缀数组)

热门文章

  1. Hbuilder X mui 华为真机运行和调试
  2. “猝死”,我离你有多远
  3. 干掉微信小程序的繁琐取值和赋值方式,提高开发效率
  4. MeshlabOpen3D SOR滤波
  5. 电脑提示错误代码0xc00000e9,该如何处理
  6. cpython pypy_PyPy4.0比Cpython更快的Python编译器
  7. 今天带软件测试2班学员就业前的复习分析(2019-7-14)
  8. OpenCV从入门到精通——边缘检测算法Canny
  9. 利用二维数组编程实现功能:输入一个整数n(n不超过10),输出n行的杨辉三角形
  10. Android开发 软键盘的右下角变为搜索按钮