xerces-c++内存管理策略&为何耗费大量内存

  • 本文结构
  • 1) 奇怪的new语句
  • 2) xerces-c++内存管理策略
  • 3) xerces-c++为何耗费内存?
  • 4) demo

本文结构

xerces-c++是一XML解析器。在讲其内存管理策略之前,需要先讲一下一个奇怪的new用法,之后会继续介绍它的内存管理策略,最后会说明它是如何耗费大量内存的。

1) 奇怪的new语句

DOMAttr *DOMDocumentImpl::createAttribute(const XMLCh *nam)
{if(!nam || !isXMLName(nam))throw DOMException(DOMException::INVALID_CHARACTER_ERR,0, getMemoryManager());return new (this, DOMMemoryManager::ATTR_OBJECT) DOMAttrImpl(this,nam);
}

placement new的用法一般为 new (address) (type) initializer 的形式,address是一个指针指向已经存在的一块内存,而上面的代码却有两个参数紧跟在new后面。于是查了下官方文档,还真有这种用法:

new expression
C++ C++ language Expressions
Creates and initializes objects with dynamic storage duration, that is, objects whose lifetime is not necessarily limited by the scope in which they were created.

Syntax
::(optional) new ( type ) initializer(optional) (1)
::(optional) new new-type initializer(optional) (2)
::(optional) new (placement-params) ( type ) initializer(optional) (3)
::(optional) new (placement-params) new-type initializer(optional) (4)

官方文档还有一个例子与xerces代码很像:
new(2,f) T; // calls operator new(sizeof(T), 2, f)
都是相当于下面两步:

  1. 调用重载的new函数分配内存
inline void * operator new(size_t amt, DOMDocumentImpl *doc, DOMMemoryManager::NodeObjectType type)
{void *p = doc->allocate(amt, type);return p;
}
  1. 调用构造函数初始化上面分配的内存
DOMAttrImpl::DOMAttrImpl(DOMDocument *ownerDoc, const XMLCh *aName): fNode(ownerDoc), fParent (ownerDoc), fSchemaType(0)
{DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;fName = docImpl->getPooledString(aName);fNode.isSpecified(true);
}

所以,看着怪异,其实和一般的placement new没什么大的区别。

2) xerces-c++内存管理策略

通过上面被重载的new函数可以看到它调用了DOMDocumentImpl::allocate(amt, type), 而后者又会调用到:

void* DOMDocumentImpl::allocate(XMLSize_t amount)
{// Align the request size so that suballocated blocks//    beyond this one will be maintained at the same alignment.amount = XMLPlatformUtils::alignPointerForNewBlockAllocation(amount);// If the request is for a largish block, hand it off to the system//   allocator.  The block still must be linked into the list of//   allocated blocks so that it will be deleted when the time comes.if (amount > kMaxSubAllocationSize){//    The size of the header we add to our raw blocksXMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));// Try to allocate the blockvoid* newBlock;newBlock = fMemoryManager->allocate(sizeOfHeader + amount);//  Link it into the list beyond current block, as current block//  is still being subdivided. If there is no current block//   then track that we have no bytes to further divide.if (fCurrentBlock){*(void **)newBlock = *(void **)fCurrentBlock;*(void **)fCurrentBlock = newBlock;}else{*(void **)newBlock = 0;fCurrentBlock = newBlock;fFreePtr = 0;fFreeBytesRemaining = 0;}void *retPtr = (char*)newBlock + sizeOfHeader;return retPtr;}//   It's a normal (sub-allocatable) request.// Are we out of room in our current block?if (amount > fFreeBytesRemaining){// Request doesn't fit in the current block.// The size of the header we add to our raw blocksXMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));// Get a new block from the system allocator.void* newBlock;newBlock = fMemoryManager->allocate(fHeapAllocSize);*(void **)newBlock = fCurrentBlock;fCurrentBlock = newBlock;fFreePtr = (char *)newBlock + sizeOfHeader;fFreeBytesRemaining = fHeapAllocSize - sizeOfHeader;if(fHeapAllocSize<kMaxHeapAllocSize)fHeapAllocSize*=2;}// Subdivide the request off current blockvoid *retPtr = fFreePtr;fFreePtr += amount;fFreeBytesRemaining -= amount;return retPtr;
}

这便是内存分配的核心代码,逻辑不复杂,可以总结为以下几点:

  1. 如果要分配的内存大于kMaxSubAllocationSize(0x0100)直接走原始的系统new函数。
  2. 否则上次分配的大块内存还有剩余且大于等于需要的,则用剩余的。
  3. 剩余的不够则新分配一大块内存,大小为fHeapAllocSize。
  4. 这些大块内存会组成链表,fCurrentBlock是头指针,fFreeBytesRemaining是当前大块内存剩余未用的字节数。

DOMDocumentImpl是对外的接口,要想创建节点(Node)就必须通过一系列的createXXX来创建(工厂模式?),比如createAttribute,createElement,而这些create函数都会走allocate函数。也就是说每个Attribute/Element实例都来自链表上的大块内存。这个策略让我想起了《Effecive C++》也有类似的代码。

这些节点中途不会释放,直到最后要释放整个Document时才一起释放。请参考以下代码:

DOMDocumentImpl::~DOMDocumentImpl()
{...
//  Delete the heap for this document.  This uncerimoniously yanks the storage//      out from under all of the nodes in the document.  Destructors are NOT called.this->deleteHeap();``}`void DOMDocumentImpl::deleteHeap()
{while (fCurrentBlock != 0){void *nextBlock = *(void **)fCurrentBlock;fMemoryManager->deallocate(fCurrentBlock);fCurrentBlock = nextBlock;}
}

3) xerces-c++为何耗费内存?

根本原因是描述节点、属性等的数据结构太大,可以想像成重型卡车(每个节点或属性)只拉一袋大米。
既然所有的节点、属性等类的实例内存分配都走allocate,那我们就让它打印出为哪个类分配了多少字节,看看每辆卡车自身多重?

void * DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)
{static std::map<int, std::string> maps = {{DOMMemoryManager::NodeObjectType::ATTR_OBJECT , "ATTR_OBJECT"},{DOMMemoryManager::NodeObjectType::ATTR_NS_OBJECT , "ATTR_NS_OBJECT"},{DOMMemoryManager::NodeObjectType::CDATA_SECTION_OBJECT , "CDATA_SECTION_OBJECT"},{DOMMemoryManager::NodeObjectType::COMMENT_OBJECT , "COMMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::DOCUMENT_FRAGMENT_OBJECT , "DOCUMENT_FRAGMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::DOCUMENT_TYPE_OBJECT , "DOCUMENT_TYPE_OBJECT"},{DOMMemoryManager::NodeObjectType::ELEMENT_OBJECT , "ELEMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::ELEMENT_NS_OBJECT , "ELEMENT_NS_OBJECT"},{DOMMemoryManager::NodeObjectType::ENTITY_OBJECT , "ENTITY_OBJECT"},{DOMMemoryManager::NodeObjectType::ENTITY_REFERENCE_OBJECT , "ENTITY_REFERENCE_OBJECT"},{DOMMemoryManager::NodeObjectType::NOTATION_OBJECT , "NOTATION_OBJECT"},{DOMMemoryManager::NodeObjectType::PROCESSING_INSTRUCTION_OBJECT , "PROCESSING_INSTRUCTION_OBJECT"},{DOMMemoryManager::NodeObjectType::TEXT_OBJECT , "TEXT_OBJECT"}};std::cout<<"New for "<<maps[type]<<" size=0x"<<std::hex<<amount<<std::endl;if (!fRecycleNodePtr)return allocate(amount);DOMNodePtr* ptr = fRecycleNodePtr->operator[](type);if (!ptr || ptr->empty())return allocate(amount);return (void*) ptr->pop();
}

一个简单的XML及日志:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<TopNode><SectionOfDataA><TestData>MEMORY COST 1 1</TestData><TestData>MEMORY COST 2 1</TestData></SectionOfDataA>
</TopNode>

New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38

可见DOMElementImpl的SIZE为0x68, DOMTextImpl为0x38, 这得顶多少个字符串!
打开DOMElementImpl.hpp便可以看到这个类包含多少member,这还不包括父类的。

// New data introduced in DOM Level 3const XMLCh*          fInputEncoding;const XMLCh*          fXmlEncoding;bool                  fXmlStandalone;const XMLCh*          fXmlVersion;const XMLCh*          fDocumentURI;DOMConfiguration*     fDOMConfiguration;XMLStringPool         fUserDataTableKeys;RefHash2KeysTableOf<DOMUserDataRecord, PtrHasher>* fUserDataTable;// Per-Document heap Variables.//   The heap consists of one or more biggish blocks which are//   sub-allocated for individual allocations of nodes, strings, etc.//   The big blocks form a linked list, allowing them to be located for deletion.////   There is no provision for deleting suballocated blocks, other than//     deleting the entire heap when the document is deleted.////   There is no header on individual sub-allocated blocks.//   The header on big blocks consists only of a single back pointer to//    the previously allocated big block (our linked list of big blocks)//////   revisit - this heap should be encapsulated into its own//                  class, rather than hanging naked on Document.//void*                 fCurrentBlock;char*                 fFreePtr;XMLSize_t             fFreeBytesRemaining,fHeapAllocSize;// To recycle the DOMNode pointerRefArrayOf<DOMNodePtr>* fRecycleNodePtr;// To recycle DOMBuffer pointerRefStackOf<DOMBuffer>* fRecycleBufferPtr;// Pool of DOMNodeList for getElementsByTagNameDOMDeepNodeListPool<DOMDeepNodeListImpl>* fNodeListPool;// Other dataDOMDocumentType*      fDocType;DOMElement*           fDocElement;DOMStringPoolEntry**  fNameTable;XMLSize_t             fNameTableSize;DOMNormalizer*        fNormalizer;Ranges*               fRanges;NodeIterators*        fNodeIterators;MemoryManager*        fMemoryManager;   // configurable memory managerDOMImplementation*    fDOMImplementation;int                   fChanges;bool                  errorChecking;    // Bypass error checking.

然而,只在DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)上加log输出还不全,它仅仅统计了Node实例的分配,这一点在第二个参数type上有所体现。让我们仔细看看DOMElementImpl的构造函数:

DOMElementImpl::DOMElementImpl(DOMDocument *ownerDoc, const XMLCh *eName): fNode(ownerDoc), fParent(ownerDoc), fAttributes(0), fDefaultAttributes(0)
{DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;fName = docImpl->getPooledString(eName);           //(1)setupDefaultAttributes();if (!fDefaultAttributes) {fDefaultAttributes = new (docImpl) DOMAttrMapImpl(this); //(2)fAttributes = new (docImpl) DOMAttrMapImpl(this);   //(3)}else {fAttributes = new (docImpl) DOMAttrMapImpl(this, fDefaultAttributes);}
}

(1) 是为节点字符串分配内存,如“TopNode”,“SectionOfDataA”
(2)(3) 应该和节点的属性有关,这一点我没有验证。
docImpl->getPooledString 在大块内存的基础上又加入了防重复算法(hash的办法),此处不再细说。

最后,如果想把所有的内存分配揪出来,只需要在DOMDocumentImpl::allocate(XMLSize_t amount)内加点log即可,如下,可见耗内存之多。

New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3f10 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ea8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e88 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e68 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3e48 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e10 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3df0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3dc8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3d60 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d30 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d10 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3cf0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3cb8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3c98 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3c68 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3c00 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3be0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3bc0 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3ba0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3b68 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3b48 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3b08 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ad0 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3ab0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3a80 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3a18 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39f8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=39d8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39a0 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3980 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3940 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3908 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=38e8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=38c0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3888 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3868 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3840 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3800 this=000000000067FEF8
subdivide from old block with amount=18 old fFreeBytesRemainin=37c0 this=000000000067FEF8

4) demo

抽取了xerces-c++关于内存管理的代码,便于demo或学习使用,请移步下面的链接下载。

xerces-c++内存管理策略为何耗费大量内存相关推荐

  1. iOS内存管理策略和实践

    来源:http://www.baidu.com/link?url=irojqCBbZKsY7b0L2EBPkuEkfJ9MQvUf8kuNWQUXkBLk5b22Jl5rjozKaJS3n78jCnS ...

  2. LwIP 之六 详解内存池(memp.c/h)动态内存管理策略

      对于嵌入式开发来说,内存管理及使用是至关重要的,内存的使用多少.内存泄漏等时刻需要注意!合理的内存管理策略将从根本上决定内存分配和回收效率,最终决定系统的整体性能.LwIP 就提供了 动态内存堆管 ...

  3. C语言内存管理内幕(二)----半自动内存管理策略

    2019独角兽企业重金招聘Python工程师标准>>> C语言内存管理内幕(二)----半自动内存管理策略 转载于:https://my.oschina.net/hengcai001 ...

  4. Redis 数据结构与内存管理策略(上)

    Redis 数据结构与内存管理策略(上) 标签: Redis Redis数据结构 Redis内存管理策略 Redis数据类型 Redis类型映射 Redis 数据类型特点与使用场景 String.Li ...

  5. linux内存管理策略,Glibc内存管理—ptmalloc内存分配策略(1)

    一.linux的内存布局 1.32位模式下内存的经典布局​ 图1 32位模式下内存经典布局 ​ 注:这种内存布局模式是linux内核2.6.7以前的默认内存布局形式 说明:(1)在32的机器上,lo ...

  6. android内存池,两种常见的内存管理方法:堆和内存池

    描述 本文导读 在程序运行过程中,可能产生一些数据,例如,串口接收的数据,ADC采集的数据.若需将数据存储在内存中,以便进一步运算.处理,则应为其分配合适的内存空间,数据处理完毕后,再释放相应的内存空 ...

  7. Spark内存管理(3)—— 统一内存管理设计理念

    Spark内存管理系列文章:  Spark内存管理(1)-- 静态内存管理  Spark内存管理(2)-- 统一内存管理 在本文中,将会对各个内存的分布以及设计原理进行详细的阐述  相对于静态内存模型 ...

  8. Spark内存管理(1)—— 静态内存管理

    Spark内存管理简介 Spark从1.6开始引入了动态内存管理模式,即执行内存和存储内存之间可以相互抢占  Spark提供了2种内存分配模式: 静态内存管理 统一内存管理 本系列文章将分别对这两种内 ...

  9. 从内存管理原理,窥探OS内存管理机制

    摘要:本文将从最简单的内存管理原理说起,带大家一起窥探OS的内存管理机制,由此熟悉底层的内存管理机制,写出高效的应用程序. 本文分享自华为云社区<探索OS的内存管理原理>,作者:元闰子 . ...

最新文章

  1. mac gource_如何使用Gource显示项目的时间表
  2. 第一周 01-复杂度2 Maximum Subsequence Sum
  3. 手动配置mysql_手动配置Mysql,无需安装的方法以及Mysql的一些基本命令
  4. 转: 常见加密算法分,用途,原理以及比较
  5. 图解用Wireshark进行Http协议分析
  6. Android下常见终端模拟器和SSH客户端感受及几个Tips
  7. 如何使用VS2015开发Qt5程序
  8. 正则表达式的知识普及
  9. struct类型重定义 不同的基类型_C++构造数据类型
  10. linux 查看端口 程序,linux开发:Linux下查看端口占用
  11. Quick cocos2dx-Lua(V3.3R1)学习笔记(十)-----搭建安卓打包环境,用官方示例anysdk生成apk运行...
  12. java lambda if_使用Java8的Lambda实现Monda -解道Jdon
  13. 引用文献管理软件Mendeley
  14. [jQuery基础] jQuery核心函数和工具方法
  15. python tkinter button_Python3 Tkinter-Button
  16. CPDA数据分析师:一个完整的数据分析流程
  17. vue element ui合并表格(合并某列的行数据)
  18. word中目录出现省略号疏密不一致
  19. ubuntu网络无法连接
  20. C++ 类的静态成员及静态成员函数

热门文章

  1. 纳米技术和计算机技术的关系,纳米技术究竟是什么技术?
  2. 中水处理设备:小区中水回用设备技术特点概述
  3. 卡萨帝发布5+7+N高端智慧成套方案 打破行业3大“现实”
  4. oracle ebs mom,MOM制造运营管理系统
  5. 71页智慧工地整体解决方案
  6. 网站被降权后应该如何恢复?
  7. 【云解压】ZIP 文件格式分析-偏移计算和文件大小表示ZIP32 ZIP64
  8. Python交叉分析学习笔记
  9. AI 音辨世界:艺术小白的我,靠这个AI模型,速识音乐流派选择音乐
  10. 华科计算机考研分数分布,华科计算机考研分数线