本文结构

xerces-c++是一XML解析器。在讲其内存管理策略之前，需要先讲一下一个奇怪的new用法，之后会继续介绍它的内存管理策略，最后会说明它是如何耗费大量内存的。

1) 奇怪的new语句

DOMAttr *DOMDocumentImpl::createAttribute(const XMLCh *nam)
{if(!nam || !isXMLName(nam))throw DOMException(DOMException::INVALID_CHARACTER_ERR,0, getMemoryManager());return new (this, DOMMemoryManager::ATTR_OBJECT) DOMAttrImpl(this,nam);
}

placement new的用法一般为 new (address) (type) initializer 的形式，address是一个指针指向已经存在的一块内存，而上面的代码却有两个参数紧跟在new后面。于是查了下官方文档，还真有这种用法：

new expression
C++ C++ language Expressions
Creates and initializes objects with dynamic storage duration, that is, objects whose lifetime is not necessarily limited by the scope in which they were created.

Syntax
::(optional) new ( type ) initializer(optional) (1)
::(optional) new new-type initializer(optional) (2)
::(optional) new (placement-params) ( type ) initializer(optional) (3)
::(optional) new (placement-params) new-type initializer(optional) (4)

官方文档还有一个例子与xerces代码很像：
new(2,f) T; // calls operator new(sizeof(T), 2, f)
都是相当于下面两步：

调用重载的new函数分配内存

inline void * operator new(size_t amt, DOMDocumentImpl *doc, DOMMemoryManager::NodeObjectType type)
{void *p = doc->allocate(amt, type);return p;
}

调用构造函数初始化上面分配的内存

DOMAttrImpl::DOMAttrImpl(DOMDocument *ownerDoc, const XMLCh *aName): fNode(ownerDoc), fParent (ownerDoc), fSchemaType(0)
{DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;fName = docImpl->getPooledString(aName);fNode.isSpecified(true);
}

所以，看着怪异，其实和一般的placement new没什么大的区别。

2) xerces-c++内存管理策略

通过上面被重载的new函数可以看到它调用了DOMDocumentImpl::allocate(amt, type), 而后者又会调用到：

void* DOMDocumentImpl::allocate(XMLSize_t amount)
{// Align the request size so that suballocated blocks//    beyond this one will be maintained at the same alignment.amount = XMLPlatformUtils::alignPointerForNewBlockAllocation(amount);// If the request is for a largish block, hand it off to the system//   allocator.  The block still must be linked into the list of//   allocated blocks so that it will be deleted when the time comes.if (amount > kMaxSubAllocationSize){//    The size of the header we add to our raw blocksXMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));// Try to allocate the blockvoid* newBlock;newBlock = fMemoryManager->allocate(sizeOfHeader + amount);//  Link it into the list beyond current block, as current block//  is still being subdivided. If there is no current block//   then track that we have no bytes to further divide.if (fCurrentBlock){*(void **)newBlock = *(void **)fCurrentBlock;*(void **)fCurrentBlock = newBlock;}else{*(void **)newBlock = 0;fCurrentBlock = newBlock;fFreePtr = 0;fFreeBytesRemaining = 0;}void *retPtr = (char*)newBlock + sizeOfHeader;return retPtr;}//   It's a normal (sub-allocatable) request.// Are we out of room in our current block?if (amount > fFreeBytesRemaining){// Request doesn't fit in the current block.// The size of the header we add to our raw blocksXMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));// Get a new block from the system allocator.void* newBlock;newBlock = fMemoryManager->allocate(fHeapAllocSize);*(void **)newBlock = fCurrentBlock;fCurrentBlock = newBlock;fFreePtr = (char *)newBlock + sizeOfHeader;fFreeBytesRemaining = fHeapAllocSize - sizeOfHeader;if(fHeapAllocSize<kMaxHeapAllocSize)fHeapAllocSize*=2;}// Subdivide the request off current blockvoid *retPtr = fFreePtr;fFreePtr += amount;fFreeBytesRemaining -= amount;return retPtr;
}

这便是内存分配的核心代码，逻辑不复杂，可以总结为以下几点：

如果要分配的内存大于kMaxSubAllocationSize（0x0100）直接走原始的系统new函数。
否则上次分配的大块内存还有剩余且大于等于需要的，则用剩余的。
剩余的不够则新分配一大块内存，大小为fHeapAllocSize。
这些大块内存会组成链表，fCurrentBlock是头指针，fFreeBytesRemaining是当前大块内存剩余未用的字节数。

DOMDocumentImpl是对外的接口，要想创建节点（Node）就必须通过一系列的createXXX来创建（工厂模式？），比如createAttribute，createElement，而这些create函数都会走allocate函数。也就是说每个Attribute/Element实例都来自链表上的大块内存。这个策略让我想起了《Effecive C++》也有类似的代码。

这些节点中途不会释放，直到最后要释放整个Document时才一起释放。请参考以下代码：

DOMDocumentImpl::~DOMDocumentImpl()
{...
//  Delete the heap for this document.  This uncerimoniously yanks the storage//      out from under all of the nodes in the document.  Destructors are NOT called.this->deleteHeap();``}`void DOMDocumentImpl::deleteHeap()
{while (fCurrentBlock != 0){void *nextBlock = *(void **)fCurrentBlock;fMemoryManager->deallocate(fCurrentBlock);fCurrentBlock = nextBlock;}
}

3) xerces-c++为何耗费内存？

根本原因是描述节点、属性等的数据结构太大，可以想像成重型卡车（每个节点或属性）只拉一袋大米。
既然所有的节点、属性等类的实例内存分配都走allocate，那我们就让它打印出为哪个类分配了多少字节，看看每辆卡车自身多重？

void * DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)
{static std::map<int, std::string> maps = {{DOMMemoryManager::NodeObjectType::ATTR_OBJECT , "ATTR_OBJECT"},{DOMMemoryManager::NodeObjectType::ATTR_NS_OBJECT , "ATTR_NS_OBJECT"},{DOMMemoryManager::NodeObjectType::CDATA_SECTION_OBJECT , "CDATA_SECTION_OBJECT"},{DOMMemoryManager::NodeObjectType::COMMENT_OBJECT , "COMMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::DOCUMENT_FRAGMENT_OBJECT , "DOCUMENT_FRAGMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::DOCUMENT_TYPE_OBJECT , "DOCUMENT_TYPE_OBJECT"},{DOMMemoryManager::NodeObjectType::ELEMENT_OBJECT , "ELEMENT_OBJECT"},{DOMMemoryManager::NodeObjectType::ELEMENT_NS_OBJECT , "ELEMENT_NS_OBJECT"},{DOMMemoryManager::NodeObjectType::ENTITY_OBJECT , "ENTITY_OBJECT"},{DOMMemoryManager::NodeObjectType::ENTITY_REFERENCE_OBJECT , "ENTITY_REFERENCE_OBJECT"},{DOMMemoryManager::NodeObjectType::NOTATION_OBJECT , "NOTATION_OBJECT"},{DOMMemoryManager::NodeObjectType::PROCESSING_INSTRUCTION_OBJECT , "PROCESSING_INSTRUCTION_OBJECT"},{DOMMemoryManager::NodeObjectType::TEXT_OBJECT , "TEXT_OBJECT"}};std::cout<<"New for "<<maps[type]<<" size=0x"<<std::hex<<amount<<std::endl;if (!fRecycleNodePtr)return allocate(amount);DOMNodePtr* ptr = fRecycleNodePtr->operator[](type);if (!ptr || ptr->empty())return allocate(amount);return (void*) ptr->pop();
}

一个简单的XML及日志：

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<TopNode><SectionOfDataA><TestData>MEMORY COST 1 1</TestData><TestData>MEMORY COST 2 1</TestData></SectionOfDataA>
</TopNode>

New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38

可见DOMElementImpl的SIZE为0x68, DOMTextImpl为0x38, 这得顶多少个字符串！
打开DOMElementImpl.hpp便可以看到这个类包含多少member，这还不包括父类的。

// New data introduced in DOM Level 3const XMLCh*          fInputEncoding;const XMLCh*          fXmlEncoding;bool                  fXmlStandalone;const XMLCh*          fXmlVersion;const XMLCh*          fDocumentURI;DOMConfiguration*     fDOMConfiguration;XMLStringPool         fUserDataTableKeys;RefHash2KeysTableOf<DOMUserDataRecord, PtrHasher>* fUserDataTable;// Per-Document heap Variables.//   The heap consists of one or more biggish blocks which are//   sub-allocated for individual allocations of nodes, strings, etc.//   The big blocks form a linked list, allowing them to be located for deletion.////   There is no provision for deleting suballocated blocks, other than//     deleting the entire heap when the document is deleted.////   There is no header on individual sub-allocated blocks.//   The header on big blocks consists only of a single back pointer to//    the previously allocated big block (our linked list of big blocks)//////   revisit - this heap should be encapsulated into its own//                  class, rather than hanging naked on Document.//void*                 fCurrentBlock;char*                 fFreePtr;XMLSize_t             fFreeBytesRemaining,fHeapAllocSize;// To recycle the DOMNode pointerRefArrayOf<DOMNodePtr>* fRecycleNodePtr;// To recycle DOMBuffer pointerRefStackOf<DOMBuffer>* fRecycleBufferPtr;// Pool of DOMNodeList for getElementsByTagNameDOMDeepNodeListPool<DOMDeepNodeListImpl>* fNodeListPool;// Other dataDOMDocumentType*      fDocType;DOMElement*           fDocElement;DOMStringPoolEntry**  fNameTable;XMLSize_t             fNameTableSize;DOMNormalizer*        fNormalizer;Ranges*               fRanges;NodeIterators*        fNodeIterators;MemoryManager*        fMemoryManager;   // configurable memory managerDOMImplementation*    fDOMImplementation;int                   fChanges;bool                  errorChecking;    // Bypass error checking.

然而，只在DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)上加log输出还不全，它仅仅统计了Node实例的分配，这一点在第二个参数type上有所体现。让我们仔细看看DOMElementImpl的构造函数：

DOMElementImpl::DOMElementImpl(DOMDocument *ownerDoc, const XMLCh *eName): fNode(ownerDoc), fParent(ownerDoc), fAttributes(0), fDefaultAttributes(0)
{DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;fName = docImpl->getPooledString(eName);           //（1）setupDefaultAttributes();if (!fDefaultAttributes) {fDefaultAttributes = new (docImpl) DOMAttrMapImpl(this); //(2)fAttributes = new (docImpl) DOMAttrMapImpl(this);   //(3)}else {fAttributes = new (docImpl) DOMAttrMapImpl(this, fDefaultAttributes);}
}

(1) 是为节点字符串分配内存，如“TopNode”，“SectionOfDataA”
(2）(3) 应该和节点的属性有关，这一点我没有验证。
docImpl->getPooledString 在大块内存的基础上又加入了防重复算法（hash的办法），此处不再细说。

最后，如果想把所有的内存分配揪出来，只需要在DOMDocumentImpl::allocate(XMLSize_t amount)内加点log即可，如下，可见耗内存之多。

New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3f10 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ea8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e88 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e68 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3e48 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e10 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3df0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3dc8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3d60 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d30 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d10 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3cf0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3cb8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3c98 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3c68 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3c00 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3be0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3bc0 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3ba0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3b68 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3b48 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3b08 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ad0 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3ab0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3a80 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3a18 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39f8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=39d8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39a0 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3980 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3940 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3908 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=38e8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=38c0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3888 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3868 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3840 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3800 this=000000000067FEF8
subdivide from old block with amount=18 old fFreeBytesRemainin=37c0 this=000000000067FEF8

4) demo

抽取了xerces-c++关于内存管理的代码，便于demo或学习使用，请移步下面的链接下载。

xerces-c++内存管理策略为何耗费大量内存相关推荐

iOS内存管理策略和实践
来源:http://www.baidu.com/link?url=irojqCBbZKsY7b0L2EBPkuEkfJ9MQvUf8kuNWQUXkBLk5b22Jl5rjozKaJS3n78jCnS ...
LwIP 之六详解内存池（memp.c/h）动态内存管理策略
对于嵌入式开发来说,内存管理及使用是至关重要的,内存的使用多少.内存泄漏等时刻需要注意!合理的内存管理策略将从根本上决定内存分配和回收效率,最终决定系统的整体性能.LwIP 就提供了动态内存堆管 ...
C语言内存管理内幕(二）----半自动内存管理策略
2019独角兽企业重金招聘Python工程师标准>>> C语言内存管理内幕(二)----半自动内存管理策略转载于:https://my.oschina.net/hengcai001 ...
Redis 数据结构与内存管理策略（上）
Redis 数据结构与内存管理策略(上) 标签: Redis Redis数据结构 Redis内存管理策略 Redis数据类型 Redis类型映射 Redis 数据类型特点与使用场景 String.Li ...
linux内存管理策略,Glibc内存管理—ptmalloc内存分配策略（1）
一.linux的内存布局 1.32位模式下内存的经典布局图1 32位模式下内存经典布局注:这种内存布局模式是linux内核2.6.7以前的默认内存布局形式说明:(1)在32的机器上,lo ...
android内存池,两种常见的内存管理方法：堆和内存池
描述本文导读在程序运行过程中,可能产生一些数据,例如,串口接收的数据,ADC采集的数据.若需将数据存储在内存中,以便进一步运算.处理,则应为其分配合适的内存空间,数据处理完毕后,再释放相应的内存空 ...
Spark内存管理（3）—— 统一内存管理设计理念
Spark内存管理系列文章: Spark内存管理(1)-- 静态内存管理 Spark内存管理(2)-- 统一内存管理在本文中,将会对各个内存的分布以及设计原理进行详细的阐述相对于静态内存模型 ...
Spark内存管理（1）—— 静态内存管理
Spark内存管理简介 Spark从1.6开始引入了动态内存管理模式,即执行内存和存储内存之间可以相互抢占 Spark提供了2种内存分配模式: 静态内存管理统一内存管理本系列文章将分别对这两种内 ...
从内存管理原理，窥探OS内存管理机制
摘要:本文将从最简单的内存管理原理说起,带大家一起窥探OS的内存管理机制,由此熟悉底层的内存管理机制,写出高效的应用程序. 本文分享自华为云社区<探索OS的内存管理原理>,作者:元闰子 . ...

xerces-c++内存管理策略为何耗费大量内存

xerces-c++内存管理策略&为何耗费大量内存