XML解析简介及Xerces-C++简单使用举例

XML是由World WideWeb联盟(W3C)定义的元语言。它已经成为一种通用的数据交换格式，它的平台无关性，语言无关性，系统无关性，给数据集成与交互带来了极大的方便。XML在不同的语言里解析方式都是一样的，只不过实现的语法不同而已。

XML本身只是以纯文本对数据进行编码的一种格式，要想利用XML，或者说利用XML文件中所编码的数据，必须先将数据从纯文本中解析出来，因此，必须有一个能够识别XML文档中信息的解析器，用来解释XML文档并提取其中的数据。然而，根据数据提取的不同需求，又存在着多种解析方式，不同的解析方式有着各自的优缺点和适用环境。选择合适的XML解析技术能够有效提升应用系统的整体性能。

所有的XML处理都从解析开始，无论是使用XSLT或Java语言，第一步都是要读入XML文件，解码结构和检索信息等等，这就是解析，即把代表XML文档的一个无结构的字符序列转换为满足XML语法的结构化组件的过程。

XML基本的的解析方式主要有两种：SAX(Simple API for XML)和DOM(Document ObjectModel)。

SAX是基于事件流的解析。SAX处理的优点非常类似于流媒体的优点。分析能够立即开始，而不是等待所有的数据被处理。而且，由于应用程序只是在读取数据时检查数据，因此不需要将数据存储在内存中。这对于大型文档来说是个巨大的优点。事实上，应用程序甚至不必解析整个文档，它可以在某个条件得到满足时停止解析。一般来说，SAX还比它的替代者DOM快很多。SAX解析器采用了基于事件的模型，它在解析XML文档的时候可以触发一系列的事件，当发现给定的tag的时候，它可以激活一个回调方法，告诉该方法制定的标签已经找到。SAX对内存的要求通常会比较低，因为它让开发人员来决定所要处理的tag。特别是当开发人员只需要处理文档中所包含的部分数据时，SAX这种扩展能力得到了更好的体现。但用SAX解析器的时候编码工作会比较困难，而且很难同时访问同一个文档中的多处不同数据。优点：(1)、不需要等待所有数据都被处理，分析就能立即开始；(2)、只在读取数据时检查数据，不需要保存在内存中；(3)、可以在某个条件得到满足时停止解析，不必解析整个文档；(4)、效率和性能较高，能解析大于系统内存的文档。缺点：(1)、需要应用程序自己负责TAG的处理逻辑(例如维护父/子关系等)，文档越复杂程序就越复杂；(2)、单向导航，无法定位文档层次，很难同时访问同一文档的不同部分数据，不支持XPath。

DOM是用与平台和语言无关的方式表示XML文档的官方W3C标准。DOM是以层次结构组织的节点或信息片段的集合。这个层次结构允许开发人员在树中寻找特定信息。分析该结构通常需要加载整个文档和构造层次结构，然后才能做任何工作。由于它是基于信息层次的，因而DOM被认为是基于树或基于对象的。优点：(1)、允许应用程序对数据和结构做出更改；(2)、访问是双向的，可以在任何时候在树中上下导航，获取和操作任意部分的数据。缺点：通常需要加载整个XML文档来构造层次结构，消耗资源大。

基于C/C++语言的XML解析库包括：

(1)、Expat：http://www.libexpat.org/ ；

(2)、die-xml：https://code.google.com/p/die-xml/；

(3)、Xerces-C++：http://xerces.apache.org/xerces-c/index.html；

(4)、TinyXml：http://www.grinninglizard.com/tinyxml/；

Xerces-C++的编译和使用：

1、从http://xerces.apache.org/xerces-c/download.cgi#verify下载 xerces-c-3.1.1.zip 源代码，并解压缩；

2、用vs2010打开xerces-c-3.1.1\projects\Win32\VC10\xerces-all目录下的xerces-all.sln；

3、分别选择SolutionConfigurations、Solution Platforms中相关项，然后选中Solution ‘xerces-all’，-->单击右键，选择执行Rebuild Solution，会在/Build/Win32/VC10目录下生成相应的动态库和静态库，这里选择Static Debug/xerces-c_static_3D.lib和Static Release/xerces-c_static_3.lib进行测试；

4、在’xerces-all’工作空间的基础上新建一个TestXerces工程，选中此工程，分别在Debug和Release下，工程属性(1)、Configuration Properties -->Character Set:Use Unicode Character Set; (2)、C/C++-->General-->Additional Include Directories: ../../../../../src ，C/C++ -->Prerocessor中加入：

_CRT_SECURE_NO_DEPRECATE
_WINDOWS
XERCES_STATIC_LIBRARY
XERCES_BUILDING_LIBRARY
XERCES_USE_TRANSCODER_WINDOWS
XERCES_USE_MSGLOADER_INMEMORY
XERCES_USE_NETACCESSOR_WINSOCK
XERCES_USE_FILEMGR_WINDOWS
XERCES_USE_MUTEXMGR_WINDOWS
XERCES_PATH_DELIMITER_BACKSLASH
HAVE_STRICMP
HAVE_STRNICMP
HAVE_LIMITS_H
HAVE_SYS_TIMEB_H
HAVE_FTIME
HAVE_WCSUPR
HAVE_WCSLWR
HAVE_WCSICMP
HAVE_WCSNICMP

stdafx.h:

#pragma once#include "targetver.h"#include <stdio.h>#include "xercesc/util/PlatformUtils.hpp"
#include "xercesc/util/XMLString.hpp"
#include "xercesc/dom/DOM.hpp"
#include "xercesc/util/OutOfMemoryException.hpp"
#include "xercesc/util/TransService.hpp"
#include "xercesc/parsers/SAXParser.hpp"
#include "xercesc/sax/HandlerBase.hpp"
#include "xercesc/framework/XMLFormatter.hpp"

stdafx.cpp:

#include "stdafx.h"// TODO: reference any additional headers you need in STDAFX.H
// and not in this file
#ifdef _DEBUG#pragma comment(lib, "../../../../../Build/Win32/VC10/Static Debug/xerces-c_static_3D.lib")
#else#pragma comment(lib, "../../../../../Build/Win32/VC10/Static Release/xerces-c_static_3.lib")
#endif

TestXerces.cpp:

#include "stdafx.h"
#include <iostream>using namespace std;XERCES_CPP_NAMESPACE_USEclass XStr
{
public :// -----------------------------------------------------------------------//  Constructors and Destructor// -----------------------------------------------------------------------XStr(const char* const toTranscode){// Call the private transcoding methodfUnicodeForm = XMLString::transcode(toTranscode);}~XStr(){XMLString::release(&fUnicodeForm);}// -----------------------------------------------------------------------//  Getter methods// -----------------------------------------------------------------------const XMLCh* unicodeForm() const{return fUnicodeForm;}private :// -----------------------------------------------------------------------//  Private data members////  fUnicodeForm//      This is the Unicode XMLCh format of the string.// -----------------------------------------------------------------------XMLCh*   fUnicodeForm;
};#define X(str) XStr(str).unicodeForm()/*
* This sample illustrates how you can create a DOM tree in memory.
* It then prints the count of elements in the tree.
*/
int CreateDOMDocument()
{// Initialize the XML4C2 system.try {XMLPlatformUtils::Initialize();} catch(const XMLException& toCatch) {char *pMsg = XMLString::transcode(toCatch.getMessage());XERCES_STD_QUALIFIER cerr << "Error during Xerces-c Initialization.\n"<< "  Exception message:"<< pMsg;XMLString::release(&pMsg);return 1;}// Watch for special case help requestint errorCode = 0;/*{XERCES_STD_QUALIFIER cout << "\nUsage:\n""    CreateDOMDocument\n\n""This program creates a new DOM document from scratch in memory.\n""It then prints the count of elements in the tree.\n"<< XERCES_STD_QUALIFIER endl;errorCode = 1;}*/if(errorCode) {XMLPlatformUtils::Terminate();return errorCode;}{//  Nest entire test in an inner block.//  The tree we create below is the same that the XercesDOMParser would//  have created, except that no whitespace text nodes would be created.// <company>//     <product>Xerces-C</product>//     <category idea='great'>XML Parsing Tools</category>//     <developedBy>Apache Software Foundation</developedBy>// </company>DOMImplementation* impl =  DOMImplementationRegistry::getDOMImplementation(X("Core"));if (impl != NULL) {try {DOMDocument* doc = impl->createDocument(0,                    // root element namespace URI.X("company"),         // root element name0);                   // document type object (DTD).DOMElement* rootElem = doc->getDocumentElement();DOMElement*  prodElem = doc->createElement(X("product"));rootElem->appendChild(prodElem);DOMText*    prodDataVal = doc->createTextNode(X("Xerces-C"));prodElem->appendChild(prodDataVal);DOMElement*  catElem = doc->createElement(X("category"));rootElem->appendChild(catElem);catElem->setAttribute(X("idea"), X("great"));DOMText*    catDataVal = doc->createTextNode(X("XML Parsing Tools"));catElem->appendChild(catDataVal);DOMElement*  devByElem = doc->createElement(X("developedBy"));rootElem->appendChild(devByElem);DOMText*    devByDataVal = doc->createTextNode(X("Apache Software Foundation"));devByElem->appendChild(devByDataVal);//// Now count the number of elements in the above DOM tree.//const XMLSize_t elementCount = doc->getElementsByTagName(X("*"))->getLength();XERCES_STD_QUALIFIER cout << "The tree just created contains: " << elementCount<< " elements." << XERCES_STD_QUALIFIER endl;doc->release();} catch (const OutOfMemoryException&) {XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;errorCode = 5;} catch (const DOMException& e) {XERCES_STD_QUALIFIER cerr << "DOMException code is:  " << e.code << XERCES_STD_QUALIFIER endl;errorCode = 2;} catch (...) {XERCES_STD_QUALIFIER cerr << "An error occurred creating the document" << XERCES_STD_QUALIFIER endl;errorCode = 3;}} else{// (inpl != NULL)XERCES_STD_QUALIFIER cerr << "Requested implementation is not supported" << XERCES_STD_QUALIFIER endl;errorCode = 4;}}XMLPlatformUtils::Terminate();return errorCode;
}// ---------------------------------------------------------------------------
//  This is a simple class that lets us do easy (though not terribly efficient)
//  transcoding of XMLCh data to local code page for display.
// ---------------------------------------------------------------------------
class StrX
{
public :// -----------------------------------------------------------------------//  Constructors and Destructor// -----------------------------------------------------------------------StrX(const XMLCh* const toTranscode){// Call the private transcoding methodfLocalForm = XMLString::transcode(toTranscode);}~StrX(){XMLString::release(&fLocalForm);}// -----------------------------------------------------------------------//  Getter methods// -----------------------------------------------------------------------const char* localForm() const{return fLocalForm;}private :// -----------------------------------------------------------------------//  Private data members////  fLocalForm//      This is the local code page form of the string.// -----------------------------------------------------------------------char*   fLocalForm;
};inline XERCES_STD_QUALIFIER ostream& operator<<(XERCES_STD_QUALIFIER ostream& target, const StrX& toDump)
{target << toDump.localForm();return target;
}int SAXPrint()
{// ---------------------------------------------------------------------------//  Local data////  doNamespaces//      Indicates whether namespace processing should be enabled or not.//      Defaults to disabled.////  doSchema//      Indicates whether schema processing should be enabled or not.//      Defaults to disabled.////  schemaFullChecking//      Indicates whether full schema constraint checking should be enabled or not.//      Defaults to disabled.////  encodingName//      The encoding we are to output in. If not set on the command line,//      then it is defaulted to LATIN1.////  xmlFile//      The path to the file to parser. Set via command line.////  valScheme//      Indicates what validation scheme to use. It defaults to 'auto', but//      can be set via the -v= command.// ---------------------------------------------------------------------------static bool                     doNamespaces        = false;static bool                     doSchema            = false;static bool                     schemaFullChecking  = false;static const char*              encodingName    = "LATIN1";static XMLFormatter::UnRepFlags unRepFlags      = XMLFormatter::UnRep_CharRef;static char*                    xmlFile         = 0;static SAXParser::ValSchemes    valScheme       = SAXParser::Val_Auto;// Initialize the XML4C2 systemtry {XMLPlatformUtils::Initialize();} catch (const XMLException& toCatch) {XERCES_STD_QUALIFIER cerr << "Error during initialization! :\n"<< StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;return 1;}xmlFile = "../../../../../samples/data/personal-schema.xml";int errorCount = 0;////  Create a SAX parser object. Then, according to what we were told on//  the command line, set it to validate or not.//SAXParser* parser = new SAXParser;parser->setValidationScheme(valScheme);parser->setDoNamespaces(doNamespaces);parser->setDoSchema(doSchema);parser->setHandleMultipleImports (true);parser->setValidationSchemaFullChecking(schemaFullChecking);////  Create the handler object and install it as the document and error//  handler for the parser-> Then parse the file and catch any exceptions//  that propogate out//int errorCode = 0;try {//SAXPrintHandlers handler(encodingName, unRepFlags);//parser->setDocumentHandler(&handler);//parser->setErrorHandler(&handler);parser->parse(xmlFile);errorCount = parser->getErrorCount();} catch (const OutOfMemoryException&) {XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;errorCode = 5;} catch (const XMLException& toCatch) {XERCES_STD_QUALIFIER cerr << "\nAn error occurred\n  Error: "<< StrX(toCatch.getMessage())<< "\n" << XERCES_STD_QUALIFIER endl;errorCode = 4;}if(errorCode) {XMLPlatformUtils::Terminate();return errorCode;}////  Delete the parser itself.  Must be done prior to calling Terminate, below.//delete parser;// And call the termination methodXMLPlatformUtils::Terminate();if (errorCount > 0)return 4;elsereturn 0;return 0;
}int main(int argc, char* argv[])
{CreateDOMDocument();SAXPrint();cout<<"ok!"<<endl;return 0;
}

XML解析简介及Xerces-C++简单使用举例相关推荐

史上最最靠谱，又双叒叒(ruò,zhuó)简单的基于MSXML的XML解析指南-C++
文章目录史上最最靠谱,又双叒叒简单的基于MSXML的XML解析指南-C++ 流程设计 xml信息有哪几种读取形式(xml文件或wchar) 如何选取节点,and取节点属性有哪些方法? IXMLDOM ...
C++11 boost::spirit::qi简单的XML解析器示例
boost::spirit::qi是一个简单的解释器开发库.可以用来解析文本,构建解释器等. 笔者花了两天时间看完了README文档,并且照着Demo代码写了一遍.感觉语法很复杂.特别是最后的一个XM ...
XML文件简介和解析
文章目录 1 XML文件 1.1 XML文件简介 1.2 dom4j解析XML文件 1 XML文件 1.1 XML文件简介 XML文件是可扩展标记语言,用于 1,保存数据通过标签的属性和文本内容持久 ...
XML简介,XML和HTML的区别,XML用处,XML规则,XML约束,XML语法,XML解析,DOM
什么是xml XML 指可扩展标记语言(EXtensible Markup Language),它的结构是一种树形的结构,必须有一个根节点 XML 是一种标记语言,很类似 HTML XML 的设计宗旨 ...
XML解析技术简介——（一）
解析技术(两种) 基本的解析方式有两种,一种叫SAX,另一种叫DOM.SAX是基于事件流的解析,DOM是基于XML文档树结构的解析. DOM:document object model W3C组织 ...
自己写的简单xml解析器
已经自我放逐好几年了.打算去上班得了.在最后的自由日子里,做点有意义的事吧... 先来下载地址 http://www.kuaipan.cn/file/id_12470514853353274.htm ...
XML解析器及相关概念介绍
前几天看一本介绍JSP的书,上面有对XML解析器的介绍,但看不太懂,在网上搜了一些资料,看后发现原来书中写的不太正确. 通过这篇文章,把本人理解的关于XML解析器和Java下一些XML相关的概念介绍清 ...
C++ XML解析之TinyXML篇[转]
最近使用TinyXML进行C++ XML解析,感觉使用起来比较简单,很容易上手,本文给出一个使用TinyXML进行XML解析的简单例子,很多复杂的应用都可以基于本例子的方法来完成.以后的文章里会讲 ...
Java XML解析器
使用Apache Xerces解析XML文档一.技术概述在用Java解析XML时候,一般都使用现成XML解析器来完成,自己编码解析是一件很棘手的问题,对程序员要求很高,一般也没有专业厂商或者开源组 ...

XML解析简介及Xerces-C++简单使用举例

XML解析简介及Xerces-C++简单使用举例相关推荐

最新文章

热门文章