pythonjava解释xml_Python解析XML文档

解析XML主要用到pytohn自带的XML库，其次还是lxml库

XML结构，先以一个相对简单但功能比较全的XML文档为例

dive into mark

currently between addictions

tag:diveintomark.org,2001-07-29:/

2009-03-27T21:56:07Z

Mark

http://diveintomark.org/

Dive into history, 2009 edition

href='http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'/>

tag:diveintomark.org,2009-03-27:/archives/20090327172042

2009-03-27T21:56:07Z

2009-03-27T17:20:42Z

Putting an entire chapter on one page sounds

bloated, but consider this — my longest chapter so far

would be 75 printed pages, and it loads in under 5 seconds…

On dialup.

Mark

http://diveintomark.org/

Accessibility is a harsh mistress

href='http://diveintomark.org/archives/2009/03/21/accessibility-is-a-harsh-mistress'/>

tag:diveintomark.org,2009-03-21:/archives/20090321200928

2009-03-22T01:05:37Z

2009-03-21T20:09:28Z

The accessibility orthodoxy does not permit people to

question the value of features that are rarely useful and rarely used.

Mark

A gentle introduction to video encoding, part 1: container formats

href='http://diveintomark.org/archives/2008/12/18/give-part-1-container-formats'/>

tag:diveintomark.org,2008-12-18:/archives/20081218155422

2009-01-11T19:39:22Z

2008-12-18T15:54:22Z

These notes will eventually become part of a

tech talk on video encoding.

先简单的看一下这个XML的结构

#这里定义了命名空间(namespace) http://www.w3.org/2005/Atom

#这里的没有text，但是里面有相应的属性

href='http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'/>

首先有一个全局的根元素

在根元素下面有title,subtitle,id,update,link,entry子元素

在entry元素下面还有author,title,link,id,updated,published,category,summary子元素 (姑且称为孙元素)

在author元素下面还有name,uri子元素(这该称为曾孙元素了吧~ 哈哈)

结构还是挺清晰的

下面我们用python的方法来一步步的取出在元素<>>这间的content以为元素内的属性

使用的方法主要有

tree = etree.parse() 解析XML

root = tree.getroot() 得到根元素

root.tag 根元素名称

root.attrib 显示元素的属性

root.findall() 查找元素

下面请看代码，都已经将注释与结果写在里面

import xml.etree.ElementTree as etree #将xml.etree.ElementTree引入

tree = etree.parse('feed.xml') #解析XML

root = tree.getroot()

print root

#元素即列表

print root.tag

#{http://www.w3.org/2005/Atom}feed

# ElementTree使用{namespace}localname来表达xml元素

for child in root:

print child

# 这里只显示一级子元素，而子元素的子元素将不会被遍历

#属性即字典

print root.attrib

#{'{http://www.w3.org/XML/1998/namespace}lang': 'en'}

#我们注意到feed下面的link这个元素有属性

print root[4].attrib

#{'href': 'http://diveintomark.org/', 'type': 'text/html', 'rel': 'alternate'}

print root[3].attrib

#{} 将会得到一个空字典，因为updated元素内没有属性值

#查找元素

entrylist = root.findall('{http://www.w3.org/2005/Atom}entry')

print entrylist

# [,

# 3.org/2005/Atom}entry at 18425d0>,

# t 1842968>]

print root.findall('{http://www.w3.org/2005/Atom}author')

# 这里将得到一个空列表，因为author不是feed的直接子元素

#查找子元素

entries = tree.findall('{http://www.w3.org/2005/Atom}entry') #先找到entry元素·

title = entries[0].find('{http://www.w3.org/2005/Atom}title')#接着再找title元素

print title.text

#'Dive into history, 2009 edition'

all_links = tree.findall('//{http://www.w3.org/2005/Atom}link') #在元素前面加'//' 则可以在所有元素里查找包括子元素和孙元素

# [,

# ,

# ]

print all_links[0].attrib #将会得到这个Link的属性字典

# {'href': 'http://diveintomark.org/',

# 'type': 'text/html',

# 'rel': 'alternate'}

关于XML库解析与查找XML文档基本的方法就这些了，现在通过一个实例来学以至用下

还是回到微信的XML解析上，微信将用户的信息POST到你的服务器上，基本形式如下

1348831860

1234567890123456

现在我们来通过上面介绍的方法来获得元素中的‘this is a test’字段

import xml.etree.ElementTree as etree

weixinxml = etree.parse('weixinpost.xml')

wroot = weixinxml.getroot()

print wroot.tag

for child in wroot:

print child.tag

if wroot.find('Content') is not None:

print wroot.find('Content').text

else:

print 'Nothing found'

这样简单几步就可以把想要的内容取出来了

pythonjava解释xml_Python解析XML文档相关推荐

libxml -- 解析 XML 文档
参考 http://xmlsoft.org/ http://www.miidoo.cn/info_detail-204.html http://www.blogjava.net/wxb_nudt/ar ...
XML 解析XML文档 XML约束
XML 什么是XML Extensible Markup Language(可扩展的标记语言) 他是一个语言,有自己的语法,和Java以及其他的编程无关 "标记" 在文件中包含类似 ...
【学习笔记】关于DOM4J：使用DOM4J解析XML文档
一.概述 DOM4J是一个易用的.开源的库,用于XML.XPath和XSLT中.采用了Java集合框架并完全支持DOM.SAX.和JAXP. DOM4J最大的特色是使用大量的接口,主要接口都在org. ...
细节：解析XML文档和XML字符串
总代码如下: var XMLHttp = null; if (window.XMLHttpRequest) { //现代浏览器XMLHttp = new XMLHttpRequest(); } els ...
Dom4j 解析Xml文档及XPath查询学习笔记
本文查阅方法: 1.查阅目录 -- 查阅本文目录,确定想要查阅的目录标题 2.快捷"查找" -- 在当前浏览器页面,按键 "Ctrl+F" 按键 ...
DOM4J解析XML文档、Document对象、节点对象节点对象属性、将文档写入XML文件（详细）...
Dom4j是一个简单.灵活的开放源代码的库.Dom4j是由早期开发JDOM开发的.与JDOM不同的是,dom4j使用接口和抽象的人分离出来而后独立基类,虽然Dom4j的API相对要复杂一些,但它提供了 ...
Android数据存储——2.文件存储_C_DOM解析XML文档
今天学习Android数据存储--文件存储_DOM解析XML文档位于org.w3c.dom操作XML会比较简单,就是将XML看做是一颗树,DOM就是对这颗树的一个数据结构的描述,但对大型XML文件效 ...
xml教程之java解析xml文档
1.java解析xml文件介绍 XML解析方式分为两种:DOM方式和SAX方式 DOM:Document Object Model,文档对象模型.这种方式是W3C推荐的处理XML的一种方式. SAX: ...
JAVA中利用DOM解析XML文档
JAVA中利用DOM解析XML文档 package org.sws.utils; import java.io.File;import java.io.IOException; import java ...

pythonjava解释xml_Python解析XML文档

pythonjava解释xml_Python解析XML文档相关推荐

最新文章

热门文章