epub 书是可供人们下载的开放性资源格式的电子图书。epub 文件通常与类似亚马逊Kindle 这样的电子阅读器不兼容。

一个epub 文件包含两个文件:一个包含数据的压缩文件(.zip文件)以及一个描述压缩文件信息的XML 格式文件。下面是通过python 的lxml 库来解析这个描述压缩文件信息的XML 文件。从而得到相关信息:

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import zipfile

from lxml import etree

def get_epub_info(fname):

ns = {

'n': 'urn:oasis:names:tc:opendocument:xmlns:container',

'pkg': 'http://www.idpf.org/2007/opf',

'dc': 'http://purl.org/dc/elements/1.1/'


# prepare to read from the .epub file

_zip = zipfile.ZipFile(fname)

# find the contents metafile

txt = _zip.read('META-INF/container.xml')

tree = etree.fromstring(txt)

cfname = tree.xpath('n:rootfiles/n:rootfile/@full-path', namespaces=ns)[0]

# grab the metadata block from the contents metafile

cf = _zip.read(cfname)

# print cf

tree = etree.fromstring(cf)

p = tree.xpath('/pkg:package/pkg:metadata', namespaces=ns)[0]

# repackage the data

res = {}

for s in ['title', 'language', 'creator', 'date', 'identifier', 'publisher', 'subject', 'description']:

res[s] = p.xpath('dc:%s/text()' % s, namespaces=ns)[0]

# print '--------', s, '-------'

# for i in p.xpath('dc:%s/text()' % s, namespaces=ns):

# print i

# print p.xpath('dc:identifier/text()', namespaces=ns)[1] # ISBN

return res

if __name__ == "__main__":

print get_epub_info('source/epubsample.epub')


{'publisher': 'Shoes and Ships and Sealing Wax Ltd', 'description': 'SUMMARY:\nThis unique \'15 books in 1\' edition of L. Frank Baum\'s original "Oz" series contains the following complete works: "The Wonderful Wizard of Oz," "The Marvelous Land of Oz," "Ozma of Oz," "Dorothy and the Wizard in Oz," "The Road to Oz," "The Emerald City of Oz," "The Patchwork Girl Of Oz," "Little Wizard Stories of Oz," "Tik-Tok of Oz," "The Scarecrow Of Oz," "Rinkitink In Oz," "The Lost Princess Of Oz," "The Tin Woodman Of Oz," "The Magic of Oz," and "Glinda Of Oz." For over a hundred years, L. Frank Baum\'s classic fairy stories about the land of Oz have been delighting children and parents alike. Now, for the first time, the entire Oz series is available in this single, great-value, edition!', 'language': 'UND', 'creator': 'L. Frank Baum', 'title': 'The Wonderful Wizard of Oz', 'date': '2010-01-22T00:08:46', 'identifier': 'd1d2e9d3-2d97-44b9-924a-c59416e85df7', 'subject': 'Science fiction'}

xml 示例

<?xml version="1.0" encoding="UTF-8"?>


The Wonderful Wizard of Oz


calibre (0.6.34) [http://calibre-ebook.com]


L. Frank Baum

Shoes and Ships and Sealing Wax Ltd


Science fiction





Science Fiction & Fantasy


Juvenile Fiction

Fantasy & Magic


Fantasy fiction




Classic fiction (Children's


Ages 9-12 Fiction

Young Adult Fiction

Action & Adventure

Children's Books

& Magic

Fairy tales

Children's stories


Wizard of Oz (Fictitious character)

folk tales

Juvenile Fiction : General

magical tales & traditional stories

Oz (Imaginary place)

Juvenile Fiction : Fantasy & Magic


This unique '15 books in 1' edition of L. Frank Baum's original "Oz" series contains the following complete works: "The Wonderful Wizard of Oz," "The Marvelous Land of Oz," "Ozma of Oz," "Dorothy and the Wizard in Oz," "The Road to Oz," "The Emerald City of Oz," "The Patchwork Girl Of Oz," "Little Wizard Stories of Oz," "Tik-Tok of Oz," "The Scarecrow Of Oz," "Rinkitink In Oz," "The Lost Princess Of Oz," "The Tin Woodman Of Oz," "The Magic of Oz," and "Glinda Of Oz." For over a hundred years, L. Frank Baum's classic fairy stories about the land of Oz have been delighting children and parents alike. Now, for the first time, the entire Oz series is available in this single, great-value, edition!


A simple python script to unpack/parse epub books so they can be read on the command line.

