epub 书是可供人们下载的开放性资源格式的电子图书。epub 文件通常与类似亚马逊Kindle 这样的电子阅读器不兼容。

一个epub 文件包含两个文件:一个包含数据的压缩文件(.zip文件)以及一个描述压缩文件信息的XML 格式文件。下面是通过python 的lxml 库来解析这个描述压缩文件信息的XML 文件。从而得到相关信息:

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import zipfile

from lxml import etree

def get_epub_info(fname):

ns = {

'n': 'urn:oasis:names:tc:opendocument:xmlns:container',

'pkg': 'http://www.idpf.org/2007/opf',

'dc': 'http://purl.org/dc/elements/1.1/'

}

# prepare to read from the .epub file

_zip = zipfile.ZipFile(fname)

# find the contents metafile

txt = _zip.read('META-INF/container.xml')

tree = etree.fromstring(txt)

cfname = tree.xpath('n:rootfiles/n:rootfile/@full-path', namespaces=ns)[0]

# grab the metadata block from the contents metafile

cf = _zip.read(cfname)

# print cf

tree = etree.fromstring(cf)

p = tree.xpath('/pkg:package/pkg:metadata', namespaces=ns)[0]

# repackage the data

res = {}

for s in ['title', 'language', 'creator', 'date', 'identifier', 'publisher', 'subject', 'description']:

res[s] = p.xpath('dc:%s/text()' % s, namespaces=ns)[0]

# print '--------', s, '-------'

# for i in p.xpath('dc:%s/text()' % s, namespaces=ns):

# print i

# print p.xpath('dc:identifier/text()', namespaces=ns)[1] # ISBN

return res

if __name__ == "__main__":

print get_epub_info('source/epubsample.epub')

输出

{'publisher': 'Shoes and Ships and Sealing Wax Ltd', 'description': 'SUMMARY:\nThis unique \'15 books in 1\' edition of L. Frank Baum\'s original "Oz" series contains the following complete works: "The Wonderful Wizard of Oz," "The Marvelous Land of Oz," "Ozma of Oz," "Dorothy and the Wizard in Oz," "The Road to Oz," "The Emerald City of Oz," "The Patchwork Girl Of Oz," "Little Wizard Stories of Oz," "Tik-Tok of Oz," "The Scarecrow Of Oz," "Rinkitink In Oz," "The Lost Princess Of Oz," "The Tin Woodman Of Oz," "The Magic of Oz," and "Glinda Of Oz." For over a hundred years, L. Frank Baum\'s classic fairy stories about the land of Oz have been delighting children and parents alike. Now, for the first time, the entire Oz series is available in this single, great-value, edition!', 'language': 'UND', 'creator': 'L. Frank Baum', 'title': 'The Wonderful Wizard of Oz', 'date': '2010-01-22T00:08:46', 'identifier': 'd1d2e9d3-2d97-44b9-924a-c59416e85df7', 'subject': 'Science fiction'}

xml 示例

<?xml version="1.0" encoding="UTF-8"?>

UND

The Wonderful Wizard of Oz

2010-01-22T00:08:46

calibre (0.6.34) [http://calibre-ebook.com]

d1d2e9d3-2d97-44b9-924a-c59416e85df7

L. Frank Baum

Shoes and Ships and Sealing Wax Ltd

9780954840143

Science fiction

Fantasy

Epic

General

Fiction

Science Fiction & Fantasy

Magic

Juvenile Fiction

Fantasy & Magic

American

Fantasy fiction

Wizards

Classics

Anthologies

Classic fiction (Children's

YA)

Ages 9-12 Fiction

Young Adult Fiction

Action & Adventure

Children's Books

& Magic

Fairy tales

Children's stories

fables

Wizard of Oz (Fictitious character)

folk tales

Juvenile Fiction : General

magical tales & traditional stories

Oz (Imaginary place)

Juvenile Fiction : Fantasy & Magic

SUMMARY:

This unique '15 books in 1' edition of L. Frank Baum's original "Oz" series contains the following complete works: "The Wonderful Wizard of Oz," "The Marvelous Land of Oz," "Ozma of Oz," "Dorothy and the Wizard in Oz," "The Road to Oz," "The Emerald City of Oz," "The Patchwork Girl Of Oz," "Little Wizard Stories of Oz," "Tik-Tok of Oz," "The Scarecrow Of Oz," "Rinkitink In Oz," "The Lost Princess Of Oz," "The Tin Woodman Of Oz," "The Magic of Oz," and "Glinda Of Oz." For over a hundred years, L. Frank Baum's classic fairy stories about the land of Oz have been delighting children and parents alike. Now, for the first time, the entire Oz series is available in this single, great-value, edition!

...

A simple python script to unpack/parse epub books so they can be read on the command line.

python epub解析_python 解析电子书的信息相关推荐

  1. python命令解析_python解析命令行

    可以解析这样的命令 ./cron_ctrl jobname1 --stop ;./cron_ctrl jobname1 --start;./cron_ctrl jobname1 --list #!/u ...

  2. python xmlns 解析_Python 解析含有命名空间(xmlns)的xml文件(基于ElementTree)

    Python 解析含有命名空间(xmlns)的xml文件(基于ElementTree) Outline 为什么会有命名空间? XML的元素名字是不固定的,当两个不同的文档,使用同样的名称描述两个不同类 ...

  3. python xml字符串_python -解析字符串,并返回xml格式字符串 急该如何解决

    python --解析字符串,并返回xml格式字符串 急急急. str = """Registrations: ============================= ...

  4. python pyquery库_python解析HTML之:PyQuery库的介绍与使用

    前言 Python关于爬虫的库挺多的,也各有所长.了解前端的也都知道, jQuery 能够通过选择器精确定位 DOM 树中的目标并进行操作,所以我想如果能用 jQuery 去爬网页那就 cool 了. ...

  5. python 报文解析_python解析DNS数据包实例代码

    例子,python解析DNS数据包. 代码示例: ###file QueryDNS.py## -*- coding: utf-8 -*- #Get DNS answer #详情见RFC 1035 im ...

  6. python 邮件解析_Python解析邮件

    邮件的解析是个大课题,远超一般人的预期.它远比发送邮件和接收邮件要复杂的多的多. 这就是为什么网上中文外文搜邮件的问题,绝大多数都是讲发送的而讲接收的很少. 发送邮件好说,接收和下载邮件也好说.关键是 ...

  7. python带货_Python解析罗永浩直播带货背后的数据秘密!

    原标题:Python解析罗永浩直播带货背后的数据秘密! 作为手机界最会说相声的罗永浩,已经正式加盟抖音,全身心投入直播行业了!按罗永浩的话说,是因为看了招商证券的调研报告,也为了偿还之前做手机留下来的 ...

  8. python xml实例_python解析xml文档实例

    博客已迁移  新地址 打开 ======================= 今天恰好用到,记录一下 使用python 用到的包:xml.dom.minidom 需求: 有一个表,里面数据量比较大,每天 ...

  9. python 配置文件解析_python 解析配置文件

    #!/usr/bin/env python #coding=utf-8 #上面来显示中文的.不然中文会出问题 ''' 说明:输入的文件,#开头的一行默认是注释符号,这一行不计入处理,其余的都是有效行, ...

最新文章

  1. 浏览器中的标签切换事件
  2. Javascript、Jquery获取浏览器和屏幕各种高度宽度[mark]
  3. java_多线程_基于接口的多线程
  4. 一个回车引发的编译错误
  5. JAVA IO系列----ObjectInputStream和ObjectOutputStream类
  6. SRM 627 D1L2GraphInversionsDFS查找指定长度的所有路径 Binary indexed tree (BIT)
  7. 光盘显示0字节可用_AT89C2051单片机开发点钞机外接显示屏
  8. 采矿协议_采矿电信产品推荐
  9. 2019小程序没必要做了_2019年,小程序还要不要做
  10. 数据结构之八大排序算法(C语言实现)
  11. 金山数据恢复 2.0
  12. Easyui 官网网址
  13. 2022苹果AppStore应用商店上传与APP上传流程必看(基础篇)
  14. Krypital Group(金氪资本)宣布完成对Ambrus Studio的战略投资
  15. yyyy-MM-dd 转化成 中国标准时间(Tue Dec 31 2019 00:00:00 GMT+0800 (中国标准时间))
  16. “移动互联网+”第一股 正益移动登陆新三板
  17. html粘性菜单,导航菜单:jQuery粘性滚动导航栏效果
  18. 绘制3d散点图报错ax = fig.gca(projection = ‘3d‘)TypeError: gca() got an unexpected keyword argument
  19. HTML实现在线代码格式化、美化、加密、解密、压缩、一键转JavaScript功能工具-toolfk程序员工具网
  20. 教你搭个助我大学拿Offer的面试项目

热门文章

  1. 刚进公司就负责项目,把老弟整蒙了!
  2. 外贸旺季,外贸人如何做好时间管理
  3. Linux ❉ vimrc文件详解
  4. nios 和arm 是硬核还是软核?
  5. 糟糕 安装失败 错误代码0xa0430721 解决方案
  6. 译文 :如果我们心存偏见,还能做好数据分析吗
  7. Unity 基础 之 实现枚举(enum/Enum)遍历的三种简单方法(foreach/for)
  8. 三星s6android版本,三星S6有几个版本 三星S6是否支持双卡双待
  9. 抖音卡首屏如何入池商品推荐
  10. 爱快软路由下载历史版本