Python Standard Library

翻译: Python 江湖群

10/06/07 20:10:08 编译

0.1. 关于本书
0.2. 代码约定
0.3. 关于例子
0.4. 如何联系我们

核心模块
- 1.1. 介绍
- 1.2. _ _builtin_ _ 模块
- 1.3. exceptions 模块
- 1.4. os 模块
- 1.5. os.path 模块
- 1.6. stat 模块
- 1.7. string 模块
- 1.8. re 模块
- 1.9. math 模块
- 1.10. cmath 模块
- 1.11. operator 模块
- 1.12. copy 模块
- 1.13. sys 模块
- 1.14. atexit 模块
- 1.15. time 模块
- 1.16. types 模块
- 1.17. gc 模块
更多标准模块
- 2.1. 概览
- 2.2. fileinput 模块
- 2.3. shutil 模块
- 2.4. tempfile 模块
- 2.5. StringIO 模块
- 2.6. cStringIO 模块
- 2.7. mmap 模块
- 2.8. UserDict 模块
- 2.9. UserList 模块
- 2.10. UserString 模块
- 2.11. traceback 模块
- 2.12. errno 模块
- 2.13. getopt 模块
- 2.14. getpass 模块
- 2.15. glob 模块
- 2.16. fnmatch 模块
- 2.17. random 模块
- 2.18. whrandom 模块
- 2.19. md5 模块
- 2.20. sha 模块
- 2.21. crypt 模块
- 2.22. rotor 模块
- 2.23. zlib 模块
- 2.24. code 模块
线程和进程
- 3.1. 概览
- 3.2. threading 模块
- 3.3. Queue 模块
- 3.4. thread 模块
- 3.5. commands 模块
- 3.6. pipes 模块
- 3.7. popen2 模块
- 3.8. signal 模块
数据表示
- 4.1. 概览
- 4.2. array 模块
- 4.3. struct 模块
- 4.4. xdrlib 模块
- 4.5. marshal 模块
- 4.6. pickle 模块
- 4.7. cPickle 模块
- 4.8. copy_reg 模块
- 4.9. pprint 模块
- 4.10. repr 模块
- 4.11. base64 模块
- 4.12. binhex 模块
- 4.13. quopri 模块
- 4.14. uu 模块
- 4.15. binascii 模块
文件格式
- 5.1. 概览
- 5.2. xmllib 模块
- 5.3. xml.parsers.expat 模块
- 5.4. sgmllib 模块
- 5.5. htmllib 模块
- 5.6. htmlentitydefs 模块
- 5.7. formatter 模块
- 5.8. ConfigParser 模块
- 5.9. netrc 模块
- 5.10. shlex 模块
- 5.11. zipfile 模块
- 5.12. gzip 模块
邮件和新闻消息处理
- 6.1. 概览
- 6.2. rfc822 模块
- 6.3. mimetools 模块
- 6.4. MimeWriter 模块
- 6.5. mailbox 模块
- 6.6. mailcap 模块
- 6.7. mimetypes 模块
- 6.8. packmail 模块
- 6.9. mimify 模块
- 6.10. multifile 模块
网络协议
- 7.1. 概览
- 7.2. socket 模块
- 7.3. select 模块
- 7.4. asyncore 模块
- 7.5. asynchat 模块
- 7.6. urllib 模块
- 7.7. urlparse 模块
- 7.8. cookie 模块
- 7.9. robotparser 模块
- 7.10. ftplib 模块
- 7.11. gopherlib 模块
- 7.12. httplib 模块
- 7.13. poplib 模块
- 7.14. imaplib 模块
- 7.15. smtplib 模块
- 7.16. telnetlib 模块
- 7.17. nntplib 模块
- 7.18. SocketServer 模块
- 7.19. BaseHTTPServer 模块
- 7.20. SimpleHTTPServer 模块
- 7.21. CGIHTTPServer 模块
- 7.22. cgi 模块
- 7.23. webbrowser 模块
国际化
- 8.1. locale 模块
- 8.2. unicodedata 模块
- 8.3. ucnhash 模块
多媒体相关模块
- 9.1. 概览
- 9.2. imghdr 模块
- 9.3. sndhdr 模块
- 9.4. whatsound 模块
- 9.5. aifc 模块
- 9.6. sunau 模块
- 9.7. sunaudio 模块
- 9.8. wave 模块
- 9.9. audiodev 模块
- 9.10. winsound 模块
数据储存
- 10.1. 概览
- 10.2. anydbm 模块
- 10.3. whichdb 模块
- 10.4. shelve 模块
- 10.5. dbhash 模块
- 10.6. dbm 模块
- 10.7. dumbdbm 模块
- 10.8. gdbm 模块
工具和实用程序
- 11.1. dis 模块
- 11.2. pdb 模块
- 11.3. bdb 模块
- 11.4. profile 模块
- 11.5. pstats 模块
- 11.6. tabnanny 模块
其他模块
- 12.1. 概览
- 12.2. fcntl 模块
- 12.3. pwd 模块
- 12.4. grp 模块
- 12.5. nis 模块
- 12.6. curses 模块
- 12.7. termios 模块
- 12.8. tty 模块
- 12.9. resource 模块
- 12.10. syslog 模块
- 12.11. msvcrt 模块
- 12.12. nt 模块
- 12.13. _winreg 模块
- 12.14. posix 模块
执行支持模块
- 13.1. dospath 模块
- 13.2. macpath 模块
- 13.3. ntpath 模块
- 13.4. posixpath 模块
- 13.5. strop 模块
- 13.6. imp 模块
- 13.7. new 模块
- 13.8. pre 模块
- 13.9. sre 模块
- 13.10. py_compile 模块
- 13.11. compileall 模块
- 13.12. ihooks 模块
- 13.13. linecache 模块
- 13.14. macurl2path 模块
- 13.15. nturl2path 模块
- 13.16. tokenize 模块
- 13.17. keyword 模块
- 13.18. parser 模块
- 13.19. symbol 模块
- 13.20. token 模块
其他模块
- 14.1. 概览
- 14.2. pyclbr 模块
- 14.3. filecmp 模块
- 14.4. cmd 模块
- 14.5. rexec 模块
- 14.6. Bastion 模块
- 14.7. readline 模块
- 14.8. rlcompleter 模块
- 14.9. statvfs 模块
- 14.10. calendar 模块
- 14.11. sched 模块
- 14.12. statcache 模块
- 14.13. grep 模块
- 14.14. dircache 模块
- 14.15. dircmp 模块
- 14.16. cmp 模块
- 14.17. cmpcache 模块
- 14.18. util 模块
- 14.19. soundex 模块
- 14.20. timing 模块
- 14.21. posixfile 模块
- 14.22. bisect 模块
- 14.23. knee 模块
- 14.24. tzparse 模块
- 14.25. regex 模块
- 14.26. regsub 模块
- 14.27. reconvert 模块
- 14.28. regex_syntax 模块
- 14.29. find 模块
Py 2.0 后新增模块
后记

<blockquote>
"We'd like to pretend that 'Fredrik' is a role, but even hundreds of volunteers
couldn't possibly keep up. No, 'Fredrik' is the result of crossing an http server
with a spam filter with an emacs whatsit and some other stuff besides."
</blockquote><blockquote>
-Gordon McMillan, June 1998
</blockquote>

Python 2.0发布附带了一个包含200个以上模块的可扩展的标准库. 本书简要地介绍每个模块并提供至少一个例子来说明如何使用它. 本书一共包含360个例子.

0.1. 关于本书

"Those people who have nothing better to do than post on the Internet all day long are rarely the ones who have the most insights."

<blockquote>
- Jakob Nielsen, December 1998
</blockquote>

五年前我偶然遇到了 Python, 开始了我的 Python 之旅, 我花费了大量的时间在 comp.lang.python 新闻组里回答问题. 也许某个人发现一个模块正是他想要的, 但是却不知道如何使用它. 也许某个人为他的任务挑选的不合适的模块. 也许某个人已经厌倦了发明新轮子. 大多时候, 一个简短的例子要比一份手册文档更有帮助.

本书是超过3,000个新闻组讨论的精华部分, 当然也有很多的新脚本, 为了涵盖标准库的每个角落.

我尽力使得每个脚本都易于理解, 易于重用代码. 我有意缩短注释的长度, 如果你想更深入地了解背景, 那么你可以参阅每个 Python 发布中的参考手册. 本书的重要之处在于范例代码.

我们欢迎任何评论, 建议, 以及 bug 报告, 请将它们发送到 fredrik@pythonware.com. 我将阅读尽我所能阅读所有的邮件, 但可能回复不是那么及时.

本书的相关更新内容以及其他信息请访问 http://www.pythonware.com/people/fredrik/librarybook.htm

为什么没有Tkinter?

本书涵盖了整个标准库, 除了(可选的)Tkinter ui(user-interface : 用户界面) 库. 有很多原因, 更多是因为时间, 本书的空间, 以及我正在写另一本关于 Tkinter 的书.

关于这些书的信息, 请访问 http://www.pythonware.com/people/fredrik/tkinterbook.htm. (不用看了,又一404)

产品细节

本书使用DocBook SGML编写, 我使用了一系列的工具, 包括Secret Labs' PythonWorks, Excosoft Documentor, James Clark's Jade DSSSL processor, Norm Walsh's DocBook stylesheets, 当然,还有一些 Python 脚本.

感谢帮忙校对的人们: Tim Peters, Guido van Rossum, David Ascher, Mark Lutz, 和 Rael Dornfest, 以及 PythonWare 成员: Matthew Ellis, Håkan Karlsson, 和 Rune Uhlin.

感谢 Lenny Muellner, 他帮助我把SGML文件转变为你们现在所看到的这本书, 以及Christien Shangraw, 他将那些代码文件集合起来做成了随书CD (可以在 http://examples.oreilly.com/pythonsl 找到, 竟然没有404, 奇迹).

0.2. 代码约定

本书使用以下习惯用法:

斜体

用于文件名和命令. 还用于定义术语.

等宽字体 e.g. Python

用于代码以及方法,模块,操作符,函数,语句,属性等的名称.

等宽粗体

用于代码执行结果.

0.3. 关于例子

除非提到,所有例子都可以在 Python 1.5.2 和 Python 2.0 下运行. 能不能在 Python 2.4/2.5 下执行.....看参与翻译各位的了.

除了一些平台相关模块的脚本, 所有例子都可以在 Windows, Solaris, 以及 Linux 下正常执行.

所有代码都是有版权的. 当然,你可以自由地使用这些这些模块,别忘记你是从哪得到(?学会)这些的.

大多例子的文件名都包含它所使用的模块名称,后边是 "-example-" 以及一个唯一的"序号". 注意有些例子并不是按顺序出现的, 这是为了匹配本书的较早版本 - (the eff-bot guide to) The Standard Python Library.

你可以在网上找到本书附带CD的内容 (参阅 http://examples.oreilly.com/pythonsl). 更多信息以及更新内容参阅 http://www.pythonware.com/people/fredrik/librarybook.htm. (ft, 又一404. 大家一定不要看~)

0.4. 如何联系我们

Python 江湖 QQ 群: 43680167

Feather (校对) QQ: 85660100

1. 核心模块

"Since the functions in the C runtime library are not part of the Win32 API, we believe the number of applications that will be affected by this bug to be very limited."

<blockquote>
- Microsoft, January 1999
</blockquote>

1.1. 介绍

Python 的标准库包括了很多的模块, 从 Python 语言自身特定的类型和声明, 到一些只用于少数程序的不著名的模块.

本章描述了一些基本的标准库模块. 任何大型 Python 程序都有可能直接或间接地使用到这类模块的大部分.

1.1.1. 内建函数和异常

下面的这两个模块比其他模块加在一起还要重要: 定义内建函数(例如 len, int, range ...)的 _ _builtin_ _ 模块, 以及定义所有内建异常的 exceptions 模块.

Python 在启动时导入这两个模块, 使任何程序都能够使用它们.

1.1.2. 操作系统接口模块

Python 有许多使用了 POSIX 标准 API 和标准 C 语言库的模块. 它们为底层操作系统提供了平台独立的接口.

这类的模块包括: 提供文件和进程处理功能的 os 模块; 提供平台独立的文件名处理 (分拆目录名, 文件名, 后缀等)的 os.path 模块; 以及时间日期处理相关的 time/datetime 模块.

[!Feather注: datetime 为 Py2.3 新增模块, 提供增强的时间处理方法 ]

延伸一点说, 网络和线程模块同样也可以归为这一个类型. 不过 Python 并没有在所有的平台/版本实现这些.

1.1.3. 类型支持模块

标准库里有许多用于支持内建类型操作的库. string 模块实现了常用的字符串处理. math 模块提供了数学计算操作和常量(pi, e都属于这类常量), cmath 模块为复数提供了和 math 一样的功能.

1.1.4. 正则表达式

re 模块为 Python 提供了正则表达式支持. 正则表达式是用于匹配字符串或特定子字符串的有特定语法的字符串模式.

1.1.5. 语言支持模块

sys 模块可以让你访问解释器相关参数,比如模块搜索路径,解释器版本号等. operator 模块提供了和内建操作符作用相同的函数. copy 模块允许你复制对象, Python 2.0 新加入的 gc 模块提供了对垃圾收集的相关控制功能.

1.2. _ _builtin_ _ 模块

这个模块包含 Python 中使用的内建函数. 一般不用手动导入这个模块; Python会帮你做好一切.

1.2.1. 使用元组或字典中的参数调用函数

Python允许你实时地创建函数参数列表. 只要把所有的参数放入一个元组中，然后通过内建的 apply 函数调用函数. 如 Example 1-1.

1.2.1.1. Example 1-1. 使用 apply 函数

File: builtin-apply-example-1.py

def function(a, b):    print a, b

apply(function, ("whither", "canada?"))apply(function, (1, 2 + 3))

whither canada?1 5

要想把关键字参数传递给一个函数, 你可以将一个字典作为 apply 函数的第 3 个参数, 参考 Example 1-2.

1.2.1.2. Example 1-2. 使用 apply 函数传递关键字参数

File: builtin-apply-example-2.py

def function(a, b):    print a, b

apply(function, ("crunchy", "frog"))apply(function, ("crunchy",), {"b": "frog"})apply(function, (), {"a": "crunchy", "b": "frog"})

crunchy frogcrunchy frogcrunchy frog

apply 函数的一个常见用法是把构造函数参数从子类传递到基类, 尤其是构造函数需要接受很多参数的时候. 如 Example 1-3 所示.

1.2.1.3. Example 1-3. 使用 apply 函数调用基类的构造函数

File: builtin-apply-example-3.py

class Rectangle:    def _ _init_ _(self, color="white", width=10, height=10):        print "create a", color, self, "sized", width, "x", height

class RoundedRectangle(Rectangle):    def _ _init_ _(self, **kw):        apply(Rectangle._ _init_ _, (self,), kw)

rect = Rectangle(color="green", height=100, width=100)rect = RoundedRectangle(color="blue", height=20)

create a green <Rectangle instance at 8c8260> sized 100 x 100create a blue <RoundedRectangle instance at 8c84c0> sized 10 x 20

Python 2.0 提供了另个方法来做相同的事. 你只需要使用一个传统的函数调用 , 使用 * 来标记元组, ** 来标记字典.

下面两个语句是等价的:

result = function(*args, **kwargs)result = apply(function, args, kwargs)

1.2.2. 加载和重载模块

如果你写过较庞大的 Python 程序, 那么你就应该知道 import 语句是用来导入外部模块的 (当然也可以使用 from-import 版本). 不过你可能不知道 import 其实是靠调用内建函数 _ _import_ _ 来工作的.

通过这个戏法你可以动态地调用函数. 当你只知道模块名称(字符串)的时候, 这将很方便. Example 1-4 展示了这种用法, 动态地导入所有以 "-plugin" 结尾的模块.

1.2.2.1. Example 1-4. 使用 _ _import_ _ 函数加载模块

File: builtin-import-example-1.py

import glob, os

modules = []

for module_file in glob.glob("*-plugin.py"):    try:        module_name, ext = os.path.splitext(os.path.basename(module_file))        module = _ _import_ _(module_name)        modules.append(module)    except ImportError:        pass # ignore broken modules

# say hello to all modulesfor module in modules:    module.hello()

example-plugin says hello

注意这个 plug-in 模块文件名中有个 "-" (hyphens). 这意味着你不能使用普通的 import 命令, 因为 Python 的辨识符不允许有 "-" .

Example 1-5 展示了 Example 1-4 中使用的 plug-in .

1.2.2.2. Example 1-5. Plug-in 例子

File: example-plugin.py

def hello():    print "example-plugin says hello"

Example 1-6 展示了如何根据给定模块名和函数名获得想要的函数对象.

1.2.2.3. Example 1-6. 使用 _ _import_ _ 函数获得特定函数

File: builtin-import-example-2.py

def getfunctionbyname(module_name, function_name):    module = _ _import_ _(module_name)    return getattr(module, function_name)

print repr(getfunctionbyname("dumbdbm", "open"))

<function open at 794fa0>

你也可以使用这个函数实现延迟化的模块导入 (lazy module loading). 例如在 Example 1-7 中的 string 模块只在第一次使用的时候导入.

1.2.2.4. Example 1-7. 使用 _ _import_ _ 函数实现延迟导入

File: builtin-import-example-3.py

class LazyImport:    def _ _init_ _(self, module_name):        self.module_name = module_name        self.module = None    def _ _getattr_ _(self, name):        if self.module is None:            self.module = _ _import_ _(self.module_name)        return getattr(self.module, name)

string = LazyImport("string")

print string.lowercase

abcdefghijklmnopqrstuvwxyz

Python 也提供了重新加载已加载模块的基本支持. [Example 1-8 #eg-1-8 会加载 3 次 hello.py 文件.

1.2.2.5. Example 1-8. 使用 reload 函数

File: builtin-reload-example-1.py

import helloreload(hello)reload(hello)

hello again, and welcome to the showhello again, and welcome to the showhello again, and welcome to the show

reload 直接接受模块作为参数.

[!Feather 注:  ^ 原句无法理解, 稍后讨论.]

注意，当你重加载模块时, 它会被重新编译, 新的模块会代替模块字典里的老模块. 但是, 已经用原模块里的类建立的实例仍然使用的是老模块(不会被更新).

同样地, 使用 from-import 直接创建的到模块内容的引用也是不会被更新的.

1.2.3. 关于名称空间

dir 返回由给定模块, 类, 实例, 或其他类型的所有成员组成的列表. 这可能在交互式 Python 解释器下很有用, 也可以用在其他地方. Example 1-9展示了 dir 函数的用法.

1.2.3.1. Example 1-9. 使用 dir 函数

File: builtin-dir-example-1.py

def dump(value):    print value, "=>", dir(value)

import sys

dump(0)dump(1.0)dump(0.0j) # complex numberdump([]) # listdump({}) # dictionarydump("string")dump(len) # functiondump(sys) # module

0 => []1.0 => []0j => ['conjugate', 'imag', 'real'][] => ['append', 'count', 'extend', 'index', 'insert',    'pop', 'remove', 'reverse', 'sort']{} => ['clear', 'copy', 'get', 'has_key', 'items',    'keys', 'update', 'values']string => []<built-in function len> => ['_ _doc_ _', '_ _name_ _', '_ _self_ _']<module 'sys' (built-in)> => ['_ _doc_ _', '_ _name_ _',    '_ _stderr_ _', '_ _stdin_ _', '_ _stdout_ _', 'argv',    'builtin_module_names', 'copyright', 'dllhandle',    'exc_info', 'exc_type', 'exec_prefix', 'executable',...

在例子 Example 1-10中定义的 getmember 函数返回给定类定义的所有类级别的属性和方法.

1.2.3.2. Example 1-10. 使用 dir 函数查找类的所有成员

File: builtin-dir-example-2.py

class A:    def a(self):        pass    def b(self):        pass

class B(A):    def c(self):        pass    def d(self):        pass

def getmembers(klass, members=None):    # get a list of all class members, ordered by class    if members is None:        members = []    for k in klass._ _bases_ _:        getmembers(k, members)    for m in dir(klass):        if m not in members:            members.append(m)    return members

print getmembers(A)print getmembers(B)print getmembers(IOError)

['_ _doc_ _', '_ _module_ _', 'a', 'b']['_ _doc_ _', '_ _module_ _', 'a', 'b', 'c', 'd']['_ _doc_ _', '_ _getitem_ _', '_ _init_ _', '_ _module_ _', '_ _str_ _']

getmembers 函数返回了一个有序列表. 成员在列表中名称出现的越早, 它所处的类层次就越高. 如果无所谓顺序的话, 你可以使用字典代替列表.

[!Feather 注: 字典是无序的, 而列表和元组是有序的, 网上有关于有序字典的讨论]

vars 函数与此相似, 它返回的是包含每个成员当前值的字典. 如果你使用不带参数的 vars , 它将返回当前局部名称空间的可见元素(同 locals() 函数 ). 如 Example 1-11所表示.

1.2.3.3. Example 1-11. 使用 vars 函数

File: builtin-vars-example-1.py

book = "library2"pages = 250scripts = 350

print "the %(book)s book contains more than %(scripts)s scripts" % vars()

the library book contains more than 350 scripts

1.2.4. 检查对象类型

Python 是一种动态类型语言, 这意味着给一个定变量名可以在不同的场合绑定到不同的类型上. 在接下面例子中, 同样的函数分别被整数, 浮点数, 以及一个字符串调用:

def function(value):    print valuefunction(1)function(1.0)function("one")

type 函数 (如 Example 1-12 所示) 允许你检查一个变量的类型. 这个函数会返回一个 type descriptor (类型描述符), 它对于 Python 解释器提供的每个类型都是不同的.

1.2.4.1. Example 1-12. 使用 type 函数

File: builtin-type-example-1.py

def dump(value):    print type(value), value

dump(1)dump(1.0)dump("one")

<type 'int'> 1<type 'float'> 1.0<type 'string'> one

每个类型都有一个对应的类型对象, 所以你可以使用 is 操作符 (对象身份?) 来检查类型. (如 Example 1-13所示).

1.2.4.2. Example 1-13. 对文件名和文件对象使用 type 函数

File: builtin-type-example-2.py

def load(file):    if isinstance(file, type("")):        file = open(file, "rb")    return file.read()

print len(load("samples/sample.jpg")), "bytes"print len(load(open("samples/sample.jpg", "rb"))), "bytes"

4672 bytes4672 bytes

callable 函数, 如 Example 1-14 所示, 可以检查一个对象是否是可调用的 (无论是直接调用或是通过 apply). 对于函数, 方法, lambda 函式, 类, 以及实现了 _ _call_ _ 方法的类实例, 它都返回 True.

1.2.4.3. Example 1-14. 使用 callable 函数

File: builtin-callable-example-1.py

def dump(function):    if callable(function):        print function, "is callable"    else:        print function, "is *not* callable"

class A:    def method(self, value):        return value

class B(A):    def _ _call_ _(self, value):        return value

a = A()b = B()

dump(0) # simple objectsdump("string")dump(callable)dump(dump) # function

dump(A) # classesdump(B)dump(B.method)

dump(a) # instancesdump(b)dump(b.method)

0 is *not* callablestring is *not* callable<built-in function callable> is callable<function dump at 8ca320> is callableA is callableB is callable<unbound method A.method> is callable<A instance at 8caa10> is *not* callable<B instance at 8cab00> is callable<method A.method of B instance at 8cab00> is callable

注意类对象 (A 和 B) 都是可调用的; 如果调用它们, 就产生新的对象(类实例). 但是 A 类的实例不可调用, 因为它的类没有实现 _ _call_ _ 方法.

你可以在 operator 模块中找到检查对象是否为某一内建类型(数字, 序列, 或者字典等) 的函数. 但是, 因为创建一个类很简单(比如实现基本序列方法的类), 所以对这些类型使用显式的类型判断并不是好主意.

在处理类和实例的时候会复杂些. Python 不会把类作为本质上的类型对待; 相反地, 所有的类都属于一个特殊的类类型(special class type), 所有的类实例属于一个特殊的实例类型(special instance type).

这意味着你不能使用 type 函数来测试一个实例是否属于一个给定的类; 所有的实例都是同样的类型! 为了解决这个问题, 你可以使用 isinstance 函数,它会检查一个对象是不是给定类(或其子类)的实例. Example 1-15 展示了 isinstance 函数的使用.

1.2.4.4. Example 1-15. 使用 isinstance 函数

File: builtin-isinstance-example-1.py

class A:    pass

class B:    pass

class C(A):    pass

class D(A, B):    pass

def dump(object):    print object, "=>",    if isinstance(object, A):        print "A",    if isinstance(object, B):        print "B",    if isinstance(object, C):        print "C",    if isinstance(object, D):        print "D",    print

a = A()b = B()c = C()d = D()

dump(a)dump(b)dump(c)dump(d)dump(0)dump("string")

<A instance at 8ca6d0> => A<B instance at 8ca750> => B<C instance at 8ca780> => A C<D instance at 8ca7b0> => A B D0 =>string =>

issubclass 函数与此相似, 它用于检查一个类对象是否与给定类相同, 或者是给定类的子类. 如 Example 1-16 所示.

注意, isinstance 可以接受任何对象作为参数, 而 issubclass 函数在接受非类对象参数时会引发 TypeError 异常.

1.2.4.5. Example 1-16. 使用 issubclass 函数

File: builtin-issubclass-example-1.py

class A:    pass

class B:    pass

class C(A):    pass

class D(A, B):    pass

def dump(object):    print object, "=>",    if issubclass(object, A):        print "A",    if issubclass(object, B):        print "B",    if issubclass(object, C):        print "C",    if issubclass(object, D):        print "D",    print

dump(A)dump(B)dump(C)dump(D)dump(0)dump("string")

A => AB => BC => A CD => A B D0 =>Traceback (innermost last):  File "builtin-issubclass-example-1.py", line 29, in ?  File "builtin-issubclass-example-1.py", line 15, in dumpTypeError: arguments must be classes

1.2.5. 计算 Python 表达式

Python 提供了在程序中与解释器交互的多种方法. 例如 eval 函数将一个字符串作为 Python 表达式求值. 你可以传递一串文本, 简单的表达式, 或者使用内建 Python 函数. 如 Example 1-17 所示.

1.2.5.1. Example 1-17. 使用 eval 函数

File: builtin-eval-example-1.py

def dump(expression):    result = eval(expression)    print expression, "=>", result, type(result)

dump("1")dump("1.0")dump("'string'")dump("1.0 + 2.0")dump("'*' * 10")dump("len('world')")

1 => 1 <type 'int'>1.0 => 1.0 <type 'float'>'string' => string <type 'string'>1.0 + 2.0 => 3.0 <type 'float'>'*' * 10 => ********** <type 'string'>len('world') => 5 <type 'int'>

如果你不确定字符串来源的安全性, 那么你在使用 eval 的时候会遇到些麻烦. 例如, 某个用户可能会使用 _ _import_ _ 函数加载 os 模块, 然后从硬盘删除文件 (如 Example 1-18 所示).

1.2.5.2. Example 1-18. 使用 eval 函数执行任意命令

File: builtin-eval-example-2.py

print eval("_ _import_ _('os').getcwd()")print eval("_ _import_ _('os').remove('file')")

/home/fredrik/librarybookTraceback (innermost last): File "builtin-eval-example-2", line 2, in ? File "<string>", line 0, in ?os.error: (2, 'No such file or directory')

这里我们得到了一个 os.error 异常, 这说明 Python 事实上在尝试删除文件!

幸运地是, 这个问题很容易解决. 你可以给 eval 函数传递第 2 个参数, 一个定义了该表达式求值时名称空间的字典. 我们测试下, 给函数传递个空字典:

>>> print eval("_ _import_ _('os').remove('file')", {})Traceback (innermost last):  File "<stdin>", line 1, in ?  File "<string>", line 0, in ?os.error: (2, 'No such file or directory')

呃.... 我们还是得到了个 os.error 异常.

这是因为 Python 在求值前会检查这个字典, 如果没有发现名称为 _ _builtins_ _ 的变量(复数形式), 它就会添加一个:

>>> namespace = {}>>> print eval("_ _import_ _('os').remove('file')", namespace)Traceback (innermost last):  File "<stdin>", line 1, in ?  File "<string>", line 0, in ?os.error: (2, 'No such file or directory')>>> namespace.keys()['_ _builtins_ _']

如果你打印这个 namespace 的内容, 你会发现里边有所有的内建函数.

[!Feather 注: 如果我RP不错的话, 添加的这个_ _builtins_ _就是当前的_ _builtins_ _]

我们注意到了如果这个变量存在, Python 就不会去添加默认的, 那么我们的解决方法也来了, 为传递的字典参数加入一个 _ _builtins_ _ 项即可. 如 Example 1-19 所示.

1.2.5.3. Example 1-19. 安全地使用 eval 函数求值

File: builtin-eval-example-3.py

print eval("_ _import_ _('os').getcwd()", {})print eval("_ _import_ _('os').remove('file')", {"_ _builtins_ _": {}})

/home/fredrik/librarybookTraceback (innermost last):  File "builtin-eval-example-3.py", line 2, in ?  File "<string>", line 0, in ?NameError: _ _import_ _

即使这样, 你仍然无法避免针对 CPU 和内存资源的攻击. (比如, 形如 eval("'*'*1000000*2*2*2*2*2*2*2*2*2") 的语句在执行后会使你的程序耗尽系统资源).

1.2.6. 编译和执行代码

eval 函数只针对简单的表达式. 如果要处理大块的代码, 你应该使用 compile 和 exec 函数 (如 Example 1-20 所示).

1.2.6.1. Example 1-20. 使用 compile 函数检查语法

File: builtin-compile-example-1.py

NAME = "script.py"

BODY = """prnt 'owl-stretching time'"""

try:    compile(BODY, NAME, "exec")except SyntaxError, v:    print "syntax error:", v, "in", NAME

# syntax error: invalid syntax in script.py

成功执行后, compile 函数会返回一个代码对象, 你可以使用 exec 语句执行它, 参见 Example 1-21 .

1.2.6.2. Example 1-21. 执行已编译的代码

File: builtin-compile-example-2.py

BODY = """print 'the ant, an introduction'"""

code = compile(BODY, "<script>", "exec")

print code

exec code

<code object ? at 8c6be0, file "<script>", line 0>the ant, an introduction

使用 Example 1-22 中的类可以在程序执行时实时地生成代码. write 方法用于添加代码, indent 和 dedent 方法用于控制缩进结构. 其他部分交给类来处理.

1.2.6.3. Example 1-22. 简单的代码生成工具

File: builtin-compile-example-3.py

import sys, string

class CodeGeneratorBackend:    "Simple code generator for Python"

    def begin(self, tab="/t"):        self.code = []        self.tab = tab        self.level = 0

    def end(self):        self.code.append("") # make sure there's a newline at the end         return compile(string.join(self.code, "/n"), "<code>", "exec")

    def write(self, string):        self.code.append(self.tab * self.level + string)

    def indent(self):        self.level = self.level + 1        # in 2.0 and later, this can be written as: self.level += 1

    def dedent(self):        if self.level == 0:            raise SyntaxError, "internal error in code generator"        self.level = self.level - 1        # or: self.level -= 1

## try it out!

c = CodeGeneratorBackend()c.begin()c.write("for i in range(5):")c.indent()c.write("print 'code generation made easy!'")c.dedent()exec c.end()

code generation made easy!code generation made easy!code generation made easy!code generation made easy!code generation made easy!

Python 还提供了 execfile 函数, 一个从文件加载代码, 编译代码, 执行代码的快捷方式. Example 1-23 简单地展示了如何使用这个函数.

1.2.6.4. Example 1-23. 使用 execfile 函数

File: builtin-execfile-example-1.py

execfile("hello.py")

def EXECFILE(filename, locals=None, globals=None):    exec compile(open(filename).read(), filename, "exec") in locals, globals

EXECFILE("hello.py")

hello again, and welcome to the showhello again, and welcome to the show

Example 1-24 中的代码是 Example 1-23 中使用的 hello.py 文件.

1.2.6.5. Example 1-24. hello.py 脚本

File: hello.py

print "hello again, and welcome to the show"

1.2.7. 从 _ _builtin_ _ 模块重载函数

因为 Python 在检查局部名称空间和模块名称空间前不会检查内建函数, 所以有时候你可能要显式地引用 _ _builtin_ _ 模块. 例如 Example 1-25 重载了内建的 open 函数. 这时候要想使用原来的 open 函数, 就需要脚本显式地指明模块名称.

1.2.7.1. Example 1-25. 显式地访问 _ _builtin_ _ 模块中的函数

File: builtin-open-example-1.py

def open(filename, mode="rb"):    import _ _builtin_ _    file = _ _builtin_ _.open(filename, mode)    if file.read(5) not in("GIF87", "GIF89"):        raise IOError, "not a GIF file"    file.seek(0)    return file

fp = open("samples/sample.gif")print len(fp.read()), "bytes"

fp = open("samples/sample.jpg")print len(fp.read()), "bytes"

3565 bytesTraceback (innermost last):  File "builtin-open-example-1.py", line 12, in ?  File "builtin-open-example-1.py", line 5, in openIOError: not a GIF file

[!Feather 注: 明白这个open()函数是干什么的么? 检查一个文件是否是 GIF 文件, 一般如这类的图片格式都在文件开头有默认的格式. 另外打开文件推荐使用file()而不是open() , 虽然暂时没有区别]

1.3. exceptions 模块

exceptions 模块提供了标准异常的层次结构. Python 启动的时候会自动导入这个模块, 并且将它加入到 _ _builtin_ _ 模块中. 也就是说, 一般不需要手动导入这个模块.

在 1.5.2 版本时它是一个普通模块, 2.0 以及以后版本成为内建模块.

该模块定义了以下标准异常:

Exception 是所有异常的基类. 强烈建议(但不是必须)自定义的异常异常也继承这个类.

SystemExit(Exception) 由 sys.exit 函数引发. 如果它在最顶层没有被 try-except 语句捕获, 那么解释器将直接关闭而不会显示任何跟踪返回信息.
StandardError(Exception) 是所有内建异常的基类(除 SystemExit 外).
KeyboardInterrupt(StandardError) 在用户按下 Control-C(或其他打断按键)后被引发. 如果它可能会在你使用 "捕获所有" 的 try-except 语句时导致奇怪的问题.
ImportError(StandardError) 在 Python 导入模块失败时被引发.
EnvironmentError 作为所有解释器环境引发异常的基类. (也就是说, 这些异常一般不是由于程序 bug 引起).
IOError(EnvironmentError) 用于标记 I/O 相关错误.
OSError(EnvironmentError) 用于标记 os 模块引起的错误.
WindowsError(OSError) 用于标记 os 模块中 Windows 相关错误.
NameError(StandardError) 在 Python 查找全局或局部名称失败时被引发.
UnboundLocalError(NameError) , 当一个局部变量还没有赋值就被使用时, 会引发这个异常. 这个异常只有在2.0及之后的版本有; 早期版本只会引发一个普通的 NameError .
AttributeError(StandardError) , 当 Python 寻找(或赋值)给一个实例属性, 方法, 模块功能或其它有效的命名失败时, 会引发这个异常.
SyntaxError(StandardError) , 当解释器在编译时遇到语法错误, 这个异常就被引发.
(2.0 及以后版本) IndentationError(SyntaxError) 在遇到非法的缩进时被引发. 该异常只用于 2.0 及以后版本, 之前版本会引发一个 SyntaxError 异常.
(2.0 及以后版本) TabError(IndentationError) , 当使用 -tt 选项检查不一致缩进时有可能被引发. 该异常只用于 2.0 及以后版本, 之前版本会引发一个 SyntaxError 异常.
TypeError(StandardError) , 当给定类型的对象不支持一个操作时被引发.
AssertionError(StandardError) 在 assert 语句失败时被引发(即表达式为 false 时).
LookupError(StandardError) 作为序列或字典没有包含给定索引或键时所引发异常的基类.
IndexError(LookupError) , 当序列对象使用给定索引数索引失败时(不存在索引对应对象)引发该异常.
KeyError(LookupError) 当字典对象使用给定索引索引失败时(不存在索引对应对象)引发该异常.
ArithmeticError(StandardError) 作为数学计算相关异常的基类.
OverflowError(ArithmeticError) 在操作溢出时被引发(例如当一个整数太大, 导致不能符合给定类型).
ZeroDivisionError(ArithmeticError) , 当你尝试用 0 除某个数时被引发.
FloatingPointError(ArithmeticError) , 当浮点数操作失败时被引发.
ValueError(StandardError) , 当一个参数类型正确但值不合法时被引发.
(2.0 及以后版本) UnicodeError(ValueError) , Unicode 字符串类型相关异常. 只使用在 2.0 及以后版本.
RuntimeError(StandardError) , 当出现运行时问题时引发, 包括在限制模式下尝试访问外部内容, 未知的硬件问题等等.
NotImplementedError(RuntimeError) , 用于标记未实现的函数, 或无效的方法.
SystemError(StandardError) , 解释器内部错误. 该异常值会包含更多的细节 (经常会是一些深层次的东西, 比如 "eval_code2: NULL globals" ). 这本书的作者编了 5 年程序都没见过这个错误. (想必是没有用 raise SystemError).
MemoryError(StandardError) , 当解释器耗尽内存时会引发该异常. 注意只有在底层内存分配抱怨时这个异常才会发生; 如果是在你的旧机器上, 这个异常发生之前系统会陷入混乱的内存交换中.

你可以创建自己的异常类. 只需要继承内建的 Exception 类(或者它的任意一个合适的子类)即可, 有需要时可以再重载它的 _ _str_ _ 方法. Example 1-26 展示了如何使用 exceptions 模块.

1.3.0.1. Example 1-26. 使用 exceptions 模块

File: exceptions-example-1.py

# python imports this module by itself, so the following# line isn't really needed# python 会自动导入该模块, 所以以下这行是不必要的# import exceptions

class HTTPError(Exception):    # indicates an HTTP protocol error    def _ _init_ _(self, url, errcode, errmsg):        self.url = url        self.errcode = errcode        self.errmsg = errmsg    def _ _str_ _(self):        return (            "<HTTPError for %s: %s %s>" %            (self.url, self.errcode, self.errmsg)            )

try:    raise HTTPError("http://www.python.org/foo", 200, "Not Found")except HTTPError, error:    print "url", "=>", error.url    print "errcode", "=>", error.errcode    print "errmsg", "=>", error.errmsg    raise # reraise exception

url => http://www.python.org/fooerrcode => 200errmsg => Not FoundTraceback (innermost last):  File "exceptions-example-1", line 16, in ?HTTPError: <HTTPError for http://www.python.org/foo: 200 Not Found>

1.4. os 模块

这个模块中的大部分函数通过对应平台相关模块实现, 比如 posix 和 nt. os 模块会在第一次导入的时候自动加载合适的执行模块.

1.4.1. 处理文件

内建的 open / file 函数用于创建, 打开和编辑文件, 如 Example 1-27 所示. 而 os 模块提供了重命名和删除文件所需的函数.

1.4.1.1. Example 1-27. 使用 os 模块重命名和删除文件

File: os-example-3.py

import osimport string

def replace(file, search_for, replace_with):    # replace strings in a text file

    back = os.path.splitext(file)[0] + ".bak"    temp = os.path.splitext(file)[0] + ".tmp"

    try:        # remove old temp file, if any        os.remove(temp)    except os.error:        pass

    fi = open(file)    fo = open(temp, "w")

    for s in fi.readlines():        fo.write(string.replace(s, search_for, replace_with))

    fi.close()    fo.close()

    try:        # remove old backup file, if any        os.remove(back)    except os.error:        pass

    # rename original to backup...    os.rename(file, back)

    # ...and temporary to original    os.rename(temp, file)

## try it out!

file = "samples/sample.txt"

replace(file, "hello", "tjena")replace(file, "tjena", "hello")

1.4.2. 处理目录

os 模块也包含了一些用于目录处理的函数.

listdir 函数返回给定目录中所有文件名(包括目录名)组成的列表, 如 Example 1-28 所示. 而 Unix 和 Windows 中使用的当前目录和父目录标记(. 和 .. )不包含在此列表中.

1.4.2.1. Example 1-28. 使用 os 列出目录下的文件

File: os-example-5.py

import os

for file in os.listdir("samples"):    print file

sample.ausample.jpgsample.wav...

getcwd 和 chdir 函数分别用于获得和改变当前工作目录. 如 Example 1-29 所示.

1.4.2.2. Example 1-29. 使用 os 模块改变当前工作目录

File: os-example-4.py

import os

# where are we?cwd = os.getcwd()print "1", cwd

# go downos.chdir("samples")print "2", os.getcwd()

# go back upos.chdir(os.pardir)print "3", os.getcwd()

1 /ematter/librarybook2 /ematter/librarybook/samples3 /ematter/librarybook

makedirs 和 removedirs 函数用于创建或删除目录层，如 Example 1-30 所示.

1.4.2.3. Example 1-30. 使用 os 模块创建/删除多个目录级

File: os-example-6.py

import os

os.makedirs("test/multiple/levels")

fp = open("test/multiple/levels/file", "w")fp.write("inspector praline")fp.close()

# remove the fileos.remove("test/multiple/levels/file")

# and all empty directories above itos.removedirs("test/multiple/levels")

removedirs 函数会删除所给路径中最后一个目录下所有的空目录. 而 mkdir 和 rmdir 函数只能处理单个目录级. 如 Example 1-31 所示.

1.4.2.4. Example 1-31. 使用 os 模块创建/删除目录

File: os-example-7.py

import os

os.mkdir("test")os.rmdir("test")

os.rmdir("samples") # this will fail

Traceback (innermost last):  File "os-example-7", line 6, in ?OSError: [Errno 41] Directory not empty: 'samples'

如果需要删除非空目录, 你可以使用 shutil 模块中的 rmtree 函数.

1.4.3. 处理文件属性

stat 函数可以用来获取一个存在文件的信息, 如 Example 1-32 所示. 它返回一个类元组对象(stat_result对象, 包含 10 个元素), 依次是st_mode (权限模式), st_ino (inode number), st_dev (device), st_nlink (number of hard links), st_uid (所有者用户 ID), st_gid (所有者所在组 ID ), st_size (文件大小, 字节), st_atime (最近一次访问时间), st_mtime (最近修改时间), st_ctime (平台相关; Unix下的最近一次元数据/metadata修改时间, 或者 Windows 下的创建时间) - 以上项目也可作为属性访问.

[!Feather 注: 原文为 9 元元组. 另,返回对象并非元组类型,为 struct.]

1.4.3.1. Example 1-32. 使用 os 模块获取文件属性

File: os-example-1.py

import osimport time

file = "samples/sample.jpg"

def dump(st):    mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st    print "- size:", size, "bytes"    print "- owner:", uid, gid    print "- created:", time.ctime(ctime)    print "- last accessed:", time.ctime(atime)    print "- last modified:", time.ctime(mtime)    print "- mode:", oct(mode)    print "- inode/dev:", ino, dev

## get stats for a filename

st = os.stat(file)

print "stat", filedump(st)print

## get stats for an open file

fp = open(file)

st = os.fstat(fp.fileno())

print "fstat", filedump(st)

stat samples/sample.jpg- size: 4762 bytes- owner: 0 0- created: Tue Sep 07 22:45:58 1999- last accessed: Sun Sep 19 00:00:00 1999- last modified: Sun May 19 01:42:16 1996- mode: 0100666- inode/dev: 0 2

fstat samples/sample.jpg- size: 4762 bytes- owner: 0 0- created: Tue Sep 07 22:45:58 1999- last accessed: Sun Sep 19 00:00:00 1999- last modified: Sun May 19 01:42:16 1996- mode: 0100666- inode/dev: 0 0

返回对象中有些属性在非 Unix 平台下是无意义的, 比如 (st_inode , st_dev)为 Unix 下的为每个文件提供了唯一标识, 但在其他平台可能为任意无意义数据 .

stat 模块包含了很多可以处理该返回对象的常量及函数. 下面的代码展示了其中的一些.

可以使用 chmod 和 utime 函数修改文件的权限模式和时间属性，如 Example 1-33 所示.

1.4.3.2. Example 1-33. 使用 os 模块修改文件的权限和时间戳

File: os-example-2.py

import osimport stat, time

infile = "samples/sample.jpg"outfile = "out.jpg"

# copy contentsfi = open(infile, "rb")fo = open(outfile, "wb")

while 1:    s = fi.read(10000)    if not s:        break    fo.write(s)

fi.close()fo.close()

# copy mode and timestampst = os.stat(infile)os.chmod(outfile, stat.S_IMODE(st[stat.ST_MODE]))os.utime(outfile, (st[stat.ST_ATIME], st[stat.ST_MTIME]))

print "original", "=>"print "mode", oct(stat.S_IMODE(st[stat.ST_MODE]))print "atime", time.ctime(st[stat.ST_ATIME])print "mtime", time.ctime(st[stat.ST_MTIME])

print "copy", "=>"st = os.stat(outfile)print "mode", oct(stat.S_IMODE(st[stat.ST_MODE]))print "atime", time.ctime(st[stat.ST_ATIME])print "mtime", time.ctime(st[stat.ST_MTIME])

original =>mode 0666atime Thu Oct 14 15:15:50 1999mtime Mon Nov 13 15:42:36 1995copy =>mode 0666atime Thu Oct 14 15:15:50 1999mtime Mon Nov 13 15:42:36 1995

1.4.4. 处理进程

system 函数在当前进程下执行一个新命令, 并等待它完成, 如 Example 1-34 所示.

1.4.4.1. Example 1-34. 使用 os 执行操作系统命令

File: os-example-8.py

import os

if os.name == "nt":    command = "dir"else:    command = "ls -l"

os.system(command)

-rwxrw-r--   1 effbot  effbot        76 Oct  9 14:17 README-rwxrw-r--   1 effbot  effbot      1727 Oct  7 19:00 SimpleAsyncHTTP.py-rwxrw-r--   1 effbot  effbot       314 Oct  7 20:29 aifc-example-1.py-rwxrw-r--   1 effbot  effbot       259 Oct  7 20:38 anydbm-example-1.py...

命令通过操作系统的标准 shell 执行, 并返回 shell 的退出状态. 需要注意的是在 Windows 95/98 下, shell 通常是 command.com , 它的推出状态总是 0.

由于 11os.system11 直接将命令传递给 shell , 所以如果你不检查传入参数的时候会很危险 (比如命令 os.system("viewer %s" % file), 将 file 变量设置为 "sample.jpg; rm -rf $HOME" ....). 如果不确定参数的安全性, 那么最好使用 exec 或 spawn 代替(稍后介绍).

exec 函数会使用新进程替换当前进程(或者说是"转到进程"). 在 Example 1-35 中, 字符串 "goodbye" 永远不会被打印.

1.4.4.2. Example 1-35. 使用 os 模块启动新进程

File: os-exec-example-1.py

import osimport sys

program = "python"arguments = ["hello.py"]

print os.execvp(program, (program,) +  tuple(arguments))print "goodbye"

hello again, and welcome to the show

Python 提供了很多表现不同的 exec 函数. Example 1-35 使用的是 execvp 函数, 它会从标准路径搜索执行程序, 把第二个参数(元组)作为单独的参数传递给程序, 并使用当前的环境变量来运行程序. 其他七个同类型函数请参阅 Python Library Reference .

在 Unix 环境下, 你可以通过组合使用 exec , fork 以及 wait 函数来从当前程序调用另一个程序, 如 Example 1-36 所示. fork 函数复制当前进程, wait 函数会等待一个子进程执行结束.

1.4.4.3. Example 1-36. 使用 os 模块调用其他程序 (Unix)

File: os-exec-example-2.py

import osimport sys

def run(program, *args):    pid = os.fork()    if not pid:        os.execvp(program, (program,) +  args)    return os.wait()[0]

run("python", "hello.py")

print "goodbye"

hello again, and welcome to the showgoodbye

fork 函数在子进程返回中返回 0 (这个进程首先从 fork 返回值), 在父进程中返回一个非 0 的进程标识符(子进程的 PID ). 也就是说, 只有当我们处于子进程的时候 "not pid" 才为真.

fork 和 wait 函数在 Windows 上是不可用的, 但是你可以使用 spawn 函数, 如 Example 1-37 所示. 不过, spawn 不会沿着路径搜索可执行文件, 你必须自己处理好这些.

1.4.4.4. Example 1-37. 使用 os 模块调用其他程序 (Windows)

File: os-spawn-example-1.py

import osimport string

def run(program, *args):    # find executable    for path in string.split(os.environ["PATH"], os.pathsep):        file = os.path.join(path, program) + ".exe"        try:            return os.spawnv(os.P_WAIT, file, (file,) + args)        except os.error:            pass    raise os.error, "cannot find executable"

run("python", "hello.py")

print "goodbye"

hello again, and welcome to the showgoodbye

spawn 函数还可用于在后台运行一个程序. Example 1-38 给 run 函数添加了一个可选的 mode 参数; 当设置为 os.P_NOWAIT 时, 这个脚本不会等待子程序结束, 默认值 os.P_WAIT 时 spawn 会等待子进程结束.

其它的标志常量还有 os.P_OVERLAY ,它使得 spawn 的行为和 exec 类似, 以及 os.P_DETACH , 它在后台运行子进程, 与当前控制台和键盘焦点隔离.

1.4.4.5. Example 1-38. 使用 os 模块在后台执行程序 (Windows)

File: os-spawn-example-2.py

import osimport string

def run(program, *args, **kw):    # find executable    mode = kw.get("mode", os.P_WAIT)    for path in string.split(os.environ["PATH"], os.pathsep):        file = os.path.join(path, program) + ".exe"        try:            return os.spawnv(mode, file, (file,) + args)        except os.error:            pass    raise os.error, "cannot find executable"

run("python", "hello.py", mode=os.P_NOWAIT)print "goodbye"

goodbyehello again, and welcome to the show

Example 1-39 提供了一个在 Unix 和 Windows 平台上通用的 spawn 方法.

1.4.4.6. Example 1-39. 使用 spawn 或 fork/exec 调用其他程序

File: os-spawn-example-3.py

import osimport string

if os.name in ("nt", "dos"):    exefile = ".exe"else:    exefile = ""

def spawn(program, *args):    try:        # possible 2.0 shortcut!        return os.spawnvp(program, (program,) + args)    except AttributeError:        pass    try:        spawnv = os.spawnv    except AttributeError:

        # assume it's unix        pid = os.fork()        if not pid:            os.execvp(program, (program,) + args)        return os.wait()[0]    else:        # got spawnv but no spawnp: go look for an executable        for path in string.split(os.environ["PATH"], os.pathsep):            file = os.path.join(path, program) + exefile            try:                return spawnv(os.P_WAIT, file, (file,) + args)            except os.error:                pass        raise IOError, "cannot find executable"

## try it out!

spawn("python", "hello.py")

print "goodbye"

hello again, and welcome to the showgoodbye

Example 1-39 首先尝试调用 spawnvp 函数. 如果该函数不存在 (一些版本/平台没有这个函数), 它将继续查找一个名为 spawnv 的函数并且开始查找程序路径. 作为最后的选择, 它会调用 exec 和 fork 函数完成工作.

1.4.5. 处理守护进程(Daemon Processes)

Unix 系统中, 你可以使用 fork 函数把当前进程转入后台(一个"守护者/daemon"). 一般来说, 你需要派生(fork off)一个当前进程的副本, 然后终止原进程, 如 Example 1-40 所示.

1.4.5.1. Example 1-40. 使用 os 模块使脚本作为守护执行 (Unix)

File: os-example-14.py

import osimport time

pid = os.fork()if pid:    os._exit(0) # kill original

print "daemon started"time.sleep(10)print "daemon terminated"

需要创建一个真正的后台程序稍微有点复杂, 首先调用 setpgrp 函数创建一个 "进程组首领/process group leader". 否则, 向无关进程组发送的信号(同时)会引起守护进程的问题:

os.setpgrp()

为了确保守护进程创建的文件能够获得程序指定的 mode flags(权限模式标记?), 最好删除 user mode mask:

os.umask(0)

然后, 你应该重定向 stdout/stderr 文件, 而不能只是简单地关闭它们(如果你的程序需要 stdout 或 stderr 写入内容的时候, 可能会出现意想不到的问题).

class NullDevice:    def write(self, s):        passsys.stdin.close()sys.stdout = NullDevice()sys.stderr = NullDevice()

换言之, 由于 Python 的 print 和 C 中的 printf/fprintf 在设备(device) 没有连接后不会关闭你的程序, 此时守护进程中的 sys.stdout.write() 会抛出一个 IOError 异常, 而你的程序依然在后台运行的很好....

另外, 先前例子中的 _exit 函数会终止当前进程. 而 sys.exit 不同, 如果调用者(caller) 捕获了 SystemExit 异常, 程序仍然会继续执行. 如 Example 1-41 所示.

1.4.5.2. Example 1-41. 使用 os 模块终止当前进程

File: os-example-9.py

import osimport sys

try:    sys.exit(1)except SystemExit, value:    print "caught exit(%s)" % value

try:    os._exit(2)except SystemExit, value:    print "caught exit(%s)" % value

print "bye!"

caught exit(1)

1.5. os.path 模块

os.path 模块包含了各种处理长文件名(路径名)的函数. 先导入 (import) os 模块, 然后就可以以 os.path 访问该模块.

1.5.1. 处理文件名

os.path 模块包含了许多与平台无关的处理长文件名的函数. 也就是说, 你不需要处理前后斜杠, 冒号等. 我们可以看看 Example 1-42 中的样例代码.

1.5.1.1. Example 1-42. 使用 os.path 模块处理文件名

File: os-path-example-1.py

import os

filename = "my/little/pony"

print "using", os.name, "..."print "split", "=>", os.path.split(filename)print "splitext", "=>", os.path.splitext(filename)print "dirname", "=>", os.path.dirname(filename)print "basename", "=>", os.path.basename(filename)print "join", "=>", os.path.join(os.path.dirname(filename),                                 os.path.basename(filename))

using nt ...split => ('my/little', 'pony')splitext => ('my/little/pony', '')dirname => my/littlebasename => ponyjoin => my/little/pony

注意这里的 split 只分割出最后一项(不带斜杠).

os.path 模块中还有许多函数允许你简单快速地获知文件名的一些特征，如 Example 1-43 所示。

1.5.1.2. Example 1-43. 使用 os.path 模块检查文件名的特征

File: os-path-example-2.py

import os

FILES = (    os.curdir,    "/",    "file",    "/file",    "samples",    "samples/sample.jpg",    "directory/file",    "../directory/file",    "/directory/file"    )

for file in FILES:    print file, "=>",    if os.path.exists(file):        print "EXISTS",    if os.path.isabs(file):        print "ISABS",    if os.path.isdir(file):        print "ISDIR",    if os.path.isfile(file):        print "ISFILE",    if os.path.islink(file):        print "ISLINK",    if os.path.ismount(file):        print "ISMOUNT",    print

. => EXISTS ISDIR/ => EXISTS ISABS ISDIR ISMOUNTfile =>/file => ISABSsamples => EXISTS ISDIRsamples/sample.jpg => EXISTS ISFILEdirectory/file =>../directory/file =>/directory/file => ISABS

expanduser 函数以与大部分Unix shell相同的方式处理用户名快捷符号(~, 不过在 Windows 下工作不正常), 如 Example 1-44 所示.

1.5.1.3. Example 1-44. 使用 os.path 模块将用户名插入到文件名

File: os-path-expanduser-example-1.py

import os

print os.path.expanduser("~/.pythonrc")

# /home/effbot/.pythonrc

expandvars 函数将文件名中的环境变量替换为对应值, 如 Example 1-45 所示.

1.5.1.4. Example 1-45. 使用 os.path 替换文件名中的环境变量

File: os-path-expandvars-example-1.py

import os

os.environ["USER"] = "user"

print os.path.expandvars("/home/$USER/config")print os.path.expandvars("$USER/folders")

/home/user/configuser/folders

1.5.2. 搜索文件系统

walk 函数会帮你找出一个目录树下的所有文件 (如 Example 1-46 所示). 它的参数依次是目录名, 回调函数, 以及传递给回调函数的数据对象.

1.5.2.1. Example 1-46. 使用 os.path 搜索文件系统

File: os-path-walk-example-1.py

import os

def callback(arg, directory, files):    for file in files:        print os.path.join(directory, file), repr(arg)

os.path.walk(".", callback, "secret message")

./aifc-example-1.py 'secret message'./anydbm-example-1.py 'secret message'./array-example-1.py 'secret message'..../samples 'secret message'./samples/sample.jpg 'secret message'./samples/sample.txt 'secret message'./samples/sample.zip 'secret message'./samples/articles 'secret message'./samples/articles/article-1.txt 'secret message'./samples/articles/article-2.txt 'secret message'...

walk 函数的接口多少有点晦涩 (也许只是对我个人而言, 我总是记不住参数的顺序). Example 1-47 中展示的 index 函数会返回一个文件名列表, 你可以直接使用 for-in 循环处理文件.

1.5.2.2. Example 1-47. 使用 os.listdir 搜索文件系统

File: os-path-walk-example-2.py

import os

def index(directory):    # like os.listdir, but traverses directory trees    stack = [directory]    files = []    while stack:        directory = stack.pop()        for file in os.listdir(directory):            fullname = os.path.join(directory, file)            files.append(fullname)            if os.path.isdir(fullname) and not os.path.islink(fullname):                stack.append(fullname)    return files

for file in index("."):    print file

./aifc-example-1.py./anydbm-example-1.py./array-example-1.py...

如果你不想列出所有的文件 (基于性能或者是内存的考虑) , Example 1-48 展示了另一种方法. 这里 DirectoryWalker 类的行为与序列对象相似, 一次返回一个文件. (generator?)

1.5.2.3. Example 1-48. 使用 DirectoryWalker 搜索文件系统

File: os-path-walk-example-3.py

import os

class DirectoryWalker:    # a forward iterator that traverses a directory tree

    def _ _init_ _(self, directory):        self.stack = [directory]        self.files = []        self.index = 0

    def _ _getitem_ _(self, index):        while 1:            try:                file = self.files[self.index]                self.index = self.index + 1            except IndexError:                # pop next directory from stack                self.directory = self.stack.pop()                self.files = os.listdir(self.directory)                self.index = 0            else:                # got a filename                fullname = os.path.join(self.directory, file)                if os.path.isdir(fullname) and not os.path.islink(fullname):                    self.stack.append(fullname)                return fullname

for file in DirectoryWalker("."):    print file

./aifc-example-1.py./anydbm-example-1.py./array-example-1.py...

注意 DirectoryWalker 类并不检查传递给 _ _getitem_ _ 方法的索引值. 这意味着如果你越界访问序列成员(索引数字过大)的话, 这个类将不能正常工作.

最后, 如果你需要处理文件大小和时间戳, Example 1-49 给出了一个类, 它返回文件名和它的 os.stat 属性(一个元组). 这个版本在每个文件上都能节省一次或两次 stat 调用( os.path.isdir 和 os.path.islink 内部都使用了 stat ), 并且在一些平台上运行很快.

1.5.2.4. Example 1-49. 使用 DirectoryStatWalker 搜索文件系统

File: os-path-walk-example-4.py

import os, stat

class DirectoryStatWalker:    # a forward iterator that traverses a directory tree, and    # returns the filename and additional file information

    def _ _init_ _(self, directory):        self.stack = [directory]        self.files = []        self.index = 0

    def _ _getitem_ _(self, index):        while 1:            try:                file = self.files[self.index]                self.index = self.index + 1            except IndexError:                # pop next directory from stack                self.directory = self.stack.pop()                self.files = os.listdir(self.directory)                self.index = 0            else:                # got a filename                fullname = os.path.join(self.directory, file)                st = os.stat(fullname)                mode = st[stat.ST_MODE]                if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):                    self.stack.append(fullname)                return fullname, st

for file, st in DirectoryStatWalker("."):    print file, st[stat.ST_SIZE]

./aifc-example-1.py 336./anydbm-example-1.py 244./array-example-1.py 526

1.6. stat 模块

Example 1-50 展示了 stat 模块的基本用法, 这个模块包含了一些 os.stat 函数中可用的常量和测试函数.

1.6.0.1. Example 1-50. Using the stat Module

File: stat-example-1.py

import statimport os, time

st = os.stat("samples/sample.txt")

print "mode", "=>", oct(stat.S_IMODE(st[stat.ST_MODE]))

print "type", "=>",if stat.S_ISDIR(st[stat.ST_MODE]):    print "DIRECTORY",if stat.S_ISREG(st[stat.ST_MODE]):    print "REGULAR",if stat.S_ISLNK(st[stat.ST_MODE]):    print "LINK",print

print "size", "=>", st[stat.ST_SIZE]

print "last accessed", "=>", time.ctime(st[stat.ST_ATIME])print "last modified", "=>", time.ctime(st[stat.ST_MTIME])print "inode changed", "=>", time.ctime(st[stat.ST_CTIME])

mode => 0664type => REGULARsize => 305last accessed => Sun Oct 10 22:12:30 1999last modified => Sun Oct 10 18:39:37 1999inode changed => Sun Oct 10 15:26:38 1999

1.7. string 模块

string 模块提供了一些用于处理字符串类型的函数, 如 Example 1-51 所示.

1.7.0.1. Example 1-51. 使用 string 模块

File: string-example-1.py

import string

text = "Monty Python's Flying Circus"

print "upper", "=>", string.upper(text)print "lower", "=>", string.lower(text)print "split", "=>", string.split(text)print "join", "=>", string.join(string.split(text), "+")print "replace", "=>", string.replace(text, "Python", "Java")print "find", "=>", string.find(text, "Python"), string.find(text, "Java")print "count", "=>", string.count(text, "n")

upper => MONTY PYTHON'S FLYING CIRCUSlower => monty python's flying circussplit => ['Monty', "Python's", 'Flying', 'Circus']join => Monty+Python's+Flying+Circusreplace => Monty Java's Flying Circusfind => 6 -1count => 3

在 Python 1.5.2 以及更早版本中, string 使用 strop 中的函数来实现模块功能.

在 Python1.6 和后继版本，更多的字符串操作都可以作为字符串方法来访问, 如 Example 1-52 所示, string 模块中的许多函数只是对相对应字符串方法的封装.

1.7.0.2. Example 1-52. 使用字符串方法替代 string 模块函数

File: string-example-2.py

text = "Monty Python's Flying Circus"

print "upper", "=>", text.upper()print "lower", "=>", text.lower()print "split", "=>", text.split()print "join", "=>", "+".join(text.split())print "replace", "=>", text.replace("Python", "Perl")print "find", "=>", text.find("Python"), text.find("Perl")print "count", "=>", text.count("n")

upper => MONTY PYTHON'S FLYING CIRCUSlower => monty python's flying circussplit => ['Monty', "Python's", 'Flying', 'Circus']join => Monty+Python's+Flying+Circusreplace => Monty Perl's Flying Circusfind => 6 -1count => 3

为了增强模块对字符的处理能力, 除了字符串方法, string 模块还包含了类型转换函数用于把字符串转换为其他类型, (如 Example 1-53 所示).

1.7.0.3. Example 1-53. 使用 string 模块将字符串转为数字

File: string-example-3.py

import string

print int("4711"),print string.atoi("4711"),print string.atoi("11147", 8), # octal 八进制print string.atoi("1267", 16), # hexadecimal 十六进制print string.atoi("3mv", 36) # whatever...

print string.atoi("4711", 0),print string.atoi("04711", 0),print string.atoi("0x4711", 0)

print float("4711"),print string.atof("1"),print string.atof("1.23e5")

4711 4711 4711 4711 47114711 2505 181934711.0 1.0 123000.0

大多数情况下 (特别是当你使用的是1.6及更高版本时) ，你可以使用 int 和 float 函数代替 string 模块中对应的函数。

atoi 函数可以接受可选的第二个参数, 指定数基(number base). 如果数基为 0, 那么函数将检查字符串的前几个字符来决定使用的数基: 如果为 "0x," 数基将为 16 (十六进制), 如果为 "0," 则数基为 8 (八进制). 默认数基值为 10 (十进制), 当你未传递参数时就使用这个值.

在 1.6 及以后版本中, int 函数和 atoi 一样可以接受第二个参数. 与字符串版本函数不一样的是 , int 和 float 可以接受 Unicode 字符串对象.

1.8. re 模块

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

<blockquote>
- Jamie Zawinski, on comp.lang.emacs
</blockquote>

re 模块提供了一系列功能强大的正则表达式 (regular expression) 工具, 它们允许你快速检查给定字符串是否与给定的模式匹配 (使用 match 函数), 或者包含这个模式 (使用 search 函数). 正则表达式是以紧凑(也很神秘)的语法写出的字符串模式.

match 尝试从字符串的起始匹配一个模式, 如 Example 1-54 所示. 如果模式匹配了某些内容 (包括空字符串, 如果模式允许的话) , 它将返回一个匹配对象. 使用它的 group 方法可以找出匹配的内容.

1.8.0.1. Example 1-54. 使用 re 模块来匹配字符串

File: re-example-1.py

import re

text = "The Attila the Hun Show"

# a single character 单个字符m = re.match(".", text)if m: print repr("."), "=>", repr(m.group(0))

# any string of characters 任何字符串m = re.match(".*", text)if m: print repr(".*"), "=>", repr(m.group(0))

# a string of letters (at least one) 只包含字母的字符串(至少一个)m = re.match("/w+", text)if m: print repr("/w+"), "=>", repr(m.group(0))

# a string of digits 只包含数字的字符串m = re.match("/d+", text)if m: print repr("/d+"), "=>", repr(m.group(0))

 '.' => 'T''.*' => 'The Attila the Hun Show''//w+' => 'The'

可以使用圆括号在模式中标记区域. 找到匹配后, group 方法可以抽取这些区域的内容，如 Example 1-55 所示. group(1) 会返回第一组的内容, group(2) 返回第二组的内容, 这样... 如果你传递多个组数给 group 函数, 它会返回一个元组.

1.8.0.2. Example 1-55. 使用 re 模块抽出匹配的子字符串

File: re-example-2.py

import re

text ="10/15/99"

m = re.match("(/d{2})/(/d{2})/(/d{2,4})", text)if m:    print m.group(1, 2, 3)

('10', '15', '99')

search 函数会在字符串内查找模式匹配, 如 Example 1-56 所示. 它在所有可能的字符位置尝试匹配模式, 从最左边开始, 一旦找到匹配就返回一个匹配对象. 如果没有找到相应的匹配, 就返回 None .

1.8.0.3. Example 1-56. 使用 re 模块搜索子字符串

File: re-example-3.py

import re

text = "Example 3: There is 1 date 10/25/95 in here!"

m = re.search("(/d{1,2})/(/d{1,2})/(/d{2,4})", text)

print m.group(1), m.group(2), m.group(3)

month, day, year = m.group(1, 2, 3)print month, day, year

date = m.group(0)print date

10 25 9510 25 9510/25/95

Example 1-57 中展示了 sub 函数, 它可以使用另个字符串替代匹配模式.

1.8.0.4. Example 1-57. 使用 re 模块替换子字符串

File: re-example-4.py

import re

text = "you're no fun anymore..."

# literal replace (string.replace is faster)# 文字替换 (string.replace 速度更快)print re.sub("fun", "entertaining", text)

# collapse all non-letter sequences to a single dash # 将所有非字母序列转换为一个"-"(dansh,破折号)print re.sub("[^/w]+", "-", text)

# convert all words to beeps # 将所有单词替换为 BEEPprint re.sub("/S+", "-BEEP-", text)

you're no entertaining anymore...you-re-no-fun-anymore--BEEP- -BEEP- -BEEP- -BEEP-

你也可以通过回调 (callback) 函数使用 sub 来替换指定模式. Example 1-58 展示了如何预编译模式.

1.8.0.5. Example 1-58. 使用 re 模块替换字符串(通过回调函数)

File: re-example-5.py

import reimport string

text = "a line of text//012another line of text//012etc..."

def octal(match):    # replace octal code with corresponding ASCII character    # 使用对应 ASCII 字符替换八进制代码    return chr(string.atoi(match.group(1), 8))

octal_pattern = re.compile(r"//(/d/d/d)")

print textprint octal_pattern.sub(octal, text)

a line of text/012another line of text/012etc...a line of textanother line of textetc...

如果你不编译, re 模块会为你缓存一个编译后版本, 所有的小脚本中, 通常不需要编译正则表达式. Python1.5.2 中, 缓存中可以容纳 20 个匹配模式, 而在 2.0 中, 缓存则可以容纳 100 个匹配模式.

最后, Example 1-59 用一个模式列表匹配一个字符串. 这些模式将会组合为一个模式, 并预编译以节省时间.

1.8.0.6. Example 1-59. 使用 re 模块匹配多个模式中的一个

File: re-example-6.py

import re, string

def combined_pattern(patterns):    p = re.compile(        string.join(map(lambda x: "("+x+")", patterns), "|")        )    def fixup(v, m=p.match, r=range(0,len(patterns))):        try:            regs = m(v).regs        except AttributeError:            return None # no match, so m.regs will fail        else:            for i in r:                if regs[i+1] != (-1, -1):                    return i    return fixup

## try it out!

patterns = [    r"/d+",    r"abc/d{2,4}",    r"p/w+"]

p = combined_pattern(patterns)

print p("129391")print p("abc800")print p("abc1600")print p("python")print p("perl")print p("tcl")

01122None

1.9. math 模块

math 模块实现了许多对浮点数的数学运算函数. 这些函数一般是对平台 C 库中同名函数的简单封装, 所以一般情况下, 不同平台下计算的结果可能稍微地有所不同, 有时候甚至有很大出入. Example 1-60 展示了如何使用 math 模块.

1.9.0.1. Example 1-60. 使用 math 模块

File: math-example-1.py

import math

print "e", "=>", math.eprint "pi", "=>", math.piprint "hypot", "=>", math.hypot(3.0, 4.0)

# and many others...

e => 2.71828182846pi => 3.14159265359hypot => 5.0

完整函数列表请参阅 Python Library Reference .

1.10. cmath 模块

Example 1-61 所展示的 cmath 模块包含了一些用于复数运算的函数.

1.10.0.1. Example 1-61. 使用 cmath 模块

File: cmath-example-1.py

import cmath

print "pi", "=>", cmath.piprint "sqrt(-1)", "=>", cmath.sqrt(-1)

pi => 3.14159265359sqrt(-1) => 1j

完整函数列表请参阅 Python Library Reference .

1.11. operator 模块

operator 模块为 Python 提供了一个 "功能性" 的标准操作符接口. 当使用 map 以及 filter 一类的函数的时候, operator 模块中的函数可以替换一些 lambda 函式. 而且这些函数在一些喜欢写晦涩代码的程序员中很流行. Example 1-62 展示了 operator 模块的一般用法.

1.11.0.1. Example 1-62. 使用 operator 模块

File: operator-example-1.py

import operator

sequence = 1, 2, 4

print "add", "=>", reduce(operator.add, sequence)print "sub", "=>", reduce(operator.sub, sequence)print "mul", "=>", reduce(operator.mul, sequence)print "concat", "=>", operator.concat("spam", "egg")print "repeat", "=>", operator.repeat("spam", 5)print "getitem", "=>", operator.getitem(sequence, 2)print "indexOf", "=>", operator.indexOf(sequence, 2)print "sequenceIncludes", "=>", operator.sequenceIncludes(sequence, 3)

add => 7sub => -5mul => 8concat => spameggrepeat => spamspamspamspamspam

getitem => 4indexOf => 1sequenceIncludes => 0

Example 1-63 展示了一些可以用于检查对象类型的 operator 函数.

1.11.0.2. Example 1-63. 使用 operator 模块检查类型

File: operator-example-2.py

import operatorimport UserList

def dump(data):    print type(data), "=>",    if operator.isCallable(data):        print "CALLABLE",    if operator.isMappingType(data):        print "MAPPING",    if operator.isNumberType(data):        print "NUMBER",    if operator.isSequenceType(data):        print "SEQUENCE",    print

dump(0)dump("string")dump("string"[0])dump([1, 2, 3])dump((1, 2, 3))dump({"a": 1})dump(len) # function 函数dump(UserList) # module 模块dump(UserList.UserList) # class 类dump(UserList.UserList()) # instance 实例

<type 'int'> => NUMBER<type 'string'> => SEQUENCE<type 'string'> => SEQUENCE<type 'list'> => SEQUENCE<type 'tuple'> => SEQUENCE<type 'dictionary'> => MAPPING<type 'builtin_function_or_method'> => CALLABLE<type 'module'> =><type 'class'> => CALLABLE<type 'instance'> => MAPPING NUMBER SEQUENCE

这里需要注意 operator 模块使用非常规的方法处理对象实例. 所以使用 isNumberType , isMappingType , 以及 isSequenceType 函数的时候要小心, 这很容易降低代码的扩展性.

同样需要注意的是一个字符串序列成员 (单个字符) 也是序列. 所以当在递归函数使用 isSequenceType 来截断对象树的时候, 别把普通字符串作为参数(或者是任何包含字符串的序列对象).

1.12. copy 模块

copy 模块包含两个函数, 用来拷贝对象, 如 Example 1-64 所示.

copy(object) => object 创建给定对象的 "浅/浅层(shallow)" 拷贝(copy). 这里 "浅/浅层(shallow)" 的意思是复制对象本身, 但当对象是一个容器 (container) 时, 它的成员仍然指向原来的成员对象.

1.12.0.1. Example 1-64. 使用 copy 模块复制对象

File: copy-example-1.py

import copy

a = [[1],[2],[3]]b = copy.copy(a)

print "before", "=>"print aprint b

# modify originala[0][0] = 0a[1] = None

print "after", "=>"print aprint b

before =>[[1], [2], [3]][[1], [2], [3]]after =>[[0], None, [3]][[0], [2], [3]]

你也可以使用[:]语句 (完整切片) 来对列表进行浅层复制, 也可以使用 copy 方法复制字典.

相反地, deepcopy(object) => object 创建一个对象的深层拷贝(deepcopy), 如 Example 1-65 所示, 当对象为一个容器时, 所有的成员都被递归地复制了。

1.12.0.2. Example 1-65. 使用 copy 模块复制集合(Collections)

File: copy-example-2.py

import copy

a = [[1],[2],[3]]b = copy.deepcopy(a)

print "before", "=>"print aprint b

# modify originala[0][0] = 0a[1] = None

print "after", "=>"print aprint b

before =>[[1], [2], [3]][[1], [2], [3]]after =>[[0], None, [3]][[1], [2], [3]]

1.13. sys 模块

sys 模块提供了许多函数和变量来处理 Python 运行时环境的不同部分.

1.13.1. 处理命令行参数

在解释器启动后, argv 列表包含了传递给脚本的所有参数, 如 Example 1-66 所示. 列表的第一个元素为脚本自身的名称.

1.13.1.1. Example 1-66. 使用sys模块获得脚本的参数

File: sys-argv-example-1.py

import sys

print "script name is", sys.argv[0]

if len(sys.argv) > 1:    print "there are", len(sys.argv)-1, "arguments:"    for arg in sys.argv[1:]:        print argelse:    print "there are no arguments!"

script name is sys-argv-example-1.pythere are no arguments!

如果是从标准输入读入脚本 (比如 "python < sys-argv-example-1.py"), 脚本的名称将被设置为空串. 如果把脚本作为字符串传递给python (使用 -c 选项), 脚本名会被设置为 "-c".

1.13.2. 处理模块

path 列表是一个由目录名构成的列表, Python 从中查找扩展模块( Python 源模块, 编译模块,或者二进制扩展). 启动 Python 时,这个列表从根据内建规则, PYTHONPATH 环境变量的内容, 以及注册表( Windows 系统)等进行初始化. 由于它只是一个普通的列表, 你可以在程序中对它进行操作, 如 Example 1-67 所示.

1.13.2.1. Example 1-67. 使用sys模块操作模块搜索路径

File: sys-path-example-1.py

import sys

print "path has", len(sys.path), "members"

# add the sample directory to the pathsys.path.insert(0, "samples")import sample

# nuke the pathsys.path = []import random # oops!

path has 7 membersthis is the sample module!Traceback (innermost last):  File "sys-path-example-1.py", line 11, in ?    import random # oops!ImportError: No module named random

builtin_module_names 列表包含 Python 解释器中所有内建模块的名称, Example 1-68 给出了它的样例代码.

1.13.2.2. Example 1-68. 使用sys模块查找内建模块

File: sys-builtin-module-names-example-1.py

import sys

def dump(module):    print module, "=>",    if module in sys.builtin_module_names:        print "<BUILTIN>"    else:        module = _ _import_ _(module)        print module._ _file_ _

dump("os")dump("sys")dump("string")dump("strop")dump("zlib")

os => C:/python/lib/os.pycsys => <BUILTIN>string => C:/python/lib/string.pycstrop => <BUILTIN>zlib => C:/python/zlib.pyd

modules 字典包含所有加载的模块. import 语句在从磁盘导入内容之前会先检查这个字典.

正如你在 Example 1-69 中所见到的, Python 在处理你的脚本之前就已经导入了很多模块.

1.13.2.3. Example 1-69. 使用sys模块查找已导入的模块

File: sys-modules-example-1.py

import sys

print sys.modules.keys()

['os.path', 'os', 'exceptions', '_ _main_ _', 'ntpath', 'strop', 'nt','sys', '_ _builtin_ _', 'site', 'signal', 'UserDict', 'string', 'stat']

1.13.3. 处理引用记数

getrefcount 函数 (如 Example 1-70 所示) 返回给定对象的引用记数 - 也就是这个对象使用次数. Python 会跟踪这个值, 当它减少为0的时候, 就销毁这个对象.

1.13.3.1. Example 1-70. 使用sys模块获得引用记数

File: sys-getrefcount-example-1.py

import sys

variable = 1234

print sys.getrefcount(0)print sys.getrefcount(variable)print sys.getrefcount(None)

503192

注意这个值总是比实际的数量大, 因为该函数本身在确定这个值的时候依赖这个对象.

== 检查主机平台===

Example 1-71 展示了 platform 变量, 它包含主机平台的名称.

1.13.3.2. Example 1-71. 使用sys模块获得当前平台

File: sys-platform-example-1.py

import sys

## emulate "import os.path" (sort of)...

if sys.platform == "win32":    import ntpath    pathmodule = ntpathelif sys.platform == "mac":    import macpath    pathmodule = macpathelse:    # assume it's a posix platform    import posixpath    pathmodule = posixpath

print pathmodule

典型的平台有Windows 9X/NT(显示为 win32 ), 以及 Macintosh(显示为 mac ) . 对于 Unix 系统而言, platform 通常来自 "uname -r" 命令的输出, 例如 irix6 , linux2 , 或者 sunos5 (Solaris).

1.13.4. 跟踪程序

setprofiler 函数允许你配置一个分析函数(profiling function). 这个函数会在每次调用某个函数或方法时被调用(明确或隐含的), 或是遇到异常的时候被调用. 让我们看看 Example 1-72 的代码.

1.13.4.1. Example 1-72. 使用sys模块配置分析函数

File: sys-setprofiler-example-1.py

import sys

def test(n):    j = 0    for i in range(n):        j = j + i    return n

def profiler(frame, event, arg):    print event, frame.f_code.co_name, frame.f_lineno, "->", arg

# profiler is activated on the next call, return, or exception# 分析函数将在下次函数调用, 返回, 或异常时激活sys.setprofile(profiler)

# profile this function call# 分析这次函数调用test(1)

# disable profiler# 禁用分析函数sys.setprofile(None)

# don't profile this call# 不会分析这次函数调用test(2)

call test 3 -> Nonereturn test 7 -> 1

基于该函数, profile 模块提供了一个完整的分析器框架.

Example 1-73 中的 settrace 函数与此类似, 但是 trace 函数会在解释器每执行到新的一行时被调用.

1.13.4.2. Example 1-73. 使用sys模块配置单步跟踪函数

File: sys-settrace-example-1.py

import sys

def test(n):    j = 0    for i in range(n):        j = j + i    return n

def tracer(frame, event, arg):    print event, frame.f_code.co_name, frame.f_lineno, "->", arg    return tracer

# tracer is activated on the next call, return, or exception# 跟踪器将在下次函数调用, 返回, 或异常时激活sys.settrace(tracer)

# trace this function call# 跟踪这次函数调用test(1)

# disable tracing# 禁用跟踪器sys.settrace(None)

# don't trace this call# 不会跟踪这次函数调用test(2)

call test 3 -> Noneline test 3 -> Noneline test 4 -> Noneline test 5 -> Noneline test 5 -> Noneline test 6 -> Noneline test 5 -> Noneline test 7 -> Nonereturn test 7 -> 1

基于该函数提供的跟踪功能, pdb 模块提供了完整的调试( debug )框架.

1.13.5. 处理标准输出/输入

stdin , stdout , 以及 stderr 变量包含与标准 I/O 流对应的流对象. 如果需要更好地控制输出,而 print 不能满足你的要求, 它们就是你所需要的. 你也可以替换它们, 这时候你就可以重定向输出和输入到其它设备( device ), 或者以非标准的方式处理它们. 如 Example 1-74 所示.

1.13.5.1. Example 1-74. 使用sys重定向输出

File: sys-stdout-example-1.py

import sysimport string

class Redirect:

    def _ _init_ _(self, stdout):        self.stdout = stdout

    def write(self, s):        self.stdout.write(string.lower(s))

# redirect standard output (including the print statement)# 重定向标准输出(包括print语句)old_stdout = sys.stdoutsys.stdout = Redirect(sys.stdout)

print "HEJA SVERIGE",print "FRISKT HUM/303/226R"

# restore standard output# 恢复标准输出sys.stdout = old_stdout

print "M/303/205/303/205/303/205/303/205L!"

heja sverige friskt hum/303/266rM/303/205/303/205/303/205/303/205L!

要重定向输出只要创建一个对象, 并实现它的 write 方法.

(除非 C 类型的实例外：Python 使用一个叫做 softspace 的整数属性来控制输出中的空白. 如果没有这个属性, Python 将把这个属性附加到这个对象上. 你不需要在使用 Python 对象时担心, 但是在重定向到一个 C 类型时, 你应该确保该类型支持 softspace 属性.)

1.13.6. 退出程序

执行至主程序的末尾时,解释器会自动退出. 但是如果需要中途退出程序, 你可以调用 sys.exit 函数, 它带有一个可选的整数参数返回给调用它的程序. Example 1-75 给出了范例.

1.13.6.1. Example 1-75. 使用sys模块退出程序

File: sys-exit-example-1.py

import sys

print "hello"

sys.exit(1)

print "there"

hello

注意 sys.exit 并不是立即退出. 而是引发一个 SystemExit 异常. 这意味着你可以在主程序中捕获对 sys.exit 的调用, 如 Example 1-76 所示.

1.13.6.2. Example 1-76. 捕获sys.exit调用

File: sys-exit-example-2.py

import sys

print "hello"

try:    sys.exit(1)except SystemExit:    pass

print "there"

hellothere

如果准备在退出前自己清理一些东西(比如删除临时文件), 你可以配置一个 "退出处理函数"(exit handler), 它将在程序退出的时候自动被调用. 如 Example 1-77 所示.

1.13.6.3. Example 1-77. 另一种捕获sys.exit调用的方法

File: sys-exitfunc-example-1.py

import sys

def exitfunc():    print "world"

sys.exitfunc = exitfunc

print "hello"sys.exit(1)print "there" # never printed # 不会被 print

helloworld

在 Python 2.0 以后, 你可以使用 atexit 模块来注册多个退出处理函数.

1.14. atexit 模块

(用于2.0版本及以上) atexit 模块允许你注册一个或多个终止函数(暂且这么叫), 这些函数将在解释器终止前被自动调用.

调用 register 函数, 便可以将函数注册为终止函数, 如 Example 1-78 所示. 你也可以添加更多的参数, 这些将作为 exit 函数的参数传递.

1.14.0.1. Example 1-78. 使用 atexit 模块

File: atexit-example-1.py

import atexit

def exit(*args):    print "exit", args

# register two exit handleratexit.register(exit)atexit.register(exit, 1)atexit.register(exit, "hello", "world")

exit ('hello', 'world')exit (1,)exit ()

该模块其实是一个对 sys.exitfunc 钩子( hook )的简单封装.

1.15. time 模块

time 模块提供了一些处理日期和一天内时间的函数. 它是建立在 C 运行时库的简单封装.

给定的日期和时间可以被表示为浮点型(从参考时间, 通常是 1970.1.1 到现在经过的秒数. 即 Unix 格式), 或者一个表示时间的 struct (类元组).

1.15.1. 获得当前时间

Example 1-79 展示了如何使用 time 模块获取当前时间.

1.15.1.1. Example 1-79. 使用 time 模块获取当前时间

File: time-example-1.py

import time

now = time.time()

print now, "seconds since", time.gmtime(0)[:6]printprint "or in other words:"print "- local time:", time.localtime(now)print "- utc:", time.gmtime(now)

937758359.77 seconds since (1970, 1, 1, 0, 0, 0)

or in other words:- local time: (1999, 9, 19, 18, 25, 59, 6, 262, 1)- utc: (1999, 9, 19, 16, 25, 59, 6, 262, 0)

localtime 和 gmtime 返回的类元组包括年, 月, 日, 时, 分, 秒, 星期, 一年的第几天, 日光标志. 其中年是一个四位数(在有千年虫问题的平台上另有规定, 但还是四位数), 星期从星期一(数字 0 代表)开始, 1月1日是一年的第一天.

1.15.2. 将时间值转换为字符串

你可以使用标准的格式化字符串把时间对象转换为字符串, 不过 time 模块已经提供了许多标准转换函数, 如 Example 1-80 所示.

1.15.2.1. Example 1-80. 使用 time 模块格式化时间输出

File: time-example-2.py

import time

now = time.localtime(time.time())

print time.asctime(now)print time.strftime("%y/%m/%d %H:%M", now)print time.strftime("%a %b %d", now)print time.strftime("%c", now)print time.strftime("%I %p", now)print time.strftime("%Y-%m-%d %H:%M:%S %Z", now)

# do it by hand...year, month, day, hour, minute, second, weekday, yearday, daylight = nowprint "%04d-%02d-%02d" % (year, month, day)print "%02d:%02d:%02d" % (hour, minute, second)print ("MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN")[weekday], yearday

Sun Oct 10 21:39:24 199999/10/10 21:39Sun Oct 10Sun Oct 10 21:39:24 199909 PM1999-10-10 21:39:24 CEST1999-10-1021:39:24SUN 283

1.15.3. 将字符串转换为时间对象

在一些平台上, time 模块包含了 strptime 函数, 它的作用与 strftime 相反. 给定一个字符串和模式, 它返回相应的时间对象, 如 Example 1-81 所示.

1.15.3.1. Example 1-81. 使用 time.strptime 函数解析时间

File: time-example-6.py

import time

# make sure we have a strptime function!# 确认有函数 strptimetry:    strptime = time.strptimeexcept AttributeError:    from strptime import strptime

print strptime("31 Nov 00", "%d %b %y")print strptime("1 Jan 70 1:30pm", "%d %b %y %I:%M%p")

只有在系统的 C 库提供了相应的函数的时候, time.strptime 函数才可以使用. 对于没有提供标准实现的平台, Example 1-82 提供了一个不完全的实现.

1.15.3.2. Example 1-82. strptime 实现

File: strptime.py

import reimport string

MONTHS = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug",          "Sep", "Oct", "Nov", "Dec"]

SPEC = {    # map formatting code to a regular expression fragment    "%a": "(?P<weekday>[a-z]+)",    "%A": "(?P<weekday>[a-z]+)",    "%b": "(?P<month>[a-z]+)",    "%B": "(?P<month>[a-z]+)",    "%C": "(?P<century>/d/d?)",    "%d": "(?P<day>/d/d?)",    "%D": "(?P<month>/d/d?)/(?P<day>/d/d?)/(?P<year>/d/d)",    "%e": "(?P<day>/d/d?)",    "%h": "(?P<month>[a-z]+)",    "%H": "(?P<hour>/d/d?)",    "%I": "(?P<hour12>/d/d?)",    "%j": "(?P<yearday>/d/d?/d?)",    "%m": "(?P<month>/d/d?)",    "%M": "(?P<minute>/d/d?)",    "%p": "(?P<ampm12>am|pm)",    "%R": "(?P<hour>/d/d?):(?P<minute>/d/d?)",    "%S": "(?P<second>/d/d?)",    "%T": "(?P<hour>/d/d?):(?P<minute>/d/d?):(?P<second>/d/d?)",    "%U": "(?P<week>/d/d)",    "%w": "(?P<weekday>/d)",    "%W": "(?P<weekday>/d/d)",    "%y": "(?P<year>/d/d)",    "%Y": "(?P<year>/d/d/d/d)",    "%%": "%"}

class TimeParser:    def _ _init_ _(self, format):        # convert strptime format string to regular expression        format = string.join(re.split("(?:/s|%t|%n)+", format))        pattern = []        try:            for spec in re.findall("%/w|%%|.", format):                if spec[0] == "%":                    spec = SPEC[spec]                pattern.append(spec)        except KeyError:            raise ValueError, "unknown specificer: %s" % spec        self.pattern = re.compile("(?i)" + string.join(pattern, ""))    def match(self, daytime):        # match time string        match = self.pattern.match(daytime)        if not match:            raise ValueError, "format mismatch"        get = match.groupdict().get        tm = [0] * 9        # extract date elements        y = get("year")        if y:            y = int(y)            if y < 68:                y = 2000 + y            elif y < 100:                y = 1900 + y            tm[0] = y        m = get("month")        if m:            if m in MONTHS:                m = MONTHS.index(m) + 1            tm[1] = int(m)        d = get("day")        if d: tm[2] = int(d)        # extract time elements        h = get("hour")        if h:            tm[3] = int(h)        else:            h = get("hour12")            if h:                h = int(h)                if string.lower(get("ampm12", "")) == "pm":                    h = h + 12                tm[3] = h        m = get("minute")        if m: tm[4] = int(m)        s = get("second")        if s: tm[5] = int(s)        # ignore weekday/yearday for now        return tuple(tm)

def strptime(string, format="%a %b %d %H:%M:%S %Y"):    return TimeParser(format).match(string)

if _ _name_ _ == "_ _main_ _":    # try it out    import time    print strptime("2000-12-20 01:02:03", "%Y-%m-%d %H:%M:%S")    print strptime(time.ctime(time.time()))

(2000, 12, 20, 1, 2, 3, 0, 0, 0)(2000, 11, 15, 12, 30, 45, 0, 0, 0)

1.15.4. 转换时间值

将时间元组转换回时间值非常简单, 至少我们谈论的当地时间 (local time) 如此. 只要把时间元组传递给 mktime 函数, 如 Example 1-83 所示.

1.15.4.1. Example 1-83. 使用 time 模块将本地时间元组转换为时间值(整数)

File: time-example-3.py

import time

t0 = time.time()tm = time.localtime(t0)

print tm

print t0print time.mktime(tm)

(1999, 9, 9, 0, 11, 8, 3, 252, 1)936828668.16936828668.0

但是, 1.5.2 版本的标准库没有提供能将 UTC 时间 (Universal Time, Coordinated: 特林威治标准时间)转换为时间值的函数 ( Python 和对应底层 C 库都没有提供). Example 1-84 提供了该函数的一个 Python 实现, 称为 timegm .

1.15.4.2. Example 1-84. 将 UTC 时间元组转换为时间值(整数)

File: time-example-4.py

import time

def _d(y, m, d, days=(0,31,59,90,120,151,181,212,243,273,304,334,365)):    # map a date to the number of days from a reference point    return (((y - 1901)*1461)/4 + days[m-1] + d +        ((m > 2 and not y % 4 and (y % 100 or not y % 400)) and 1))

def timegm(tm, epoch=_d(1970,1,1)):    year, month, day, h, m, s = tm[:6]    assert year >= 1970    assert 1 <= month <= 12    return (_d(year, month, day) - epoch)*86400 + h*3600 + m*60 + s

t0 = time.time()tm = time.gmtime(t0)

print tm

print t0print timegm(tm)

(1999, 9, 8, 22, 12, 12, 2, 251, 0)936828732.48936828732

从 1.6 版本开始, calendar 模块提供了一个类似的函数 calendar.timegm .

1.15.5. Timing 相关

time 模块可以计算 Python 程序的执行时间, 如 Example 1-85 所示. 你可以测量 "wall time" (real world time), 或是"进程时间" (消耗的 CPU 时间).

1.15.5.1. Example 1-85. 使用 time 模块评价算法

File: time-example-5.py

import time

def procedure():    time.sleep(2.5)

# measure process timet0 = time.clock()procedure()print time.clock() - t0, "seconds process time"

# measure wall timet0 = time.time()procedure()print time.time() - t0, "seconds wall time"

0.0 seconds process time2.50903499126 seconds wall time

并不是所有的系统都能测量真实的进程时间. 一些系统中(包括 Windows ), clock 函数通常测量从程序启动到测量时的 wall time.

进程时间的精度受限制. 在一些系统中, 它超过 30 分钟后进程会被清理. (原文: On many systems, it wraps around after just over 30 minutes.)

另参见 timing 模块( Windows 下的朋友不用忙活了,没有地~), 它可以测量两个事件之间的 wall time.

1.16. types 模块

types 模块包含了标准解释器定义的所有类型的类型对象, 如 Example 1-86 所示. 同一类型的所有对象共享一个类型对象. 你可以使用 is 来检查一个对象是不是属于某个给定类型.

1.16.0.1. Example 1-86. 使用 types 模块

File: types-example-1.py

import types

def check(object):    print object,

    if type(object) is types.IntType:        print "INTEGER",    if type(object) is types.FloatType:        print "FLOAT",    if type(object) is types.StringType:        print "STRING",    if type(object) is types.ClassType:        print "CLASS",    if type(object) is types.InstanceType:        print "INSTANCE",    print

check(0)check(0.0)check("0")

class A:    pass

class B:    pass

check(A)check(B)

a = A()b = B()

check(a)check(b)

0 INTEGER0.0 FLOAT0 STRINGA CLASSB CLASS<A instance at 796960> INSTANCE<B instance at 796990> INSTANCE

注意所有的类都具有相同的类型, 所有的实例也是一样. 要测试一个类或者实例所属的类, 可以使用内建的 issubclass 和 isinstance 函数.

types 模块在第一次引入的时候会破坏当前的异常状态. 也就是说, 不要在异常处理语句块中导入该模块 (或其他会导入它的模块) .

1.17. gc 模块

(可选, 2.0 及以后版本) gc 模块提供了到内建循环垃圾收集器的接口.

Python 使用引用记数来跟踪什么时候销毁一个对象; 一个对象的最后一个引用一旦消失, 这个对象就会被销毁.

从 2.0 版开始, Python 还提供了一个循环垃圾收集器, 它每隔一段时间执行. 这个收集器查找指向自身的数据结构, 并尝试破坏循环. 如 Example 1-87 所示.

你可以使用 gc.collect 函数来强制完整收集. 这个函数将返回收集器销毁的对象的数量.

1.17.0.1. Example 1-87. 使用 gc 模块收集循环引用垃圾

File: gc-example-1.py

import gc

# create a simple object that links to itselfclass Node:

    def _ _init_ _(self, name):        self.name = name        self.parent = None        self.children = []

    def addchild(self, node):        node.parent = self        self.children.append(node)

    def _ _repr_ _(self):        return "<Node %s at %x>" % (repr(self.name), id(self))

# set up a self-referencing structureroot = Node("monty")

root.addchild(Node("eric"))root.addchild(Node("john"))root.addchild(Node("michael"))

# remove our only referencedel root

print gc.collect(), "unreachable objects"print gc.collect(), "unreachable objects"

12 unreachable objects0 unreachable objects

如果你确定你的程序不会创建自引用的数据结构, 你可以使用 gc.disable 函数禁用垃圾收集, 调用这个函数以后, Python 的工作方式将与 1.5.2 或更早的版本相同.

2. 更多标准模块

"Now, imagine that your friend kept complaining that she didn't want to visit you since she found it too hard to climb up the drain pipe, and you kept telling her to use the friggin' stairs like everyone else..."

<blockquote>
- eff-bot, June 1998
</blockquote>

2.1. 概览

本章叙述了许多在 Python 程序中广泛使用的模块. 当然, 在大型的 Python 程序中不使用这些模块也是可以的, 但如果使用会节省你不少时间.

2.1.1. 文件与流

fileinput 模块可以让你更简单地向不同的文件写入内容. 该模块提供了一个简单的封装类, 一个简单的 for-in 语句就可以循环得到一个或多个文本文件的内容.

StringIO 模块 (以及 cStringIO 模块, 作为一个的变种) 实现了一个工作在内存的文件对象. 你可以在很多地方用 StringIO 对象替换普通的文件对象.

2.1.2. 类型封装

UserDict , UserList , 以及 UserString 是对应内建类型的顶层简单封装. 和内建类型不同的是, 这些封装是可以被继承的. 这在你需要一个和内建类型行为相似但由额外新方法的类的时候很有用.

2.1.3. 随机数字

random 模块提供了一些不同的随机数字生成器. whrandom 模块与此相似, 但允许你创建多个生成器对象.

[!Feather 注: whrandom 在版本 2.1 时声明不支持. 请使用 random 替代.]

2.1.4. 加密算法

md5 和 sha 模块用于计算密写的信息标记( cryptographically strong message signatures , 所谓的 "message digests", 信息摘要).

crypt 模块实现了 DES 样式的单向加密. 该模块只在 Unix 系统下可用.

rotor 模块提供了简单的双向加密. 版本 2.4 以后的朋友可以不用忙活了.

[!Feather 注: 它在版本 2.3 时申明不支持, 因为它的加密运算不安全.]

2.2. fileinput 模块

fileinput 模块允许你循环一个或多个文本文件的内容, 如 Example 2-1 所示.

2.2.0.1. Example 2-1. 使用 fileinput 模块循环一个文本文件

File: fileinput-example-1.py

import fileinputimport sys

for line in fileinput.input("samples/sample.txt"):    sys.stdout.write("-> ")    sys.stdout.write(line)

-> We will perhaps eventually be writing only small-> modules which are identified by name as they are-> used to build larger ones, so that devices like-> indentation, rather than delimiters, might become-> feasible for expressing local structure in the-> source language.->      -- Donald E. Knuth, December 1974

你也可以使用 fileinput 模块获得当前行的元信息 (meta information). 其中包括 isfirstline , filename , lineno , 如 Example 2-2 所示.

2.2.0.2. Example 2-2. 使用 fileinput 模块处理多个文本文件

File: fileinput-example-2.py

import fileinputimport globimport string, sys

for line in fileinput.input(glob.glob("samples/*.txt")):    if fileinput.isfirstline(): # first in a file?        sys.stderr.write("-- reading %s --/n" % fileinput.filename())    sys.stdout.write(str(fileinput.lineno()) + " " + string.upper(line))

-- reading samples/sample.txt --1 WE WILL PERHAPS EVENTUALLY BE WRITING ONLY SMALL2 MODULES WHICH ARE IDENTIFIED BY NAME AS THEY ARE3 USED TO BUILD LARGER ONES, SO THAT DEVICES LIKE4 INDENTATION, RATHER THAN DELIMITERS, MIGHT BECOME5 FEASIBLE FOR EXPRESSING LOCAL STRUCTURE IN THE6 SOURCE LANGUAGE.7    -- DONALD E. KNUTH, DECEMBER 1974

文本文件的替换操作很简单. 只需要把 inplace 关键字参数设置为 1 , 传递给 input 函数, 该模块会帮你做好一切. Example 2-3 展示了这些.

2.2.0.3. Example 2-3. 使用 fileinput 模块将 CRLF 改为 LF

File: fileinput-example-3.py

import fileinput, sys

for line in fileinput.input(inplace=1):    # convert Windows/DOS text files to Unix files    if line[-2:] == "/r/n":        line = line[:-2] + "/n"    sys.stdout.write(line)

2.3. shutil 模块

shutil 实用模块包含了一些用于复制文件和文件夹的函数. Example 2-4 中使用的 copy 函数使用和 Unix 下 cp 命令基本相同的方式复制一个文件.

2.3.0.1. Example 2-4. 使用 shutil 复制文件

File: shutil-example-1.py

import shutilimport os

for file in os.listdir("."):    if os.path.splitext(file)[1] == ".py":        print file        shutil.copy(file, os.path.join("backup", file))

aifc-example-1.pyanydbm-example-1.pyarray-example-1.py...

copytree 函数用于复制整个目录树 (与 cp -r 相同), 而 rmtree 函数用于删除整个目录树 (与 rm -r ). 如 Example 2-5 所示.

2.3.0.2. Example 2-5. 使用 shutil 模块复制/删除目录树

File: shutil-example-2.py

import shutilimport os

SOURCE = "samples"BACKUP = "samples-bak"

# create a backup directoryshutil.copytree(SOURCE, BACKUP)

print os.listdir(BACKUP)

# remove itshutil.rmtree(BACKUP)

print os.listdir(BACKUP)

['sample.wav', 'sample.jpg', 'sample.au', 'sample.msg', 'sample.tgz',...Traceback (most recent call last): File "shutil-example-2.py", line 17, in ?   print os.listdir(BACKUP)os.error: No such file or directory

2.4. tempfile 模块

Example 2-6 中展示的 tempfile 模块允许你快速地创建名称唯一的临时文件供使用.

2.4.0.1. Example 2-6. 使用 tempfile 模块创建临时文件

File: tempfile-example-1.py

import tempfileimport os

tempfile = tempfile.mktemp()

print "tempfile", "=>", tempfile

file = open(tempfile, "w+b")file.write("*" * 1000)file.seek(0)print len(file.read()), "bytes"file.close()

try:    # must remove file when done    os.remove(tempfile)except OSError:    pass

tempfile => C:/TEMP/~160-11000 bytes

TemporaryFile 函数会自动挑选合适的文件名, 并打开文件, 如 Example 2-7 所示. 而且它会确保该文件在关闭的时候会被删除. (在 Unix 下, 你可以删除一个已打开的文件, 这时文件关闭时它会被自动删除. 在其他平台上, 这通过一个特殊的封装类实现.)

2.4.0.2. Example 2-7. 使用 tempfile 模块打开临时文件

File: tempfile-example-2.py

import tempfile

file = tempfile.TemporaryFile()

for i in range(100):    file.write("*" * 100)

file.close() # removes the file!

2.5. StringIO 模块

Example 2-8 展示了 StringIO 模块的使用. 它实现了一个工作在内存的文件对象 (内存文件). 在大多需要标准文件对象的地方都可以使用它来替换.

2.5.0.1. Example 2-8. 使用 StringIO 模块从内存文件读入内容

File: stringio-example-1.py

import StringIO

MESSAGE = "That man is depriving a village somewhere of a computer scientist."

file = StringIO.StringIO(MESSAGE)

print file.read()

That man is depriving a village somewhere of a computer scientist.

StringIO 类实现了内建文件对象的所有方法, 此外还有 getvalue 方法用来返回它内部的字符串值. Example 2-9 展示了这个方法.

2.5.0.2. Example 2-9. 使用 StringIO 模块向内存文件写入内容

File: stringio-example-2.py

import StringIO

file = StringIO.StringIO()file.write("This man is no ordinary man. ")file.write("This is Mr. F. G. Superman.")

print file.getvalue()

This man is no ordinary man. This is Mr. F. G. Superman.

StringIO 可以用于重新定向 Python 解释器的输出, 如 Example 2-10 所示.

2.5.0.3. Example 2-10. 使用 StringIO 模块捕获输出

File: stringio-example-3.py

import StringIOimport string, sys

stdout = sys.stdout

sys.stdout = file = StringIO.StringIO()

print """According to Gbaya folktales, trickery and guileare the best ways to defeat the python, king ofsnakes, which was hatched from a dragon at theworld's start. -- National Geographic, May 1997"""

sys.stdout = stdout

print string.upper(file.getvalue())

ACCORDING TO GBAYA FOLKTALES, TRICKERY AND GUILEARE THE BEST WAYS TO DEFEAT THE PYTHON, KING OFSNAKES, WHICH WAS HATCHED FROM A DRAGON AT THEWORLD'S START. -- NATIONAL GEOGRAPHIC, MAY 1997

2.6. cStringIO 模块

cStringIO 是一个可选的模块, 是 StringIO 的更快速实现. 它的工作方式和 StringIO 基本相同, 但是它不可以被继承. Example 2-11 展示了 cStringIO 的用法, 另参考前一节.

2.6.0.1. Example 2-11. 使用 cStringIO 模块

File: cstringio-example-1.py

import cStringIO

MESSAGE = "That man is depriving a village somewhere of a computer scientist."

file = cStringIO.StringIO(MESSAGE)

print file.read()

That man is depriving a village somewhere of a computer scientist.

为了让你的代码尽可能快, 但同时保证兼容低版本的 Python ,你可以使用一个小技巧在 cStringIO 不可用时启用 StringIO 模块, 如 Example 2-12 所示.

2.6.0.2. Example 2-12. 后退至 StringIO

File: cstringio-example-2.py

try:    import cStringIO    StringIO = cStringIOexcept ImportError:    import StringIO

print StringIO

<module  'StringIO' (built-in)>

2.7. mmap 模块

(2.0 新增) mmap 模块提供了操作系统内存映射函数的接口, 如 Example 2-13 所示. 映射区域的行为和字符串对象类似, 但数据是直接从文件读取的.

2.7.0.1. Example 2-13. 使用 mmap 模块

File: mmap-example-1.py

import mmapimport os

filename = "samples/sample.txt"

file = open(filename, "r+")size = os.path.getsize(filename)

data = mmap.mmap(file.fileno(), size)

# basicsprint dataprint len(data), size

# use slicing to read from the file# 使用切片操作读取文件print repr(data[:10]), repr(data[:10])

# or use the standard file interface# 或使用标准的文件接口print repr(data.read(10)), repr(data.read(10))

<mmap object at 008A2A10>302 302'We will pe' 'We will pe''We will pe' 'rhaps even'

在 Windows 下, 这个文件必须以既可读又可写的模式打开( `r+` , `w+` , 或 `a+` ), 否则 mmap 调用会失败.

[!Feather 注: 经本人测试, a+ 模式是完全可以的, 原文只有 r+ 和 w+]

Example 2-14 展示了内存映射区域的使用, 在很多地方它都可以替换普通字符串使用, 包括正则表达式和其他字符串操作.

2.7.0.2. Example 2-14. 对映射区域使用字符串方法和正则表达式

File: mmap-example-2.py

import mmapimport os, string, re

def mapfile(filename):    file = open(filename, "r+")    size = os.path.getsize(filename)    return mmap.mmap(file.fileno(), size)

data = mapfile("samples/sample.txt")

# searchindex = data.find("small")print index, repr(data[index-5:index+15])

# regular expressions work too!m = re.search("small", data)print m.start(), m.group()

43 'only small/015/012modules '43 small

2.8. UserDict 模块

UserDict 模块包含了一个可继承的字典类 (事实上是对内建字典类型的 Python 封装).

Example 2-15 展示了一个增强的字典类, 允许对字典使用 "加/+" 操作并提供了接受关键字参数的构造函数.

2.8.0.1. Example 2-15. 使用 UserDict 模块

File: userdict-example-1.py

import UserDict

class FancyDict(UserDict.UserDict):

    def _ _init_ _(self, data = {}, **kw):        UserDict.UserDict._ _init_ _(self)        self.update(data)        self.update(kw)

    def _ _add_ _(self, other):        dict = FancyDict(self.data)        dict.update(b)        return dict

a = FancyDict(a = 1)b = FancyDict(b = 2)

print a + b

{'b': 2, 'a': 1}

2.9. UserList 模块

UserList 模块包含了一个可继承的列表类 (事实上是对内建列表类型的 Python 封装).

在 Example 2-16 中, AutoList 实例类似一个普通的列表对象, 但它允许你通过赋值为列表添加项目.

2.9.0.1. Example 2-16. 使用 UserList 模块

File: userlist-example-1.py

import UserList

class AutoList(UserList.UserList):

    def _ _setitem_ _(self, i, item):        if i == len(self.data):            self.data.append(item)        else:            self.data[i] = item

list = AutoList()

for i in range(10):    list[i] = i

print list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

2.10. UserString 模块

(2.0 新增) UserString 模块包含两个类, UserString 和 MutableString . 前者是对标准字符串类型的封装, 后者是一个变种, 允许你修改特定位置的字符(联想下列表就知道了).

注意 MutableString 并不是效率很好, 许多操作是通过切片和字符串连接实现的. 如果性能很对你的脚本来说重要的话, 你最好使用字符串片断的列表或者 array 模块. Example 2-17 展示了 UserString 模块.

2.10.0.1. Example 2-17. 使用 UserString 模块

File: userstring-example-1.py

import UserString

class MyString(UserString.MutableString):

    def append(self, s):        self.data = self.data + s

    def insert(self, index, s):        self.data = self.data[index:] + s + self.data[index:]

    def remove(self, s):        self.data = self.data.replace(s, "")

file = open("samples/book.txt")text = file.read()file.close()

book = MyString(text)

for bird in ["gannet", "robin", "nuthatch"]:    book.remove(bird)

print book

...C: The one without the !P: The one without the -!!! They've ALL got the !! It's aStandard British Bird, the , it's in all the books!!!...

2.11. traceback 模块

Example 2-18 展示了 traceback 模块允许你在程序里打印异常的跟踪返回 (Traceback)信息, 类似未捕获异常时解释器所做的. 如 Example 2-18 所示.

2.11.0.1. Example 2-18. 使用 traceback 模块打印跟踪返回信息

File: traceback-example-1.py

# note! importing the traceback module messes up the# exception state, so you better do that here and not# in the exception handler# 注意! 导入 traceback 会清理掉异常状态, 所以# 最好别在异常处理代码中导入该模块import traceback

try:    raise SyntaxError, "example"except:    traceback.print_exc()

Traceback (innermost last):  File "traceback-example-1.py", line 7, in ?SyntaxError: example

Example 2-19 使用 StringIO 模块将跟踪返回信息放在字符串中.

2.11.0.2. Example 2-19. 使用 traceback 模块将跟踪返回信息复制到字符串

File: traceback-example-2.py

import tracebackimport StringIO

try:    raise IOError, "an i/o error occurred"except:    fp = StringIO.StringIO()    traceback.print_exc(file=fp)    message = fp.getvalue()

    print "failure! the error was:", repr(message)

failure! the error was: 'Traceback (innermost last):/012  File"traceback-example-2.py", line 5, in ?/012IOError: an i/o erroroccurred/012'

你可以使用 extract_tb 函数格式化跟踪返回信息, 得到包含错误信息的列表, 如 Example 2-20 所示.

2.11.0.3. Example 2-20. 使用 traceback Module 模块编码 Traceback 对象

File: traceback-example-3.py

import tracebackimport sys

def function():    raise IOError, "an i/o error occurred"

try:    function()except:    info = sys.exc_info()    for file, lineno, function, text in traceback.extract_tb(info[2]):        print file, "line", lineno, "in", function        print "=>", repr(text)    print "** %s: %s" % info[:2]

traceback-example-3.py line 8 in ?=> 'function()'traceback-example-3.py line 5 in function=> 'raise IOError, "an i/o error occurred"'** exceptions.IOError: an i/o error occurred

2.12. errno 模块

errno 模块定义了许多的符号错误码, 比如 ENOENT ("没有该目录入口") 以及 EPERM ("权限被拒绝"). 它还提供了一个映射到对应平台数字错误代码的字典. Example 2-21 展示了如何使用 errno 模块.

在大多情况下, IOError 异常会提供一个二元元组, 包含对应数值错误代码和一个说明字符串. 如果你需要区分不同的错误代码, 那么最好在可能的地方使用符号名称.

2.12.0.1. Example 2-21. 使用 errno 模块

File: errno-example-1.py

import errno

try:    fp = open("no.such.file")except IOError, (error, message):    if error == errno.ENOENT:        print "no such file"    elif error == errno.EPERM:        print "permission denied"    else:        print message

no such file

Example 2-22 绕了些无用的弯子, 不过它很好地说明了如何使用 errorcode 字典把数字错误码映射到符号名称( symbolic name ).

2.12.0.2. Example 2-22. 使用 errorcode 字典

File: errno-example-2.py

import errno

try:    fp = open("no.such.file")except IOError, (error, message):    print error, repr(message)    print errno.errorcode[error]

# 2 'No such file or directory'# ENOENT

2.13. getopt 模块

getopt 模块包含用于抽出命令行选项和参数的函数, 它可以处理多种格式的选项. 如 Example 2-23 所示.

其中第 2 个参数指定了允许的可缩写的选项. 选项名后的冒号(:) 意味这这个选项必须有额外的参数.

2.13.0.1. Example 2-23. 使用 getopt 模块

File: getopt-example-1.py

import getoptimport sys

# simulate command-line invocation# 模仿命令行参数sys.argv = ["myscript.py", "-l", "-d", "directory", "filename"]

# process options# 处理选项opts, args = getopt.getopt(sys.argv[1:], "ld:")

long = 0directory = None

for o, v in opts:    if o == "-l":        long = 1    elif o == "-d":        directory = v

print "long", "=", longprint "directory", "=", directoryprint "arguments", "=", args

long = 1directory = directoryarguments = ['filename']

为了让 getopt 查找长的选项, 如 Example 2-24 所示, 传递一个描述选项的列表做为第 3 个参数. 如果一个选项名称以等号(=) 结尾, 那么它必须有一个附加参数.

2.13.0.2. Example 2-24. 使用 getopt 模块处理长选项

File: getopt-example-2.py

import getoptimport sys

# simulate command-line invocation# 模仿命令行参数sys.argv = ["myscript.py", "--echo", "--printer", "lp01", "message"]

opts, args = getopt.getopt(sys.argv[1:], "ep:", ["echo", "printer="])

# process options# 处理选项echo = 0printer = None

for o, v in opts:    if o in ("-e", "--echo"):        echo = 1    elif o in ("-p", "--printer"):        printer = v

print "echo", "=", echoprint "printer", "=", printerprint "arguments", "=", args

echo = 1printer = lp01arguments = ['message']

[!Feather 注: 我不知道大家明白没, 可以自己试下:myscript.py -e -p lp01 messagemyscript.py --echo --printer=lp01 message]

2.14. getpass 模块

getpass 模块提供了平台无关的在命令行下输入密码的方法. 如 Example 2-25 所示.

getpass(prompt) 会显示提示字符串, 关闭键盘的屏幕反馈, 然后读取密码. 如果提示参数省略, 那么它将打印出 "Password:".

getuser() 获得当前用户名, 如果可能的话.

2.14.0.1. Example 2-25. 使用 getpass 模块

File: getpass-example-1.py

import getpass

usr = getpass.getuser()

pwd = getpass.getpass("enter password for user %s: " % usr)

print usr, pwd

enter password for user mulder:mulder trustno1

2.15. glob 模块

glob 根据给定模式生成满足该模式的文件名列表, 和 Unix shell 相同.

这里的模式和正则表达式类似, 但更简单. 星号(*) 匹配零个或更多个字符, 问号(?) 匹配单个字符. 你也可以使用方括号来指定字符范围, 例如 [0-9] 代表一个数字. 其他所有字符都代表它们本身.

glob(pattern) 返回满足给定模式的所有文件的列表. Example 2-26 展示了它的用法.

2.15.0.1. Example 2-26. 使用 glob 模块

File: glob-example-1.py

import glob

for file in glob.glob("samples/*.jpg"):    print file

samples/sample.jpg

注意这里的 glob 返回完整路径名, 这点和 os.listdir 函数不同. glob 事实上使用了 fnmatch 模块来完成模式匹配.

2.16. fnmatch 模块

fnmatch 模块使用模式来匹配文件名. 如 Example 2-27 所示.

模式语法和 Unix shell 中所使用的相同. 星号(*) 匹配零个或更多个字符, 问号(?) 匹配单个字符. 你也可以使用方括号来指定字符范围, 例如 [0-9] 代表一个数字. 其他所有字符都匹配它们本身.

2.16.0.1. Example 2-27. 使用 fnmatch 模块匹配文件

File: fnmatch-example-1.py

import fnmatchimport os

for file in os.listdir("samples"):    if fnmatch.fnmatch(file, "*.jpg"):        print file

sample.jpg

Example 2-28 中的 translate 函数可以将一个文件匹配模式转换为正则表达式.

2.16.0.2. Example 2-28. 使用 fnmatch 模块将模式转换为正则表达式

File: fnmatch-example-2.py

import fnmatchimport os, re

pattern = fnmatch.translate("*.jpg")

for file in os.listdir("samples"):    if re.match(pattern, file):        print file

print "(pattern was %s)" % pattern

sample.jpg(pattern was .*/.jpg$)

glob 和 find 模块在内部使用 fnmatch 模块来实现.

2.17. random 模块

"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."

<blockquote>
- John von Neumann, 1951
</blockquote>

random 模块包含许多随机数生成器.

基本随机数生成器(基于 Wichmann 和 Hill , 1982 的数学运算理论) 可以通过很多方法访问, 如 Example 2-29 所示.

2.17.0.1. Example 2-29. 使用 random 模块获得随机数字

File: random-example-1.py

import random

for i in range(5):

    # random float: 0.0 <= number < 1.0    print random.random(),

    # random float: 10 <= number < 20    print random.uniform(10, 20),

    # random integer: 100 <= number <= 1000    print random.randint(100, 1000),

    # random integer: even numbers in 100 <= number < 1000    print random.randrange(100, 1000, 2)

0.946842713956 19.5910069381 709 1720.573613195398 16.2758417025 407 1200.363241598013 16.8079747714 916 5800.602115173978 18.386796935 531 7740.526767588533 18.0783794596 223 344

注意这里的 randint 函数可以返回上界, 而其他函数总是返回小于上界的值. 所有函数都有可能返回下界值.

Example 2-30 展示了 choice 函数, 它用来从一个序列里分拣出一个随机项目. 它可以用于列表, 元组, 以及其他序列(当然, 非空的).

2.17.0.2. Example 2-30. 使用 random 模块从序列取出随机项

File: random-example-2.py

import random

# random choice from a listfor i in range(5):    print random.choice([1, 2, 3, 5, 9])

23191

在 2.0 及以后版本, shuffle 函数可以用于打乱一个列表的内容 (也就是生成一个该列表的随机全排列). Example 2-31 展示了如何在旧版本中实现该函数.

2.17.0.3. Example 2-31. 使用 random 模块打乱一副牌

File: random-example-4.py

import random

try:    # available in 2.0 and later    shuffle = random.shuffleexcept AttributeError:    def shuffle(x):        for i in xrange(len(x)-1, 0, -1):            # pick an element in x[:i+1] with which to exchange x[i]            j = int(random.random() * (i+1))            x[i], x[j] = x[j], x[i]

cards = range(52)

shuffle(cards)

myhand = cards[:5]

print myhand

[4, 8, 40, 12, 30]

random 模块也包含了非恒定分布的随机生成器函数. Example 2-32 使用了 gauss (高斯)函数来生成满足高斯分的布随机数字.

2.17.0.4. Example 2-32. 使用 random 模块生成高斯分布随机数

File: random-example-3.py

import random

histogram = [0] * 20

# calculate histogram for gaussian# noise, using average=5, stddev=1for i in range(1000):    i = int(random.gauss(5, 1) * 2)    histogram[i] = histogram[i] + 1

# print the histogramm = max(histogram)for v in histogram:    print "*" * (v * 50 / m)

*************************************************************************************************************************************************************************************************************************************************************

你可以在 Python Library Reference 找到更多关于非恒定分布随机生成器函数的信息.

标准库中提供的随机数生成器都是伪随机数生成器. 不过这对于很多目的来说已经足够了, 比如模拟, 数值分析, 以及游戏. 可以确定的是它不适合密码学用途.

2.18. whrandom 模块

这个模块早在 2.1 就被声明不赞成, 早废了. 请使用 random 代替.
- Feather

Example 2-33 展示了 whrandom , 它提供了一个伪随机数生成器. (基于 Wichmann 和 Hill, 1982 的数学运算理论). 除非你需要不共享状态的多个生成器(如多线程程序), 请使用 random 模块代替.

2.18.0.1. Example 2-33. 使用 whrandom 模块

File: whrandom-example-1.py

import whrandom

# same as randomprint whrandom.random()print whrandom.choice([1, 2, 3, 5, 9])print whrandom.uniform(10, 20)print whrandom.randint(100, 1000)

0.113412062346116.8778954689799

Example 2-34 展示了如何使用 whrandom 类实例创建多个生成器.

2.18.0.2. Example 2-34. 使用 whrandom 模块创建多个随机生成器

File: whrandom-example-2.py

import whrandom

# initialize all generators with the same seedrand1 = whrandom.whrandom(4,7,11)rand2 = whrandom.whrandom(4,7,11)rand3 = whrandom.whrandom(4,7,11)

for i in range(5):    print rand1.random(), rand2.random(), rand3.random()

0.123993532536 0.123993532536 0.1239935325360.180951499518 0.180951499518 0.1809514995180.291924111809 0.291924111809 0.2919241118090.952048889363 0.952048889363 0.9520488893630.969794283643 0.969794283643 0.969794283643

2.19. md5 模块

md5 (Message-Digest Algorithm 5)模块用于计算信息密文(信息摘要).

md5 算法计算一个强壮的128位密文. 这意味着如果两个字符串是不同的, 那么有极高可能它们的 md5 也不同. 也就是说, 给定一个 md5 密文, 那么几乎没有可能再找到另个字符串的密文与此相同. Example 2-35 展示了如何使用 md5 模块.

2.19.0.1. Example 2-35. 使用 md5 模块

File: md5-example-1.py

import md5

hash = md5.new()hash.update("spam, spam, and eggs")

print repr(hash.digest())

 'L/005J/243/266/355/243u`/305r/203/267/020F/303'

注意这里的校验和是一个二进制字符串. Example 2-36 展示了如何获得一个十六进制或 base64 编码的字符串.

2.19.0.2. Example 2-36. 使用 md5 模块获得十六进制或 base64 编码的 md5 值

File: md5-example-2.py

import md5import stringimport base64

hash = md5.new()hash.update("spam, spam, and eggs")

value = hash.digest()print hash.hexdigest()# before 2.0, the above can be written as# 在 2.0 前, 以上应该写做:# print string.join(map(lambda v: "%02x" % ord(v), value), "")

print base64.encodestring(value)

4c054aa3b6eda37560c57283b71046c3TAVKo7bto3VgxXKDtxBGww==

Example 2-37 展示了如何使用 md5 校验和来处理口令的发送与应答的验证(不过我们将稍候讨论这里使用随机数字所带来的问题).

2.19.0.3. Example 2-37. 使用 md5 模块来处理口令的发送与应答的验证

File: md5-example-3.py

import md5import string, random

def getchallenge():    # generate a 16-byte long random string.  (note that the built-    # in pseudo-random generator uses a 24-bit seed, so this is not    # as good as it may seem...)    # 生成一个 16 字节长的随机字符串. 注意内建的伪随机生成器    # 使用的是 24 位的种子(seed), 所以这里这样用并不好..    challenge = map(lambda i: chr(random.randint(0, 255)), range(16))    return string.join(challenge, "")

def getresponse(password, challenge):    # calculate combined digest for password and challenge    # 计算密码和质询(challenge)的联合密文    m = md5.new()    m.update(password)    m.update(challenge)    return m.digest()

## server/client communication# 服务器/客户端通讯

# 1. client connects.  server issues challenge.# 1. 客户端连接, 服务器发布质询(challenge)

print "client:", "connect"

challenge = getchallenge()

print "server:", repr(challenge)

# 2. client combines password and challenge, and calculates# the response.# 2. 客户端计算密码和质询(challenge)的组合后的密文

client_response = getresponse("trustno1", challenge)

print "client:", repr(client_response)

# 3. server does the same, and compares the result with the# client response.  the result is a safe login in which the# password is never sent across the communication channel.# 3. 服务器做同样的事, 然后比较结果与客户端的返回, # 判断是否允许用户登陆. 这样做密码没有在通讯中明文传输.

server_response = getresponse("trustno1", challenge)

if server_response == client_response:    print "server:", "login ok"

client: connectserver: '/334/352/227Z#/272/273/212KG/330/265/032>/311o'client: "l'/305/240-x/245/237/035/225A/254/233/337/225/001"server: login ok

Example 2-38 提供了 md5 的一个变种, 你可以通过标记信息来判断它是否在网络传输过程中被修改(丢失).

2.19.0.4. Example 2-38. 使用 md5 模块检查数据完整性

File: md5-example-4.py

import md5import array

class HMAC_MD5:    # keyed md5 message authentication

    def _ _init_ _(self, key):        if len(key) > 64:            key = md5.new(key).digest()        ipad = array.array("B", [0x36] * 64)        opad = array.array("B", [0x5C] * 64)        for i in range(len(key)):            ipad[i] = ipad[i] ^ ord(key[i])            opad[i] = opad[i] ^ ord(key[i])        self.ipad = md5.md5(ipad.tostring())        self.opad = md5.md5(opad.tostring())

    def digest(self, data):        ipad = self.ipad.copy()        opad = self.opad.copy()        ipad.update(data)        opad.update(ipad.digest())        return opad.digest()

## simulate server end# 模拟服务器端

key = "this should be a well-kept secret"message = open("samples/sample.txt").read()

signature = HMAC_MD5(key).digest(message)

# (send message and signature across a public network)# (经过由网络发送信息和签名)

## simulate client end#模拟客户端

key = "this should be a well-kept secret"

client_signature = HMAC_MD5(key).digest(message)

if client_signature == signature:    print "this is the original message:"    print    print messageelse:    print "someone has modified the message!!!"

copy 方法会对这个内部对象状态做一个快照( snapshot ). 这允许你预先计算部分密文摘要(例如 Example 2-38 中的 padded key).

该算法的细节请参阅 HMAC-MD5:Keyed-MD5 for Message Authentication ( http://www.research.ibm.com/security/draft-ietf-ipsec-hmac-md5-00.txt ) by Krawczyk, 或其他.

千万别忘记内建的伪随机生成器对于加密操作而言并不合适. 千万小心.

2.20. sha 模块

sha 模块提供了计算信息摘要(密文)的另种方法, 如 Example 2-39 所示. 它与 md5 模块类似, 但生成的是 160 位签名.

2.20.0.1. Example 2-39. 使用 sha 模块

File: sha-example-1.py

import sha

hash = sha.new()hash.update("spam, spam, and eggs")

print repr(hash.digest())print hash.hexdigest()

'/321/333/003/026I/331/272-j/303/247/240/345/343Tvq/364/346/311'd1db031649d9ba2d6ac3a7a0e5e3547671f4e6c9

关于 sha 密文的使用, 请参阅 md5 中的例子.

2.21. crypt 模块

(可选, 只用于 Unix) crypt 模块实现了单向的 DES 加密, Unix 系统使用这个加密算法来储存密码, 这个模块真正也就只在检查这样的密码时有用.

Example 2-40 展示了如何使用 crypt.crypt 来加密一个密码, 将密码和 salt 组合起来然后传递给函数, 这里的 salt 包含两位随机字符. 现在你可以扔掉原密码而只保存加密后的字符串了.

2.21.0.1. Example 2-40. 使用 crypt 模块

File: crypt-example-1.py

import crypt

import random, string

def getsalt(chars = string.letters + string.digits):    # generate a random 2-character 'salt'    # 生成随机的 2 字符 'salt'    return random.choice(chars) + random.choice(chars)

print crypt.crypt("bananas", getsalt())

'py8UGrijma1j6'

确认密码时, 只需要用新密码调用加密函数, 并取加密后字符串的前两位作为 salt 即可. 如果结果和加密后字符串匹配, 那么密码就是正确的. Example 2-41 使用 pwd 模块来获取已知用户的加密后密码.

2.21.0.2. Example 2-41. 使用 crypt 模块身份验证

File: crypt-example-2.py

import pwd, crypt

def login(user, password):    "Check if user would be able to log in using password"    try:        pw1 = pwd.getpwnam(user)[1]        pw2 = crypt.crypt(password, pw1[:2])        return pw1 == pw2    except KeyError:        return 0 # no such user

user = raw_input("username:")password = raw_input("password:")

if login(user, password):    print "welcome", userelse:    print "login failed"

关于其他实现验证的方法请参阅 md5 模块一节.

2.22. rotor 模块

这个模块在 2.3 时被声明不赞成, 2.4 时废了. 因为它的加密算法不安全.
- Feather

(可选) rotor 模块实现了一个简单的加密算法. 如 Example 2-42 所示. 它的算法基于 WWII Enigma engine.

2.22.0.1. Example 2-42. 使用 rotor 模块

File: rotor-example-1.py

import rotor

SECRET_KEY = "spam"MESSAGE = "the holy grail"

r = rotor.newrotor(SECRET_KEY)

encoded_message = r.encrypt(MESSAGE)decoded_message = r.decrypt(encoded_message)

print "original:", repr(MESSAGE)print "encoded message:", repr(encoded_message)print "decoded message:", repr(decoded_message)

original: 'the holy grail'encoded message: '/227/271/244/015/305sw/3340/337/252/237/340U'decoded message: 'the holy grail'

2.23. zlib 模块

(可选) zlib 模块为 "zlib" 压缩提供支持. (这种压缩方法是 "deflate".)

Example 2-43 展示了如何使用 compress 和 decompress 函数接受字符串参数.

2.23.0.1. Example 2-43. 使用 zlib 模块压缩字符串

File: zlib-example-1.py

import zlib

MESSAGE = "life of brian"

compressed_message = zlib.compress(MESSAGE)decompressed_message = zlib.decompress(compressed_message)

print "original:", repr(MESSAGE)print "compressed message:", repr(compressed_message)print "decompressed message:", repr(decompressed_message)

original: 'life of brian'compressed message: 'x/234/313/311LKU/310OSH*/312L/314/003/000!/010/004/302'decompressed message: 'life of brian'

文件的内容决定了压缩比率, Example 2-44 说明了这点.

2.23.0.2. Example 2-44. 使用 zlib 模块压缩多个不同类型文件

File: zlib-example-2.py

import zlibimport glob

for file in glob.glob("samples/*"):

    indata = open(file, "rb").read()    outdata = zlib.compress(indata, zlib.Z_BEST_COMPRESSION)

    print file, len(indata), "=>", len(outdata),    print "%d%%" % (len(outdata) * 100 / len(indata))

samples/sample.au 1676 => 1109 66%samples/sample.gz 42 => 51 121%samples/sample.htm 186 => 135 72%samples/sample.ini 246 => 190 77%samples/sample.jpg 4762 => 4632 97%samples/sample.msg 450 => 275 61%samples/sample.sgm 430 => 321 74%samples/sample.tar 10240 => 125 1%samples/sample.tgz 155 => 159 102%samples/sample.txt 302 => 220 72%samples/sample.wav 13260 => 10992 82%

你也可以实时地压缩或解压缩数据, 如 Example 2-45 所示.

2.23.0.3. Example 2-45. 使用 zlib 模块解压缩流

File: zlib-example-3.py

import zlib

encoder = zlib.compressobj()

data = encoder.compress("life")data = data + encoder.compress(" of ")data = data + encoder.compress("brian")data = data + encoder.flush()

print repr(data)print repr(zlib.decompress(data))

'x/234/313/311LKU/310OSH*/312L/314/003/000!/010/004/302''life of brian'

Example 2-46 把解码对象封装到了一个类似文件对象的类中, 实现了一些文件对象的方法, 这样使得读取压缩文件更方便.

2.23.0.4. Example 2-46. 压缩流的仿文件访问方式

File: zlib-example-4.py

import zlibimport string, StringIO

class ZipInputStream:

    def _ _init_ _(self, file):        self.file = file        self._ _rewind()

    def _ _rewind(self):        self.zip = zlib.decompressobj()        self.pos = 0 # position in zipped stream        self.offset = 0 # position in unzipped stream        self.data = ""

    def _ _fill(self, bytes):        if self.zip:            # read until we have enough bytes in the buffer            while not bytes or len(self.data) < bytes:                self.file.seek(self.pos)                data = self.file.read(16384)                if not data:                    self.data = self.data + self.zip.flush()                    self.zip = None # no more data                    break                self.pos = self.pos + len(data)                self.data = self.data + self.zip.decompress(data)

    def seek(self, offset, whence=0):        if whence == 0:            position = offset        elif whence == 1:            position = self.offset + offset        else:            raise IOError, "Illegal argument"        if position < self.offset:            raise IOError, "Cannot seek backwards"

        # skip forward, in 16k blocks        while position > self.offset:            if not self.read(min(position - self.offset, 16384)):                break

    def tell(self):        return self.offset

    def read(self, bytes = 0):        self._ _fill(bytes)        if bytes:            data = self.data[:bytes]            self.data = self.data[bytes:]        else:            data = self.data            self.data = ""        self.offset = self.offset + len(data)        return data

    def readline(self):        # make sure we have an entire line        while self.zip and "/n" not in self.data:            self._ _fill(len(self.data) + 512)        i = string.find(self.data, "/n") + 1        if i <= 0:            return self.read()        return self.read(i)

    def readlines(self):        lines = []        while 1:            s = self.readline()            if not s:                break            lines.append(s)        return lines

## try it out

data = open("samples/sample.txt").read()data = zlib.compress(data)

file = ZipInputStream(StringIO.StringIO(data))for line in file.readlines():    print line[:-1]

We will perhaps eventually be writing only smallmodules which are identified by name as they areused to build larger ones, so that devices likeindentation, rather than delimiters, might becomefeasible for expressing local structure in thesource language.    -- Donald E. Knuth, December 1974

2.24. code 模块

code 模块提供了一些用于模拟标准交互解释器行为的函数.

compile_command 与内建 compile 函数行为相似, 但它会通过测试来保证你传递的是一个完成的 Python 语句.

在 Example 2-47 中, 我们一行一行地编译一个程序, 编译完成后会执行所得到的代码对象 (code object). 程序代码如下:

a = (  1,  2,  3)print a

注意只有我们到达第 2 个括号, 元组的赋值操作能编译完成.

2.24.0.1. Example 2-47. 使用 code 模块编译语句

File: code-example-1.py

import codeimport string

# SCRIPT = [    "a = (",    "  1,",    "  2,",    "  3 ",    ")",    "print a"]

script = ""

for line in SCRIPT:    script = script + line + "/n"    co = code.compile_command(script, "<stdin>", "exec")    if co:        # got a complete statement.  execute it!        print "-"*40        print script,        print "-"*40        exec co        script = ""

----------------------------------------a = (  1,  2,  3 )--------------------------------------------------------------------------------print a----------------------------------------(1, 2, 3)

InteractiveConsole 类实现了一个交互控制台, 类似你启动的 Python 解释器交互模式.

控制台可以是活动的(自动调用函数到达下一行) 或是被动的(当有新数据时调用 push 方法). 默认使用内建的 raw_input 函数. 如果你想使用另个输入函数, 你可以使用相同的名称重载这个方法. Example 2-48 展示了如何使用 code 模块来模拟交互解释器.

2.24.0.2. Example 2-48. 使用 code 模块模拟交互解释器

File: code-example-2.py

import code

console = code.InteractiveConsole()console.interact()

Python 1.5.2Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam(InteractiveConsole)>>> a = (...     1,...     2,...     3... )>>> print a(1, 2, 3)

Example 2-49 中的脚本定义了一个 keyboard 函数. 它允许你在程序中手动控制交互解释器.

2.24.0.3. Example 2-49. 使用 code 模块实现简单的 Debugging

File: code-example-3.py

def keyboard(banner=None):    import code, sys

    # use exception trick to pick up the current frame    try:        raise None    except:        frame = sys.exc_info()[2].tb_frame.f_back

    # evaluate commands in current namespace    namespace = frame.f_globals.copy()    namespace.update(frame.f_locals)

    code.interact(banner=banner, local=namespace)

def func():    print "START"    a = 10    keyboard()    print "END"

func()

STARTPython 1.5.2Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam(InteractiveConsole)>>> print a10>>> print keyboard<function keyboard at 9032c8>^ZEND

3. 线程和进程

"Well, since you last asked us to stop, this thread has moved from discussing languages suitable for professional programmers via accidental users to computer-phobic users. A few more iterations can make this thread really interesting..."

<blockquote>
- eff-bot, June 1996
</blockquote>

3.1. 概览

本章将介绍标准 Python 解释器中所提供的线程支持模块. 注意线程支持模块是可选的, 有可能在一些 Python 解释器中不可用.

本章还涵盖了一些 Unix 和 Windows 下用于执行外部进程的模块.

3.1.1. 线程

执行 Python 程序的时候, 是按照从主模块顶端向下执行的. 循环用于重复执行部分代码, 函数和方法会将控制临时移交到程序的另一部分.

通过线程, 你的程序可以在同时处理多个任务. 每个线程都有它自己的控制流. 所以你可以在一个线程里从文件读取数据, 另个向屏幕输出内容.

为了保证两个线程可以同时访问相同的内部数据, Python 使用了 global interpreter lock (全局解释器锁). 在同一时间只可能有一个线程执行 Python 代码; Python 实际上是自动地在一段很短的时间后切换到下个线程执行, 或者等待一个线程执行一项需要时间的操作(例如等待通过 socket 传输的数据, 或是从文件中读取数据).

全局锁事实上并不能避免你程序中的问题. 多个线程尝试访问相同的数据会导致异常状态. 例如以下的代码:

def getitem(key):    item = cache.get(key)    if item is None:        # not in cache; create a new one        item = create_new_item(key)  cache[key] = item    return item

如果不同的线程先后使用相同的 key 调用这里的 getitem 方法, 那么它们很可能会导致相同的参数调用两次 create_new_item . 大多时候这样做没有问题, 但在某些时候会导致严重错误.

不过你可以使用 lock objects 来同步线程. 一个线程只能拥有一个 lock object , 这样就可以确保某个时刻只有一个线程执行 getitem 函数.

3.1.2. 进程

在大多现代操作系统中, 每个程序在它自身的进程( process )内执行. 我们通过在 shell 中键入命令或直接在菜单中选择来执行一个程序/进程. Python 允许你在一个脚本内执行一个新的程序.

大多进程相关函数通过 os 模块定义. 相关内容请参阅第 1.4.4 小节 .

3.2. threading 模块

(可选) threading 模块为线程提供了一个高级接口, 如 Example 3-1 所示. 它源自 Java 的线程实现. 和低级的 thread 模块相同, 只有你在编译解释器时打开了线程支持才可以使用它 .

你只需要继承 Thread 类, 定义好 run 方法, 就可以创建一个新的线程. 使用时首先创建该类的一个或多个实例, 然后调用 start 方法. 这样每个实例的 run 方法都会运行在它自己的线程里.

3.2.0.1. Example 3-1. 使用 threading 模块

File: threading-example-1.py

import threadingimport time, random

class Counter:    def _ _init_ _(self):        self.lock = threading.Lock()        self.value = 0

    def increment(self):        self.lock.acquire() # critical section        self.value = value = self.value + 1        self.lock.release()        return value

counter = Counter()

class Worker(threading.Thread):

    def run(self):        for i in range(10):            # pretend we're doing something that takes 10�00 ms            value = counter.increment() # increment global counter            time.sleep(random.randint(10, 100) / 1000.0)            print self.getName(), "-- task", i, "finished", value

## try it

for i in range(10):    Worker().start() # start a worker

Thread-1 -- task 0 finished 1Thread-3 -- task 0 finished 3Thread-7 -- task 0 finished 8Thread-1 -- task 1 finished 7Thread-4 -- task 0 Thread-5 -- task 0 finished 4finished 5Thread-8 -- task 0 Thread-6 -- task 0 finished 9finished 6...Thread-6 -- task 9 finished 98Thread-4 -- task 9 finished 99Thread-9 -- task 9 finished 100

Example 3-1 使用了 Lock 对象来在全局 Counter 对象里创建临界区 (critical section). 如果删除了 acquire 和 release 语句, 那么 Counter 很可能不会到达 100.

3.3. Queue 模块

Queue 模块提供了一个线程安全的队列 (queue) 实现, 如 Example 3-2 所示. 你可以通过它在多个线程里安全访问同个对象.

3.3.0.1. Example 3-2. 使用 Queue 模块

File: queue-example-1.py

import threadingimport Queueimport time, random

WORKERS = 2

class Worker(threading.Thread):

    def _ _init_ _(self, queue):        self._ _queue = queue        threading.Thread._ _init_ _(self)

    def run(self):        while 1:            item = self._ _queue.get()            if item is None:                break # reached end of queue

            # pretend we're doing something that takes 10�00 ms            time.sleep(random.randint(10, 100) / 1000.0)

            print "task", item, "finished"

## try it

queue = Queue.Queue(0)

for i in range(WORKERS):    Worker(queue).start() # start a worker

for i in range(10):    queue.put(i)

for i in range(WORKERS):    queue.put(None) # add end-of-queue markers

task 1 finishedtask 0 finishedtask 3 finishedtask 2 finishedtask 4 finishedtask 5 finishedtask 7 finishedtask 6 finishedtask 9 finishedtask 8 finished

Example 3-3 展示了如何限制队列的大小. 如果队列满了, 那么控制主线程 (producer threads) 被阻塞, 等待项目被弹出 (pop off).

3.3.0.2. Example 3-3. 使用限制大小的 Queue 模块

File: queue-example-2.py

import threadingimport Queue

import time, random

WORKERS = 2

class Worker(threading.Thread):

    def _ _init_ _(self, queue):        self._ _queue = queue        threading.Thread._ _init_ _(self)

    def run(self):        while 1:            item = self._ _queue.get()            if item is None:                break # reached end of queue

            # pretend we're doing something that takes 10�00 ms            time.sleep(random.randint(10, 100) / 1000.0)

            print "task", item, "finished"

## run with limited queue

queue = Queue.Queue(3)

for i in range(WORKERS):    Worker(queue).start() # start a worker

for item in range(10):    print "push", item    queue.put(item)

for i in range(WORKERS):    queue.put(None) # add end-of-queue markers

push 0push 1push 2push 3push 4push 5task 0 finishedpush 6task 1 finishedpush 7task 2 finishedpush 8task 3 finishedpush 9task 4 finishedtask 6 finishedtask 5 finishedtask 7 finishedtask 9 finishedtask 8 finished

你可以通过继承 Queue 类来修改它的行为. Example 3-4 为我们展示了一个简单的具有优先级的队列. 它接受一个元组作为参数, 元组的第一个成员表示优先级(数值越小优先级越高).

3.3.0.3. Example 3-4. 使用 Queue 模块实现优先级队列

File: queue-example-3.py

import Queueimport bisect

Empty = Queue.Empty

class PriorityQueue(Queue.Queue):    "Thread-safe priority queue"

    def _put(self, item):        # insert in order        bisect.insort(self.queue, item)

## try it

queue = PriorityQueue(0)

# add items out of orderqueue.put((20, "second"))queue.put((10, "first"))queue.put((30, "third"))

# print queue contentstry:    while 1:        print queue.get_nowait()except Empty:    pass

thirdsecondfirst

Example 3-5 展示了一个简单的堆栈 (stack) 实现 (末尾添加, 头部弹出, 而非头部添加, 头部弹出).

3.3.0.4. Example 3-5. 使用 Queue 模块实现一个堆栈

File: queue-example-4.py

import Queue

Empty = Queue.Empty

class Stack(Queue.Queue):    "Thread-safe stack"

    def _put(self, item):        # insert at the beginning of queue, not at the end        self.queue.insert(0, item)

    # method aliases    push = Queue.Queue.put    pop = Queue.Queue.get    pop_nowait = Queue.Queue.get_nowait

## try it

stack = Stack(0)

# push items on stackstack.push("first")stack.push("second")stack.push("third")

# print stack contentstry:    while 1:        print stack.pop_nowait()except Empty:    pass

thirdsecondfirst

3.4. thread 模块

(可选) thread 模块提为线程提供了一个低级 (low_level) 的接口, 如 Example 3-6 所示. 只有你在编译解释器时打开了线程支持才可以使用它. 如果没有特殊需要, 最好使用高级接口 threading 模块替代.

3.4.0.1. Example 3-6. 使用 thread 模块

File: thread-example-1.py

import threadimport time, random

def worker():    for i in range(50):        # pretend we're doing something that takes 10�00 ms        time.sleep(random.randint(10, 100) / 1000.0)        print thread.get_ident(), "-- task", i, "finished"

## try it out!

for i in range(2):    thread.start_new_thread(worker, ())

time.sleep(1)

print "goodbye!"

311 -- task 0 finished265 -- task 0 finished265 -- task 1 finished311 -- task 1 finished...265 -- task 17 finished311 -- task 13 finished265 -- task 18 finishedgoodbye!

注意当主程序退出的时候, 所有的线程也随着退出. 而 threading 模块不存在这个问题 . (该行为可改变)

3.5. commands 模块

(只用于 Unix) commands 模块包含一些用于执行外部命令的函数. Example 3-7 展示了这个模块.

3.5.0.1. Example 3-7. 使用 commands 模块

File: commands-example-1.py

import commands

stat, output = commands.getstatusoutput("ls -lR")

print "status", "=>", statprint "output", "=>", len(output), "bytes"

status => 0output => 171046 bytes

3.6. pipes 模块

(只用于 Unix) pipes 模块提供了 "转换管道 (conversion pipelines)" 的支持. 你可以创建包含许多外部工具调用的管道来处理多个文件. 如 Example 3-8 所示.

3.6.0.1. Example 3-8. 使用 pipes 模块

File: pipes-example-1.py

import pipes

t = pipes.Template()

# create a pipeline# 这里 " - " 代表从标准输入读入内容t.append("sort", "--")t.append("uniq", "--")

# filter some text# 这里空字符串代表标准输出t.copy("samples/sample.txt", "")

Alan Jones (sensible party)Kevin Phillips-Bong (slightly silly)Tarquin Fin-tim-lin-bin-whin-bim-lin-bus-stop-F'tang-F'tang-Olé-Biscuitbarrel

3.7. popen2 模块

popen2 模块允许你执行外部命令, 并通过流来分别访问它的 stdin 和 stdout ( 可能还有 stderr ).

在 python 1.5.2 以及之前版本, 该模块只存在于 Unix 平台上. 2.0 后, Windows 下也实现了该函数. Example 3-9 展示了如何使用该模块来给字符串排序.

3.7.0.1. Example 3-9. 使用 popen2 模块对字符串排序Module to Sort Strings

File: popen2-example-1.py

import popen2, string

fin, fout = popen2.popen2("sort")

fout.write("foo/n")fout.write("bar/n")fout.close()

print fin.readline(),print fin.readline(),fin.close()

barfoo

Example 3-10 展示了如何使用该模块控制应用程序 .

3.7.0.2. Example 3-10. 使用 popen2 模块控制 gnuchess

File: popen2-example-2.py

import popen2import string

class Chess:    "Interface class for chesstool-compatible programs"

    def _ _init_ _(self, engine = "gnuchessc"):        self.fin, self.fout = popen2.popen2(engine)        s = self.fin.readline()        if s != "Chess/n":            raise IOError, "incompatible chess program"

    def move(self, move):        self.fout.write(move + "/n")        self.fout.flush()        my = self.fin.readline()        if my == "Illegal move":            raise ValueError, "illegal move"        his = self.fin.readline()        return string.split(his)[2]

    def quit(self):        self.fout.write("quit/n")        self.fout.flush()

## play a few moves

g = Chess()

print g.move("a2a4")print g.move("b2b3")

g.quit()

b8c6e7e5

3.8. signal 模块

你可以使用 signal 模块配置你自己的信号处理器 (signal handler), 如 Example 3-11 所示. 当解释器收到某个信号时, 信号处理器会立即执行.

3.8.0.1. Example 3-11. 使用 signal 模块

File: signal-example-1.py

import signalimport time

def handler(signo, frame):    print "got signal", signo

signal.signal(signal.SIGALRM, handler)

# wake me up in two secondssignal.alarm(2)

now = time.time()

time.sleep(200)

print "slept for", time.time() - now, "seconds"

got signal 14slept for 1.99262607098 seconds

4. 数据表示

"PALO ALTO, Calif. - Intel says its Pentium Pro and new Pentium II chips have a flaw that can cause computers to sometimes make mistakes but said the problems could be fixed easily with rewritten software."

<blockquote>
- Reuters telegram
</blockquote>

4.1. 概览

本章描述了一些用于在 Python 对象和其他数据表示类型间相互转换的模块. 这些模块通常用于读写特定的文件格式或是储存/取出 Python 变量.

4.1.1. 二进制数据

Python 提供了一些用于二进制数据解码/编码的模块. struct 模块用于在二进制数据结构(例如 C 中的 struct )和 Python 元组间转换. array 模块将二进制数据阵列 ( C arrays )封装为 Python 序列对象.

4.1.2. 自描述格式

marshal 和 pickle 模块用于在不同的 Python 程序间共享/传递数据.

marshal 模块使用了简单的自描述格式( Self-Describing Formats ), 它支持大多的内建数据类型, 包括 code 对象. Python 自身也使用了这个格式来储存编译后代码( .pyc 文件).

pickle 模块提供了更复杂的格式, 它支持用户定义的类, 自引用数据结构等等. pickle 是用 Python 写的, 相对来说速度较慢, 不过还有一个 cPickle 模块, 使用 C 实现了相同的功能, 速度和 marshal 不相上下.

4.1.3. 输出格式

一些模块提供了增强的格式化输出, 用来补充内建的 repr 函数和 % 字符串格式化操作符.

pprint 模块几乎可以将任何 Python 数据结构很好地打印出来(提高可读性).

repr 模块可以用来替换内建同名函数. 该模块与内建函数不同的是它限制了很多输出形式: 他只会输出字符串的前 30 个字符, 它只打印嵌套数据结构的几个等级, 等等.

4.1.4. 编码二进制数据

Python 支持大部分常见二进制编码, 例如 base64 , binhex (一种 Macintosh 格式) , quoted printable , 以及 uu 编码.

4.2. array 模块

array 模块实现了一个有效的阵列储存类型. 阵列和列表类似, 但其中所有的项目必须为相同的类型. 该类型在阵列创建时指定.

Examples 4-1 到 4-5 都是很简单的范例. Example 4-1 创建了一个 array 对象, 然后使用 tostring 方法将内部缓冲区( internal buffer )复制到字符串.

4.2.0.1. Example 4-1. 使用 array 模块将数列转换为字符串

File: array-example-1.py

import array

a = array.array("B", range(16)) # unsigned charb = array.array("h", range(16)) # signed short

print aprint repr(a.tostring())

print bprint repr(b.tostring())

array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])'/000/001/002/003/004/005/006/007/010/011/012/013/014/015/016/017'

array('h', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])'/000/000/001/000/002/000/003/000/004/000/005/000/006/000/007/000/010/000/011/000/012/000/013/000/014/000/015/000/016/000/017/000'

array 对象可以作为一个普通列表对待, 如 Example 4-2 所示. 不过, 你不能连接两个不同类型的阵列.

4.2.0.2. Example 4-2. 作为普通序列操作阵列

File: array-example-2.py

import array

a = array.array("B", [1, 2, 3])

a.append(4)

a = a + a

a = a[2:-2]

print aprint repr(a.tostring())for i in a:    print i,

array('B', [3, 4, 1, 2])'/003/004/001/002'3 4 1 2

该模块还提供了用于转换原始二进制数据到整数序列(或浮点数数列, 具体情况决定)的方法, 如 Example 4-3 所示.

4.2.0.3. Example 4-3. 使用阵列将字符串转换为整数列表

File: array-example-3.py

import array

a = array.array("i", "fish license") # signed integer

print aprint repr(a.tostring())print a.tolist()

array('i', [1752394086, 1667853344, 1702063717])'fish license'[1752394086, 1667853344, 1702063717]

最后, Example 4-4 展示了如何使用该模块判断当前平台的字节序( endianess ) .

4.2.0.4. Example 4-4. 使用 array 模块判断平台字节序

File: array-example-4.py

import array

def little_endian():    return ord(array.array("i",[1]).tostring()[0])

if little_endian():    print "little-endian platform (intel, alpha)"else:    print "big-endian platform (motorola, sparc)"

big-endian platform (motorola, sparc)

Python 2.0 以及以后版本提供了 sys.byteorder 属性, 可以更简单地判断字节序 (属性值为 "little" 或 "big" ), 如 Example 4-5 所示.

4.2.0.5. Example 4-5. 使用 sys.byteorder 属性判断平台字节序( Python 2.0 及以后)

File: sys-byteorder-example-1.py

import sys

# 2.0 and laterif sys.byteorder == "little":    print "little-endian platform (intel, alpha)"else:    print "big-endian platform (motorola, sparc)"

big-endian platform (motorola, sparc)

4.3. struct 模块

struct 模块用于转换二进制字符串和 Python 元组. pack 函数接受格式字符串以及额外参数, 根据指定格式将额外参数转换为二进制字符串. upack 函数接受一个字符串作为参数, 返回一个元组. 如 Example 4-6 所示.

4.3.0.1. Example 4-6. 使用 struct 模块

File: struct-example-1.py

import struct

# native byteorderbuffer = struct.pack("ihb", 1, 2, 3)print repr(buffer)print struct.unpack("ihb", buffer)

# data from a sequence, network byteorderdata = [1, 2, 3]buffer = apply(struct.pack, ("!ihb",) + tuple(data))print repr(buffer)print struct.unpack("!ihb", buffer)

# in 2.0, the apply statement can also be written as:# buffer = struct.pack("!ihb", *data)

'/001/000/000/000/002/000/003'(1, 2, 3)'/000/000/000/001/000/002/003'(1, 2, 3)

4.4. xdrlib 模块

xdrlib 模块用于在 Python 数据类型和 Sun 的 external data representation (XDR) 间相互转化, 如 Example 4-7 所示.

4.4.0.1. Example 4-7. 使用 xdrlib 模块

File: xdrlib-example-1.py

import xdrlib

## create a packer and add some data to it

p = xdrlib.Packer()p.pack_uint(1)p.pack_string("spam")

data = p.get_buffer()

print "packed:", repr(data)

## create an unpacker and use it to decode the data

u = xdrlib.Unpacker(data)

print "unpacked:", u.unpack_uint(), repr(u.unpack_string())

u.done()

packed: '/000/000/000/001/000/000/000/004spam'unpacked: 1 'spam'

Sun 在 remote procedure call (RPC) 协议中使用了 XDR 格式. Example 4-8 虽然不完整, 但它展示了如何建立一个 RPC 请求包.

4.4.0.2. Example 4-8. 使用 xdrlib 模块发送 RPC 调用包

File: xdrlib-example-2.py

import xdrlib

# some constants (see the RPC specs for details)RPC_CALL = 1RPC_VERSION = 2

MY_PROGRAM_ID = 1234 # assigned by SunMY_VERSION_ID = 1000MY_TIME_PROCEDURE_ID = 9999

AUTH_NULL = 0

transaction = 1

p = xdrlib.Packer()

# send a Sun RPC call packagep.pack_uint(transaction)p.pack_enum(RPC_CALL)p.pack_uint(RPC_VERSION)p.pack_uint(MY_PROGRAM_ID)p.pack_uint(MY_VERSION_ID)p.pack_uint(MY_TIME_PROCEDURE_ID)p.pack_enum(AUTH_NULL)p.pack_uint(0)p.pack_enum(AUTH_NULL)p.pack_uint(0)

print repr(p.get_buffer())

'/000/000/000/001/000/000/000/001/000/000/000/002/000/000/004/322/000/000/003/350/000/000/'/017/000/000/000/000/000/000/000/000/000/000/000/000/000/000/000/000'

4.5. marshal 模块

marshal 模块可以把不连续的数据组合起来 - 与字符串相互转化, 这样它们就可以写入文件或是在网络中传输. 如 Example 4-9 所示.

marshal 模块使用了简单的自描述格式. 对于每个数据项目, 格式化后的字符串都包含一个类型代码, 然后是一个或多个类型标识区域. 整数使用小字节序( little-endian order )储存, 字符串储存时和它自身内容长度相同(可能包含空字节), 元组由组成它的对象组合表示.

4.5.0.1. Example 4-9. 使用 marshal 模块组合不连续数据

File: marshal-example-1.py

import marshal

value = (    "this is a string",    [1, 2, 3, 4],    ("more tuples", 1.0, 2.3, 4.5),    "this is yet another string"    )

data = marshal.dumps(value)

# intermediate formatprint type(data), len(data)

print "-"*50print repr(data)print "-"*50

print marshal.loads(data)

<type 'string'> 118--------------------------------------------------'(/004/000/000/000s/020/000/000/000this is a string[/004/000/000/000i/001/000/000/000i/002/000/000/000i/003/000/000/000i/004/000/000/000(/004/000/000/000s/013/000/000/000more tuplesf/0031.0f/0032.3f/0034.5s/032/000/000/000this is yet another string'--------------------------------------------------('this is a string', [1, 2, 3, 4], ('more tuples',1.0, 2.3, 4.5), 'this is yet another string')

marshal 模块还可以处理 code 对象(它用于储存预编译的 Python 模块). 如 Example 4-10 所示.

4.5.0.2. Example 4-10. 使用 marshal 模块处理代码

File: marshal-example-2.py

import marshal

script = """print 'hello'"""

code = compile(script, "<script>", "exec")

data = marshal.dumps(code)

# intermediate formatprint type(data), len(data)

print "-"*50print repr(data)print "-"*50

exec marshal.loads(data)

<type 'string'> 81--------------------------------------------------'c/000/000/000/000/001/000/000/000s/017/000/000/000/177/000/000/177/002/000d/000/000GHd/001/000S(/002/000/000/000s/005/000/000/000helloN(/000/000/000/000(/000/000/000/000s/010/000/000/000<script>s/001/000/000/000?/002/000s/000/000/000/000'--------------------------------------------------hello

4.6. pickle 模块

pickle 模块同 marshal 模块相同, 将数据连续化, 便于保存传输. 它比 marshal 要慢一些, 但它可以处理类实例, 共享的元素, 以及递归数据结构等.

4.6.0.1. Example 4-11. 使用 pickle 模块

File: pickle-example-1.py

import pickle

value = (    "this is a string",    [1, 2, 3, 4],    ("more tuples", 1.0, 2.3, 4.5),    "this is yet another string"    )

data = pickle.dumps(value)

# intermediate formatprint type(data), len(data)

print "-"*50print dataprint "-"*50

print pickle.loads(data)

<type 'string'> 121--------------------------------------------------(S'this is a string'p0(lp1I1aI2aI3aI4a(S'more tuples'p2F1.0F2.3F4.5tp3S'this is yet another string'p4tp5.--------------------------------------------------('this is a string', [1, 2, 3, 4], ('more tuples',1.0, 2.3, 4.5), 'this is yet another string')

不过另一方面, pickle 不能处理 code 对象(可以参阅 copy_reg 模块来完成这个).

默认情况下, pickle 使用急于文本的格式. 你也可以使用二进制格式, 这样数字和二进制字符串就会以紧密的格式储存, 这样文件就会更小点. 如 Example 4-12 所示.

4.6.0.2. Example 4-12. 使用 pickle 模块的二进制模式

File: pickle-example-2.py

import pickleimport math

value = (    "this is a long string" * 100,    [1.2345678, 2.3456789, 3.4567890] * 100    )

# text modedata = pickle.dumps(value)print type(data), len(data), pickle.loads(data) == value

# binary modedata = pickle.dumps(value, 1)print type(data), len(data), pickle.loads(data) == value

4.7. cPickle 模块

(可选, 注意大小写) cPickle 模块是针对 pickle 模块的一个更快的实现. 如 Example 4-13 所示.

4.7.0.1. Example 4-13. 使用 cPickle 模块

File: cpickle-example-1.py

try:    import cPickle    pickle = cPickleexcept ImportError:    import pickle

4.8. copy_reg 模块

你可以使用 copy_reg 模块注册你自己的扩展类型. 这样 pickle 和 copy 模块就会知道如何处理非标准类型.

例如, 标准的 pickle 实现不能用来处理 Python code 对象, 如下所示:

File: copy-reg-example-1.py

import pickle

CODE = """print 'good evening'"""

code = compile(CODE, "<string>", "exec")

exec codeexec pickle.loads(pickle.dumps(code))

good eveningTraceback (innermost last):...pickle.PicklingError: can't pickle 'code' objects

我们可以注册一个 code 对象处理器来完成目标. 处理器应包含两个部分: 一个 pickler , 接受 code 对象并返回一个只包含简单数据类型的元组, 以及一个 unpickler , 作用相反, 接受这样的元组作为参数. 如 Example 4-14 所示.

4.8.0.1. Example 4-14. 使用 copy_reg 模块实现 code 对象的 pickle 操作

File: copy-reg-example-2.py

import copy_regimport pickle, marshal, types

## register a pickle handler for code objects

def code_unpickler(data):    return marshal.loads(data)

def code_pickler(code):    return code_unpickler, (marshal.dumps(code),)

copy_reg.pickle(types.CodeType, code_pickler, code_unpickler)

## try it out

CODE = """print "suppose he's got a pointed stick""""

code = compile(CODE, "<string>", "exec")

exec codeexec pickle.loads(pickle.dumps(code))

suppose he's got a pointed sticksuppose he's got a pointed stick

如果你是在网络中传输 pickle 后的数据, 那么请确保自定义的 unpickler 在数据接收端也是可用的.

Example 4-15 展示了如何实现 pickle 一个打开的文件对象.

4.8.0.2. Example 4-15. 使用 copy_reg 模块实现文件对象的 pickle 操作

File: copy-reg-example-3.py

import copy_regimport pickle, typesimport StringIO

## register a pickle handler for file objects

def file_unpickler(position, data):    file = StringIO.StringIO(data)    file.seek(position)    return file

def file_pickler(code):    position = file.tell()    file.seek(0)    data = file.read()    file.seek(position)    return file_unpickler, (position, data)

copy_reg.pickle(types.FileType, file_pickler, file_unpickler)

## try it out

file = open("samples/sample.txt", "rb")

print file.read(120),print "<here>",print pickle.loads(pickle.dumps(file)).read()

We will perhaps eventually be writing only smallmodules, which are identified by name as they areused to build larger <here> ones, so that devices likeindentation, rather than delimiters, might becomefeasible for expressing local structure in thesource language.     -- Donald E. Knuth, December 1974

4.9. pprint 模块

pprint 模块( pretty printer )用于打印 Python 数据结构. 当你在命令行下打印特定数据结构时你会发现它很有用(输出格式比较整齐, 便于阅读).

4.9.0.1. Example 4-16. 使用 pprint 模块

File: pprint-example-1.py

import pprint

data = (    "this is a string", [1, 2, 3, 4], ("more tuples",    1.0, 2.3, 4.5), "this is yet another string"    )

pprint.pprint(data)

('this is a string', [1, 2, 3, 4], ('more tuples', 1.0, 2.3, 4.5), 'this is yet another string')

4.10. repr 模块

repr 模块提供了内建 repr 函数的另个版本. 它限制了很多(字符串长度, 递归等). Example 4-17 展示了如何使用该模块.

4.10.0.1. Example 4-17. 使用 repr 模块

File: repr-example-1.py

# note: this overrides the built-in 'repr' functionfrom repr import repr

# an annoyingly recursive data structuredata = (    "X" * 100000,    )data = [data]data.append(data)

print repr(data)

[('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [('XXXXXXXXXXXX...XXXXXXXXXXXXX',), [(...), [...]]]]]]]

4.11. base64 模块

base64 编码体系用于将任意二进制数据转换为纯文本. 它将一个 3 字节的二进制字节组转换为 4 个文本字符组储存, 而且规定只允许以下集合中的字符出现:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

另外, = 用于填充数据流的末尾.

Example 4-18 展示了如何使用 encode 和 decode 函数操作文件对象.

4.11.0.1. Example 4-18. 使用 base64 模块编码文件

File: base64-example-1.py

import base64

MESSAGE = "life of brian"

file = open("out.txt", "w")file.write(MESSAGE)file.close()

base64.encode(open("out.txt"), open("out.b64", "w"))base64.decode(open("out.b64"), open("out.txt", "w"))

print "original:", repr(MESSAGE)print "encoded message:", repr(open("out.b64").read())print "decoded message:", repr(open("out.txt").read())

original:  'life of brian'encoded message: 'bGlmZSBvZiBicmlhbg==/012'decoded message: 'life of brian'

Example 4-19 展示了如何使用 encodestring 和 decodestring 函数在字符串间转换. 它们是 encode 和 decode 函数的顶层封装. 使用 StringIO 对象处理输入和输出.

4.11.0.2. Example 4-19. 使用 base64 模块编码字符串

File: base64-example-2.py

import base64

MESSAGE = "life of brian"

data = base64.encodestring(MESSAGE)

original_data = base64.decodestring(data)

print "original:", repr(MESSAGE)print "encoded data:", repr(data)print "decoded data:", repr(original_data)

original: 'life of brian'encoded data: 'bGlmZSBvZiBicmlhbg==/012'decoded data: 'life of brian'

Example 4-20 展示了如何将用户名和密码转换为 HTTP 基本身份验证字符串.

4.11.0.3. Example 4-20. 使用 base64 模块做基本验证

File: base64-example-3.py

import base64

def getbasic(user, password):    # basic authentication (according to HTTP)    return base64.encodestring(user + ":" + password)    

print getbasic("Aladdin", "open sesame")

'QWxhZGRpbjpvcGVuIHNlc2FtZQ=='

最后, Example 4-21 展示了一个实用小工具, 它可以把 GIF 格式转换为 Python 脚本, 便于使用 Tkinter 库.

4.11.0.4. Example 4-21. 使用 base64 为 Tkinter 封装 GIF 格式

File: base64-example-4.py

import base64, sys

if not sys.argv[1:]:    print "Usage: gif2tk.py giffile >pyfile"    sys.exit(1)

data = open(sys.argv[1], "rb").read()

if data[:4] != "GIF8":    print sys.argv[1], "is not a GIF file"    sys.exit(1)

print '# generated from', sys.argv[1], 'by gif2tk.py'printprint 'from Tkinter import PhotoImage' printprint 'image = PhotoImage(data="""'print base64.encodestring(data),print '""")'

# generated from samples/sample.gif by gif2tk.py

from Tkinter import PhotoImage

image = PhotoImage(data="""R0lGODlhoAB4APcAAAAAAIAAAACAAICAAAAAgIAAgACAgICAgAQEBIwEBIyMBJRUlISE/LRUBAQE...AjmQBFmQBnmQCJmQCrmQDNmQDvmQEBmREnkRAQEAOw==""")

4.12. binhex 模块

binhex 模块用于到 Macintosh BinHex 格式的相互转化. 如 Example 4-22 所示.

4.12.0.1. Example 4-22. 使用 binhex 模块

File: binhex-example-1.py

import binheximport sys

infile = "samples/sample.jpg"

binhex.binhex(infile, sys.stdout)

(This file must be converted with BinHex 4.0)

:#R0KEA"XC5jUF'F!2j!)!*!%%TS!N!4RdrrBrq!!%%T'58B!!3%!!!%!!3!!rpX!3`!)"JB("J8)"`F(#3N)#J`8$3`,#``C%K-2&"dD(aiG'K`F)#3Z*b!L,#-F(#Jh+5``-63d0"mR16di-M`Z-c3brpX!3`%*#3N-#``B$3dB-L%F)6+3-[r!!"%)!)!!J!-")J!#%3%$%3(ra!!I!!!""3'3"J#3#!%#!`3&"JF)#3S,rm3!Y4!!!J%$!`)%!`8&"!3!!!&p!3)$!!34"4)K-8%'%e&K"b*a&$+"ND%))d+a`495dI!N-f*bJJN

该模块有两个函数 binhex 和 hexbin .

4.13. quopri 模块

quopri 模块基于 MIME 标准实现了引用的可打印编码( quoted printable encoding ).

这样的编码可以将不包含或只包含一部分U.S. ASCII 文本的信息, 例如大多欧洲语言, 中文, 转换为只包含 U.S. ASCII 的信息. 在一些老式的 mail 代理中你会发现这很有用, 因为它们一般不支持特殊. 如 Example 4-23 所示.

4.13.0.1. Example 4-23. 使用 quopri 模块

File: quopri-example-1.py

import quopriimport StringIO

# helpers (the quopri module only supports file-to-file conversion)

def encodestring(instring, tabs=0):    outfile = StringIO.StringIO()    quopri.encode(StringIO.StringIO(instring), outfile, tabs)    return outfile.getvalue()

def decodestring(instring):    outfile = StringIO.StringIO()    quopri.decode(StringIO.StringIO(instring), outfile)    return outfile.getvalue()

## try it out

MESSAGE = "å i åa ä e ö!"

encoded_message = encodestring(MESSAGE)decoded_message = decodestring(encoded_message)

print "original:", MESSAGEprint "encoded message:", repr(encoded_message)print "decoded message:", decoded_message

original: å i åa ä e ö!encoded message: '=E5 i =E5a =E4 e =F6!/012'decoded message: å i åa ä e ö!

如 Example 4-23 所示, 非 U.S. 字符通过等号 (=) 附加两个十六进制字符来表示. 这里需要注意等号也是使用这样的方式( "=3D" )来表示的, 以及换行符( "=20" ). 其他字符不会被改变. 所以如果你没有用太多的怪异字符的话, 编码后字符串依然可读性很好.

(Europeans generally hate this encoding and strongly believe that certain U.S. programmers deserve to be slapped in the head with a huge great fish to the jolly music of Edward German....)

4.14. uu 模块

uu 编码体系用于将任意二进制数据转换为普通文本格式. 该格式在新闻组中很流行, 但逐渐被 base64 编码取代.

uu 编码将每个 3 字节( 24 位)的数据组转换为 4 个可打印字符(每个字符 6 位), 使用从 chr(32) (空格) 到 chr(95) 的字符. uu 编码通常会使数据大小增加 40% .

一个编码后的数据流以一个新行开始, 它包含文件的权限( Unix 格式)和文件名, 以 end 行结尾:

begin 666 sample.jpgM_]C_X  02D9)1@ ! 0   0 !  #_VP!#  @&!@<&!0@'!P<)'0@*#!0-# L+...more lines like this...end

uu 模块提供了两个函数: encode 和 decode .

encode(infile, outfile, filename) 函数从编码输入文件中的数据, 然后写入到输出文件中. 如 Example 4-24 所示. infile 和 outfile 可以是文件名或文件对象. filename 参数作为起始域的文件名写入.

4.14.0.1. Example 4-24. 使用 uu 模块编码二进制文件

File: uu-example-1.py

import uuimport os, sys

infile = "samples/sample.jpg"

uu.encode(infile, sys.stdout, os.path.basename(infile))

begin 666 sample.jpgM_]C_X  02D9)1@ ! 0   0 !  #_VP!#  @&!@<&!0@'!P<)"0@*#!0-# L+M#!D2$P/4'1H?'AT:'!P@)"XG("(L(QP<*#<I+# Q-#0T'R<Y/3@R/"XS-#+_MVP!# 0D)"0P+#!@-#1@R(1PA,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C(RM,C(R,C(R,C(R,C(R,C(R,C(R,C(R,C+_P  1" "  ( # 2(  A$! Q$!_/0 M'P   04! 0$! 0$           $" P0%!@<("0H+_/0 M1   @$# P($ P4%

decode(infile, outfile) 函数用来解码 uu 编码的数据. 同样地, 参数可以是文件名也可以是文件对象. 如 Example 4-25 所示.

4.14.0.2. Example 4-25. 使用 uu 模块解码 uu 格式的文件

File: uu-example-2.py

import uuimport StringIO

infile = "samples/sample.uue"outfile = "samples/sample.jpg"

## decode

fi = open(infile)fo = StringIO.StringIO()

uu.decode(fi, fo)

## compare with original data file

data = open(outfile, "rb").read()

if fo.getvalue() == data:    print len(data), "bytes ok"

4.15. binascii 模块

binascii 提供了多个编码的支持函数, 包括 base64 , binhex , 以及 uu . 如 Example 4-26 所示.

2.0 及以后版本中, 你还可以使用它在二进制数据和十六进制字符串中相互转换.

4.15.0.1. Example 4-26. 使用 binascii 模块

File: binascii-example-1.py

import binascii

text = "hello, mrs teal"

data = binascii.b2a_base64(text)text = binascii.a2b_base64(data)print text, "<=>", repr(data)

data = binascii.b2a_uu(text)text = binascii.a2b_uu(data)print text, "<=>", repr(data)

data = binascii.b2a_hqx(text)text = binascii.a2b_hqx(data)[0]print text, "<=>", repr(data)

# 2.0 and newerdata = binascii.b2a_hex(text)text = binascii.a2b_hex(data)print text, "<=>", repr(data)

hello, mrs teal <=> 'aGVsbG8sIG1ycyB0ZWFs/012'hello, mrs teal <=> '/:&5L;&//L(&UR<R!T96%L/012'hello, mrs teal <=> 'D/'9XE/'mX)/'ebFb"dC@&X'hello, mrs teal <=> '68656c6c6f2c206d7273207465616c'

5. 文件格式

5.1. 概览

本章将描述用于处理不同文件格式的模块.

5.1.1. Markup 语言

Python 提供了一些用于处理可扩展标记语言( Extensible Markup Language , XML ) 和超文本标记语言( Hypertext Markup Language , HTML )的扩展. Python 同样提供了对标准通用标记语言( Standard Generalized Markup Language , SGML )的支持.

所有这些格式都有着相同的结构, 因为 HTML 和 XML 都来自 SGML . 每个文档都是由起始标签( start tags ), 结束标签( end tags ), 文本(又叫字符数据), 以及实体引用( entity references )构成:

<document name="sample.xml">    <header>This is a header</header>    <body>This is the body text.  The text can contain    plain text (&quot;character data&quot;), tags, and    entities.    </body></document>

在这个例子中, <document>, <header>, 以及 <body> 是起始标签. 每个起始标签都有一个对应的结束标签, 使用斜线 "/" 标记. 起始标签可以包含多个属性, 比如这里的 name 属性.

起始标签和它对应的结束标签中的任何东西被称为 元素( element ). 这里 document 元素包含 header 和 body 两个元素.

" 是一个字符实体( character entity ). 字符实体用于在文本区域中表示特殊的保留字符, 使用 & 指示. 这里它代表一个引号, 常见字符实体还有 " < ( < )" 和 " > ( > )" .

虽然 XML , HTML , SGML 使用相同的结构块, 但它们还有一些不同点. 在 XML 中, 所有元素必须有起始和结束标签, 所有标签必须正确嵌套( well-formed ). 而且 XML 是区分大小写的, 所以 <document> 和 <Document> 是不同的元素类型.

HTML 有很高灵活性, HTML 语法分析器一般会自动补全缺失标签; 例如, 当遇到一个以 <P> 标签开始的新段落, 却没有对应结束标签, 语法分析器会自动添加一个 </P> 标签. HTML 也是区分大小写的. 另一方面, XML 允许你定义任何元素, 而 HTML 使用一些由 HTML 规范定义的固定元素.

SGML 有着更高的灵活性, 你可以使用自己的声明( declaration ) 定义源文件如何转换到元素结构, DTD ( document type description , 文件类型定义)可以用来检查结构并补全缺失标签. 技术上来说, HTML 和 XML 都是 SGML 应用, 有各自的 SGML 声明, 而且 HTML 有一个标准 DTD .

Python 提供了多个 makeup 语言分析器. 由于 SGML 是最灵活的格式, Python 的 sgmllib 事实上很简单. 它不会去处理 DTD , 不过你可以继承它来提供更复杂的功能.

Python 的 HTML 支持基于 SGML 分析器. htmllib 将具体的格式输出工作交给 formatter 对象. formatter 模块包含一些标准格式化标志.

Python 的 XML 支持模块很复杂. 先前是只有与 sgmllib 类似的 xmllib , 后来加入了更高级的 expat 模块(可选). 而最新版本中已经准备废弃 xmllib ,启用 xml 包作为工具集.

5.1.2. 配置文件

ConfigParser 模块用于读取简单的配置文件, 类似 Windows 下的 INI 文件.

netrc 模块用于读取 .netrc 配置文件, shlex 模块用于读取类似 shell 脚本语法的配置文件.

5.1.3. 压缩档案格式

Python 的标准库提供了对 GZIP 和 ZIP ( 2.0 及以后) 格式的支持. 基于 zlib 模块, gzip 和 zipfile 模块分别用来处理这类文件.

5.2. xmllib 模块

xmllib 已在当前版本中申明不支持.

xmlib 模块提供了一个简单的 XML 语法分析器, 使用正则表达式将 XML 数据分离, 如 Example 5-1 所示. 语法分析器只对文档做基本的检查, 例如是否只有一个顶层元素, 所有的标签是否匹配.

XML 数据一块一块地发送给 xmllib 分析器(例如在网路中传输的数据). 分析器在遇到起始标签, 数据区域, 结束标签, 和实体的时候调用不同的方法.

如果你只是对某些标签感兴趣, 你可以定义特殊的 start_tag 和 end_tag 方法, 这里 tag 是标签名称. 这些 start 函数使用它们对应标签的属性作为参数调用(传递时为一个字典).

5.2.0.1. Example 5-1. 使用 xmllib 模块获取元素的信息

File: xmllib-example-1.py

import xmllib

class Parser(xmllib.XMLParser):    # get quotation number

    def _ _init_ _(self, file=None):        xmllib.XMLParser._ _init_ _(self)        if file:            self.load(file)

    def load(self, file):        while 1:            s = file.read(512)            if not s:                break            self.feed(s)        self.close()

    def start_quotation(self, attrs):        print "id =>", attrs.get("id")        raise EOFError

try:    c = Parser()    c.load(open("samples/sample.xml"))except EOFError:    pass

id => 031

Example 5-2 展示了一个简单(不完整)的内容输出引擎( rendering engine ). 分析器有一个元素堆栈( _ _tags ), 它连同文本片断传递给输出生成器. 生成器会在 style 字典中查询当前标签的层次, 如果不存在, 它将根据样式表创建一个新的样式描述.

5.2.0.2. Example 5-2. 使用 xmllib 模块

File: xmllib-example-2.py

import xmllibimport string, sys

STYLESHEET = {    # each element can contribute one or more style elements    "quotation": {"style": "italic"},    "lang": {"weight": "bold"},    "name": {"weight": "medium"},}

class Parser(xmllib.XMLParser):    # a simple styling engine

    def _ _init_ _(self, renderer):        xmllib.XMLParser._ _init_ _(self)        self._ _data = []        self._ _tags = []        self._ _renderer = renderer

    def load(self, file):        while 1:            s = file.read(8192)            if not s:                break            self.feed(s)        self.close()

    def handle_data(self, data):        self._ _data.append(data)

    def unknown_starttag(self, tag, attrs):        if self._ _data:            text = string.join(self._ _data, "")            self._ _renderer.text(self._ _tags, text)        self._ _tags.append(tag)        self._ _data = []

    def unknown_endtag(self, tag):        self._ _tags.pop()        if self._ _data:            text = string.join(self._ _data, "")            self._ _renderer.text(self._ _tags, text)        self._ _data = []

class DumbRenderer:

    def _ _init_ _(self):        self.cache = {}

    def text(self, tags, text):        # render text in the style given by the tag stack        tags = tuple(tags)        style = self.cache.get(tags)        if style is None:            # figure out a combined style            style = {}            for tag in tags:                s = STYLESHEET.get(tag)                if s:                    style.update(s)            self.cache[tags] = style # update cache        # write to standard output        sys.stdout.write("%s =>/n" % style)        sys.stdout.write("  " + repr(text) + "/n")

## try it out

r = DumbRenderer()c = Parser(r)c.load(open("samples/sample.xml"))

{'style': 'italic'} =>  'I/'ve had a lot of developers come up to me and/012say,  "I haven/'t had this much fun in a long time. It sure  beats/012writing '{'style': 'italic', 'weight': 'bold'} =>  'Cobol'{'style': 'italic'} =>  '" -- '{'style': 'italic', 'weight': 'medium'} =>  'James Gosling'{'style': 'italic'} =>  ', on/012'{'weight': 'bold'} =>  'Java'{'style': 'italic'} =>  '.'

5.3. xml.parsers.expat 模块

(可选) xml.parsers.expat 模块是 James Clark's Expat XML parser 的接口. Example 5-3 展示了这个功能完整且性能很好的语法分析器.

5.3.0.1. Example 5-3. 使用 xml.parsers.expat 模块

File: xml-parsers-expat-example-1.py

from xml.parsers import expat

class Parser:

    def _ _init_ _(self):        self._parser = expat.ParserCreate()        self._parser.StartElementHandler = self.start        self._parser.EndElementHandler = self.end        self._parser.CharacterDataHandler = self.data

    def feed(self, data):        self._parser.Parse(data, 0)

    def close(self):        self._parser.Parse("", 1) # end of data        del self._parser # get rid of circular references

    def start(self, tag, attrs):        print "START", repr(tag), attrs

    def end(self, tag):        print "END", repr(tag)

    def data(self, data):        print "DATA", repr(data)

p = Parser()p.feed("<tag>data</tag>")p.close()

START u'tag' {}DATA u'data'END u'tag'

注意即使你传入的是普通的文本, 这里的分析器仍然会返回 Unicode 字符串. 默认情况下, 分析器将源文本作为 UTF-8 解析. 如果要使用其他编码, 请确保 XML 文件包含 encoding 说明. 如 Example 5-4 所示.

5.3.0.2. Example 5-4. 使用 xml.parsers.expat 模块读取 ISO Latin-1 文本

File: xml-parsers-expat-example-2.py

from xml.parsers import expat

class Parser:

    def _ _init_ _(self):        self._parser = expat.ParserCreate()        self._parser.StartElementHandler = self.start        self._parser.EndElementHandler = self.end        self._parser.CharacterDataHandler = self.data

    def feed(self, data):        self._parser.Parse(data, 0)

    def close(self):        self._parser.Parse("", 1) # end of data        del self._parser # get rid of circular references

    def start(self, tag, attrs):        print "START", repr(tag), attrs

    def end(self, tag):        print "END", repr(tag)

    def data(self, data):        print "DATA", repr(data)

p = Parser()p.feed("""/<?xml version='1.0' encoding='iso-8859-1'?><author><name>fredrik lundh</name><city>linköping</city></author>""")p.close()

START u'author' {}DATA u'/012'START u'name' {}DATA u'fredrik lundh'END u'name'DATA u'/012'START u'city' {}DATA u'link/366ping'END u'city'DATA u'/012'END u'author'

5.4. sgmllib 模块

sgmllib 模块, 提供了一个基本的 SGML 语法分析器. 它与 xmllib 分析器基本相同, 但限制更少(而且不是很完善). 如 Example 5-5 所示.

和在 xmllib 中一样, 这个分析器在遇到起始标签, 数据区域, 结束标签以及实体时调用内部方法. 如果你只是对某些标签感兴趣, 那么你可以定义特殊的方法.

5.4.0.1. Example 5-5. 使用 sgmllib 模块提取 Title 元素

File: sgmllib-example-1.py

import sgmllibimport string

class FoundTitle(Exception):    pass

class ExtractTitle(sgmllib.SGMLParser):

    def _ _init_ _(self, verbose=0):        sgmllib.SGMLParser._ _init_ _(self, verbose)        self.title = self.data = None

    def handle_data(self, data):        if self.data is not None:            self.data.append(data)

    def start_title(self, attrs):        self.data = []

    def end_title(self):        self.title = string.join(self.data, "")        raise FoundTitle # abort parsing!

def extract(file):    # extract title from an HTML/SGML stream    p = ExtractTitle()    try:        while 1:            # read small chunks            s = file.read(512)            if not s:                break            p.feed(s)        p.close()    except FoundTitle:        return p.title    return None

## try it out

print "html", "=>", extract(open("samples/sample.htm"))print "sgml", "=>", extract(open("samples/sample.sgm"))

html => A Title.sgml => Quotations

重载 unknown_starttag 和 unknown_endtag 方法就可以处理所有的标签. 如 Example 5-6 所示.

5.4.0.2. Example 5-6. 使用 sgmllib 模块格式化 SGML 文档

File: sgmllib-example-2.py

import sgmllibimport cgi, sys

class PrettyPrinter(sgmllib.SGMLParser):    # A simple SGML pretty printer

    def _ _init_ _(self):        # initialize base class        sgmllib.SGMLParser._ _init_ _(self)        self.flag = 0

    def newline(self):        # force newline, if necessary        if self.flag:            sys.stdout.write("/n")        self.flag = 0

    def unknown_starttag(self, tag, attrs):        # called for each start tag

        # the attrs argument is a list of (attr, value)        # tuples. convert it to a string.        text = ""        for attr, value in attrs:            text = text + " %s='%s'" % (attr, cgi.escape(value))

        self.newline()        sys.stdout.write("<%s%s>/n" % (tag, text))

    def handle_data(self, text):        # called for each text section        sys.stdout.write(text)        self.flag = (text[-1:] != "/n")

    def handle_entityref(self, text):        # called for each entity        sys.stdout.write("&%s;" % text)

    def unknown_endtag(self, tag):        # called for each end tag        self.newline()        sys.stdout.write("<%s>" % tag)

## try it out

file = open("samples/sample.sgm")

p = PrettyPrinter()p.feed(file.read())p.close()

<chapter><title>Quotations<title><epigraph><attribution>eff-bot, June 1997<attribution><para><quote>Nobody expects the Spanish Inquisition! Amongstour weaponry are such diverse elements as fear, surprise,ruth

Python3.x 标准模块库目录(下篇)相关推荐

python基础系列教程——Python3.x标准模块库目录
全栈工程师开发手册 (作者:栾鹏) python教程全解文本 string:通用字符串操作 re:正则表达式操作 difflib:差异计算工具 textwrap:文本填充 unicodedata:U ...
Python 3.x标准模块库目录
文本 1. string:通用字符串操作 2. re:正则表达式操作 3. difflib:差异计算工具 4. textwrap:文本填充 5. unicodedata:Unicode字符数据库 6. ...
Python3.9标准库math中的函数汇总介绍(53个函数和5个常数)
为了更好的用计算机帮助我们运算,首先要了解自己使用的库中有什么方法,否则就会做会多费力不讨好的重复工作. 本篇博文我们来了解Python标准库的math函数. 参考资料: Python3.9的math ...
python英译汉库模块_翻译|Python标准功能库1
上班的时候偷懒,把Python帮助里的标准功能库1过了一遍,顺便翻译了一下,虽然我知道基本没有人看,但不是说21世纪编程能力是基本的生存力嘛. 通过阅读本文,你将了解Python的11个标准功能库1. ...
Python中标准模块importlib详解
模块简介 Python提供了importlib包作为标准库的一部分.目的就是提供Python中import语句的实现(以及__import__函数).另外,importlib允许程序员创建他们自定义的 ...
【Python3】Python模块与包的导入
[Python3]Python模块与包的导入一.模块导入 1. 定义 Python 模块(Module),是一个 Python 文件,以 .py 结尾,包含了 Python 对象定义和Python语 ...
Python3.x：第三方库简介
环境管理管理 Python 版本和环境的工具 p – 非常简单的交互式 python 版本管理工具. pyenv – 简单的 Python 版本管理工具. Vex – 可以在虚拟环境中执行命令. v ...
python中处理日期和时间的标准模块是-2019python常见的170道面试题解析
语言特性 1.谈谈对 Python 和其他语言的区别 2.简述解释型和编译型编程语言 3.Python 的解释器种类以及相关特点? 4.说说你知道的Python3 和 Python2 之间的区别? 5 ...
linux闲话FHS标准下linux目录结构
2019独角兽企业重金招聘Python工程师标准>>> 1.闲话 2011年10月24日收到了秒针的OfferLetter并决定加入之后,就开始认真学习linux.坦 ...

Python3.x 标准模块库目录(下篇)

﻿Python Standard Library