Python开发【第五章】：常用模块

一、模块介绍：

1、模块定义

用来从逻辑上组织python代码（变量，函数，类，逻辑：实现一个功能），本质上就是.py结尾python文件

分类：内置模块、开源模块、自定义模块

2、导入模块

本质：导入模块的本质就是把python文件解释一遍；导入包的本质就是把包文件下面的init.py文件运行一遍

①同目录下模块的导入

#同级目录间importimport module_name              #直接导入模块
import module_name,module2_name     #导入多个模块     使用：模块名.加函数名
from module_name import *           #导入模块中所有函数和变量等。。不建议使用
from module_name import m1,m2,m3        #只导入模块中函数m1,m2,m3  使用：直接使用m1,m2,m3即可
from module_name import m1 as m   #导入module_name模块中m1函数并且重新赋值给m  使用：直接输入m即可

② 不同目录下模块的导入

#不同目录之间import  当前文件main.py#目录结构
# ├── Credit_card
# │
# ├── core  #
# │   ├── __init__.py
# │   └── main.py  # 当前文件
# ├── conf  #
# │   ├── __init__.py
# │   └── setting.py
# │   └── lzl.pyimport sys,oscreditcard_path=os.path.dirname(os.path.dirname(os.path.abspath(__file__))) #当前目录的上上级目录绝对路径，即Creditcard目录
sys.path.insert(0,creditcard_path)    #把Creditcard目录加入到系统路径print(sys.path)                      #打印系统环境路径
#['C:\\Users\\L\\PycharmProjects\\s14\\Day5\\Creditcard,.......]#import settings.py                   #无法直接import
#ImportError: No module named 'settings'from conf import settings           #from目录import模块settings.set()                        #执行settings下的函数
#in the settings

③ 不同目录下模块连环导入　

不同目录多个模块之间相互导入，为什么要引入这个概念，虽然老师没讲，但这个很重要，当时做atm程序时一个很大的坑........

目录结构：

目录结构
├── Credit_card
│
├── core#│   ├── __init__.py
│   └── main.py#当前文件
├── conf  #│   ├── __init__.py
│   └── setting.py
│   └── lzl.py

目录结构

conf目录下的文件:

#!/usr/bin/env python#-*- coding:utf-8 -*-#-Author-Lian#当前文件lzl.py
defname():print("name is lzl")

lzl.py

#当前文件settings，调用lzl.py模块
import lzl              #导入模块lzldefset():print("in the settings")lzl.name()#运行lzl模块下的函数
set()#执行函数set#in the settings#name is lzl

setttings.py

此时执行settings.py文件没有任何问题，就是同一目录下的模块之间的导入，关键来了，此刻croe目录下的main.py导入模块settings会出现什么状况呢？？！

core目录下的文件：

#不同目录之间连环import 当前文件main.py
import sys,oscreditcard_path=os.path.dirname(os.path.dirname(os.path.abspath(__file__))) #当前目录的上上级目录绝对路径，即Creditcard目录
sys.path.insert(0,creditcard_path)    #把Creditcard目录加入到系统路径from conf import settingssettings.set()                        #执行settings下的函数
#    import lzl              #导入模块lzl
#ImportError: No module named 'lzl'

可以看到直接报错：ImportError: No module named 'lzl'，想想什么会报错类？！刚才已经说到了，导入模块的本质就是把模块里的内容执行一遍，当main.py导入settings模块时，也会把settings里的内容执行一遍，即执行import lzl；但是对于main.py来说，不能直接import lzl，所有就出现了刚才的报错，那有什么办法可以解决？!

对conf目录下settings.py文件进行修改

#当前文件settings，调用lzl.py模块
from . import lzl              #通过相对路径导入模块lzldefset():print("in the settings")lzl.name()#运行lzl模块下的函数
set()#执行函数set#in the settings#name is lzl

settings.py

此时执行main.py文件

#不同目录之间连环import 当前文件main.py
import sys,oscreditcard_path=os.path.dirname(os.path.dirname(os.path.abspath(__file__))) #当前目录的上上级目录绝对路径，即Creditcard目录
sys.path.insert(0,creditcard_path)    #把Creditcard目录加入到系统路径from conf import settingssettings.set()                        #执行settings下的函数
# in the settings
# name is lzl
# in the settings
# name is lzl

没有任何报错，我们只对settings修改了lzl模块的调用方式，结果就完全不同，此时的from . import lzl 用到的是相对路径，这就是相对路径的优点所在

④ 不同目录多个模块相互导入，用相对路径

目录结构：

Day5├── Credit_card├── README.md├── core │   ├──__init__.py│   └── main.py ├── conf │   ├──__init__.py│   └── setting.py│   └── lzl.py

目录结构

conf目录下的文件:

#!/usr/bin/env python#-*- coding:utf-8 -*-#-Author-Lian#当前文件lzl.py  相对路径
defname():print("name is lzl")

lzl.py

#!/usr/bin/env python#-*- coding:utf-8 -*-#-Author-Lian#当前文件settings，调用lzl.py模块  相对路径
from . import lzl              #通过相对路径导入模块lzldefset():print("in the settings")lzl.name()#运行lzl模块下的函数
set()#执行函数set#in the settings#name is lzl

settings

core目录下的文件：

#不同目录之间连环import 当前文件main.py  相对路径from Day5.Credit_card.conf import settingssettings.set()                        #执行settings下的函数
# in the settings
# name is lzl
# in the settings
# name is lzl

lzl.py以及settings.py文件未变，main.py文件去掉了繁杂的sys.path添加的过程，直接执行from Day5.Credit_card.conf import settings，使用相对路径，更加简洁方便！

二、内置模块

1、time和datatime模块

时间相关的操作，时间有三种表示方式：

时间戳 1970年1月1日之后的秒，即：time.time()
格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d')
结构化时间元组包含了：年、日、星期等... time.struct_time 即：time.localtime()

time模块：

#time模块
import timeprint(time.time())              #时间戳
#1472037866.0750718print(time.localtime())        #结构化时间
#time.struct_time(tm_year=2016, tm_mon=8, tm_mday=25, tm_hour=8, tm_min=44, tm_sec=46, tm_wday=3, tm_yday=238, tm_isdst=0)print(time.strftime('%Y-%m-%d'))    #格式化的字符串
#2016-08-25
print(time.strftime('%Y-%m-%d',time.localtime()))
#2016-08-25print(time.gmtime())            #结构化时间
#time.struct_time(tm_year=2016, tm_mon=8, tm_mday=25, tm_hour=3, tm_min=8, tm_sec=48, tm_wday=3, tm_yday=238, tm_isdst=0)print(time.strptime('2014-11-11', '%Y-%m-%d'))  #结构化时间
#time.struct_time(tm_year=2014, tm_mon=11, tm_mday=11, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=315, tm_isdst=-1)print(time.asctime())
#Thu Aug 25 11:15:10 2016
print(time.asctime(time.localtime()))
#Thu Aug 25 11:15:10 2016
print(time.ctime(time.time()))
#Thu Aug 25 11:15:10 2016

结构化时间：

时间戳、格式化字符串、机构化时间相互转换：

datetime:

import datetimeprint(datetime.date)    #表示日期的类。常用的属性有year, month, day
#<class 'datetime.date'>
print(datetime.time)    #表示时间的类。常用的属性有hour, minute, second, microsecond
#<class 'datetime.time'>
print(datetime.datetime)        #表示日期时间
#<class 'datetime.datetime'>
print(datetime.timedelta)       #表示时间间隔，即两个时间点之间的长度
#<class 'datetime.timedelta'>print(datetime.datetime.now())
#2016-08-25 14:21:07.722285
print(datetime.datetime.now() - datetime.timedelta(days=5))
#2016-08-20 14:21:28.275460

更多-》》https://zhuanlan.zhihu.com/p/23679915?utm_source=itdadao&utm_medium=referral

1 importtime2
3 str = '2017-03-26 3:12'
4 str2 = '2017-05-26 13:12'
5 date1 = time.strptime(str, '%Y-%m-%d %H:%M')6 date2 = time.strptime(str2, '%Y-%m-%d %H:%M')7 if float(time.time()) >= float(time.mktime(date1)) and float(time.time()) <=float(time.mktime(date2)):8     print 'cccccccc'
9
10
11 importdatetime12
13 str = '2017-03-26 3:12'
14 str2 = '2017-05-26 13:12'
15 date1 = datetime.datetime.strptime(str,'%Y-%m-%d %H:%M')16 date2 = datetime.datetime.strptime(str2,'%Y-%m-%d %H:%M')17 datenow =datetime.datetime.now()18 if datenow <date1:19     print 'dddddd'

时间比较

2、random模块

生成随机数：

#random随机数模块
import randomprint(random.random())      #生成0到1的随机数
#0.7308387398872364print(random.randint(1,3))  #生成1-3随机数
#3print(random.randrange(1,3)) #生成1-2随机数，不包含3
#2print(random.choice("hello"))  #随机选取字符串
#eprint(random.sample("hello",2))     #随机选取特定的字符
#['l', 'h']items = [1,2,3,4,5,6,7]
random.shuffle(items)
print(items)
#[2, 3, 1, 6, 4, 7, 5]

验证码：

import random
checkcode = ''
for i in range(4):current = random.randrange(0,4)if current != i:temp = chr(random.randint(65,90))else:temp = random.randint(0,9)checkcode += str(temp)print(checkcode)
#51T6

3、os模块

用于提供系统级别的操作

#os模块
import osos.getcwd() #获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")  #改变当前脚本工作目录；相当于shell下cd
os.curdir  #返回当前目录: ('.')
os.pardir  #获取当前目录的父目录字符串名：('..')
os.makedirs('dirname1/dirname2')    #可生成多层递归目录
os.removedirs('dirname1')   # 若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')   # 生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')    #删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')    #列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove() # 删除一个文件
os.rename("oldname","newname") # 重命名文件/目录
os.stat('path/filename') # 获取文件/目录信息
os.sep    #输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep    #输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep    #输出用于分割文件路径的字符串
os.name    #输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")  #运行shell命令，直接显示 commans可以获取返回值
os.environ  #获取系统环境变量
os.path.abspath(path)  #返回path规范化的绝对路径
os.path.split(path)  #将path分割成目录和文件名二元组返回
os.path.dirname(path) # 返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path) # 返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)  #如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)  #如果path是绝对路径，返回True
os.path.isfile(path)  #如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)  #如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) # 将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)  #返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)  #返回path所指向的文件或者目录的最后修改时间

4、sys模块

用于提供对解释器相关的操作

#sys模块
import syssys.argv           #命令行参数List，第一个元素是程序本身路径
sys.exit(n)        #退出程序，正常退出时exit(0)
sys.version       # 获取Python解释程序的版本信息
sys.maxint         #最大的Int值
sys.path           #返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform      #返回操作系统平台名称
sys.stdout.write('please:')
val = sys.stdin.readline()[:-1]

详情：->>http://www.cnblogs.com/lianzhilei/p/5724847.html

5、shutil模块

高级的文件、文件夹、压缩包处理模块

①shutil.copyfileobj将文件内容拷贝到另一个文件中，可以部分内容

def copyfileobj(fsrc, fdst, length=16*1024):"""copy data from file-like object fsrc to file-like object fdst"""while 1:buf=fsrc.read(length)if notbuf:breakfdst.write(buf)

shutil.copyfileobj

#shutil 文件拷贝
import shutilf1 = open("fsrc",encoding="utf-8")f2 = open("fdst",encoding="utf-8")shutil.copyfile(f1,f2)#把文件f1里的内容拷贝到f2当中

②shutil.copyfile 文件拷贝

defcopyfile(src, dst):"""Copy data from src to dst"""if_samefile(src, dst):raise Error("`%s` and `%s` are the same file" %(src, dst))for fn in[src, dst]:try:st=os.stat(fn)exceptOSError:#File most likely does not existpasselse:#XXX What about other special files? (sockets, devices...)ifstat.S_ISFIFO(st.st_mode):raise SpecialFileError("`%s` is a named pipe" %fn)with open(src,'rb') as fsrc:with open(dst,'wb') as fdst:copyfileobj(fsrc, fdst)

shutil.copyfile

#shutil.copyfile 文件拷贝
import shutilshutil.copyfile("f1","f2")
#把文件f1里的内容拷贝到f2当中

③ shutil.copymode(src, dst)仅拷贝权限。内容、组、用户均不变

defcopymode(src, dst):"""Copy mode bits from src to dst"""if hasattr(os, 'chmod'):st=os.stat(src)mode=stat.S_IMODE(st.st_mode)os.chmod(dst, mode)

shutil.copymode

④ shutil.copystat(src, dst) 拷贝状态的信息，包括：mode bits, atime, mtime, flags

defcopystat(src, dst):"""Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""st=os.stat(src)mode=stat.S_IMODE(st.st_mode)if hasattr(os, 'utime'):os.utime(dst, (st.st_atime, st.st_mtime))if hasattr(os, 'chmod'):os.chmod(dst, mode)if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):try:os.chflags(dst, st.st_flags)exceptOSError, why:for err in 'EOPNOTSUPP', 'ENOTSUP':if hasattr(errno, err) and why.errno ==getattr(errno, err):breakelse:raise

shutil.copystat

⑤shutil.copy(src, dst) 拷贝文件和权限

defcopy(src, dst):"""Copy data and mode bits ("cp src dst").The destination may be a directory."""ifos.path.isdir(dst):dst=os.path.join(dst, os.path.basename(src))copyfile(src, dst)copymode(src, dst)

shutil.copy

⑥ shutil.copy2(src, dst) 拷贝文件和状态信息

defcopy2(src, dst):"""Copy data and all stat info ("cp -p src dst").The destination may be a directory."""ifos.path.isdir(dst):dst=os.path.join(dst, os.path.basename(src))copyfile(src, dst)copystat(src, dst)

shutil.copy2

⑦ shutil.copytree(src, dst, symlinks=False, ignore=None) 递归的去拷贝文件拷贝多层目录

def ignore_patterns(*patterns):"""Function that can be used as copytree() ignore parameter.Patterns is a sequence of glob-style patternsthat are used to exclude files"""def_ignore_patterns(path, names):ignored_names=[]for pattern inpatterns:ignored_names.extend(fnmatch.filter(names, pattern))returnset(ignored_names)return_ignore_patternsdef copytree(src, dst, symlinks=False, ignore=None):"""Recursively copy a directory tree using copy2().The destination directory must not already exist.If exception(s) occur, an Error is raised with a list of reasons.If the optional symlinks flag is true, symbolic links in thesource tree result in symbolic links in the destination tree; ifit is false, the contents of the files pointed to by symboliclinks are copied.The optional ignore argument is a callable. If given, itis called with the `src` parameter, which is the directorybeing visited by copytree(), and `names` which is the list of`src` contents, as returned by os.listdir():callable(src, names) -> ignored_namesSince copytree() is called recursively, the callable will becalled once for each directory that is copied. It returns alist of names relative to the `src` directory that shouldnot be copied.XXX Consider this example code rather than the ultimate tool."""names=os.listdir(src)if ignore is notNone:ignored_names=ignore(src, names)else:ignored_names=set()os.makedirs(dst)errors=[]for name innames:if name inignored_names:continuesrcname=os.path.join(src, name)dstname=os.path.join(dst, name)try:if symlinks andos.path.islink(srcname):linkto=os.readlink(srcname)os.symlink(linkto, dstname)elifos.path.isdir(srcname):copytree(srcname, dstname, symlinks, ignore)else:#Will raise a SpecialFileError for unsupported file types
copy2(srcname, dstname)#catch the Error from the recursive copytree so that we can#continue with other filesexceptError, err:errors.extend(err.args[0])exceptEnvironmentError, why:errors.append((srcname, dstname, str(why)))try:copystat(src, dst)exceptOSError, why:if WindowsError is not None andisinstance(why, WindowsError):#Copying file access times may fail on Windowspasselse:errors.append((src, dst, str(why)))iferrors:raise Error, errors

shutil.copytree

⑧ shutil.rmtree(path[, ignore_errors[, onerror]]) 递归的去删除文件

def rmtree(path, ignore_errors=False, οnerrοr=None):"""Recursively delete a directory tree.If ignore_errors is set, errors are ignored; otherwise, if onerroris set, it is called to handle the error with arguments (func,path, exc_info) where func is os.listdir, os.remove, or os.rmdir;path is the argument to that function that caused it to fail; andexc_info is a tuple returned by sys.exc_info().  If ignore_errorsis false and onerror is None, an exception is raised."""ifignore_errors:def onerror(*args):passelif onerror isNone:def onerror(*args):raisetry:ifos.path.islink(path):#symlinks to directories are forbidden, see bug #1669raise OSError("Cannot call rmtree on a symbolic link")exceptOSError:onerror(os.path.islink, path, sys.exc_info())#can't continue even if onerror hook returnsreturnnames=[]try:names=os.listdir(path)exceptos.error, err:onerror(os.listdir, path, sys.exc_info())for name innames:fullname=os.path.join(path, name)try:mode=os.lstat(fullname).st_modeexceptos.error:mode=0ifstat.S_ISDIR(mode):rmtree(fullname, ignore_errors, onerror)else:try:os.remove(fullname)exceptos.error, err:onerror(os.remove, fullname, sys.exc_info())try:os.rmdir(path)exceptos.error:onerror(os.rmdir, path, sys.exc_info())

shutil.rmtree

⑨ shutil.move(src, dst) 递归的去移动文件

defmove(src, dst):"""Recursively move a file or directory to another location. This issimilar to the Unix "mv" command.If the destination is a directory or a symlink to a directory, the sourceis moved inside the directory. The destination path must not alreadyexist.If the destination already exists but is not a directory, it may beoverwritten depending on os.rename() semantics.If the destination is on our current filesystem, then rename() is used.Otherwise, src is copied to the destination and then removed.A lot more could be done here...  A look at a mv.c shows a lot ofthe issues this implementation glosses over."""real_dst=dstifos.path.isdir(dst):if_samefile(src, dst):#We might be on a case insensitive filesystem,#perform the rename anyway.
os.rename(src, dst)returnreal_dst=os.path.join(dst, _basename(src))ifos.path.exists(real_dst):raise Error, "Destination path '%s' already exists" %real_dsttry:os.rename(src, real_dst)exceptOSError:ifos.path.isdir(src):if_destinsrc(src, dst):raise Error, "Cannot move a directory '%s' into itself '%s'." %(src, dst)copytree(src, real_dst, symlinks=True)rmtree(src)else:copy2(src, real_dst)os.unlink(src)

shutil.move

⑩ shutil.make_archive(base_name, format,...) 创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，

　　　　　　　　如：www =>保存至当前路径

　　　　　　　　如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/

format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”
root_dir：要压缩的文件夹路径（默认当前目录）
owner：用户，默认当前用户
group：组，默认当前组
logger：用于记录日志，通常是logging.Logger对象

def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,dry_run=0, owner=None, group=None, logger=None):"""Create an archive file (eg. zip or tar).'base_name' is the name of the file to create, minus any format-specificextension; 'format' is the archive format: one of "zip", "tar", "bztar"or "gztar".'root_dir' is a directory that will be the root directory of thearchive; ie. we typically chdir into 'root_dir' before creating thearchive.  'base_dir' is the directory where we start archiving from;ie. 'base_dir' will be the common prefix of all files anddirectories in the archive.  'root_dir' and 'base_dir' both defaultto the current directory.  Returns the name of the archive file.'owner' and 'group' are used when creating a tar archive. By default,uses the current owner and group."""save_cwd=os.getcwd()if root_dir is notNone:if logger is notNone:logger.debug("changing into '%s'", root_dir)base_name=os.path.abspath(base_name)if notdry_run:os.chdir(root_dir)if base_dir isNone:base_dir=os.curdirkwargs= {'dry_run': dry_run, 'logger': logger}try:format_info=_ARCHIVE_FORMATS[format]exceptKeyError:raise ValueError, "unknown archive format '%s'" %formatfunc=format_info[0]for arg, val in format_info[1]:kwargs[arg]=valif format != 'zip':kwargs['owner'] =ownerkwargs['group'] =grouptry:filename= func(base_name, base_dir, **kwargs)finally:if root_dir is notNone:if logger is notNone:logger.debug("changing back to '%s'", save_cwd)os.chdir(save_cwd)return filename

源码

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

importzipfile#压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()#解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()zipfile 压缩解压

zipfile 压缩解压

importtarfile#压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()#解压
tar = tarfile.open('your.tar','r')
tar.extractall()#可设置解压地址
tar.close()tarfile 压缩解压

tarfile 压缩解压

classZipFile(object):"""Class with methods to open, read, write, close, list zip files.z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)file: Either the path to the file, or a file-like object.If it is a path, the file will be opened and closed by ZipFile.mode: The mode can be either read "r", write "w" or append "a".compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).allowZip64: if True ZipFile will create files with ZIP64 extensions whenneeded, otherwise it will raise an exception when this wouldbe necessary."""fp= None                   #Set here since __del__ checks itdef __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):"""Open the ZIP file with mode read "r", write "w" or append "a"."""if mode not in ("r", "w", "a"):raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')if compression ==ZIP_STORED:passelif compression ==ZIP_DEFLATED:if notzlib:raiseRuntimeError,\"Compression requires the (missing) zlib module"else:raise RuntimeError, "That compression method is not supported"self._allowZip64=allowZip64self._didModify=Falseself.debug= 0  #Level of printing: 0 through 3self.NameToInfo = {}    #Find file info given nameself.filelist = []      #List of ZipInfo instances for archiveself.compression = compression  #Method of compressionself.mode = key = mode.replace('b', '')[0]self.pwd=Noneself._comment= ''#Check if we were passed a file-like objectifisinstance(file, basestring):self._filePassed=0self.filename=filemodeDict= {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}try:self.fp=open(file, modeDict[mode])exceptIOError:if mode == 'a':mode= key = 'w'self.fp=open(file, modeDict[mode])else:raiseelse:self._filePassed= 1self.fp=fileself.filename= getattr(file, 'name', None)try:if key == 'r':self._RealGetContents()elif key == 'w':#set the modified flag so central directory gets written#even if no files are added to the archiveself._didModify =Trueelif key == 'a':try:#See if file is a zip file
self._RealGetContents()#seek to start of directory and overwrite
self.fp.seek(self.start_dir, 0)exceptBadZipfile:#file is not a zip file, just appendself.fp.seek(0, 2)#set the modified flag so central directory gets written#even if no files are added to the archiveself._didModify =Trueelse:raise RuntimeError('Mode must be "r", "w" or "a"')except:fp=self.fpself.fp=Noneif notself._filePassed:fp.close()raisedef __enter__(self):returnselfdef __exit__(self, type, value, traceback):self.close()def_RealGetContents(self):"""Read in the table of contents for the ZIP file."""fp=self.fptry:endrec=_EndRecData(fp)exceptIOError:raise BadZipfile("File is not a zip file")if notendrec:raise BadZipfile, "File is not a zip file"if self.debug > 1:printendrecsize_cd= endrec[_ECD_SIZE]             #bytes in central directoryoffset_cd = endrec[_ECD_OFFSET]         #offset of central directoryself._comment = endrec[_ECD_COMMENT]    #archive comment#"concat" is zero, unless zip was concatenated to another fileconcat = endrec[_ECD_LOCATION] - size_cd -offset_cdif endrec[_ECD_SIGNATURE] ==stringEndArchive64:#If Zip64 extension structures are present, account for themconcat -= (sizeEndCentDir64 +sizeEndCentDir64Locator)if self.debug > 2:inferred= concat +offset_cdprint "given, inferred, offset", offset_cd, inferred, concat#self.start_dir:  Position of start of central directoryself.start_dir = offset_cd +concatfp.seek(self.start_dir, 0)data=fp.read(size_cd)fp=cStringIO.StringIO(data)total=0while total <size_cd:centdir=fp.read(sizeCentralDir)if len(centdir) !=sizeCentralDir:raise BadZipfile("Truncated central directory")centdir=struct.unpack(structCentralDir, centdir)if centdir[_CD_SIGNATURE] !=stringCentralDir:raise BadZipfile("Bad magic number for central directory")if self.debug > 2:printcentdirfilename=fp.read(centdir[_CD_FILENAME_LENGTH])#Create ZipInfo instance to store file informationx =ZipInfo(filename)x.extra=fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])x.comment=fp.read(centdir[_CD_COMMENT_LENGTH])x.header_offset=centdir[_CD_LOCAL_HEADER_OFFSET](x.create_version, x.create_system, x.extract_version, x.reserved,x.flag_bits, x.compress_type, t, d,x.CRC, x.compress_size, x.file_size)= centdir[1:12]x.volume, x.internal_attr, x.external_attr= centdir[15:18]#Convert date/time code to (year, month, day, hour, min, sec)x._raw_time =tx.date_time= ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,t>>11, (t>>5)&0x3F, (t&0x1F) * 2)x._decodeExtra()x.header_offset= x.header_offset +concatx.filename=x._decodeFilename()self.filelist.append(x)self.NameToInfo[x.filename]=x#update total bytes read from central directorytotal = (total + sizeCentralDir +centdir[_CD_FILENAME_LENGTH]+centdir[_CD_EXTRA_FIELD_LENGTH]+centdir[_CD_COMMENT_LENGTH])if self.debug > 2:print "total", totaldefnamelist(self):"""Return a list of file names in the archive."""l=[]for data inself.filelist:l.append(data.filename)returnldefinfolist(self):"""Return a list of class ZipInfo instances for files in thearchive."""returnself.filelistdefprintdir(self):"""Print a table of contents for the zip file."""print "%-46s %19s %12s" % ("File Name", "Modified", "Size")for zinfo inself.filelist:date= "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]print "%-46s %s %12d" %(zinfo.filename, date, zinfo.file_size)deftestzip(self):"""Read all the files and check the CRC."""chunk_size= 2 ** 20for zinfo inself.filelist:try:#Read by chunks, to avoid an OverflowError or a#MemoryError with very large embedded files.with self.open(zinfo.filename, "r") as f:while f.read(chunk_size):     #Check CRC-32passexceptBadZipfile:returnzinfo.filenamedefgetinfo(self, name):"""Return the instance of ZipInfo given 'name'."""info=self.NameToInfo.get(name)if info isNone:raiseKeyError('There is no item named %r in the archive' %name)returninfodefsetpassword(self, pwd):"""Set default password for encrypted files."""self.pwd=pwd@propertydefcomment(self):"""The comment text associated with the ZIP file."""returnself._comment@comment.setterdefcomment(self, comment):#check for valid comment lengthif len(comment) >ZIP_MAX_COMMENT:importwarningswarnings.warn('Archive comment is too long; truncating to %d bytes'% ZIP_MAX_COMMENT, stacklevel=2)comment=comment[:ZIP_MAX_COMMENT]self._comment=commentself._didModify=Truedef read(self, name, pwd=None):"""Return file bytes (as a string) for name."""return self.open(name, "r", pwd).read()def open(self, name, mode="r", pwd=None):"""Return file-like object for 'name'."""if mode not in ("r", "U", "rU"):raise RuntimeError, 'open() requires mode "r", "U", or "rU"'if notself.fp:raiseRuntimeError, \"Attempt to read ZIP archive that was already closed"#Only open a new file for instances where we were not#given a file object in the constructorifself._filePassed:zef_file=self.fpshould_close=Falseelse:zef_file= open(self.filename, 'rb')should_close=Truetry:#Make sure we have an info objectifisinstance(name, ZipInfo):#'name' is already an info objectzinfo =nameelse:#Get info object for namezinfo =self.getinfo(name)zef_file.seek(zinfo.header_offset, 0)#Skip the file header:fheader =zef_file.read(sizeFileHeader)if len(fheader) !=sizeFileHeader:raise BadZipfile("Truncated file header")fheader=struct.unpack(structFileHeader, fheader)if fheader[_FH_SIGNATURE] !=stringFileHeader:raise BadZipfile("Bad magic number for file header")fname=zef_file.read(fheader[_FH_FILENAME_LENGTH])iffheader[_FH_EXTRA_FIELD_LENGTH]:zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])if fname !=zinfo.orig_filename:raiseBadZipfile, \'File name in directory "%s" and header "%s" differ.' %(zinfo.orig_filename, fname)#check for encrypted flag & handle passwordis_encrypted = zinfo.flag_bits & 0x1zd=Noneifis_encrypted:if notpwd:pwd=self.pwdif notpwd:raise RuntimeError, "File %s is encrypted,"\"password required for extraction" %namezd=_ZipDecrypter(pwd)#The first 12 bytes in the cypher stream is an encryption header#used to strengthen the algorithm. The first 11 bytes are#completely random, while the 12th contains the MSB of the CRC,#or the MSB of the file time depending on the header type#and is used to check the correctness of the password.bytes = zef_file.read(12)h= map(zd, bytes[0:12])if zinfo.flag_bits & 0x8:#compare against the file type from extended local headerscheck_byte = (zinfo._raw_time >> 8) & 0xffelse:#compare against the CRC otherwisecheck_byte = (zinfo.CRC >> 24) & 0xffif ord(h[11]) !=check_byte:raise RuntimeError("Bad password for file", name)returnZipExtFile(zef_file, mode, zinfo, zd,close_fileobj=should_close)except:ifshould_close:zef_file.close()raisedef extract(self, member, path=None, pwd=None):"""Extract a member from the archive to the current working directory,using its full name. Its file information is extracted as accuratelyas possible. `member' may be a filename or a ZipInfo object. You canspecify a different directory using `path'."""if notisinstance(member, ZipInfo):member=self.getinfo(member)if path isNone:path=os.getcwd()returnself._extract_member(member, path, pwd)def extractall(self, path=None, members=None, pwd=None):"""Extract all members from the archive to the current workingdirectory. `path' specifies a different directory to extract to.`members' is optional and must be a subset of the list returnedby namelist()."""if members isNone:members=self.namelist()for zipinfo inmembers:self.extract(zipinfo, path, pwd)def_extract_member(self, member, targetpath, pwd):"""Extract the ZipInfo object 'member' to a physicalfile on the path targetpath."""#build the destination pathname, replacing#forward slashes to platform specific separators.arcname = member.filename.replace('/', os.path.sep)ifos.path.altsep:arcname=arcname.replace(os.path.altsep, os.path.sep)#interpret absolute pathname as relative, remove drive letter or#UNC path, redundant separators, "." and ".." components.arcname = os.path.splitdrive(arcname)[1]arcname= os.path.sep.join(x for x inarcname.split(os.path.sep)if x not in ('', os.path.curdir, os.path.pardir))if os.path.sep == '\\':#filter illegal characters on Windowsillegal = ':<>|"?*'ifisinstance(arcname, unicode):table= {ord(c): ord('_') for c inillegal}else:table= string.maketrans(illegal, '_' *len(illegal))arcname=arcname.translate(table)#remove trailing dotsarcname = (x.rstrip('.') for x inarcname.split(os.path.sep))arcname= os.path.sep.join(x for x in arcname ifx)targetpath=os.path.join(targetpath, arcname)targetpath=os.path.normpath(targetpath)#Create all upper directories if necessary.upperdirs =os.path.dirname(targetpath)if upperdirs and notos.path.exists(upperdirs):os.makedirs(upperdirs)if member.filename[-1] == '/':if notos.path.isdir(targetpath):os.mkdir(targetpath)returntargetpathwith self.open(member, pwd=pwd) as source, \file(targetpath,"wb") as target:shutil.copyfileobj(source, target)returntargetpathdef_writecheck(self, zinfo):"""Check for errors before writing a file to the archive."""if zinfo.filename inself.NameToInfo:importwarningswarnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)if self.mode not in ("w", "a"):raise RuntimeError, 'write() requires mode "w" or "a"'if notself.fp:raiseRuntimeError, \"Attempt to write ZIP archive that was already closed"if zinfo.compress_type == ZIP_DEFLATED and notzlib:raiseRuntimeError, \"Compression requires the (missing) zlib module"if zinfo.compress_type not in(ZIP_STORED, ZIP_DEFLATED):raiseRuntimeError, \"That compression method is not supported"if notself._allowZip64:requires_zip64=Noneif len(self.filelist) >=ZIP_FILECOUNT_LIMIT:requires_zip64= "Files count"elif zinfo.file_size >ZIP64_LIMIT:requires_zip64= "Filesize"elif zinfo.header_offset >ZIP64_LIMIT:requires_zip64= "Zipfile size"ifrequires_zip64:raise LargeZipFile(requires_zip64 +"would require ZIP64 extensions")def write(self, filename, arcname=None, compress_type=None):"""Put the bytes from filename into the archive under the namearcname."""if notself.fp:raiseRuntimeError("Attempt to write to ZIP archive that was already closed")st=os.stat(filename)isdir=stat.S_ISDIR(st.st_mode)mtime=time.localtime(st.st_mtime)date_time= mtime[0:6]#Create ZipInfo instance to store file informationif arcname isNone:arcname=filenamearcname= os.path.normpath(os.path.splitdrive(arcname)[1])while arcname[0] in(os.sep, os.altsep):arcname= arcname[1:]ifisdir:arcname+= '/'zinfo=ZipInfo(arcname, date_time)zinfo.external_attr= (st[0] & 0xFFFF) << 16L      #Unix attributesif compress_type isNone:zinfo.compress_type=self.compressionelse:zinfo.compress_type=compress_typezinfo.file_size=st.st_sizezinfo.flag_bits= 0x00zinfo.header_offset= self.fp.tell()    #Start of header bytes
self._writecheck(zinfo)self._didModify=Trueifisdir:zinfo.file_size=0zinfo.compress_size=0zinfo.CRC=0zinfo.external_attr|= 0x10  #MS-DOS directory flag
self.filelist.append(zinfo)self.NameToInfo[zinfo.filename]=zinfoself.fp.write(zinfo.FileHeader(False))returnwith open(filename,"rb") as fp:#Must overwrite CRC and sizes with correct data laterzinfo.CRC = CRC =0zinfo.compress_size= compress_size =0#Compressed size can be larger than uncompressed sizezip64 = self._allowZip64 and\zinfo.file_size* 1.05 >ZIP64_LIMITself.fp.write(zinfo.FileHeader(zip64))if zinfo.compress_type ==ZIP_DEFLATED:cmpr=zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,zlib.DEFLATED,-15)else:cmpr=Nonefile_size=0while 1:buf= fp.read(1024 * 8)if notbuf:breakfile_size= file_size +len(buf)CRC= crc32(buf, CRC) & 0xffffffffifcmpr:buf=cmpr.compress(buf)compress_size= compress_size +len(buf)self.fp.write(buf)ifcmpr:buf=cmpr.flush()compress_size= compress_size +len(buf)self.fp.write(buf)zinfo.compress_size=compress_sizeelse:zinfo.compress_size=file_sizezinfo.CRC=CRCzinfo.file_size=file_sizeif not zip64 andself._allowZip64:if file_size >ZIP64_LIMIT:raise RuntimeError('File size has increased during compressing')if compress_size >ZIP64_LIMIT:raise RuntimeError('Compressed size larger than uncompressed size')#Seek backwards and write file header (which will now include#correct CRC and file sizes)position = self.fp.tell()       #Preserve current position in file
self.fp.seek(zinfo.header_offset, 0)self.fp.write(zinfo.FileHeader(zip64))self.fp.seek(position, 0)self.filelist.append(zinfo)self.NameToInfo[zinfo.filename]=zinfodef writestr(self, zinfo_or_arcname, bytes, compress_type=None):"""Write a file into the archive.  The contents is the string'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance orthe name of the file in the archive."""if notisinstance(zinfo_or_arcname, ZipInfo):zinfo= ZipInfo(filename=zinfo_or_arcname,date_time=time.localtime(time.time())[:6])zinfo.compress_type=self.compressionif zinfo.filename[-1] == '/':zinfo.external_attr= 0o40775 << 16   #drwxrwxr-xzinfo.external_attr |= 0x10           #MS-DOS directory flagelse:zinfo.external_attr= 0o600 << 16     #?rw-------else:zinfo=zinfo_or_arcnameif notself.fp:raiseRuntimeError("Attempt to write to ZIP archive that was already closed")if compress_type is notNone:zinfo.compress_type=compress_typezinfo.file_size= len(bytes)            #Uncompressed sizezinfo.header_offset = self.fp.tell()    #Start of header bytes
self._writecheck(zinfo)self._didModify=Truezinfo.CRC= crc32(bytes) & 0xffffffff       #CRC-32 checksumif zinfo.compress_type ==ZIP_DEFLATED:co=zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,zlib.DEFLATED,-15)bytes= co.compress(bytes) +co.flush()zinfo.compress_size= len(bytes)    #Compressed sizeelse:zinfo.compress_size=zinfo.file_sizezip64= zinfo.file_size > ZIP64_LIMIT or\zinfo.compress_size>ZIP64_LIMITif zip64 and notself._allowZip64:raise LargeZipFile("Filesize would require ZIP64 extensions")self.fp.write(zinfo.FileHeader(zip64))self.fp.write(bytes)if zinfo.flag_bits & 0x08:#Write CRC and file sizes after the file datafmt = '<LQQ' if zip64 else '<LLL'self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,zinfo.file_size))self.fp.flush()self.filelist.append(zinfo)self.NameToInfo[zinfo.filename]=zinfodef __del__(self):"""Call the "close()" method in case the user forgot."""self.close()defclose(self):"""Close the file, and for mode "w" and "a" write the endingrecords."""if self.fp isNone:returntry:if self.mode in ("w", "a") and self._didModify: #write ending recordspos1 =self.fp.tell()for zinfo in self.filelist:         #write central directorydt =zinfo.date_timedosdate= (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]dostime= dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)extra=[]if zinfo.file_size >ZIP64_LIMIT \or zinfo.compress_size >ZIP64_LIMIT:extra.append(zinfo.file_size)extra.append(zinfo.compress_size)file_size= 0xffffffffcompress_size= 0xffffffffelse:file_size=zinfo.file_sizecompress_size=zinfo.compress_sizeif zinfo.header_offset >ZIP64_LIMIT:extra.append(zinfo.header_offset)header_offset= 0xffffffffLelse:header_offset=zinfo.header_offsetextra_data=zinfo.extraifextra:#Append a ZIP64 field to the extra'sextra_data =struct.pack('<HH' + 'Q'*len(extra),1, 8*len(extra), *extra) +extra_dataextract_version= max(45, zinfo.extract_version)create_version= max(45, zinfo.create_version)else:extract_version=zinfo.extract_versioncreate_version=zinfo.create_versiontry:filename, flag_bits=zinfo._encodeFilenameFlags()centdir=struct.pack(structCentralDir,stringCentralDir, create_version,zinfo.create_system, extract_version, zinfo.reserved,flag_bits, zinfo.compress_type, dostime, dosdate,zinfo.CRC, compress_size, file_size,len(filename), len(extra_data), len(zinfo.comment),0, zinfo.internal_attr, zinfo.external_attr,header_offset)exceptDeprecationWarning:print >>sys.stderr, (structCentralDir,stringCentralDir, create_version,zinfo.create_system, extract_version, zinfo.reserved,zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,zinfo.CRC, compress_size, file_size,len(zinfo.filename), len(extra_data), len(zinfo.comment),0, zinfo.internal_attr, zinfo.external_attr,header_offset)raiseself.fp.write(centdir)self.fp.write(filename)self.fp.write(extra_data)self.fp.write(zinfo.comment)pos2=self.fp.tell()#Write end-of-zip-archive recordcentDirCount =len(self.filelist)centDirSize= pos2 -pos1centDirOffset=pos1requires_zip64=Noneif centDirCount >ZIP_FILECOUNT_LIMIT:requires_zip64= "Files count"elif centDirOffset >ZIP64_LIMIT:requires_zip64= "Central directory offset"elif centDirSize >ZIP64_LIMIT:requires_zip64= "Central directory size"ifrequires_zip64:#Need to write the ZIP64 end-of-archive recordsif notself._allowZip64:raise LargeZipFile(requires_zip64 +"would require ZIP64 extensions")zip64endrec=struct.pack(structEndArchive64, stringEndArchive64,44, 45, 45, 0, 0, centDirCount, centDirCount,centDirSize, centDirOffset)self.fp.write(zip64endrec)zip64locrec=struct.pack(structEndArchive64Locator,stringEndArchive64Locator, 0, pos2,1)self.fp.write(zip64locrec)centDirCount= min(centDirCount, 0xFFFF)centDirSize= min(centDirSize, 0xFFFFFFFF)centDirOffset= min(centDirOffset, 0xFFFFFFFF)endrec=struct.pack(structEndArchive, stringEndArchive,0, 0, centDirCount, centDirCount,centDirSize, centDirOffset, len(self._comment))self.fp.write(endrec)self.fp.write(self._comment)self.fp.flush()finally:fp=self.fpself.fp=Noneif notself._filePassed:fp.close()ZipFile

ZipFile 源码

classTarFile(object):"""The TarFile Class provides an interface to tar archives."""debug= 0                   #May be set from 0 (no msgs) to 3 (all msgs)
dereference= False         #If true, add content of linked file to the#tar file, else the link.
ignore_zeros= False        #If true, skips empty or invalid blocks and#continues processing.
errorlevel= 1              #If 0, fatal errors only appear in debug#messages (if debug >= 0). If > 0, errors#are passed to the caller as exceptions.
format= DEFAULT_FORMAT     #The format to use when creating an archive.
encoding= ENCODING         #Encoding for 8-bit character strings.
errors= None               #Error handler for unicode conversion.
tarinfo= TarInfo           #The default TarInfo class to use.
fileobject= ExFileObject   #The default ExFileObject class to use.def __init__(self, name=None, mode="r", fileobj=None, format=None,tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,errors=None, pax_headers=None, debug=None, errorlevel=None):"""Open an (uncompressed) tar archive `name'. `mode' is either 'r' toread from an existing archive, 'a' to append data to an existingfile or 'w' to create a new file overwriting an existing one. `mode'defaults to 'r'.If `fileobj' is given, it is used for reading or writing data. If itcan be determined, `mode' is overridden by `fileobj's mode.`fileobj' is not closed, when TarFile is closed."""modes= {"r": "rb", "a": "r+b", "w": "wb"}if mode not inmodes:raise ValueError("mode must be 'r', 'a' or 'w'")self.mode=modeself._mode=modes[mode]if notfileobj:if self.mode == "a" and notos.path.exists(name):#Create nonexistent files in append mode.self.mode = "w"self._mode= "wb"fileobj=bltn_open(name, self._mode)self._extfileobj=Falseelse:if name is None and hasattr(fileobj, "name"):name=fileobj.nameif hasattr(fileobj, "mode"):self._mode=fileobj.modeself._extfileobj=Trueself.name= os.path.abspath(name) if name elseNoneself.fileobj=fileobj#Init attributes.if format is notNone:self.format=formatif tarinfo is notNone:self.tarinfo=tarinfoif dereference is notNone:self.dereference=dereferenceif ignore_zeros is notNone:self.ignore_zeros=ignore_zerosif encoding is notNone:self.encoding=encodingif errors is notNone:self.errors=errorselif mode == "r":self.errors= "utf-8"else:self.errors= "strict"if pax_headers is not None and self.format ==PAX_FORMAT:self.pax_headers=pax_headerselse:self.pax_headers={}if debug is notNone:self.debug=debugif errorlevel is notNone:self.errorlevel=errorlevel#Init datastructures.self.closed =Falseself.members= []       #list of members as TarInfo objectsself._loaded = False    #flag if all members have been readself.offset =self.fileobj.tell()#current position in the archive fileself.inodes = {}        #dictionary caching the inodes of#archive members already addedtry:if self.mode == "r":self.firstmember=Noneself.firstmember=self.next()if self.mode == "a":#Move to the end of the archive,#before the first empty block.whileTrue:self.fileobj.seek(self.offset)try:tarinfo=self.tarinfo.fromtarfile(self)self.members.append(tarinfo)exceptEOFHeaderError:self.fileobj.seek(self.offset)breakexceptHeaderError, e:raiseReadError(str(e))if self.mode in "aw":self._loaded=Trueifself.pax_headers:buf=self.tarinfo.create_pax_global_header(self.pax_headers.copy())self.fileobj.write(buf)self.offset+=len(buf)except:if notself._extfileobj:self.fileobj.close()self.closed=Trueraisedef_getposix(self):return self.format ==USTAR_FORMATdef_setposix(self, value):importwarningswarnings.warn("use the format attribute instead", DeprecationWarning,2)ifvalue:self.format=USTAR_FORMATelse:self.format=GNU_FORMATposix=property(_getposix, _setposix)#--------------------------------------------------------------------------#Below are the classmethods which act as alternate constructors to the#TarFile class. The open() method is the only one that is needed for#public use; it is the "super"-constructor and is able to select an#adequate "sub"-constructor for a particular compression using the mapping#from OPEN_METH.#    #This concept allows one to subclass TarFile without losing the comfort of#the super-constructor. A sub-constructor is registered and made available#by adding it to the mapping in OPEN_METH.
@classmethoddef open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):"""Open a tar archive for reading, writing or appending. Returnan appropriate TarFile class.mode:'r' or 'r:*' open for reading with transparent compression'r:'         open for reading exclusively uncompressed'r:gz'       open for reading with gzip compression'r:bz2'      open for reading with bzip2 compression'a' or 'a:'  open for appending, creating the file if necessary'w' or 'w:'  open for writing without compression'w:gz'       open for writing with gzip compression'w:bz2'      open for writing with bzip2 compression'r|*'        open a stream of tar blocks with transparent compression'r|'         open an uncompressed stream of tar blocks for reading'r|gz'       open a gzip compressed stream of tar blocks'r|bz2'      open a bzip2 compressed stream of tar blocks'w|'         open an uncompressed stream for writing'w|gz'       open a gzip compressed stream for writing'w|bz2'      open a bzip2 compressed stream for writing"""if not name and notfileobj:raise ValueError("nothing to open")if mode in ("r", "r:*"):#Find out which *open() is appropriate for opening the file.for comptype incls.OPEN_METH:func=getattr(cls, cls.OPEN_METH[comptype])if fileobj is notNone:saved_pos=fileobj.tell()try:return func(name, "r", fileobj, **kwargs)except(ReadError, CompressionError), e:if fileobj is notNone:fileobj.seek(saved_pos)continueraise ReadError("file could not be opened successfully")elif ":" inmode:filemode, comptype= mode.split(":", 1)filemode= filemode or "r"comptype= comptype or "tar"#Select the *open() function according to#given compression.if comptype incls.OPEN_METH:func=getattr(cls, cls.OPEN_METH[comptype])else:raise CompressionError("unknown compression type %r" %comptype)return func(name, filemode, fileobj, **kwargs)elif "|" inmode:filemode, comptype= mode.split("|", 1)filemode= filemode or "r"comptype= comptype or "tar"if filemode not in ("r", "w"):raise ValueError("mode must be 'r' or 'w'")stream=_Stream(name, filemode, comptype, fileobj, bufsize)try:t= cls(name, filemode, stream, **kwargs)except:stream.close()raiset._extfileobj=Falsereturntelif mode in ("a", "w"):return cls.taropen(name, mode, fileobj, **kwargs)raise ValueError("undiscernible mode")@classmethoddef taropen(cls, name, mode="r", fileobj=None, **kwargs):"""Open uncompressed tar archive name for reading or writing."""if mode not in ("r", "a", "w"):raise ValueError("mode must be 'r', 'a' or 'w'")return cls(name, mode, fileobj, **kwargs)@classmethoddef gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):"""Open gzip compressed tar archive name for reading or writing.Appending is not allowed."""if mode not in ("r", "w"):raise ValueError("mode must be 'r' or 'w'")try:importgzipgzip.GzipFileexcept(ImportError, AttributeError):raise CompressionError("gzip module is not available")try:fileobj=gzip.GzipFile(name, mode, compresslevel, fileobj)exceptOSError:if fileobj is not None and mode == 'r':raise ReadError("not a gzip file")raisetry:t= cls.taropen(name, mode, fileobj, **kwargs)exceptIOError:fileobj.close()if mode == 'r':raise ReadError("not a gzip file")raiseexcept:fileobj.close()raiset._extfileobj=Falsereturnt@classmethoddef bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):"""Open bzip2 compressed tar archive name for reading or writing.Appending is not allowed."""if mode not in ("r", "w"):raise ValueError("mode must be 'r' or 'w'.")try:importbz2exceptImportError:raise CompressionError("bz2 module is not available")if fileobj is notNone:fileobj=_BZ2Proxy(fileobj, mode)else:fileobj= bz2.BZ2File(name, mode, compresslevel=compresslevel)try:t= cls.taropen(name, mode, fileobj, **kwargs)except(IOError, EOFError):fileobj.close()if mode == 'r':raise ReadError("not a bzip2 file")raiseexcept:fileobj.close()raiset._extfileobj=Falsereturnt#All *open() methods are registered here.OPEN_METH ={"tar": "taropen",   #uncompressed tar"gz":  "gzopen",    #gzip compressed tar"bz2": "bz2open"    #bzip2 compressed tar
}#--------------------------------------------------------------------------#The public methods which TarFile provides:defclose(self):"""Close the TarFile. In write-mode, two finishing zero blocks areappended to the archive."""ifself.closed:returnif self.mode in "aw":self.fileobj.write(NUL* (BLOCKSIZE * 2))self.offset+= (BLOCKSIZE * 2)#fill up the end with zero-blocks#(like option -b20 for tar does)blocks, remainder =divmod(self.offset, RECORDSIZE)if remainder >0:self.fileobj.write(NUL* (RECORDSIZE -remainder))if notself._extfileobj:self.fileobj.close()self.closed=Truedefgetmember(self, name):"""Return a TarInfo object for member `name'. If `name' can not befound in the archive, KeyError is raised. If a member occurs morethan once in the archive, its last occurrence is assumed to be themost up-to-date version."""tarinfo=self._getmember(name)if tarinfo isNone:raise KeyError("filename %r not found" %name)returntarinfodefgetmembers(self):"""Return the members of the archive as a list of TarInfo objects. Thelist has the same order as the members in the archive."""self._check()if not self._loaded:    #if we want to obtain a list ofself._load()        #all members, we first have to#scan the whole archive.returnself.membersdefgetnames(self):"""Return the members of the archive as a list of their names. It hasthe same order as the list returned by getmembers()."""return [tarinfo.name for tarinfo inself.getmembers()]def gettarinfo(self, name=None, arcname=None, fileobj=None):"""Create a TarInfo object for either the file `name' or the fileobject `fileobj' (using os.fstat on its file descriptor). You canmodify some of the TarInfo's attributes before you add it usingaddfile(). If given, `arcname' specifies an alternative name for thefile in the archive."""self._check("aw")#When fileobj is given, replace name by#fileobj's real name.if fileobj is notNone:name=fileobj.name#Building the name of the member in the archive.#Backward slashes are converted to forward slashes,#Absolute paths are turned to relative paths.if arcname isNone:arcname=namedrv, arcname=os.path.splitdrive(arcname)arcname= arcname.replace(os.sep, "/")arcname= arcname.lstrip("/")#Now, fill the TarInfo object with#information specific for the file.tarinfo =self.tarinfo()tarinfo.tarfile=self#Use os.stat or os.lstat, depending on platform#and if symlinks shall be resolved.if fileobj isNone:if hasattr(os, "lstat") and notself.dereference:statres=os.lstat(name)else:statres=os.stat(name)else:statres=os.fstat(fileobj.fileno())linkname= ""stmd=statres.st_modeifstat.S_ISREG(stmd):inode=(statres.st_ino, statres.st_dev)if not self.dereference and statres.st_nlink > 1 and\inodein self.inodes and arcname !=self.inodes[inode]:#Is it a hardlink to an already#archived file?type =LNKTYPElinkname=self.inodes[inode]else:#The inode is added only if its valid.#For win32 it is always 0.type =REGTYPEifinode[0]:self.inodes[inode]=arcnameelifstat.S_ISDIR(stmd):type=DIRTYPEelifstat.S_ISFIFO(stmd):type=FIFOTYPEelifstat.S_ISLNK(stmd):type=SYMTYPElinkname=os.readlink(name)elifstat.S_ISCHR(stmd):type=CHRTYPEelifstat.S_ISBLK(stmd):type=BLKTYPEelse:returnNone#Fill the TarInfo object with all#information we can get.tarinfo.name =arcnametarinfo.mode=stmdtarinfo.uid=statres.st_uidtarinfo.gid=statres.st_gidif type ==REGTYPE:tarinfo.size=statres.st_sizeelse:tarinfo.size=0Ltarinfo.mtime=statres.st_mtimetarinfo.type=typetarinfo.linkname=linknameifpwd:try:tarinfo.uname=pwd.getpwuid(tarinfo.uid)[0]exceptKeyError:passifgrp:try:tarinfo.gname=grp.getgrgid(tarinfo.gid)[0]exceptKeyError:passif type in(CHRTYPE, BLKTYPE):if hasattr(os, "major") and hasattr(os, "minor"):tarinfo.devmajor=os.major(statres.st_rdev)tarinfo.devminor=os.minor(statres.st_rdev)returntarinfodef list(self, verbose=True):"""Print a table of contents to sys.stdout. If `verbose' is False, onlythe names of the members are printed. If it is True, an `ls -l'-likeoutput is produced."""self._check()for tarinfo inself:ifverbose:printfilemode(tarinfo.mode),print "%s/%s" % (tarinfo.uname ortarinfo.uid,tarinfo.gnameortarinfo.gid),if tarinfo.ischr() ortarinfo.isblk():print "%10s" % ("%d,%d"\%(tarinfo.devmajor, tarinfo.devminor)),else:print "%10d" %tarinfo.size,print "%d-%02d-%02d %02d:%02d:%02d"\% time.localtime(tarinfo.mtime)[:6],print tarinfo.name + ("/" if tarinfo.isdir() else ""),ifverbose:iftarinfo.issym():print "->", tarinfo.linkname,iftarinfo.islnk():print "link to", tarinfo.linkname,printdef add(self, name, arcname=None, recursive=True, exclude=None, filter=None):"""Add the file `name' to the archive. `name' may be any type of file(directory, fifo, symbolic link, etc.). If given, `arcname'specifies an alternative name for the file in the archive.Directories are added recursively by default. This can be avoided bysetting `recursive' to False. `exclude' is a function that shouldreturn True for each filename to be excluded. `filter' is a functionthat expects a TarInfo object argument and returns the changedTarInfo object, if it returns None the TarInfo object will beexcluded from the archive."""self._check("aw")if arcname isNone:arcname=name#Exclude pathnames.if exclude is notNone:importwarningswarnings.warn("use the filter argument instead",DeprecationWarning,2)ifexclude(name):self._dbg(2, "tarfile: Excluded %r" %name)return#Skip if somebody tries to archive the archive...if self.name is not None and os.path.abspath(name) ==self.name:self._dbg(2, "tarfile: Skipped %r" %name)returnself._dbg(1, name)#Create a TarInfo object from the file.tarinfo =self.gettarinfo(name, arcname)if tarinfo isNone:self._dbg(1, "tarfile: Unsupported type %r" %name)return#Change or exclude the TarInfo object.if filter is notNone:tarinfo=filter(tarinfo)if tarinfo isNone:self._dbg(2, "tarfile: Excluded %r" %name)return#Append the tar header and data to the archive.iftarinfo.isreg():with bltn_open(name,"rb") as f:self.addfile(tarinfo, f)eliftarinfo.isdir():self.addfile(tarinfo)ifrecursive:for f inos.listdir(name):self.add(os.path.join(name, f), os.path.join(arcname, f),recursive, exclude, filter)else:self.addfile(tarinfo)def addfile(self, tarinfo, fileobj=None):"""Add the TarInfo object `tarinfo' to the archive. If `fileobj' isgiven, tarinfo.size bytes are read from it and added to the archive.You can create TarInfo objects using gettarinfo().On Windows platforms, `fileobj' should always be opened with mode'rb' to avoid irritation about the file size."""self._check("aw")tarinfo=copy.copy(tarinfo)buf=tarinfo.tobuf(self.format, self.encoding, self.errors)self.fileobj.write(buf)self.offset+=len(buf)#If there's data to follow, append it.if fileobj is notNone:copyfileobj(fileobj, self.fileobj, tarinfo.size)blocks, remainder=divmod(tarinfo.size, BLOCKSIZE)if remainder >0:self.fileobj.write(NUL* (BLOCKSIZE -remainder))blocks+= 1self.offset+= blocks *BLOCKSIZEself.members.append(tarinfo)def extractall(self, path=".", members=None):"""Extract all members from the archive to the current workingdirectory and set owner, modification time and permissions ondirectories afterwards. `path' specifies a different directoryto extract to. `members' is optional and must be a subset of thelist returned by getmembers()."""directories=[]if members isNone:members=selffor tarinfo inmembers:iftarinfo.isdir():#Extract directories with a safe mode.
directories.append(tarinfo)tarinfo=copy.copy(tarinfo)tarinfo.mode= 0700self.extract(tarinfo, path)#Reverse sort directories.directories.sort(key=operator.attrgetter('name'))directories.reverse()#Set correct owner, mtime and filemode on directories.for tarinfo indirectories:dirpath=os.path.join(path, tarinfo.name)try:self.chown(tarinfo, dirpath)self.utime(tarinfo, dirpath)self.chmod(tarinfo, dirpath)exceptExtractError, e:if self.errorlevel > 1:raiseelse:self._dbg(1, "tarfile: %s" %e)def extract(self, member, path=""):"""Extract a member from the archive to the current working directory,using its full name. Its file information is extracted as accuratelyas possible. `member' may be a filename or a TarInfo object. You canspecify a different directory using `path'."""self._check("r")ifisinstance(member, basestring):tarinfo=self.getmember(member)else:tarinfo=member#Prepare the link target for makelink().iftarinfo.islnk():tarinfo._link_target=os.path.join(path, tarinfo.linkname)try:self._extract_member(tarinfo, os.path.join(path, tarinfo.name))exceptEnvironmentError, e:if self.errorlevel >0:raiseelse:if e.filename isNone:self._dbg(1, "tarfile: %s" %e.strerror)else:self._dbg(1, "tarfile: %s %r" %(e.strerror, e.filename))exceptExtractError, e:if self.errorlevel > 1:raiseelse:self._dbg(1, "tarfile: %s" %e)defextractfile(self, member):"""Extract a member from the archive as a file object. `member' may bea filename or a TarInfo object. If `member' is a regular file, afile-like object is returned. If `member' is a link, a file-likeobject is constructed from the link's target. If `member' is none ofthe above, None is returned.The file-like object is read-only and provides the followingmethods: read(), readline(), readlines(), seek() and tell()"""self._check("r")ifisinstance(member, basestring):tarinfo=self.getmember(member)else:tarinfo=memberiftarinfo.isreg():returnself.fileobject(self, tarinfo)elif tarinfo.type not inSUPPORTED_TYPES:#If a member's type is unknown, it is treated as a#regular file.returnself.fileobject(self, tarinfo)elif tarinfo.islnk() ortarinfo.issym():ifisinstance(self.fileobj, _Stream):#A small but ugly workaround for the case that someone tries#to extract a (sym)link as a file-object from a non-seekable#stream of tar blocks.raise StreamError("cannot extract (sym)link as file object")else:#A (sym)link's file object is its target's file object.returnself.extractfile(self._find_link_target(tarinfo))else:#If there's no data associated with the member (directory, chrdev,#blkdev, etc.), return None instead of a file object.returnNonedef_extract_member(self, tarinfo, targetpath):"""Extract the TarInfo object tarinfo to a physicalfile called targetpath."""#Fetch the TarInfo object for the given name#and build the destination pathname, replacing#forward slashes to platform specific separators.targetpath = targetpath.rstrip("/")targetpath= targetpath.replace("/", os.sep)#Create all upper directories.upperdirs =os.path.dirname(targetpath)if upperdirs and notos.path.exists(upperdirs):#Create directories that are not part of the archive with#default permissions.
os.makedirs(upperdirs)if tarinfo.islnk() ortarinfo.issym():self._dbg(1, "%s -> %s" %(tarinfo.name, tarinfo.linkname))else:self._dbg(1, tarinfo.name)iftarinfo.isreg():self.makefile(tarinfo, targetpath)eliftarinfo.isdir():self.makedir(tarinfo, targetpath)eliftarinfo.isfifo():self.makefifo(tarinfo, targetpath)elif tarinfo.ischr() ortarinfo.isblk():self.makedev(tarinfo, targetpath)elif tarinfo.islnk() ortarinfo.issym():self.makelink(tarinfo, targetpath)elif tarinfo.type not inSUPPORTED_TYPES:self.makeunknown(tarinfo, targetpath)else:self.makefile(tarinfo, targetpath)self.chown(tarinfo, targetpath)if nottarinfo.issym():self.chmod(tarinfo, targetpath)self.utime(tarinfo, targetpath)#--------------------------------------------------------------------------#Below are the different file methods. They are called via#_extract_member() when extract() is called. They can be replaced in a#subclass to implement other functionality.defmakedir(self, tarinfo, targetpath):"""Make a directory called targetpath."""try:#Use a safe mode for the directory, the real mode is set#later in _extract_member().os.mkdir(targetpath, 0700)exceptEnvironmentError, e:if e.errno !=errno.EEXIST:raisedefmakefile(self, tarinfo, targetpath):"""Make a file called targetpath."""source=self.extractfile(tarinfo)try:with bltn_open(targetpath,"wb") as target:copyfileobj(source, target)finally:source.close()defmakeunknown(self, tarinfo, targetpath):"""Make a file from a TarInfo object with an unknown typeat targetpath."""self.makefile(tarinfo, targetpath)self._dbg(1, "tarfile: Unknown file type %r,"\"extracted as regular file." %tarinfo.type)defmakefifo(self, tarinfo, targetpath):"""Make a fifo called targetpath."""if hasattr(os, "mkfifo"):os.mkfifo(targetpath)else:raise ExtractError("fifo not supported by system")defmakedev(self, tarinfo, targetpath):"""Make a character or block device called targetpath."""if not hasattr(os, "mknod") or not hasattr(os, "makedev"):raise ExtractError("special devices not supported by system")mode=tarinfo.modeiftarinfo.isblk():mode|=stat.S_IFBLKelse:mode|=stat.S_IFCHRos.mknod(targetpath, mode,os.makedev(tarinfo.devmajor, tarinfo.devminor))defmakelink(self, tarinfo, targetpath):"""Make a (symbolic) link called targetpath. If it cannot be created(platform limitation), we try to make a copy of the referenced fileinstead of a link."""if hasattr(os, "symlink") and hasattr(os, "link"):#For systems that support symbolic and hard links.iftarinfo.issym():ifos.path.lexists(targetpath):os.unlink(targetpath)os.symlink(tarinfo.linkname, targetpath)else:#See extract().ifos.path.exists(tarinfo._link_target):ifos.path.lexists(targetpath):os.unlink(targetpath)os.link(tarinfo._link_target, targetpath)else:self._extract_member(self._find_link_target(tarinfo), targetpath)else:try:self._extract_member(self._find_link_target(tarinfo), targetpath)exceptKeyError:raise ExtractError("unable to resolve link inside archive")defchown(self, tarinfo, targetpath):"""Set owner of targetpath according to tarinfo."""if pwd and hasattr(os, "geteuid") and os.geteuid() ==0:#We have to be root to do so.try:g= grp.getgrnam(tarinfo.gname)[2]exceptKeyError:g=tarinfo.gidtry:u= pwd.getpwnam(tarinfo.uname)[2]exceptKeyError:u=tarinfo.uidtry:if tarinfo.issym() and hasattr(os, "lchown"):os.lchown(targetpath, u, g)else:if sys.platform != "os2emx":os.chown(targetpath, u, g)exceptEnvironmentError, e:raise ExtractError("could not change owner")defchmod(self, tarinfo, targetpath):"""Set file permissions of targetpath according to tarinfo."""if hasattr(os, 'chmod'):try:os.chmod(targetpath, tarinfo.mode)exceptEnvironmentError, e:raise ExtractError("could not change mode")defutime(self, tarinfo, targetpath):"""Set modification time of targetpath according to tarinfo."""if not hasattr(os, 'utime'):returntry:os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))exceptEnvironmentError, e:raise ExtractError("could not change modification time")#--------------------------------------------------------------------------defnext(self):"""Return the next member of the archive as a TarInfo object, whenTarFile is opened for reading. Return None if there is no moreavailable."""self._check("ra")if self.firstmember is notNone:m=self.firstmemberself.firstmember=Nonereturnm#Read the next block.
self.fileobj.seek(self.offset)tarinfo=NonewhileTrue:try:tarinfo=self.tarinfo.fromtarfile(self)exceptEOFHeaderError, e:ifself.ignore_zeros:self._dbg(2, "0x%X: %s" %(self.offset, e))self.offset+=BLOCKSIZEcontinueexceptInvalidHeaderError, e:ifself.ignore_zeros:self._dbg(2, "0x%X: %s" %(self.offset, e))self.offset+=BLOCKSIZEcontinueelif self.offset ==0:raiseReadError(str(e))exceptEmptyHeaderError:if self.offset ==0:raise ReadError("empty file")exceptTruncatedHeaderError, e:if self.offset ==0:raiseReadError(str(e))exceptSubsequentHeaderError, e:raiseReadError(str(e))breakif tarinfo is notNone:self.members.append(tarinfo)else:self._loaded=Truereturntarinfo#--------------------------------------------------------------------------#Little helper methods:def _getmember(self, name, tarinfo=None, normalize=False):"""Find an archive member by name from bottom to top.If tarinfo is given, it is used as the starting point."""#Ensure that all members have been loaded.members =self.getmembers()#Limit the member search list up to tarinfo.if tarinfo is notNone:members=members[:members.index(tarinfo)]ifnormalize:name=os.path.normpath(name)for member inreversed(members):ifnormalize:member_name=os.path.normpath(member.name)else:member_name=member.nameif name ==member_name:returnmemberdef_load(self):"""Read through the entire archive file and look for readablemembers."""whileTrue:tarinfo=self.next()if tarinfo isNone:breakself._loaded=Truedef _check(self, mode=None):"""Check if TarFile is still open, and if the operation's modecorresponds to TarFile's mode."""ifself.closed:raise IOError("%s is closed" % self.__class__.__name__)if mode is not None and self.mode not inmode:raise IOError("bad operation for mode %r" %self.mode)def_find_link_target(self, tarinfo):"""Find the target member of a symlink or hardlink member in thearchive."""iftarinfo.issym():#Always search the entire archive.linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))limit=Noneelse:#Search the archive before the link, because a hard link is#just a reference to an already archived file.linkname =tarinfo.linknamelimit=tarinfomember= self._getmember(linkname, tarinfo=limit, normalize=True)if member isNone:raise KeyError("linkname %r not found" %linkname)returnmemberdef __iter__(self):"""Provide an iterator object."""ifself._loaded:returniter(self.members)else:returnTarIter(self)def_dbg(self, level, msg):"""Write debugging output to sys.stderr."""if level <=self.debug:print >>sys.stderr, msgdef __enter__(self):self._check()returnselfdef __exit__(self, type, value, traceback):if type isNone:self.close()else:#An exception occurred. We must not call close() because#it would try to write end-of-archive blocks and padding.if notself._extfileobj:self.fileobj.close()self.closed=True#class TarFile
TarFile

TarFile 源码

6、json 和 pickle模块

文件只能存二进制或字符串，不能存其他类型，所以用到了用于序列化的两个模块：

json，用于字符串和python数据类型间进行转换，将数据通过特殊的形式转换为所有语言都认识的字符串（字典，变量，列表）

pickle，用于python特有的类型和python的数据类型间进行转换，将数据通过特殊的形式转换为只有python认识的字符串（函数，类）

① json模块：

#json 序列化和反序列化
import jsoninfo ={　　　　　　　　　　　　　　 #字典"name":"lzl","age":"18"
}with open("test","w") as f:f.write(json.dumps(info))   #用json把info写入到文件test中with open("test","r") as f:info = json.loads(f.read())print(info["name"])#lzl

② pickle模块：

#pickle 序列化和反序列化
import pickle　　　　　　　　#pickle支持python特有的所有类型def func():                 #函数info ={"name":"lzl","age":"18"}print(info,type(info))func()
#{'age': '18', 'name': 'lzl'} <class 'dict'>with open("test","wb") as f:f.write(pickle.dumps(func))   #用pickle把func写入到文件test中 如果用json此时会报错with open("test","rb") as f:func_new = pickle.loads(f.read())func_new()
#{'age': '18', 'name': 'lzl'} <class 'dict'>

更多内容-》》http://openskill.cn/article/472

7、shelve模块

shelve模块内部对pickle进行了封装，shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式

#!/usr/bin/env python
# -*- coding:utf-8 -*-
#-Author-Lianimport shelve# k，v方式存储数据
s = shelve.open("shelve_test")  # 打开一个文件
tuple = (1, 2, 3, 4)
list = ['a', 'b', 'c', 'd']
info = {"name": "lzl", "age": 18}
s["tuple"] = tuple  # 持久化元组
s["list"] = list
s["info"] = info
s.close()# 通过key获取value值
d = shelve.open("shelve_test")  # 打开一个文件
print(d["tuple"])  # 读取
print(d.get("list"))
print(d.get("info"))# (1, 2, 3, 4)
# ['a', 'b', 'c', 'd']
# {'name': 'lzl', 'age': 18}
d.close()# 循环打印key值
s = shelve.open("shelve_test")  # 打开一个文件
for k in s.keys():              # 循环key值print(k)# list
# tuple
# info
s.close()# 更新key的value值
s = shelve.open("shelve_test")  # 打开一个文件
s.update({"list":[22,33]})      #重新赋值或者s["list"] = [22,33]
print(s["list"])#[22, 33]
s.close()

8、xml模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data><countryname="Liechtenstein"><rankupdated="yes">2</rank><year>2008</year><gdppc>141100</gdppc><neighborname="Austria"direction="E"/><neighborname="Switzerland"direction="W"/></country><countryname="Singapore"><rankupdated="yes">5</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Malaysia"direction="N"/></country><countryname="Panama"><rankupdated="yes">69</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"/><neighborname="Colombia"direction="E"/></country>
</data>

xml格式

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml 　

import xml.etree.ElementTree as ETtree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)#遍历xml文档
for child in root:print(child.tag, child.attrib)for i in child:print(i.tag,i.text)#只遍历year 节点
for node in root.iter('year'):print(node.tag,node.text)

修改和删除xml文档内容

import xml.etree.ElementTree as ETtree = ET.parse("xmltest.xml")
root = tree.getroot()#修改
for node in root.iter('year'):new_year = int(node.text) + 1node.text = str(new_year)node.set("updated","yes")tree.write("xmltest.xml")#删除node
for country in root.findall('country'):rank = int(country.find('rank').text)if rank > 50:root.remove(country)tree.write('output.xml')

自己创建xml文档

import xml.etree.ElementTree as ETnew_xml = ET.Element("namelist")
name = ET.SubElement(new_xml, "name", attrib={"enrolled": "yes"})
age = ET.SubElement(name, "age", attrib={"checked": "no"})
sex = ET.SubElement(name, "sex")
sex.text = '33'
name2 = ET.SubElement(new_xml, "name", attrib={"enrolled": "no"})
age = ET.SubElement(name2, "age")
age.text = '19'et = ET.ElementTree(new_xml)  # 生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True)ET.dump(new_xml)  # 打印生成的格式string = ET.tostring(new_xml)   # xml对象转换为str字符串

9、configparser模块

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser

来看一个好多软件的常见文档格式如下：

[DEFAULT]
ServerAliveInterval= 45Compression=yes
CompressionLevel= 9ForwardX11=yes[bitbucket.org]
User=hg[topsecret.server.com]
Port= 50022ForwardX11= no

配置文件

如果想用python生成一个这样的文档怎么做呢？

import configparserconfig = configparser.ConfigParser()
config["DEFAULT"] = {'ServerAliveInterval': '45','Compression': 'yes','CompressionLevel': '9'}config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'     # mutates the parser
topsecret['ForwardX11'] = 'no'  # same here
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:config.write(configfile)

写完了还可以再读出来哈：

>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.sections()
[]
>>> config.read('example.ini')
['example.ini']
>>> config.sections()
['bitbucket.org', 'topsecret.server.com']
>>> 'bitbucket.org' in config
True
>>> 'bytebong.com' in config
False
>>> config['bitbucket.org']['User']
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: print(key)
...
user
compressionlevel
serveraliveinterval
compression
forwardx11
>>> config['bitbucket.org']['ForwardX11']
'yes'

configparser增删改查语法：

[section1]
k1 = v1
k2:v2[section2]
k1 = v1import ConfigParserconfig = ConfigParser.ConfigParser()
config.read('i.cfg')# ########## 读 ##########
#secs = config.sections()
#print secs
#options = config.options('group2')
#print options#item_list = config.items('group2')
#print item_list#val = config.get('group1','key')
#val = config.getint('group1','key')# ########## 改写 ##########
#sec = config.remove_section('group1')
#config.write(open('i.cfg', "w"))#sec = config.has_section('wupeiqi')
#sec = config.add_section('wupeiqi')
#config.write(open('i.cfg', "w"))#config.set('group2','k1',11111)
#config.write(open('i.cfg', "w"))#config.remove_option('group2','age')
#config.write(open('i.cfg', "w"))

10、hashlib模块　

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlibm = hashlib.md5()
m.update(b"Hello")
m.update(b"It's me")
print(m.digest())
m.update(b"It's been a long time since last time we ...")print(m.digest()) #2进制格式hash
print(len(m.hexdigest())) #16进制格式hash
'''
def digest(self, *args, **kwargs): # real signature unknown""" Return the digest value as a string of binary data. """passdef hexdigest(self, *args, **kwargs): # real signature unknown""" Return the digest value as a string of hexadecimal digits. """pass'''
import hashlib# ######## md5 ########hash = hashlib.md5()
hash.update('admin')
print(hash.hexdigest())# ######## sha1 ########hash = hashlib.sha1()
hash.update('admin')
print(hash.hexdigest())# ######## sha256 ########hash = hashlib.sha256()
hash.update('admin')
print(hash.hexdigest())# ######## sha384 ########hash = hashlib.sha384()
hash.update('admin')
print(hash.hexdigest())# ######## sha512 ########hash = hashlib.sha512()
hash.update('admin')
print(hash.hexdigest())

还不够吊？python 还有一个 hmac 模块，它内部对我们创建 key 和内容再进行处理然后再加密

import hmac
h = hmac.new('wueiqi')
h.update('hellowo')
print h.hexdigest()

11、re模块

re模块用于对python的正则表达式的操作

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
'[a-z]' 匹配a到z任意一个字符
'[^()]' 匹配除()以外的任意一个字符 r' '    转义引号里的字符 针对\字符  详情查看⑦
'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符结尾，同$
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
'\s'    匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t''(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city")
结果{'province': '3714', 'city': '81', 'birthday': '1993'}
re.IGNORECASE  忽略大小写 re.search('(\A|\s)red(\s+|$)',i,re.IGNORECASE)

标志位：

#flags
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE #ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE #assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE #assume unicode locale
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE #make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL #make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE #ignore whitespace and comments
flags

flags

①、match

从起始位置开始根据模型去字符串中匹配指定内容：

#match
import re                               obj = re.match('\d+', '123uua123sf')       #从第一个字符开始匹配一个到多个数字
print(obj)
#<_sre.SRE_Match object; span=(0, 3), match='123'>if obj:                                   #如果有匹配到字符则执行，为空不执行print(obj.group())                    #打印匹配到的内容
#123

匹配ip地址：

import reip = '255.255.255.253'
result=re.match(r'^([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.'r'([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$',ip)
print(result)
# <_sre.SRE_Match object; span=(0, 15), match='255.255.255.253'>

②、search

根据模型去字符串中匹配指定内容（不一定是最开始位置），匹配最前

#search
import  re
obj = re.search('\d+', 'a123uu234asf')     #从数字开始匹配一个到多个数字
print(obj)
#<_sre.SRE_Match object; span=(1, 4), match='123'>if obj:                                   #如果有匹配到字符则执行，为空不执行print(obj.group())                    #打印匹配到的内容
#123import  re
obj = re.search('\([^()]+\)', 'sdds(a1fwewe2(3uusfdsf2)34as)f')     #匹配最里面（）的内容
print(obj)
#<_sre.SRE_Match object; span=(13, 24), match='(3uusfdsf2)'>if obj:                                   #如果有匹配到字符则执行，为空不执行print(obj.group())                    #打印匹配到的内容
#(3uusfdsf2)

③、group与groups的区别

#group与groups的区别
import  re
a = "123abc456"
b = re.search("([0-9]*)([a-z]*)([0-9]*)", a)
print(b)
#<_sre.SRE_Match object; span=(0, 9), match='123abc456'>
print(b.group())
#123abc456
print(b.group(0))
#123abc456
print(b.group(1))
#123
print(b.group(2))
#abc
print(b.group(3))
#456
print(b.groups())
#('123', 'abc', '456')

④、findall

上述两中方式均用于匹配单值，即：只能匹配字符串中的一个，如果想要匹配到字符串中所有符合条件的元素，则需要使用 findall；findall没有group用法

#findall
import  re
obj = re.findall('\d+', 'a123uu234asf')     #匹配多个if obj:                                   #如果有匹配到字符则执行，为空不执行print(obj)                             #生成的内容为列表
#['123', '234']

⑤、sub

用于替换匹配的字符串(pattern, repl, string, count=0, flags=0)

#sub
import  recontent = "123abc456"
new_content = re.sub('\d+', 'ABC', content)
print(new_content)
#ABCabcABC

⑥、split

根据指定匹配进行分组(pattern, string, maxsplit=0, flags=0)

#split
import  recontent = "1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )"
new_content = re.split('\*', content)       #用*进行分割，分割为列表
print(new_content)
#['1 - 2 ', ' ((60-30+1', '(9-2', '5/3+7/3', '99/4', '2998+10', '568/14))-(-4', '3)/(16-3', '2) )']content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('[\+\-\*\/]+', content)
# new_content = re.split('\*', content, 1)
print(new_content)
#["'1 ", ' 2 ', ' ((60', '30', '1', '(9', '2', '5', '3', '7', '3', '99', '4', '2998', '10', '568', '14))',
#  '(', '4', '3)', '(16', '3', "2) )'"]inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
inpp = re.sub('\s*','',inpp)                #把空白字符去掉
print(inpp)
new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
print(new_content)
#['1-2*((60-30+', '-40-5', '*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2))']

⑦、补充r' ' 转义

fdfdsfds\fds
sfdsfds& @$

lzl.txt

首先要清楚，程序读取文件里的\字符时，添加到列表里面的是\\：

import re,sys
li = []
with open('lzl.txt','r',encoding="utf-8") as file:for line in file:li.append(line)
print(li)                   # 注意：文件中的单斜杠，读出来后会变成双斜杠
# ['fdfdsfds\\fds\n', 'sfdsfds& @$']
print(li[0])                # print打印的时候还是单斜杠
# fdfdsfds\fds

r字符的意义，对字符\进行转义，\只做为字符出现：

import re,sys
li = []
with open('lzl.txt','r',encoding="utf-8") as file:for line in file:print(re.findall(r's\\f', line))  #第一种方式匹配# print(re.findall('\\\\', line))  #第二种方式匹配li.append(line)
print(li)                   # 注意：文件中的单斜杠，读出来后会变成双斜杠
# ['s\\f']
# []
# ['fdfdsfds\\fds\n', 'sfdsfds& @$']

补充：看完下面的代码你可能更懵了

import re
re.findall(r'\\', line)  # 正则中只能这样写 不能写成 r'\' 这样
print(r'\\')            # 只能这样写 不能写成r'\' \只能是双数
# \\        结果
# 如果想值打印单个\ 写成如下
print('\\')             # 只能是双数
# \         结果

总结：文件中的单斜杠\，读出到程序中时是双斜杠\\，print打印出来是单斜杠\；正则匹配文件但斜杠\时,用r'\\'双斜杠去匹配，或者不用r直接用'\\\\'四个斜杠去匹配

⑧、compile函数

说明：

Python通过re模块提供对正则表达式的支持。使用re的一般步骤是先使用re.compile()函数，将正则表达式的字符串形式编译为Pattern实例，
然后使用Pattern实例处理文本并获得匹配结果（一个Match实例），最后使用Match实例获得信息，进行其他的操作

举一个简单的例子，在寻找一个字符串中所有的英文字符：

import re
pattern = re.compile('[a-zA-Z]')
result = pattern.findall('as3SiOPdj#@23awe')
print(result)
# ['a', 's', 'S', 'i', 'O', 'P', 'd', 'j', 'a', 'w', 'e']

匹配IP地址（255.255.255.255）：　　

import repattern = re.compile(r'^(([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.){3}([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$')
result = pattern.match('255.255.255.255')
print(result)
# <_sre.SRE_Match object; span=(0, 15), match='255.255.255.255'>

更新版本-》点击

12、urllib模块

在能使用的各种网络函数库中，功能最强大的可能是urllib和urllib2（python2.0）了。通过他们在网络上访问文件，就好像访问本地电脑的文件一样，通过一个简单的函数调用，几乎可以把任何URL执行的东西用做程序的输入，想象一下这个模块和re模块集合：可以下载web页面，提前信息，以及自动生成报告等。

#!/usr/bin/env python
# -*- coding:utf-8 -*-
#-Author-Lianimport urllib.requestdef getdata():url = "http://www.baidu.com"data = urllib.request.urlopen(url).read()data = data.decode("utf-8")print(data)getdata()

urlopen返回的类文件对象支持close、read、readline、和readlines方法

更多：

logging模块-》》http://www.cnblogs.com/lianzhilei/p/6016543.html

转载于:https://www.cnblogs.com/lianzhilei/p/5794402.html