python高级用法使用手册（收藏）

让你的程序运行的更快

下面列出一些常见简单的优化策略：

有选择的消除属性访问：每次使用句点操作符(.)来访问属性时都会带来开销。在底层，这会触发调用特殊方法，比如 getattribute() 和 getattr()，而调用这些方法常常会导致字典查询操作。
理解变量所处的位置：通常来说，访问局部变量要比全局变量要快。对于需要频繁访问的名称，想要提高运行速度，可以通过让这些名称尽可能成为局部变量来达成。
避免不必要的抽象：任何时候当使用额外的处理层比如装饰器、属性或者描述符来包装代码时，代码的速度就会变慢。
使用内建的容器：内建的数据类型处理速度一般要比自己写的快的多。
避免产生不必要的数据结构或者拷贝操作

一、重要用法

1、bfs-宽度优先搜索：

除了用dqueue，另一种就是用list代替队列，此时set代替list效率会提高

BFS宽度搜索模板_Rnan_prince的博客-CSDN博客

2、set转成tuple可以序列化

sets = set()
tuples = tuple(sets)

3、set的高效操作

set.intersection(set1, set2 ... etc)

Python Set intersection() 方法 | 菜鸟教程

4、python的排序模块bisect

一个有趣的python排序模块：bisect - 毛志谦 - 博客园

5、字符转统计

str.count(sub, start= 0,end=len(string))

Python count() 方法 | 菜鸟教程

另一种是，collections.Counter()，返回的是dict

python-Counter计数函数_每天进步一点点-CSDN博客_counter方法

6、将多维数据转为一维数组

array = [[3, 4, 5], [4, 22, 44, 6], [7, 8]]
res = []
[res.extend(m) for m in array]
print(res)  # [3, 4, 5, 4, 22, 44, 6, 7, 8]

itertools.chain方法：

print(list(itertools.chain.from_iterable(array)))
# [3, 4, 5, 4, 22, 44, 6, 7, 8]
# or
print(list(itertools.chain(*array)))
# [3, 4, 5, 4, 22, 44, 6, 7, 8]

operator方法：

import operator
from functools import reduce
array = [[3, 4, 5], [4, 22, 44, 6], [7, 8]]
print(reduce(operator.add, array))
# [3, 4, 5, 4, 22, 44, 6, 7, 8]

7、设置字典的默认类型

避免了if判断：collections.defaultdict(set/list/int)

from collections import defaultdictd = defaultdict(list)
d['a'].append(1)
d['a'].append(1)
print(d)
# out: defaultdict(list, {'a': [1, 1]})
s = defaultdict(set)
s['a'].add(1)
s['a'].add(1)
s['a'].add(2)
print(s)
# out: defaultdict(set, {'a': {1, 2}})

Python中collections.defaultdict()使用 - 简书

让字典保持有序　collections.OrderedDict

使用ＯrderedDict 创建的dict 会严格按照初始添加的顺序进行。其内部维护了一个双向链表，它会根据元素加入的顺序来排列键的位置。因此ＯrderedDict的大小是普通字典的２倍多。

注意：3.6版本的dict()函数使得结果不再无序

**8、Python zip(*) 和**itertools.zip_longest()

Python zip() 函数 | 菜鸟教程

参数前面加上* 号，意味着参数的个数不止一个，另外带一个星号（*）参数的函数传入的参数存储为一个元组（tuple）

a = ["abchh", "abcsdf", "abdshf"]
for b in zip(*a):print(b)
>>>
('a', 'a', 'a')
('b', 'b', 'b')
('c', 'c', 'd')
('h', 's', 's')
('h', 'd', 'h')

itertools.zip_longest(v1, v2, fillvalue=0)使用最长的迭代器来作为返回值的长度，并且可以使用fillvalue来制定那些缺失值的默。

当参数长度一致时和zip一样。
当参数长度不一时，zip和较短的保持一致，itertools.zip_longest()和较长的保持一致。

Python zip函数详解+和izip和zip_longest的比较辨析_THEAQING-CSDN博客

import itertools
list1 = ["A","B","C","D","E"]  #len = 5
list2 = ["a", "b", "c", "d","e"]  #len = 5
list3 = [1, 2, 3, 4]  #len = 4print(list(itertools.zip_longest(list1, list3)))
#[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', None)]
print(list(zip(list1, list3)))
#[('A', 1), ('B', 2), ('C', 3), ('D', 4)]

9、检查你的Python版本

from sys import version_info
if version_info.major != 2 and version_info.minor != 7:raise Exception('请使用Python 2.7来完成此项目')

指定python版本：

#！python2  //由python2解释器运行
#！python3  //由python3解释器运行

10、队列大小的设定：

self.history = deque(maxlen=2)

11、判断是否包含负数

symbols = np.concatenate(X)
if (symbols < 0).any():         # contains negative integersreturn False

12、判断差值，接着上文

symbols.sort()
np.all(np.diff(symbols) <= 1)

首先看diff含义：离散差值

import numpy as np
a=np.array([1, 6, 7, 8, 12])
diff_x1 = np.diff(a)
print(diff_x1)
# [5 1 1 4]
# [6-1,7-6,8-7,12-8]

所以上述的含义是判断symbols中连续元素的差值是都小于等于1

13、string.endswith

string.endswith(str, beg=[0,end=len(string)])
string[beg:end].endswith(str)

string：被检测的字符串
str：指定的字符或者子字符串（可以使用元组，会逐一匹配）
beg：设置字符串检测的起始位置（可选，从左数起）
end：设置字符串检测的结束位置（可选，从左数起）

如果存在参数 beg 和 end，则在指定范围内检查，否则在整个字符串中检查

14、Python 缓存机制与 functools.lru_cache

Python 缓存机制与 functools.lru_cache

在 Python 的 3.2 版本中，引入了一个非常优雅的缓存机制，即 functool 模块中的 lru_cache 装饰器，可以直接将函数或类方法的结果缓存住，后续调用则直接返回缓存的结果。lru_cache 原型如下：

@functools.lru_cache(maxsize=None, typed=False)

使用 functools 模块的 lur_cache 装饰器，可以缓存最多 maxsize 个此函数的调用结果，从而提高程序执行的效率，特别适合于耗时的函数。参数 maxsize 为最多缓存的次数，如果为 None，则无限制，设置为 2 的幂时，性能最佳；如果 typed=True（注意，在 functools32 中没有此参数），则不同参数类型的调用将分别缓存，例如 f(3) 和 f(3.0)。

使用前提：

同样的函数参数一定得到同样的结果
函数执行时间很长，且要多次执行

本质：函数调用的参数 ==> 返回值

适用场景：单机上需要空间换时间的地方，可以用缓存来将计算编程快速查询

注意：leetcode好多的题用DFS不能通过，增添了缓存机制后，能顺利通过，简单！

例：

from functools import lru_cache
@lru_cache(None)
def add(x, y):print("calculating: %s + %s" % (x, y))return x + y
print(add(1, 2))
print(add(1, 2))
print(add(2, 3))

输出结果：

calculating: 1 + 2
3
3
calculating: 2 + 3
5

从结果可以看出，当第二次调用 add(1, 2) 时，并没有真正执行函数体，而是直接返回缓存的结果。

Python-functools （reduce，偏函数partial，lru_cache） - JerryZao - 博客园

缺点：

函数参数要可序列化（set,tuple,int,等），不支持list，dict等
不支持缓存过期，key无法过期，失效
不支持清除操作
不支持分布式，是一个单机缓存

15.python字符串中连续相同字符个数

import itertools
res = [(k, len(list(g))) for k, g in itertools.groupby('TTFTTTFFFFTFFTT')]
res：[('T', 2), ('F', 1), ('T', 3), ('F', 4), ('T', 1), ('F', 2), ('T', 2))]

Python的内建模块itertools提供了非常有用的用于操作迭代对象的函数。

python基础 - itertools工具_Rnan_prince的博客-CSDN博客

16、使用多个分隔符分隔字符串

python基础 - 正则表达式（re模块）
python基础 - 正则表达式（re模块）_Rnan_prince的博客-CSDN博客

17、heapq.nsmallest 和 heapq.nlargest

先说说import heapq:

heapify：对序列进行堆排序，
heappush:在堆序列中添加值
heappop:删除最小值并返回
heappushpop:添加并删除堆中最小值且返回，添加之后删除
heapreplace:添加并删除队中最小值且返回，删除之后添加

heapq(Python内置的模块)

__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge', 'nlargest', 'nsmallest', 'heappushpop']

heapq.nlargest(n, iterable, key=None)

heapq.nsmallest(n, iterable, key=None)

n:查找个数 iterable:可迭代对象 key：同sorted

例：按照 num1和num2的数对和进行排序

heapq.nsmallest(k, itertools.product(nums1, nums2), key=sum)

heapq.merge(list1，list2)

合并list1和list2，还进行了排序

list1 = [1, 2, 3, 4, 5, 12]
set1 = {2, 3, 9, 23, 54}
s = list(merge(list1,set1))
print(s)  #[1, 2, 2, 3, 3, 4, 5, 9, 12, 54, 23]

18、list比较

list_x = [124, 32525, 2141, 354]
list_y = [114, 231, 341, 153]print(list_x > list_y)  # True
print(list_x < list_y)  # Flaseprint((list_x > list_y) - (list_x < list_y))  # 1list_x = [124, 231, 341, 153]
list_y = [124, 231, 341, 153]
print((list_x > list_y) - (list_x < list_y))  # 0list_x = [124, 231, 341, 153]
list_y = [134, 231, 341, 153]
print((list_x > list_y) - (list_x < list_y))  # -1

19、max与map结合应用

versions = ["192.168.1.1", "192.168.1.2", "292.168.1.1", "192.178.1.1"]
res = max(versions, key=lambda x: list(map(int, x.split('.'))))
print(res) # 292.168.1.1

20、四舍五入

在 Python 中如何正确地四舍五入？ - Tr0y's Blog

import decimal
decimal.getcontext().rounding = decimal.ROUND_HALF_UP
a = decimal.Decimal('2.135').quantize(decimal.Decimal('0.00'))
b = decimal.Decimal('2.145').quantize(decimal.Decimal('0.00'), rounding=decimal.ROUND_HALF_UP)
print(a, b)  # 2.14 2.15

21、set集合的各种运算：

集合 x <==> ① + ②

集合 y <==> ② + ③

交集 x&6 <==> ② x.intersection(y)

并集 x|y <==> ① + ② + ③ x.union(y)

差集 x-y <==> ① x.difference(y)

差集 y-x <==> ③ y.difference(x)

对称差集 x^y == y^x 　<==> ① + ③ x.symmetric_difference(y) = y.symmetric_difference(x)

22、.format的应用

a = list("HELLO")
print("{0[0]}, {0[2]}".format(a))  # H L

a = {"c": "foo", "d": "bar"}
print("{c} {d}".format(**a))  # foo bar
a = "foo", "bar"
print("{0} {1}".format(*a))  # foo bar
a = ["foo", "bar"]
print("{0} {1}".format(*a))  # foo bar
a = {"foo", "bar"}
print("{0} {1}".format(*a))   # foo bar / bar foo --set无序

22、*args and **kwargs

当定义函数的时候使用了*，意味着那些通过位置传递的参数将会被放在带有*前缀的变量中，所以：

def one(*args):print args # 1
one()
#()
one(1, 2, 3)
#(1, 2, 3)
def two(x, y, *args): # 2print x, y, args
two('a', 'b', 'c')
#a b ('c',)

第一个函数one只是简单地讲任何传递过来的位置参数全部打印出来而已，在代码#1处我们只是引用了函数内的变量args, *args仅仅只是用在函数定义的时候用来表示位置参数应该存储在变量args里面。Python允许我们制定一些参数并且通过args捕获其他所有剩余的未被捕捉的位置参数，就像#2处所示的那样。
*操作符在函数被调用的时候也能使用。一个用*标志的变量意思是变量里面的内容需要被提取出来然后当做位置参数被使用。例：

def add(x, y):return x + y
lst = [1,2]
add(lst[0], lst[1]) # 1
3
add(*lst) # 2
3

#1处的代码和#2处的代码所做的事情其实是一样的，在#2处，python为我们所做的事其实也可以手动完成。这也不是什么坏事，*args要么是表示调用方法大的时候额外的参数可以从一个可迭代列表中取得，要么就是定义方法的时候标志这个方法能够接受任意的位置参数。
接下来提到的**会稍多更复杂一点，**代表着键值对的餐宿字典，和*所代表的意义相差无几，也很简单对不对：

def foo(**kwargs):print kwargs
foo()
#{}
foo(x=1, y=2)
#{'y': 2, 'x': 1}

注意点：参数arg、*args、**kwargs三个参数的位置必须是一定的。必须是(arg,*args,**kwargs)这个顺序，否则程序会报错。

dct = {'x': 1, 'y': 2}
def bar(x, y):return x + y
bar(**dct)
#3

23、 Python Number 类型转换

ord(x ) 将一个字符转换为它的整数值
hex(x ) 将一个整数转换为一个十六进制字符串
oct(x ) 将一个整数转换为一个八进制字符串
bin(x ) 将一个整数转换为一个二进制字符串

 num = "0011"a = int(num, base=2)  # 以二进制转换print(a)  # 3num2 = "a"b = int(num2, base=16)  # 以16进制转换print(b)  # 10

24、判断一个列表是否是其中的一个子集

Counter方法：

from collections import Counter
print(not Counter([1, 2]) - Counter([1]))  # False
print(not Counter([1, 2]) - Counter([1, 2]))   # True
print(not Counter([1, 2, 2]) - Counter([1, 2]))   # False
print(not Counter([1, 2]) - Counter([1, 2, 2]))   # True

issubset方法：

set([1, 2, 2]).issubset([1, 2, 3])  # True

还有：

set(one).intersection(set(two)) == set(one)
set(one) & (set(two)) == set(one)

这些都有个缺点，不能判断有重元素的子集问题。

25、Python os.walk() 方法

os.walk() 方法用于通过在目录树中游走输出在目录中的文件名，向上或者向下。

import os
for root, dirs, files in os.walk(".", topdown=False):for name in files:print(os.path.join(root, name))for name in dirs:print(os.path.join(root, name))

root 所指的是当前正在遍历的这个文件夹的本身的地址
dirs 是一个 list ，内容是该文件夹中所有的目录的名字(不包括子目录)
files 同样是 list , 内容是该文件夹中所有的文件(不包括子目录)

Python os.walk() 方法 | 菜鸟教程

26、python 四位数整数补零

（1）数字前面补零

n = 123
n = "%04d" % n
print(n)  # 0123
print(type(n))  # str

（2）字符串前面补零

s = "123"
s = s.zfill(5)
print(s)  # 00123

27、numpy.log1p() 函数

numpy.log1p() 函数返回 numpy.log(1+number)，甚至当 number 的值接近零也能计算出准确结果。

28、Python中Numpy库中的np.sum(array,axis=0,1,2...)

Python基础 - Numpy库中的np.sum(array,axis=0,1,2...)_Rnan_prince的博客-CSDN博客

29、导入数据

python基础 - 导入数据（Numpy and Pandas）_Rnan_prince的博客-CSDN博客_numpy 导入数据

30、urllib、urllib2、urllib3用法及区别

python2.X 有这些库名可用: urllib, urllib2, urllib3, requests

python3.X 有这些库名可用: urllib, urllib3, requests

两者都有的urllib3和requests, 它们不是标准库. urllib3 提供线程安全连接池和文件post支持,与urllib及urllib2的关系不大. requests 自称HTTP for Humans, 使用更简洁方便
（1）对于python2.X:

urllib和urllib2的主要区别:

urllib2可以接受Request对象为URL设置头信息,修改用户代理,设置cookie等, urllib只能接受一个普通的URL.
urllib提供一些比较原始基础的方法而urllib2没有这些, 比如 urlencode

import urllib
with urllib.urlopen('https://mp.csdn.net/console/editor/html/104070046') as f:print(f.read(300))

（2）对于python3.X:

这里urllib成了一个包, 此包分成了几个模块,

urllib.request 用于打开和读取URL,
urllib.error 用于处理前面request引起的异常,
urllib.parse 用于解析URL,
urllib.robotparser用于解析robots.txt文件

python2.X 中的 urllib.urlopen()被废弃, urllib2.urlopen()相当于python3.X中的urllib.request.urlopen()

import urllib.request
with urllib.request.urlopen('https://mp.csdn.net/console/editor/html/104070046') as f:print(f.read(300))

其余区别详见：

python中urllib, urllib2,urllib3, httplib,httplib2, request的区别_permike的专栏-CSDN博客
Python urllib、urllib2、urllib3用法及区别 - onefine - 博客园

31、slots魔法

在Python中，每个类都有实例属性。默认情况下Python⽤⼀个字典来保存⼀个对象的实例属性。这⾮常有⽤，因为它允许我们在运⾏时去设置任意的新属性。然⽽，对于有着已知属性的⼩类来说，它可能是个瓶颈。这个字典浪费了很多内存。
Python不能在对象创建时直接分配⼀个固定量的内存来保存所有的属性。因此如果你创建许多对象（我指的是成千上万个），它会消耗掉很多内存。不过还是有⼀个⽅法来规避这个问题。这个⽅法需要使⽤__slots__来告诉Python不要使⽤字典，⽽且只给⼀个固定集合的属性分配空间。使⽤ __slots__:

class MyClass(object):__slots__ = ['name', 'identifier']def __init__(self, name, identifier):self.name = nameself.identifier = identifierself.set_up()

__slots__为你的内存减轻负担。通过这个技巧，内存占⽤率⼏乎40%~50%的减少。

32、namedtuple和enum

python基础 - namedtuple和enum_Rnan_prince的博客-CSDN博客

33、生成器（Generators）和协程

python基础 - 生成器(Generators)和协程(Coroutine)_Rnan_prince的博客-CSDN博客

34、生成随机数组

python基础 - 生成随机数组_Rnan_prince的博客-CSDN博客_python随机生成一个数组

35、matplotlib和networkx 绘图

matplotlib：python基础 - networkx 绘图总结_Rnan_prince的博客-CSDN博客_networkx画图
networkx：python基础 - matplotlib绘图总结_Rnan_prince的博客-CSDN博客

36、处理excel实例

python基础 - 处理excel实例_Rnan_prince的博客-CSDN博客

37、读写XML文档(lxml方式)

Python基础 - 读写XML文档(lxml方式)--暂存待续_Rnan_prince的博客-CSDN博客

38、装饰器总结

Python基础 - 装饰器总结_Rnan_prince的博客-CSDN博客

39、获取文件夹和文件的路径

Python基础 - 获取文件夹和文件的路径_Rnan_prince的博客-CSDN博客

40、enumerate也接受⼀些可选参数

my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):print(c, value)
# 输出:
(1, 'apple')
(2, 'banana')
(3, 'grapes')
(4, 'pear')

41、时间转换

（1）本地时间的区别

import time
from datetime import datetime
time_at = time.time()
print(time.localtime(time_at))  # time.struct_time(tm_year=2020, tm_mon=6, tm_mday=20, tm_hour=20, tm_min=19, tm_sec=8, tm_wday=5, tm_yday=172, tm_isdst=0)
print(datetime.utcfromtimestamp(time_at).strftime('%Y-%m-%d %H:%M:%S'))  # 2020-06-20 12:19:08
print(datetime.fromtimestamp(time_at).strftime('%Y-%m-%d %H:%M:%S'))  # 2020-06-20 20:19:08

（2）时间戳转换成date

import time
import pandas as pd
from datetime import datetimedef time_to_datetime(time_at):return datetime.fromtimestamp(time_at).strftime('%Y-%m-%d %H:%M:%S')print(type(time.time()))   # <class 'float'>
print(time.time())         # 1592655986.43079str_time = time_to_datetime(time.time())
print(type(str_time))      # <class 'str'>
print(str_time)            # 2020-06-20 20:26:26date_time = pd.to_datetime(str_time)
print(type(date_time))     # <class 'pandas._libs.tslibs.timestamps.Timestamp'>
print(date_time)           # 2020-06-20 20:26:26

（3）UTC-本地时区转换

UTC-本地时区转换(python)_Rnan_prince的博客-CSDN博客_python utc时间转换

43、Python如何读取、拆分大文件

import pandas as pd
pd.read_table("data/ex1.csv", chunksize=10000, header=None, sep=',')
for chunk in data:print(chunk)

44、re.match和re.search

a = "back.text"
b = "text.back"
pattern = "back"
if re.match(pattern, a):print(1)
if re.match(pattern, b):print(2)
if re.search(pattern, a):print(3)
if re.search(pattern, b):print(4)
# 1 3 4

45、字典中按照键的顺序输出

dict_1 = {"c": 45254, "a": 333, "b": 908}
for key in dict_1:print(key, dict_1[key])
>>>
c 45254
a 333
b 908for key in sorted(dict_1):print(key, dict_1[key])
>>>
a 333
b 908
c 45254

46、python 保留n位小数

以保留2位小数为例 a = 21.2345：

1、round

print(round(a, 2))   # 21.23 flaot

2、%nf

print('%.2f' % a)   # 21.23 str #

3、'{:.%2f}'.format()

print('{:.2f}'.format(a))   # 21.23 str

47、字符串格式化千分位逗号分隔

print("{:,}".format(99999999))  # 99,999,999

48、删除某目录下的所有文件

import shutil
shutil.rmtree(r'G:\test')

会连带目录一起删掉，如果想不删目录的话，需要自己写代码来递归删除文件夹中的内容，或者还是用这个函数，但是删完以后再新建文件夹。

import shutil
shutil.rmtree('要清空的文件夹名')
os.mkdir('要清空的文件夹名')

其他方法：

os.remove() 方法用于删除指定路径的文件。如果指定的路径是一个目录，将抛出OSError。
os.rmdir() 方法用于删除指定路径的目录。仅当这文件夹是空的才可以, 否则, 抛出OSError。
os.removedirs() 方法用于递归删除目录。像rmdir(), 如果子文件夹成功删除, removedirs()才尝试它们的父文件夹,直到抛出一个error(它基本上被忽略,因为它一般意味着你文件夹不为空)。
os.unlink() 方法用于删除文件,如果文件是一个目录则返回一个错误。

递归删除目录和文件的方法（类似DOS命令DeleteTree）：

import os
for root, dirs, files in os.walk(top, topdown=False):for name in files:os.remove(os.path.join(root, name))for name in dirs:os.rmdir(os.path.join(root, name))

如果想把一个文件从一个文件夹移动到另一个文件夹，并同时重命名，用shutil也很简单：

shutil.move('原文件夹/原文件名','目标文件夹/目标文件名')

49、重组 mat (or array).reshape(c, -1)

特殊用法：mat (or array).reshape(c, -1);
必须是矩阵格式或者数组格式，才能使用 .reshape(c, -1) 函数，表示将此矩阵或者数组重组，以 c行d列的形式表示（-1的作用就在此，自动计算d：d=数组或者矩阵里面所有的元素个数/c, d必须是整数，不然报错）

50、pickle存取文件

import picklein_data = [1, 3, 5, 7, 9]
output_file = open("test.pkl", 'wb')
pickle.dump(in_data, output_file)
output_file.close()input_file = open("test.pkl", 'rb')
out_data = pickle.load(input_file)
print(out_data)  # [1, 3, 5, 7, 9]
# 使用上下文管理器 with：
with open('test.pkl', 'rb') as input_file:pickled_data = pickle.load(input_file)print(out_data)

51、如何查看安装python和包的版本

如何查看python和安装包的版本_Rnan_prince的博客-CSDN博客_如何查看python安装的包版本

52、pip 安装与使用

python基础 - pip 安装与升级_Rnan_prince的博客-CSDN博客

53、Numpy和Pandas使用

Numpy：python基础 - Numpy_Rnan_prince的博客-CSDN博客

Pandas：python基础 - Pandas_Rnan_prince的博客-CSDN博客

54、python 配置虚拟环境，多版本管理

python 配置虚拟环境，多版本管理_Rnan_prince的博客-CSDN博客

55、切片操作 slice

items = [0, 1, 2, 3, 4, 5, 6]
print(items[2:4])
# Out[24]: [2, 3]
a = slice(2, 4)
print(items[a])
# Out[25]: [2, 3]

对迭代器做切片操作 itertools.islice

对生成切做切片操作，普通的切片不能用，可以使用itertools.islice()函数

import itertoolsdef count(n):while True:yield nn += 1c = count(0)
print(c)
# Out[6]: <generator object count at 0x00000209CF554C00>
for x in itertools.islice(c, 10, 20):print(x)
# 10
# ...
# 19

56、将多个映射合并为单个映射 Chainmap

问题的背景是我们有多个字典或者映射，想把它们合并成为一个单独的映射，有人说可以用update进行合并，这样做的问题就是新建了一个数据结构以致于当我们对原来的字典进行更改的时候不会同步。如果想建立一个同步的查询方法，可以使用ChainMap，python3 中使用。

from collections import ChainMapa = {'x': 1, 'z': 3}
b = {'y': 2, 'z': 4}
c = ChainMap(a, b)
print(c)
# Out[5]: ChainMap({'z': 3, 'x': 1}, {'z': 4, 'y': 2})
print(c['x'])
print(c['y'])
print(c['z'])
c["z"] = 4
print(c)
# Out[12]: ChainMap({'z': 4, 'x': 1}, {'z': 4, 'y': 2})
c.pop('z')
print(c)
# Out[14]: ChainMap({'x': 1}, {'z': 4, 'y': 2})
del c["y"]
# ---------------------------------------------------------------------------
# KeyError: "Key not found in the first mapping: 'y'"

57、文本过滤和清理str.translate

s = 'python\fis\tawesome\r\n'
print(s)
# Out[10]: 'python\x0cis\tawesome\r\n'
remap = {ord('\t'): '|',  # 替换ord('\f'): '|',  # 替换ord('\r'): None  # 删除}
a = s.translate(remap)
print(a)
# Out[22]: 'python|is|awesome\n'

58、分数的计算 fractions.Fraction

from fractions import Fraction
a = Fraction(5, 4)
b = Fraction(7, 16)
c = a + b
print(c.numerator)
# Out[30]: 27
print(c.denominator)
# Out[31]: 16

69、时间换算 datetime.timedelta

from datetime import timedeltaa = timedelta(days=2, hours=6)
b = timedelta(hours=4.5)
c = a + b
print(c.days)
# Out[36]: 2
print(c.seconds)
# Out[37]: 37800
print(c.seconds / 3600)
# Out[38]: 10.5
print(c.total_seconds() / 3600)
# Out[39]: 58.5

70、委托迭代 iter()方法

对自定义的容器对象，其内部持有一个列表丶元组或其他的可迭代对象，我们想让自己的新容器能够完成迭代操作。一般来说，我们所要做的就是定义一个__iter__()方法，将迭代请求委托到对象内部持有的容器上。

class Person:def __init__(self, vaule):self._value = vauleself._children = []def __repr__(self):return 'Person({!r})'.format(self._value)def __iter__(self):return iter(self._children)person = Person(30)
person._children = ["zhangSan", "liSi", "wangErMaZi"]
print(person)
# Out[38]: Person(30)
for p in person:print(p)
# Out[39]: zhangSan
# Out[40]: liSi
# Out[41]: wangErMaZi

71、反向迭代 reversed()

假如想要反向迭代序列中的元素，可以使用内建的 reversed()函数。也可以在自己的类中实现__reversed__()方法。具体实现类似于__iter__()方法。

a = [1, 2, ,3 ,4]
for x in reversed(a):print(x)

72、迭代所有可能的组合或排列

itertools.permutations 接受一个元素集合，将其中所有的元素重排列为所有可能的情况，并以元组序列的形式返回。
itertools.combinations 不考虑元素间的实际顺序，同时已经排列过的元素将从从可能的候选元素中移除。若想解除这一限制，可用combinations_with_replacement。

from itertools import permutations
items = ['a', 'b', 'c']
for p in permutations(items):print(p)
# ('a', 'b', 'c')
# ('a', 'c', 'b')
# ('b', 'a', 'c')
# ('b', 'c', 'a')
# ('c', 'a', 'b')
# ('c', 'b', 'a')from itertools import combinations
for c in combinations(items, 2):print(c)
# ('a', 'b')
# ('a', 'c')
# ('b', 'c')from itertools import combinations_with_replacement
for c in combinations_with_replacement(items, 2):print(c)
# ('a', 'a')
# ('a', 'b')
# ('a', 'c')
# ('b', 'b')
# ('b', 'c')
# ('c', 'c')

73、在类中定义多个构造函数

要定义一个含有多个构造函数的类，应该使用类方法。

import timeclass Date:# Primary constructor def __init__(self, year, month, day):self.year = yearself.month = monthself.day = day# Alternate constructor @classmethoddef today(cls):t = time.localtime()return cls(t.tm_year, t.tm_mon, t.tm_mday)b = Date.today()
a = Date(2012, 12, 32)

类方法的一个关键特性就是把类作为其接收的第一个参数(cls)，类方法中会用到这个类来创建并返回最终的实例。

74、添加日志记录

给程序简单的添加日志功能，最简单的方法就是使用 logging 模块了。 logging 的调用 (critical()、error()、warning()、info()、debug())分别代表着不同的严重级别，以降序排列。basicConfig()的 level参数是一个过滤器，所有等级低于此设定的消息都会被忽略掉。

import loggingdef main():logging.basicConfig(filename='app.log', level=logging.ERROR)hostname = 'www.python.org'item = 'spam'filename = 'data.csv'mode = 'r'logging.critical('Host %s unknown', hostname)logging.error("Couldn't find %r", item)logging.warning('Feature is deprecated')logging.info('Opening file %r, mode=%r', filename, mode)logging.debug('Got here')if __name__ == '__main__':main()

输出 app.log：

75、python - 协程异步IO(asyncio)

python - 协程异步IO(asyncio)_Rnan_prince的博客-CSDN博客

76、python - 并发和多线程

python - 并发和多线程_Rnan_prince的博客-CSDN博客

77、创建自定义的异常

创建自定义的异常是非常简单的,只要将它们定义成继承自Exception 的类即可(也可以继承自其他已有的异常类型,如果这么做更有道理的话)。自定义的类应该总是继承自内建的Exception类，或者继承自一些本地定义的基类，而这个基类本身又是继承自Exception 的。虽然所有的异常也都继承自 BaseException，但不应该将它作为基类来产生新的异常。BaseException 是预留给系统退出异常的，比如 KeyboardInterrupt。因此捕获这些异常并不适用于它们本来的用途。

class NetworkError(Exception):passclass HostnameError(NetworkError):pass# when used
try:msg = s.s.recv()
except HostnameError as e:...

如果打算定义一个新的异常并且改写 Exception 的 init()方法，请确保总是用所有传递过来的参数调用 Exception.init()。

class CustomError(Exception):def __init__(self, message, status):super().__init__(message, status)self.message = messageself.status = status

78、python基础 - json用法

python基础 - json_Rnan_prince的博客-CSDN博客

79、两个列表(list)组成字典(dict)

keys = ['a', 'b', 'c']
values = [1, 2, 3]
dictionary = dict(zip(keys, values))
print(dictionary)"""
{'a': 1, 'c': 3, 'b': 2}
"""

80、字符串列表转换为小写或大写

mylist = ['Mixed Case One', 'Mixed Case Two', 'Mixed Three']
print(map(lambda x: x.lower(), mylist))
print(map(lambda x: x.upper(), mylist))print(map(str.lower, mylist))
print(map(str.upper, mylist))

81、迭代器代替while无限循环

读取数据：

def func(s):while True:data = read_data()if data is None:breakprocess_data(data)

你在代码中使用 while 循环来迭代处理数据，因为它需要调用某个函数或者和一般迭代模式不同的测试条件。能不能用迭代器来重写这个循环呢？

iter 函数一个鲜为人知的特性是它接受一个可选的 callable 对象和一个标记(结尾)值作为输入参数。当以这种方式使用的时候，它会创建一个迭代器，这个迭代器会不断调用 callable 对象直到返回值和标记值相等为止。

可以这样写：

def func(s):for data in iter(lambda: read_data(s), None):process_data(data)

这种特殊的方法对于一些特定的会被重复调用的函数很有效果，比如涉及到I/O调用的函数。举例来讲，如果你想从套接字或文件中以数据块的方式读取数据，通常你得要不断重复的执行 read() 或 recv() ，并在后面紧跟一个文件结尾测试来决定是否终止。这节中的方案使用一个简单的 iter() 调用就可以将两者结合起来了。其中 lambda 函数参数是为了创建一个无参的 callable 对象，并为 recv 或 read() 方法提供了 size 参数。

82、大数据集下多层嵌套字典提升效率

三层字典：

link= defaultdict(lambda:defaultdict(lambda:defaultdict(int)))

n层字典(字典树)：

Trie = lambda: collections.defaultdict(Trie)
trie = Trie() # 这是字典树的根

83、字典按照key排序

dict_data = {"s": 1, "f": 2, "r": 3, "t": 4, "g": 5}
print(sorted(dict_data.items(), key=lambda item: item[0]))
# [('f', 2), ('g', 5), ('r', 3), ('s', 1), ('t', 4)]

84、获取当天凌晨零点的时间戳

import datetime
import timenow_time = int(time.time())
day_time = now_time - (now_time - time.timezone) % 86400print(datetime.datetime.fromtimestamp(day_time).strftime("%Y-%m-%d %H:%M:%S"))
# 2020-10-12 00:00:00

85、字典合并

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
# 传统一行式-y优先
z1 = {k: v for d in [x, y] for k, v in d.items()}
# 构造函数-当y的key不是字符串会报错-y优先
z2 = dict(x, **y)
# 迭代器链接-y优先
import itertools
z3 = dict(itertools.chain(x.items(), y.items()))
# key相同时-随机
z4 = dict(x.items() | y.items())

多个 dict 进行合并

def merge_dicts(*dict_args):result = {}for item in dict_args:result.update(item)return result

86、打印堆栈信息

import traceback
traceback.print_exc()

87、py文件转换成pyc文件

py文件编译成pyc文件_Rnan_prince的博客-CSDN博客

88、循环嵌套的优化

    import itertoolsnums1 = [1, 2, 3, 4]nums2 = [5, 6, 7, 8]res1 = []for n1, n2 in itertools.product(nums1, nums2):res1.append([n1, n2])res2 = []for n1 in nums1:for n2 in nums2:res2.append([n1, n2])print(res1 == res2)

89、尽量使用生成器代替列表

##不推荐
def my_range(n):i = 0result = []while i &lt; n:result.append(fn(i))i += 1return result  #  返回列表##推荐
def my_range(n):i = 0result = []while i &lt; n:yield fn(i)  #  使用生成器代替列表i += 1

*尽量用生成器代替列表，除非必须用到列表特有的函数。

90、属性(property)

不推荐

class Clock(object):def __init__(self):self.__hour = 1def set_hour(self, hour):if 0 <= hour <= 25:self.__hour = hourelse:print("BadHourException")def get_hour(self):return self.__hour

91、numpy.where与numpy.random.multinomial用法

numpy.where与numpy.random.multinomial用法_Rnan_prince的博客-CSDN博客

92、遍历python中的对象属性

https://blog.csdn.net/qq_19446965/article/details/120929992

93、list列表groupby分组用法

key是当前分组键，group是一个迭代器，您可以使用该迭代器迭代该分组键所定义的组。换句话说，groupby迭代器本身返回迭代器。

from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):print([thing for thing in group])  # This outputs the contents of things, broken up by line.print([thing for thing in group])  # This line prints an empty list

94、Python-itertools - 高效的迭代器的函数

Python-itertools - 高效的迭代器的函数_Rnan_prince的博客-CSDN博客

二、python容易被忽略的问题

1、int（）强制转换浮点数

在int（）的强制转换浮点数时候，不管是正数还是负数，只取整数部分。

print(int(6.235))  # 6
print(int(-6.235))  # -6

注意：这里不是向上或者向下取整，也不是四舍五入。

2、注意操作的返回值

a = print("python")
print(a)  # None
list_1 = [1,2,3]
list_2 = [4,5,6]
print(list_1.extend(list_2))  # None
print(list_1)  # [1, 2, 3, 4, 5, 6]
list_3 = [1,6,5,8,7,9,4,1,3]
new_list = list_3.sort()
print(new_list)  # None
print(list_3)  # [1, 1, 3, 4, 5, 6, 7, 8, 9]list_4 = [1, 6, 5, 8, 7, 9, 4, 1, 3]
new_list = sorted(list_4)
print(new_list)  # # [1, 1, 3, 4, 5, 6, 7, 8, 9]
print(list_4)  # 不变[1, 6, 5, 8, 7, 9, 4, 1, 3]

3、关联顺序

val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2)val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2)val x = sc.parallelize(List((1, "apple"), (2, "banana"), (3, "orange"), (4, "kiwi")), 2)val y = sc.parallelize(List((5, "computer"), (1, "laptop"), (1, "desktop"), (4, "iPad")), 2)

在这个里面x有的键y是可能没有的。

x.cogroup(y).collect()

可以从结果里看到，x有key为2，而y没有，则cogroup之后，y那边的ArrayBuffer是空。

res23: Array[(Int, (Iterable[String], Iterable[String]))] = Array((4,(ArrayBuffer(kiwi),ArrayBuffer(iPad))), (2,(ArrayBuffer(banana),ArrayBuffer())), (3,(ArrayBuffer(orange),ArrayBuffer())),(1,(ArrayBuffer(apple),ArrayBuffer(laptop, desktop))),(5,(ArrayBuffer(),ArrayBuffer(computer))))

4、不同版本的取整不同

数字的处理

python2的取整方式：print(,,, , )

	3/2	-3/2	int(-3/2)	float(-3)/2	int(float(-3)/2)
python2	1	-2	-2	-1.5	-1
python3	1.5	-1.5	-1	-1.5	-1

为什么python中7/-3等于-3，而c中等于-2?

python的整数除法是 round down的，而C的整数除法是truncation toward zero。

类似的还有 %， python中 7%-3 == -2，而C中7%-3 == 1

5、字符串连接效率

你所不知道的Python | 字符串连接的秘密 - 知乎

（1）加号连接，

r = a + b

（2）使用%操作符

r = '%s%s' % (a, b)

（3）使用format方法

r = '{}{}'.format(a, b)

（4）方法4：使用f-string

r = f'{a}{b}'

（5）使用str.join()方法

r = ''.join([a, b])

连接少量字符串时

使用加号连接符在性能和可读性上都是明智的，如果对可读性有更高的要求，并且使用的Python 3.6以上版本，f-string也是一个非常好的选择，例如下面这种情况，f-string的可读性显然比加号连接好得多。

a = f'姓名：{name} 年龄：{age} 性别：{gender}'
b = '姓名：' + name + '年龄：' + age + '性别：' + gender

连接大量字符串时

join和f-string都是性能最好的选择，选择时依然取决于你使用的Python版本以及对可读性的要求，f-string在连接大量字符串时可读性并不一定好。切记不要使用加号连接，尤其是在for循环中。

6、Python的作用域

Python的作用域一共有4种，分别是：

· L （Local）局部作用域

· E （Enclosing）闭包函数外的函数中

· G （Global）全局作用域

· B （Built-in）内置作用域（内置函数所在模块的范围）

以 L –> E –> G –>B 的规则查找，即：在局部找不到，便会去局部外的局部找（例如闭包），再找不到就会去全局找，再者去内置中找。

global关键字用来在函数或其他局部作用域中使用全局变量。但是如果不修改全局变量也可以不使用global关键字

nonlocal关键字用来在函数或其他作用域中使用外层(非全局)变量

修改变量：

如果需要在函数中修改全局变量，可以使用关键字global修饰变量名。

Python 2.x中没有关键字为在闭包中修改外部变量提供支持，在3.x中，关键字nonlocal可以做到这一点。

7、多版本Python共存，系统找到Python的原理

解决Python多版本兼容性问题_咸鱼程序员的博客-CSDN博客

（1）Windows系统通过环境变量path来找到系统程序所在的位置
（2）当多个版本的Python同时存在时，在环境变量path中靠前的Python版本将被执行
（3）当安装多个版本时，添加环境变量后，打开cmd键入Python即可查看版本

8、python模块中的all属性

python中模块的__all__属性_快递小可的博客-CSDN博客_python中__all__

用于模块导入时限制，如：

from module import *

此时被导入模块若定义了__all__属性，则只有__all__内指定的属性、方法、类可被导入。

若没定义，则导入模块内的所有公有属性，方法和类。

注意正常导入还是可以的，只是import *不可以

9、类变量和实例变量的访问

访问User对象u的name属性（实际上访问__name实例变量）

==》print(u._User__name)

动态地为类和对象添加类变量

Person.name =“aa"
print(person1.name)

Python 允许通过对象访问类变量，但无法通过对象修改类变量的值。因为，通过对象修改类变量的值，不是在给“类变量赋值”，而是定义新的实例变量。
类中，实例变量和类变量可以同名，但是在这种情况下，使用类对象将无法调用类变量，因为它会首选实例变量。

10、python函数定义和调用顺序

在函数中调用其他函数，不需要定义在前，调用在后

def fun1(a, b):c = fun2(a, b)print(c)def fun2(a, b):c = a + breturn c

而实际的函数调用执行操作，就一定要先定义后调用

def fun3(a, b):c = a + bprint(c)fun3(1, 2)

11、from future import print_function用法

阅读代码的时候会看到下面语句：

from __future__ import print_function

查阅了一些资料，这里mark一下常见的用法！

首先我们需要明白该句语句是python2的概念，那么python3对于python2就是future了，也就是说，在python2的环境下，超前使用python3的print函数。

举例如下：

在python2.x的环境是使用下面语句，则第二句语法检查通过，第三句语法检查失败

1 from __future__ import print_function
2 print('you are good')
3 print 'you are good'

所以以后看到这个句子的时候，不用害怕，只是把下一个新版本的特性导入到当前版本！

12、slots性能优化

python-__slots__性能优化_Rnan_prince的博客-CSDN博客

13、python 迭代器单向，迭代器对象只能使用一次

from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus")]
for key, group in groupby(things, lambda x: x[0]):print([thing for thing in group])  # This outputs the contents of things, broken up by line.print([thing for thing in group])  # This line prints an empty list

参考:

Python Cookbook 3rd Edition Documentation — python3-cookbook 3.0.0 文档 https://python-cookbook-3rd-edition.readthedocs.io/zh_CN/latest/index.html

3.9.1 Documentation https://docs.python.org/zh-cn/3/