Python读CookBook之数据结构和算法

1.将序列分解为单独的变量

任何序列（可迭代的变量）都可以通过一个简单的赋值操作来分解为单独的变量。唯一的要求是变量的总数和结构要与序列相吻合

data = ["Mike", 22, 73, (2017, 12, 28)]
name, age, score, (year, month, date) = data
print(name, age, score, year, month, date)

Mike 22 73 2017 12 28

分解操作时，可以用一个用不到的变量名来丢弃某一变量

data = ["Mike", 22, 73, (2017, 12, 28)]
_, age, score, (_, _, date) = data
print(age, score, date)

22 73 28

2.从任意长度的可迭代对象中分解元素

使用*表达式可以表示被*修饰的变量代表n个元素的列表 n 可以为0 可以为无限大

record = ("Jack", 22, "15012345678", "18099883311")
name, age, *phone = record
print(name, age, phone)

Jack 22 ['15012345678', '18099883311']

注意：分解一个元素时，只能有一个被*修饰的变量

3.保存最后N个元素

from collections import dequedef search(lines, pattern, history=5):previous_lines = deque(maxlen=history)for line in lines:if pattern in line:yield line, previous_linesprevious_lines.append(line)if __name__ == "__main__":with open("D:/Test1.txt") as f:for line, prelines in search(f, "456", 5):for pline in prelines:print(pline, end="")print(line, end="")print("-"*20)

123
456
--------------------

collection模块的deque能很好的完成这个工作，切deque在头尾位置插入数据时时间复杂度都为 O（1）

4.找到最大最小N个元素

①找最大最小的元素

使用 min() max()函数,时间复杂度 O（n）-- n为序列的长度

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
maxnum = max(num)
minnum = min(num)
print(maxnum, "----", minnum)

42 ---- -4

②相对于列表长度极小（例如 N=2）

使用heapq库中的和heapify使序列成堆的形式分布，且第一个元素永远是最小的那个元素,此时，使用heappop()函数会弹出最小的那个元素，第二小的取而代之处于首元素的位置。

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heap = list(num)
heapq.heapify(heap)
print(heap)
print("="*50)print(heapq.heappop(heap))
print(heap)
print("="*50)print(heapq.heappop(heap))
print(heap)
print("="*50)

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]
==================================================
-4
[1, 2, 2, 23, 7, 8, 18, 23, 42, 37]
==================================================
1
[2, 2, 8, 23, 7, 37, 18, 23, 42]
==================================================

该方法时间复杂度为O（logn） n 为序列长度

③N相对数组长度小（例如N = 4）

使用heapq模块中的 nlargest() nsmallest()函数，这两个函数可以接受一个key作为参数

data = [{"name": "Jack", "age": 21, "score": 99},{"name": "Ben", "age": 22, "score": 90},{"name": "Mark", "age": 20, "score": 72},{"name": "Cook", "age": 20, "score": 53},{"name": "Antony", "age": 23, "score": 94},{"name": "Chris", "age": 24, "score": 62},{"name": "Ken", "age": 22, "score": 81},{"name": "Jackie", "age": 20, "score": 85},{"name": "David", "age": 22, "score": 89},{"name": "Jackson", "age": 23, "score": 89},{"name": "Lucy", "age": 22, "score": 77}
]scoreMax = heapq.nlargest(4, data, key=lambda s: s["score"])
scoreMin = heapq.nsmallest(4, data, key=lambda s: s["score"])
print(scoreMax)
print(scoreMin)

[{'name': 'Jack', 'age': 21, 'score': 99}, {'name': 'Antony', 'age': 23, 'score': 94}, {'name': 'Ben', 'age': 22, 'score': 90}, {'name': 'David', 'age': 22, 'score': 89}]
[{'name': 'Cook', 'age': 20, 'score': 53}, {'name': 'Chris', 'age': 24, 'score': 62}, {'name': 'Mark', 'age': 20, 'score': 72}, {'name': 'Lucy', 'age': 22, 'score': 77}]

上面现象可以看出，有相同数据时，优先选取顺序在前的

④当N接近于序列的大小

使用sorted()并进行切片操作

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
lst = sorted(num)
lstmax = lst[:8]
print(lstmax)
lstrev = sorted(num, reverse=True)
lstmin = lstrev[:8]
print(lstmin)

或

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
lst = sorted(num)
lstmax = lst[:8]
print(lstmax)
lstmin = lst[-8:]
print(lstmin)

5.实现优先级队列

使用heapq（堆操作）的heappush heappop实现这一操作

import heapqclass PriorityQueue(object):def __init__(self):self._queue = []self._index = 0def push(self, item, priority):heapq.heappush(self._queue, (-priority, self._index, item))self._index += 1def pop(self):return heapq.heappop(self._queue)[-1]class Item(object):def __init__(self, name):self.name = namedef __repr__(self):return self.nameif __name__ == "__main__":q = PriorityQueue()q.push(Item("Jack"), 1)q.push(Item("Mike"), 2)q.push(Item("Ben"), 3)q.push(Item("David"), 1)for i in range(q._index):print(q.pop())

Ben
Mike
Jack
David

6.在字典中将键映射到多个值上：

使用collection模块中的defaultdict类来实现

当属性为list

from collections import defaultdict


d = defaultdict(list)
d["a"].append(1)
d["a"].append(1)
d["b"].append(2)
d["c"].append(3)
d["d"].append(4)for key, values in d.items():print(key, ":", values)

a : [1, 1]
b : [2]
c : [3]
d : [4]

当属性为set

from collections import defaultdictd = defaultdict(set)
d["a"].add(1)
d["a"].add(1)
d["b"].add(2)
d["c"].add(3)
d["d"].add(4)for key, values in d.items():print(key, ":", values)

a : {1}
b : {2}
c : {3}
d : {4}

不过这种方法会预先建立一个空的表项

也可通过普通字典的setdefault属性来实现这个功能

d = {}
d.setdefault("a", []).append(1)
d.setdefault("a", []).append(2)
d.setdefault("b", []).append(3)
d.setdefault("c", []).append(4)for key, values in d.items():print(key, ":", values)

a : [1, 2]
b : [3]
c : [4]

不过这种方法每次都会创建一个新实例 [] 或者（）

列举一个循环插入的示例：

from collections import defaultdictd = defaultdict(list)for key, values in pairs:d[key].append[values]

7.让字典保持有序

使用collection模块中的OrderedDict

from collections import OrderedDictd = OrderedDict()
d["foo"] = 1
d["bar"] = 2
d["spam"] = 3
d["grok"] = 4
d["foo"] = 5for k in d:print(k, ":", d[k])

foo : 5
bar : 2
spam : 3
grok : 4

由此可见，更改已经插入的键的值不会影响该项在排序字典中的位置

OrderedDict由一组双向链表维护，大小为普通字典内存的两倍

可适用于JSON格式文件编码时控制各字段的顺序

8.与字典有关的计算问题

prices = {"ACME": 45.23,"AAPL": 612.78,"IBM": 205.55,"HPQ": 37.20,"FB": 10.75
}print(max(zip(prices.keys(), prices.values())))
print(min(zip(prices.keys(), prices.values())))
print("-"*10)
prices_sorted = sorted(zip(prices.keys(), prices.values()))
for k in prices_sorted:print(k)

('IBM', 205.55)
('AAPL', 612.78)
----------
('AAPL', 612.78)
('ACME', 45.23)
('FB', 10.75)
('HPQ', 37.2)
('IBM', 205.55)

zip可以反转key和value，且不改变字典原结构，属于迭代器，只能被消费一次

如果比较字典只会用key进行比较

如果我们换一种方式，操作如下

prices = {"ACME": 45.23,"AAPL": 612.78,"IBM": 205.55,"HPQ": 37.20,"FB": 10.75
}minItem = min(prices, key=lambda k: prices[k])
maxItem = max(prices, key=lambda k: prices[k])
minValue = prices[minItem]
maxValue = prices[maxItem]
print(minItem, maxItem, "="*5, minValue, maxValue)

FB AAPL ===== 10.75 612.78

9.在两个字典中寻找相同点

通过keys() items() 的 + - & 计算进行操作

a = {"x": 1, "y": 2, "z": 3}
b = {"w": 10, "x": 11, "y": 2}
# Find Common Keys
print(a.keys() & b.keys())
# Find keys in a but not in b
print(a.keys() - b.keys())
# Find {keys,valus} in commom
print(a.items() & b.items())
# Create a new dictionary with certain keys removed
c = {key: a[key] for key in a.keys() - {"z", "w"}}
print(c)

{'y', 'x'}
{'z'}
{('y', 2)}
{'y': 2, 'x': 1}

10.从序列中移除重复项目且保持元素间顺序不变

如过序列中的值可哈希 ---- 生存期内不可变的对象，有一个__hash__()方法，如整数、浮点数、字符串、元组

def dedupe(items):seen = set()for item in items:if item not in seen:yield itemseen.add(item)a = [1, 5, 2, 1, 9, 1, 5, 10]
lst = list(dedupe(a))
print(lst)

[1, 5, 2, 9, 10]

如果值不可哈希

def dedupe(items, key=None):seen = set()for item in items:val = item if key is None else key(item)if val not in seen:yield itemseen.add(val)b = [{"x": 1, "y": 2},{"x": 1, "y": 3},{"x": 1, "y": 2},{"x": 2, "y": 4},]
lst = list(dedupe(b, key=lambda k: (k["x"], k["y"])))
print(lst)
lst2 = list(dedupe(b, key=lambda k: (k["x"])))
print(lst2)

想办法将不可哈希的项改为可哈希的项

set也可以去重复，但是无法保证原来的顺序不变

11.对切片命名

s = "Hello World"
a = slice(2, 5)
print(s[a])

llo

可以使用indice(size)将slice限定在安全的范围内

s = "HelloWorld"
a = slice(5, 50, 2)
print(a.start)
print(a.stop)
print(a.step)
print(a.indices(len(s)))
for i in range(*a.indices(len(s))):print(s[i])

5
50
2
(5, 10, 2)
W
r
d

这样就不会因为切片的大小问题出现IndexError

12.找出序列中出现最多次数的元素

collection中的Counter类实现此功能

from collections import Counterwords = ['ear', 'head', 'nose', 'ear', 'look', 'see','head', 'ear', 'nose', 'ear', 'read', 'see','head', 'see', 'watch', 'look', 'hair', 'see','ear', 'big', 'small', 'do', 'hair', 'nose','head', 'big', 'large', 'ear', 'do', 'ear'
]word_counter = Counter(words)
most_three_couter = word_counter.most_common(3)
print(most_three_couter)

[('ear', 7), ('head', 4), ('see', 4)]

手动增加计数

words = ['ear', 'head', 'nose', 'ear', 'look', 'see','head', 'ear', 'nose', 'ear', 'read', 'see','head', 'see', 'watch', 'look', 'hair', 'see','ear', 'big', 'small', 'do', 'hair', 'nose','head', 'big', 'large', 'ear', 'do', 'ear'
]addwords = ['ear', 'head', 'small', 'big', 'do']word_counter = Counter(words)for word in addwords:word_counter[word] += 1print(word_counter)

或

words = ['ear', 'head', 'nose', 'ear', 'look', 'see','head', 'ear', 'nose', 'ear', 'read', 'see','head', 'see', 'watch', 'look', 'hair', 'see','ear', 'big', 'small', 'do', 'hair', 'nose','head', 'big', 'large', 'ear', 'do', 'ear'
]addwords = ['ear', 'head', 'small', 'big', 'do']word_counter = Counter(words)word_counter.update(addwords)print(word_counter)

Counter({'ear': 8, 'head': 5, 'see': 4, 'nose': 3, 'big': 3, 'do': 3, 'look': 2, 'hair': 2, 'small': 2, 'read': 1, 'watch': 1, 'large': 1})

Counter可以做加减法

words = ['ear', 'head', 'nose', 'ear', 'look', 'see','head', 'ear', 'nose', 'ear', 'read', 'see','head', 'see', 'watch', 'look', 'hair', 'see','ear', 'big', 'small', 'do', 'hair', 'nose','head', 'big', 'large', 'ear', 'do', 'ear'
]addwords = ['ear', 'head', 'small', 'big', 'do']word_counter = Counter(words)
addwords_counter = Counter(addwords)
mix = word_counter + addwords_counter
print(mix)
subtract = word_counter - addwords_counter
print(subtract)

Counter({'ear': 8, 'head': 5, 'see': 4, 'nose': 3, 'big': 3, 'do': 3, 'look': 2, 'hair': 2, 'small': 2, 'read': 1, 'watch': 1, 'large': 1})
Counter({'ear': 6, 'see': 4, 'head': 3, 'nose': 3, 'look': 2, 'hair': 2, 'read': 1, 'watch': 1, 'big': 1, 'do': 1, 'large': 1})

13.通过公共键对字典列表排序

使用operator中的itemgetter函数进行排序

from operator import itemgetterdata = [{'ID': 1, "Name": "Ben", "score": 88},{'ID': 2, "Name": "Jack", "score": 92},{'ID': 3, "Name": "Mike", "score": 73},{'ID': 4, "Name": "Mark", "score": 81},{'ID': 5, "Name": "Lucy", "score": 95},
]rows_by_Name = sorted(data, key=itemgetter('Name'))
rows_by_Score = sorted(data, key=itemgetter('score'))
print(rows_by_Name)
print(rows_by_Score)

[{'ID': 1, 'Name': 'Ben', 'score': 88}, {'ID': 2, 'Name': 'Jack', 'score': 92}, {'ID': 5, 'Name': 'Lucy', 'score': 95}, {'ID': 4, 'Name': 'Mark', 'score': 81}, {'ID': 3, 'Name': 'Mike', 'score': 73}]
[{'ID': 3, 'Name': 'Mike', 'score': 73}, {'ID': 4, 'Name': 'Mark', 'score': 81}, {'ID': 1, 'Name': 'Ben', 'score': 88}, {'ID': 2, 'Name': 'Jack', 'score': 92}, {'ID': 5, 'Name': 'Lucy', 'score': 95}]

可以用lamda来代替itemgetter 但是通常itemgetter效率高

14.对不原生支持比较操作的对象排序

比如必将两个对象就可以通过对象的属性进行比较

class User(object):def __init__(self, id):self.id = iddef __repr__(self):return "User({})".format(self.id)users = [User(2), User(3), User(1)]
print(users)
user_ordered = sorted(users, key=lambda k: k.id)
print(user_ordered)

[User(2), User(3), User(1)]
[User(1), User(2), User(3)]

也可以使用operator中的attrgetter（）

from operator import attrgetterclass User(object):def __init__(self, id):self.id = iddef __repr__(self):return "User({})".format(self.id)users = [User(2), User(3), User(1)]
print(users)
user_ordered = sorted(users, key=attrgetter('id'))
print(user_ordered)

同理使用attrgetter的效率高一些

15.根据字段将记录分组

通过itemgetter与itertools模块中的groupby实现

from operator import itemgetterfrom itertools import groupby

data = [    {"Name": "Jack", "Age": 21},    {"Name": "Ben", "Age": 22},    {"Name": "Lucy", "Age": 23},    {"Name": "Lily", "Age": 23},    {"Name": "Mike", "Age": 21},    {"Name": "Martin", "Age": 21},    {"Name": "Susan", "Age": 23},    {"Name": "Rose", "Age": 22},    {"Name": "Hill", "Age": 22},]

sortdata = sorted(data, key=itemgetter('Age'))

for age, rows in groupby(sortdata,key=itemgetter("Age")):    print(age)    for row in rows:        print(" ", row)

21{'Name': 'Jack', 'Age': 21}{'Name': 'Mike', 'Age': 21}{'Name': 'Martin', 'Age': 21}
22{'Name': 'Ben', 'Age': 22}{'Name': 'Rose', 'Age': 22}{'Name': 'Hill', 'Age': 22}
23{'Name': 'Lucy', 'Age': 23}{'Name': 'Lily', 'Age': 23}{'Name': 'Susan', 'Age': 23}

分组之前先对序列进行排序

16.筛选序列中的元素：

内建函数fiter

def is_int(Val):if isinstance(Val, int):return Trueelse:return Falsea = [1, 2, "aaa", "b", 5, "cc"]
b = list(filter(is_int, a))
print(b)

[1, 2, 5]

itertools库中的compress

from itertools import compressname = ["Jack", "Lucy", "Ben", "Lily", "Mike"]
age = [8, 11, 12, 9, 11]more = [n > 10 for n in age]
print(more)l = list(compress(name, more))
print(l)

[False, True, True, False, True]
['Lucy', 'Ben', 'Mike']

得到一组BOOL变量后使用compress

17.从字典中获取子集

字典推导式

data = {"a": 1,"b": 3,"c": 5,"d": 7,"f": 8
}sondata1 = {k: v for k, v in data.items() if v > 4}
dataname = ["a", "c", "f"]
sondata2 = {k: v for k, v in data.items() if k in dataname}
print(sondata1)
print(sondata2)

{'c': 5, 'd': 7, 'f': 8}
{'a': 1, 'c': 5, 'f': 8}

18.将名称映射到序列的元素中

使用collection模块中的namedturple

from collections import  namedtuplename = ("Data", ["Name", "Id"])
Data = namedtuple(name[0], name[1])
data = Data("Mike", 1)
print(data)def change_data(s):return data._replace(**s)a = {"Name": "Jack", "Id": 2}
b = change_data(a)
print(b)

Data(Name='Mike', Id=1)
Data(Name='Jack', Id=2)

元素不能改变，通过_replace改变，数据量大使用类__slot___方式实现

19.同时对数据做转换和运算

生成器列表

num = [1, 2, 3, 4, 5]
sum = [n * n for n in num]
print(sum)

[1, 4, 9, 16, 25]

20.将多个映射合并为单个映射

collection模块中的ChainMap

有相同的键会使用第一个字典的值，增删改查操作总是会影响第一个字典

from collections import ChainMapa = {"x": 1, "y": 2}
b = {"z": 3, "y": 4}
c = ChainMap(a, b)
print(c)
print(c["y"])
a["x"] = 5
print(c)

ChainMap({'x': 1, 'y': 2}, {'z': 3, 'y': 4})
2
ChainMap({'x': 5, 'y': 2}, {'z': 3, 'y': 4})

同样可以建立一个用于合成两个字典的新字典使用update

a = {"x": 1, "y": 2}
b = {"z": 3, "y": 4}
c = dict(a)
c.update(b)
print(c)
a["x"] = 5
print(c)

{'x': 1, 'y': 4, 'z': 3}
{'x': 1, 'y': 4, 'z': 3}

转载于:https://www.cnblogs.com/Msh0923/p/8138271.html

Python读CookBook之数据结构和算法相关推荐

Python–cookbook–1.数据结构与算法
Python–cookbook–1.数据结构与算法文章目录 Python–cookbook–1.数据结构与算法解压序列赋值给多个变量解压可迭代对象赋值给多个变量文件对比,对比当前行和之前行查 ...
python思想读后感_数据结构与算法：Python语言描述读后感1000字
<数据结构与算法:Python语言描述>是一本由裘宗燕著作,机械工业出版社出版的平装图书,本书定价:CNY 45.00,页数:343,特精心从网络上整理的一些读者的读后感,希望对大家能有帮 ...
python函数结构图_Python数据结构与算法之图结构（Graph）实例分析
本文实例讲述了Python数据结构与算法之图结构(Graph).分享给大家供大家参考,具体如下: 图结构(Graph)--算法学中最强大的框架之一.树结构只是图的一种特殊情况. 如果我们可将自己的工作 ...
python棋盘最短路径_Python数据结构与算法之图的最短路径(Dijkstra算法)完整实例...
本文实例讲述了Python数据结构与算法之图的最短路径(Dijkstra算法).分享给大家供大家参考,具体如下: # coding:utf-8 # Dijkstra算法--通过边实现松弛 # 指定一个 ...
python遍历树结构_python 数据结构与算法——树的遍历
1.广度优先遍历 2.深度优先遍历先序遍历:把根放在最前面中序遍历:把根放在中间后序遍历:把根放在后面 # -*- coding: utf-8 -*- """ Cr ...
python @修饰符_数据结构与算法之8——抽象数据类型与python类
就算你是特别聪明,也要学习,从头学起!--(俄国)屠格涅夫本篇文章要说的主要是数据结构与算法和python中关于类(Class)以及异常(Error)的一些基础,虽然很简单,但是必须非常重视.只有在 ...
python 熊猫钓鱼_Python数据结构与算法之使用队列解决小猫钓鱼问题
本文实例讲述了Python数据结构与算法之使用队列解决小猫钓鱼问题.分享给大家供大家参考,具体如下: 按照<啊哈>里的思路实现这道题目,但是和结果不一样,我自己用一幅牌试了一下,发现是我的 ...
【Python学习教程】数据结构与算法
前言 python内置的数据结构包括:列表(list).集合(set).字典(dictionary),一般情况下我们可以直接使用这些数据结构,但通常我们还需要考虑比如搜索.排序.排列以及赛选等一些常见 ...
数据结构python课后答案_数据结构与算法:Python语言描述 1~5章课后习题
数据结构与算法:Python语言描述 1~5章课后习题发布时间:2018-07-19 20:42, 浏览次数:1885 , 标签: Python MarkDown语法写的,不知道为啥上传到CSDN不 ...
Python（一）数据结构和算法
1.将序列分解为单独的变量任何的序列(或者是可迭代对象)可以通过一个简单的赋值操作来分解为单独的变量. 唯一的要求就是变量的总数和结构必须与序列相吻合. 如果元素的数量不匹配,会得到一个错误提示示 ...

Python读CookBook之数据结构和算法

Python读CookBook之数据结构和算法相关推荐

最新文章

热门文章