第二章，也可以去这里查看笔记噢虫洞在这里

2.1使用多个界定符分隔字符串

问题：将一个字符串分隔为多个字段，但是分隔符并不是固定的
方案：string对象的split() 只适用于简单的字符串分隔，他不允许有多个分隔符或者分隔符周围不确定的空格。可以使用re.split()

line = 'asdf fsff; frf, dfsfe,asd. daffpp'
import re
re.split(r'[.;,\s]\s*',line)

['asdf', 'fsff', 'frf', 'dfsfe', 'asd', 'daffpp']

当使用re.split() 时需要注意正则表达式是否包含一个括号捕获分组。如果有，则被匹配的文本也会出现在结果列表中

fields = re.split(r'(;|,|\s)\s*',line)
fields

['asdf', ' ', 'fsff', ';', 'frf', ',', 'dfsfe', ',', 'asd.', ' ', 'daffpp']

如果你不想保留分隔字符串到结果中去，但仍需使用分组正则表达式，确保你的分组是非捕获分组，如（？：…）

re.split(r'(?:,|;|\s)\s*',line)

['asdf', 'fsff', 'frf', 'dfsfe', 'asd.', 'daffpp']

2.2字符串开头或结尾匹配

问题：需要指定文本模式去检查字符串的开头或者结尾，比如文件名后缀
方案：使用str.startswith()或者str.endswith()

filename = 'spam.txt'
filename.startswith('file:')

False

filename.endswith('.txt')

True

url = 'http://www.python.org'
url.startswith('http://')

True

如果想检查多种匹配可能，只需要将所有的匹配项放入到一个元祖中去即可

filenames = ['Makefile','foo.c','bar.py','spam.c','sapm.h']
[name for name in filenames if name.endswith(('.c','.h'))]

['foo.c', 'spam.c', 'sapm.h']

any(name.endswith('.py') for name in filenames)

True

from urllib.request import urlopen
def read_data(name):if name.startswith(('http:','https','ftp')):return urlopen(name).read()else:with open(name) as f:return f.read()

注意：上述两个方法中必须传入一个元祖作为参数，如果传入的是list或者其他的，需要首先调用tuple() 函数进行转换。

choices = ['http:','ftp:']
url = 'http://www.python.org'
url.startswith(choices)

---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)<ipython-input-23-78cd8b4bba7d> in <module>()1 choices = ['http:','ftp:']2 url = 'http://www.python.org'
----> 3 url.startswith(choices)TypeError: startswith first arg must be str or a tuple of str, not list

url.startswith(tuple(choices))

True

startswith（）和endswith()也可以由切片来完成

filename = 'spam.txt'
filename[-4:] == '.txt'

True

也可以使用正则表达式实现

import re
url = 'http://www.pyhton.org'
re.match('http:|https:|ftp:',url)

<_sre.SRE_Match object; span=(0, 5), match='http:'>

2.3使用shell通配符匹配字符串

问题：想使用 Unix Shell 中常用的通配符 (比如 .py , Dat[0-9].csv 等) 去匹配文本字符串
方案：fnmatch模块提供了两个函数：fnmatch()和 fnmatchcase()

from fnmatch import fnmatch,fnmatchcase
fnmatch('foo.txt','*.txt')

True

fnmatch('foo.txt','?oo.txt')

True

fnmatch('Dat45.csv','Dat[0-9]*')

True

names = ['Dat1.csv','Dat2.csv','config.ini','foo.py']
[name for name in names if fnmatch(name,'Dat[1-9].csv')]
#[name for name in names if fnmatch(name,'Dat*.csv')]

['Dat1.csv', 'Dat2.csv']

fnmatch()使用底层操作系统的大小写敏感规则,根据您的操作系统会有区别。如果您的系统是敏感的，该函数也是敏感的

fnmatch('foo.txt','.Txt')

False

fnmatchcase()函数可以代替，它完全使用你的模式大小写去匹配

fnmatchcase('foo.txt','.Txt')

False

addresses = [
'5412 N CLARK ST',
'1060 W ADDISON ST',
'1039 W GRANVILLE AVE',
'2122 N CLARK ST',
'4802 N BROADWAY',
]
from fnmatch import fnmatchcase
[addr for addr in addresses if fnmatchcase(addr,'*ST')]

['5412 N CLARK ST', '1060 W ADDISON ST', '2122 N CLARK ST']

[addr for addr in addresses if fnmatchcase(addr,'54[0-9][0-9] *CLARK*')]

['5412 N CLARK ST']

2.4字符串匹配和搜索

问题：需要匹配或者搜索特定模式的文本
方案：可以使用str.find()和str.endswith()和str.startswith() 函数

text = 'yeah, but no, but yeah,but no,but yeah'
text == 'yeah'

False

text.startswith('yeah')

True

text.endswith('no')

False

find() 函数会返回搜索文本第一次出现的位置

text.find('no')

对于复杂的匹配则需要使用正则表达式和re模块
\d+ 是指匹配一个或者多个数字

text1 = '11/27/2012'
text2 = 'Nov 27,2012'
import re
if re.match(r'\d+/\d+/\d+',text1):print('yes')
else:print('no')

yes

if re.match(r'\d+/\d+/\d+',text2):print('yes')
else:print('no')

no

如果想使用同一个模式去匹配多次，应该先将匹配模式字符串编译为模式对象
match() 函数总是从字符串的开始去匹配，如果像查找字符串的任意位置可以使用findall()方法

datepat = re.compile(r'\d+/\d+/\d+')
if re.match(data,text1):print('yes')
else:print('no')

yes

text = 'Today is 11/23/2018.Pycon starts 3/13/2019'
datepat.findall(text)

['11/23/2018', '3/13/2019']

在定义正则式的时候常会使用括号去分组捕获。因为分组捕获使得后面的处理更加简单，可以分别将每个组的内容提取出来

datepat2 = re.compile(r'(\d+)/(\d+)/(\d+)')
m = datepat2.match('11/23/2018')

<_sre.SRE_Match object; span=(0, 10), match='11/23/2018'>

m.group(0)

'11/23/2018'

m.group(1)

'11'

m.group(2)

'23'

m.group(3)

'2018'

m.groups()

('11', '23', '2018')

month,day,year = m.groups()
year

'2018'

findall()会搜素文本并以list 的形式返回匹配的结果。

text = 'Today is 11/23/2018.Pycon starts 3/13/2019'
datepat3 = re.compile(r'(\d+)/(\d+)/(\d+)')
datepat3.findall(text)

['11/23/2018', '3/13/2019']

for month,day,year in datepat3.findall(text):print('{}-{}-{}'.format(year,month,day))

2018-11-23
2019-3-13

使用re模块的基本方法是：先使用re.compile() 编译正则表达式字符串，然后使用match()、findall()或者finditer() 方法
match() 函数从字符串开始的地方匹配，但他的结果有可能不是期望的

datepat = re.compile(r'(\d+)/(\d+)/(\d+)')
m = datepat.match('11/27/2012asdafa')
m

<_sre.SRE_Match object; span=(0, 10), match='11/27/2012'>

m.group()

'11/27/2012'

如果需要精确匹配可以在正则表达式的末尾加上 $

datepat = re.compile(r'(\d+)/(\d+)/(\d+)$')
#不会有任何输出
datepat.match('11/27/2012asdafa')

datepat.match('11/27/2012')

<_sre.SRE_Match object; span=(0, 10), match='11/27/2012'>

2.5字符串搜索与替换

问题：想在字符串中搜索制定的模式并替换
方案：直接使用str.replace()

text = 'yeah,but no,yeah,but no,but yeah'
text.replace('yeah','yea')

'yea,but no,yea,but no,but yea'

对于更为复杂的可以使用re模块的sub()函数
*sub() 函数中的第一个参数是被匹配的模式，第二个参数是替换模式。反斜杠数字比如 \3 指向前面模式的捕获组号。

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
import re
re.sub(r'(\d+)/(\d+)/(\d+)',r'\3-\1-\2',text)

'today is 2012-11-27.PyCon starts 2013-3-13'

如果想要多次匹配相同的模式，可以使用先编译来提升性能

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
import re
datepat = re.compile(r"(\d+)/(\d+)/(\d+)")
datepat.sub(r'\3-\1-\2',text)

'today is 2012-11-27.PyCon starts 2013-3-13'

对于更复杂的，可以传递一个替换回调函数

from calendar import month_abbr
def change_date(m):mon_name = month_abbr[int(m.group(1))]return '{} {} {}'.format(m.group(2),mon_name,m.group(3))text = 'today is 11/27/2012.PyCon starts 3/13/2013'
datepat.sub(change_date,text)

'today is 27 Nov 2012.PyCon starts 13 Mar 2013'

如果除了想替换并且有多少地方发生了替换可以使用subn()

text = 'today is 11/27/2012.PyCon starts 3/13/2013'
new_text,n = datepat.subn(r'\3-\1-\2',text)

new_text

'today is 2012-11-27.PyCon starts 2013-3-13'

2.5字符串忽略大小写的搜索替换

问题：以忽略大小写的方式进行搜索替换
方案：可以使用re模块的时候提供一个参数re.IGNORECASE

text = 'UPDATE PYTHON,lower python,Mixed Python'
re.findall('python',text,flags=re.IGNORECASE)

['PYTHON', 'python', 'Python']

re.sub('python','snake',text,flags=re.IGNORECASE)

'UPDATE snake,lower snake,Mixed snake'

Python Cookbook学习笔记ch2_01相关推荐

Machine Learning with Python Cookbook 学习笔记第8章
Chapter 8. Handling Images 前言本笔记是针对人工智能典型算法的课程中Machine Learning with Python Cookbook的学习笔记学习的实战代码都放 ...
Machine Learning with Python Cookbook 学习笔记第9章
Chapter 9. Dimensionality Reduction Using Feature Extraction 前言本笔记是针对人工智能典型算法的课程中Machine Learning w ...
Machine Learning with Python Cookbook 学习笔记第6章
Chapter 6. Handling Text 本笔记是针对人工智能典型算法的课程中Machine Learning with Python Cookbook的学习笔记学习的实战代码都放在代码压缩 ...
python Cookbook 学习笔记（一）
文章目录前言一. 数据结构和算法 1.获取可迭代对象部分元素用处讨论 2. 保存最后N个元素常利用depue队列特性:先进先出 3. 找到最大或最小的N个元素 4. 创建一键多值的字典 5. 让 ...
python cookbook 学习笔记 -- 1.5 去除字符串两端空格
任务:将字符串中开头和结尾的多余空格去掉解决方案: 使用string对象的lstrip,rstrip,strio方法.这几个方法都不需要参数,可以直接返回一个删除了开头,末尾或者两端的空格的原字符串 ...
python做直方图-python OpenCV学习笔记实现二维直方图
本文介绍了python OpenCV学习笔记实现二维直方图,分享给大家,具体如下: 官方文档 – https://docs.opencv.org/3.4.0/dd/d0d/tutorial_py_2d ...
python 正则学习笔记
python 正则学习笔记官方document #1.0 import re m=re.search('(?<=abc)def','cxabcdefgb')print(m.group(0))# ...
Python数据结构学习笔记——链表：无序链表和有序链表
目录一.链表二.无序链表实现步骤分析三.无序链表的Python实现代码四.有序链表实现步骤分析五.有序链表的Python实现代码结语一.链表链表中每一个元素都由为两部分构成:一是该 ...
Python数据结构学习笔记——队列和双端队列
目录一.队列的定义二.队列实现步骤分析三.队列的Python实现代码四.队列的应用六人传土豆游戏五.双端队列的定义六.双端队列实现步骤分析七.双端队列的Python实现代码八.双 ...

Python Cookbook学习笔记ch2_01

2.1使用多个界定符分隔字符串

2.2字符串开头或结尾匹配

2.3使用shell通配符匹配字符串

2.4字符串匹配和搜索

2.5字符串搜索与替换

2.5字符串忽略大小写的搜索替换

Python Cookbook学习笔记ch2_01相关推荐

最新文章

热门文章