关于re库在实战后的快速上手小结

一、re库中的函数
- 1.re库中的match()函数
- - （1）match()函数-最常规的匹配
  - （2）match()函数-泛匹配
  - （3）match()函数-选择目标匹配
  - （4）match()函数-贪婪与非贪婪匹配
  - （5）match()函数-多行匹配
  - （6）match()函数-特殊字符转义后匹配
- 2.re库中的search()函数
- - （1）search()函数与match()函数的区别
  - （2）search()函数的实战演练
- 3.re库中的findall()函数
- - （1）findall()函数与search()函数的区别
- 4.re库中的sub()函数
- - （1）sub()函数的替换作用
  - （2）sub()函数的实战演练
- 5.re库中的compile()函数

一、re库中的函数

1.re库中的match()函数

（1）match()函数-最常规的匹配

从首个字符开始匹配，如果开头匹配不到则后面全匹配不到（人生的扣子从第一颗开始就要扣好）

import recontent = "Hello 123 4567 World_this is a Regex Demo"
print(len(content))
result = re.match("^Hello\s\d\d\d\s\d{4}\s\w{10}.*Demo$",content)
print(result)
print(result.group())# 直接输出match到的结果
print(result.span())# 输出匹配到的结果的范围

（2）match()函数-泛匹配

import recontent = "Hello 123 4567 World_this is a Regex Demo" # 最常规的匹配
print(len(content))
result = re.match("^Hello.*Demo$",content)
print(result)
print(result.group())# 直接输出match到的结果
print(result.span())# 输出匹配到的结果的范围

（3）match()函数-选择目标匹配

import recontent = "Hello 1234567 World_this is a Regex Demo" # 最常规的匹配
print(len(content))
result = re.match("^Hello\s(\d+)\sWorld.*Demo$",content)
print(result)
print(result.group(1))# 直接输出match到的结果中的第一个()中的内容
print(result.span())# 输出匹配到的结果的范围

（4）match()函数-贪婪与非贪婪匹配

import recontent = "Hello 1234567 World_this is a Regex Demo"
result1 = re.match("^He.*(\d+).*World.*Demo$",content)
result2 = re.match("^He.*?(\d+).*World.*Demo$",content)
print(result1)
print(result2)
print(result1.group(1))
print(result2.group(1))

（5）match()函数-多行匹配

import re
content = """Hello 1234567 World_this
is a Regex Demo"""
result = re.match("^He.*?(\d+).*?Demo$",content)
print(result)
print(result.group(1))# 直接输出match到的结果中的第一个()中的内容

import re
content = """Hello 1234567 World_this
is a Regex Demo"""
print(len(content))
result = re.match("^He.*?(\d+).*?Demo$",content,re.S)
print(result)
print(result.group(1))# 直接输出match到的结果中的第一个()中的内容

（6）match()函数-特殊字符转义后匹配

import re
content = 'price is $5.00'
result = re.match('price is $5.00',content)
print(result)

import re
content = 'price is $5.00'
result = re.match('price is \$5\.00',content)
print(result)

2.re库中的search()函数

（1）search()函数与match()函数的区别

re.search()扫描整个字符串并返回第一个成功的匹配，没有了re.match()的首字符必须一致的限制。相对于re.match来说re.search更加灵活

import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
result = re.match("Hello.*?(\d+).*?Demo",content)
print(result)

import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
result = re.search("Hello.*?(\d+).*?Demo",content)
print(result)

（2）search()函数的实战演练

import rehtml = """<div id="songs-lsit"><h2 class="title">"经典老歌"</h2><p class="introduction">经典老歌列表</p><ul id="list" class="list-group"><li data-view="2">一路有你</li><li data-view="7"><a href="/2.mp3" singer="任贤齐">沧海一声笑</a></li><li data-view="4"class="active"><a href="/3.mp3" singer="齐秦">往事随风</a><li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li><li data-view="5"><a href="/5.mp3" singer="陈慧琳">记事本</a></li><li data-view="5"><a href="/6.mp3" singer="邓丽君"><i class="fa fa-user"></li>但愿人长久</a></li></ul>
</div>
"""result1 = re.search('<li.*?active.*?singer="(.*?)">(.*?)</a>',html,re.S)
if result1:print(result1.group(1),result1.group(2))result2 = re.search('<li.*?singer="(.*?)">(.*?)</a>',html,re.S)
if result2:print(result2.group(1),result2.group(2)) result3 = re.search('<li.*?singer="(.*?)">(.*?)</a>',html)
if result3:print(result3.group(1),result3.group(2))

3.re库中的findall()函数

（1）findall()函数与search()函数的区别

与search()函数相比，findall()函数匹配的内容更多，findall()函数可以提取所有符合正则表达式的字符串而不像search()函数只能提取第一个

import rehtml = """<div id="songs-lsit"><h2 class="title">"经典老歌"</h2><p class="introduction">经典老歌列表</p><ul id="list" class="list-group"><li data-view="2">一路有你</li><li data-view="7"><a href="/2.mp3" singer="任贤齐">沧海一声笑</a></li><li data-view="4"class="active"><a href="/3.mp3" singer="齐秦">往事随风</a><li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li><li data-view="5"><a href="/5.mp3" singer="陈慧琳">记事本</a></li><li data-view="5"><a href="/6.mp3" singer="邓丽君">但愿人长久</a></li></ul>
</div>
"""results = re.findall('<li.*?href="(.*?)".*?singer="(.*?)">(.*?)</a>',html,re.S)
print(type(results),"\n")
for result in results:print(result)# 进阶复杂版
results = re.findall('<li.*?>\s*?(<a.*?>)?(\w+)(</a>)?\s*?</li>',html,re.S)
print(results)
for result in results:print(result[1])

4.re库中的sub()函数

（1）sub()函数的替换作用

re.sub()函数可以替换字符串中每一个匹配的字符串后返回替换后的字符串

import re
content = 'Extra stings Hello 1234567 World_This is a 7654_myname_321 Regex Demo Extra stings'
content1 =re.sub('\d+','',content)
print(content1)content2 =re.sub('\d+','Replacement',content)
print(content2)content3 =re.sub('(\d+)',r'\1 8910',content)
print(content3)

（2）sub()函数的实战演练

import rehtml = """<div id="songs-lsit"><h2 class="title">"经典老歌"</h2><p class="introduction">经典老歌列表</p><ul id="list" class="list-group"><li data-view="2">一路有你</li><li data-view="7"><a href="/2.mp3" singer="任贤齐">沧海一声笑</a></li><li data-view="4"class="active"><a href="/3.mp3" singer="齐秦">往事随风</a></li><li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li><li data-view="5"><a href="/5.mp3" singer="陈慧琳">记事本</a></li><li data-view="5"><a href="/6.mp3" singer="邓丽君">但愿人长久</a></li></ul>
</div>
"""
html = re.sub("<a.*?>|</a>","",html) # |是或的意思，去除<a>或</a>,两者皆有则都去掉
print(html)results = re.findall("<li.*?>(.*?)</li>",html,re.S)
print(results,"\n")for result in results:print(result.strip()) # 去掉\n和空格

5.re库中的compile()函数

re.compile()可以将一个正则表达式编译成正则对象，便于复用该匹配模式

import re
content = """Hello 1234567 World_this
is a Regex Demo"""pattern = re.compile("Hello.*Demo",re.S)
result = re.match(pattern,content)
print(result)

import re
content = """Hello 1234567 World_this
is a Regex Demo"""result =re.match("Hello.*Demo",content,re.S)
print(result)

【爬虫剑谱】三卷2章拾遗篇-有关于re库的使用小结相关推荐

【爬虫剑谱】三卷3章拾遗篇-有关于bs4库中的BeautifulSoup模块使用小结
关于bs4库中的BeautifulSoup模块在实战后的快速上手小结一.BeautifulSoup 模块 1.将 Beautiful 对象实例化的两种方法 (1)将本地 HTML 文档转为 Beau ...
【爬虫剑谱】三卷4章拾遗篇-关于lxml库下etree模块中Xpath表达式的使用小结
关于lxml库下etree模块中Xpath表达式在实战后的快速上手小结一.Xpath表达式 1. 将 etree 对象实例化的两种方法 (1)etree.parse() 转本地HTML文档 (2)e ...
爬虫剑谱第十页（关于拼多多商品信息的爬取＜修改版＞）
关于此前发布了一篇关于爬取拼多多商品信息的博客:爬虫剑谱第七页(输入关键词爬取拼多多商品信息并进行保存)_独一无二的李狗蛋儿的博客-CSDN博客出现了问题,运行结束后无法返回数据,up主在这里重新进行 ...
【爬虫剑谱】二卷7章实战篇-搭建IP代理池绕过反爬检测
Python\网络爬虫\IP代理一.源代码二.实战总结 1.如何生成API链接 2. 如何获取API生成的IP 三.参考文献一.源代码 import requests import random ...
【爬虫剑谱】二卷4章实战篇-模拟登录铁路12306网站（滑块验证）
Python\网络爬虫\Selenium 一.源码二.实战总结问题一:不愿意代码模块化(常见于我这种菜狗+懒狗) 三.参考文献一.源码 from selenium import webdrive ...
【爬虫剑谱】一卷3章软件篇-Anaconda的安装及配置
Python\软件应用\Anaconda 一.Anaconda下载地址二.Anaconda安装图示三.Anaconda检测是否安装成功一.Anaconda下载地址 1.官网地址:https:// ...
【爬虫剑谱】一卷2章软件篇-EdgeDriver的安装及配置
Python\网络爬虫\selenium\浏览器驱动程序一.EdgeDriver下载地址二.EdgeDriver安装图示使用selenium模块前需要先下载和安装浏览器驱动程序(用来模拟真人 ...
【爬虫剑谱】一卷1章软件篇-Mongodb的安装及配置
Mongodb 一.Mongodb下载地址二.Mongodb安装图示三.Mongodb环境变量四.Mongodb可视化软件Rotot 3T安装一.Mongodb下载地址网盘链接:https: ...
【爬虫剑谱】二卷2章实战篇-精美动漫图片爬取并保存
此次实战用到了的模块为requests.re.os. 一.源码二.在学习过程中可能会遇到的问题问题1:正则表达式无法识别,什么都爬取不到一.源码 import requests import r ...

【爬虫剑谱】三卷2章拾遗篇-有关于re库的使用小结

关于re库在实战后的快速上手小结

一、re库中的函数

1.re库中的match()函数

（1）match()函数-最常规的匹配

（2）match()函数-泛匹配

（3）match()函数-选择目标匹配

（4）match()函数-贪婪与非贪婪匹配

（5）match()函数-多行匹配

（6）match()函数-特殊字符转义后匹配

2.re库中的search()函数

（1）search()函数与match()函数的区别

（2）search()函数的实战演练

3.re库中的findall()函数

（1）findall()函数与search()函数的区别

4.re库中的sub()函数

（1）sub()函数的替换作用

（2）sub()函数的实战演练

5.re库中的compile()函数

【爬虫剑谱】三卷2章拾遗篇-有关于re库的使用小结相关推荐

最新文章

热门文章

【爬虫剑谱】三卷2章 拾遗篇-有关于re库的使用小结

关于re库在实战后的快速上手小结

一、re库中的函数

1.re库中的match()函数

（1）match()函数-最常规的匹配

（2）match()函数-泛匹配

（3）match()函数-选择目标匹配

（4）match()函数-贪婪与非贪婪匹配

（5）match()函数-多行匹配

（6）match()函数-特殊字符转义后匹配

2.re库中的search()函数

（1）search()函数与match()函数的区别

（2）search()函数的实战演练

3.re库中的findall()函数

（1）findall()函数与search()函数的区别

4.re库中的sub()函数

（1）sub()函数的替换作用

（2）sub()函数的实战演练

5.re库中的compile()函数

【爬虫剑谱】三卷2章 拾遗篇-有关于re库的使用小结相关推荐

最新文章

热门文章

【爬虫剑谱】三卷2章拾遗篇-有关于re库的使用小结

【爬虫剑谱】三卷2章拾遗篇-有关于re库的使用小结相关推荐