python关于re模块(正则表达式)

1.元字符
(1) \b:

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string. This means that r’\bfoo\b’ matches ‘foo’, ‘foo.’, ‘(foo)’, ‘bar foo baz’ but not ‘foobar’ or ‘foo3’.

只匹配在一个单词或数字或下划线首, 尾的空字符; 正式定义: \b是\w和\W的边界或者\w和字符串开头或结尾的边界(当单词位于字符串的首或尾时.

By default Unicode alphanumerics are the ones used in Unicode patterns, but this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

默认情况下，Unicode字母数字是Unicode模式中使用的字母数字，但可以使用ASCII标志更改。如果使用LOCALE标志，则单词边界由当前环境设置确定。在字符范围内，\ b表示退格符，以便与Python的字符串文字兼容。(因此在python中使用时,\b会被转义, 可在正则表达式前加r,或者用\\b)

使用示例:

re.findall(r'\b9', ' the i love 9 the 9applethe')
['9', '9']
re.findall(r'\bthe', ' the i thelove 9 the 9applethe')
['the', 'the', 'the']
re.findall(r'\bthe\b', ' the i thelove 9 the 9applethe')
['the', 'the']
re.findall(r'the\b', ' the i thelove 9 the 9applethe')
['the', 'the', 'the']
re.findall(r'\b@', ' the i thelove 9 @ the @9applethe')
[]     #  特殊字符无法匹配,只匹配数字和字母.下划线
re.findall('the\\b', ' the i thelove 9 the 9applethe')
['the', 'the', 'the']  # 使用双反斜杠
re.findall(r'\b_', ' the i thelove 9 _ the _9applethe')
['_', '_']  # 下划线也可匹配

(2) \B

Matches the empty string, but only when it is not at the beginning or end of a word. This means that r’py\B’ matches ‘python’, ‘py3’, ‘py2’, but not ‘py’, ‘py.’, or ‘py!’. \B is just the opposite of \b, so word characters in Unicode patterns are Unicode alphanumerics or the underscore, although this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used

只匹配不在单词(或数字/下划线)首, 尾的空字符; 是\b的相反模式,
单词边界同样被现有环境决定.

使用示例:

re.findall(r'\Bthe', ' the i thelove 9 _ the _9applethe')
['the']
re.findall('\Bthe\B', ' the i thelove 9 _ the _9applethe')
[]
re.findall('\Bthe\B', ' the i thelove 9 aathebb _ the _9applethe')
['the']
re.findall('_\B', ' the i thelove 9 aathebb _ the _9applethe')
['_']

(3) ’ . ’
通配符,可匹配除\n和空字符以外的任何字符
(4) group方法和groups方法

Match.group([group1, …])
Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1…99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

该方法由match对象调用: 返回匹配结果的一个或多个子组; 如果参数只有一个,返回字符串,如果有多个参数,每个参数返回一个结果,一起返回一个元组; 没有参数则默认为0, 返回整体匹配结果; 如果参数不为0, 则返回与带括号的组相匹配的字符串; 如果参数是负数或者大于组数,则报错; 如果组中的字符没有匹配上,则返回None; 如果组中的字符匹配了多次,那只返回最后一次匹配的结果,前几次匹配会被覆盖

2)Match.groups(default=None)
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.

返回一个包含所有子组匹配结果的元组.默认参数用来处理子组没有被匹配到的情况,默认为None(参见示例2)

示例1: 使用方法

re.search('(?P<fuck>ab)+cd', 'abababcdc').group()
'abababcd'    #  整体匹配
re.search('(?P<fuck>ab)+cd', 'abababcdc').group(1)
'ab'    #  子组多次匹配只保留最后一次
re.search('(?P<fuck>(ab)+)cd', 'abababcdc').group()
'abababcd'  #  加个括号整体匹配结果不变
re.search('(?P<fuck>(ab)+)cd', 'abababcdc').groups()
('ababab', 'ab')  # 但子组匹配结果大不相同
re.search('(?P<fuck>((ab)+)+)cd', 'abababcdc').groups()
('ababab', 'ababab', 'ab')  # 几组括号groups有几个元素
re.search('(?P<fuck>((ab)+)+)cd', 'abababcdc').group('fuck')
'ababab'  #  也可通过组名的方式调用

示例2: groups默认参数

re.search(r'\w+(\.?)\w+(\.?)\d+', 'www.baidu123').groups('shit')
('.', '')   # ?在包含在组中,匹配0次也算成功,返回空字符串
re.search(r'\w+(\.?)([a-z]+)(\.)?\d+', 'www.baidu123').groups('shit')
('.', 'baidu', 'shit')   #  ?包含在组外,组内字符匹配不成功,显示事先设置的参数
re.search(r'\w+(\.?)[a-z]+(\.)?\d+', 'www.baidu123').groups()
('.', None)  #  不设置默认为None

2.使用符号"?"

import re
print(re.match(r'industr(?P<name>y)(aaa)(\1)', 'industryaaay').group(2))  # 分组查找, \1表示第一个分组的结果
print(re.match(r'b', 'bbc').group())  # 只从开头找, 只找第一个, 找不到返回None
print(re.search(r'b', 'abcb').group())  # 只找第一个, 找不到返回None
print(re.findall(r'b', 'abcb'))  # 找到所有, 返回列表
print(re.findall(r'industr(?P<name>y)(aaa)(\1)', 'industryaaay'))  # 只返回分组的结果
print(re.findall(r'industr(?:yaaa)', 'industryaaay'))  # 返回完整匹配结果, ?:为非获取匹配
print(re.match(r'industr(?:yaaa)+', 'industryaaayaaa').group())  # 返回完整匹配结果, ?:为非获取匹配
print(re.findall(r'industr(?=yaaa)', 'industryaaafff'))  # 返回完整匹配结果, ?=为非获取匹配,前视匹配,只有以yaaa结尾才匹配成功
print(re.findall(r'industr(?!syaaa)', 'industryaaafff'))  # 返回完整匹配结果, ?!为非获取匹配,否定前视匹配,只有不以yaaa结尾才匹配成功
print(re.search(r'a(?<=yaaa)bcdef', 'ssyaaabcdef').group())  # 返回完整匹配结果, ?<=为非获取匹配,后视匹配,只有以yaaa开头才匹配成功
print(re.search(r'a(?<!saaa)bcdef', 'ssyaaabcdef').group())  # 返回完整匹配结果, ?<=为非获取匹配,否定后视匹配,只有不以yaaa开头才匹配成功# result
aaa
b
b
['b', 'b']
[('y', 'aaa', 'y')]
['industryaaa']
industryaaayaaa
['industr']
['industr']
abcdef
abcdef

3.其他使用

print(re.split('\d+', 'hello 12abc 34def'))
# result
['hello ', 'abc ', 'def']re.subn(r'\d+', 'ppp', 'sdfsdf123sfkj4dfjkd5')
# result
('sdfsdfpppsfkjpppdfjkdppp', 3)a=re.compile('\d+')
a.findall('asdsf343sdf45')
# result
['343', '45']

python关于re模块(正则表达式)相关推荐

Python之re模块 —— 正则表达式操作
Python之re模块 -- 正则表达式操作转自:http://www.cnblogs.com/PythonHome/archive/2011/11/19/2255459.html 这个模块提供了与 ...
Python的re模块 --- 正则表达式操作
这个模块提供了与 Perl 语言类似的正则表达式匹配操作. 模式和被搜索的字符串既可以是 Unicode 字符串 (str) ,也可以是8位字节串 (bytes). 但是,Unicode 字符串与8位 ...
Python 之Re模块(正则表达式)
一.简介正则表达式本身是一种小型的.高度专业化的编程语言,而在python中,通过内嵌集成re模块,程序媛们可以直接调用来实现正则匹配. 二.正则表达式中常用的字符含义 1.普通字符和11个元字符: ...
python（re 模块-正则表达式）
1.元字符 . 匹配除换行符 \n 以外的任意一个字符 import res = "shenzhen duoceshi" print(re.findall("d...e& ...
Python的regex模块——更强大的正则表达式引擎
Python自带了正则表达式引擎(内置的re模块),但是不支持一些高级特性,比如下面这几个: 固化分组 Atomic grouping 占有优先量词 Possessive quantifi ...
Python re模块,正则表达式
re模块讲正题之前我们先来看一个例子:https://reg.jd.com/reg/person?ReturnUrl=https%3A//www.jd.com/ 这是京东的注册页面,打开页面我们就看 ...
python中的正则表达式re模块_Python中的re模块--正则表达式
Python中的re模块--正则表达式使用match从字符串开头匹配以匹配国内手机号为例,通常手机号为11位,以1开头.大概是这样13509094747,(这个号码是我随便写的,请不要拨打),我们 ...
python正则表达式模块_Python常用模块——正则表达式re模块
Python常用模块--正则表达式re模块引子请从以下文件里取出所有的手机号姓名地区身高体重电话况咏蜜北京 171 48 13651054608 王心颜上海 169 46 1381 ...
python全栈开发之正则表达式和python的re模块
正则表达式和python的re模块 python全栈开发,正则表达式,re模块一正则表达式正则表达式(Regular Expression)是一种文本模式,包括普通字符(例如,a 到 z 之间的 ...

python关于re模块(正则表达式)

python关于re模块(正则表达式)相关推荐

最新文章

热门文章