正則表達式基本使用方法简单介绍

正則表達式非常实用，有些书专门用整本书来讲这个，可见其博大精深。有人的地方就有江湖。有字符串的地方就有正則表達式。所谓的正則表達式，只是是一种模式/形式罢了。说白了，就是一个字符串形式。没那么玄乎其玄。

我们之前介绍过的grep, sed和awk是一种文本/字符串处理工具。而正則表達式却不同。它仅仅是一种字符串形式。我们能够用grep, sed和awk对正則表達式进行处理。为了方便集中介绍正則表達式，我们用最简单的grep来做处理工具。

正則表達式也不同于通配符。虽然也有类似的地方。

在正則表達式中的*和通配符中的*就不是一个意思，这一点务必引起重视。

除了主要的正則表達式外，实际上还有扩展的正則表達式。比方+, ?, ()等东东。此时我们要用egrep或者grep -E, 在本文中，我们用egrep.

以实践操作为荣，以只看不练为耻。

在本文中，我们只进行基本介绍，兴许有新东西加入到本博客中，我们一起来玩玩吧：

1. ^xxx表示以xxx開始的行，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep ^go test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$

2. xxx$表示以xxx结尾的行。例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep up$ test.txt
daydayupAdministrator@51B6904C3C8A485 ~/reg
$

顺便说一下， ^$能够表示空行，这个显而易见。

3. 点表示除了换行符之外的随意一个字符，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep d.y test.txt
daydayupAdministrator@51B6904C3C8A485 ~/reg
$

再来看一个错误的使用方法：

Administrator@51B6904C3C8A485 ~/reg
$ echo "w.x.y.z" | grep "w.x"
w.x.y.zAdministrator@51B6904C3C8A485 ~/reg
$

结果尽管侥幸正确。但这不过碰巧而已。不信。请看：

Administrator@51B6904C3C8A485 ~/reg
$ echo "w_x_y_z" | grep "w.x"
w_x_y_zAdministrator@51B6904C3C8A485 ~/reg
$

为什么过滤w.x的时候，却把w_x_y_z过滤出来了呢？原来，在正則表達式中。点不再是普通的点了。点表示的是换行符之外的随意一个字符。

但倘若我们就是要过滤w.x这个串，怎么办呢？那就必需要用到\来转义了，我们即将会介绍。先来热热身：

Administrator@51B6904C3C8A485 ~/reg
$ echo "www.x.y.z" | grep "w\.x"
www.x.y.zAdministrator@51B6904C3C8A485 ~/reg
$ echo "w_x_y_z" | grep "w\.x"Administrator@51B6904C3C8A485 ~/reg
$

这就对了。

4. *表示前面模式的0次或者多次反复，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep d.*y test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep s.*t test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$ grep s.t test.txtAdministrator@51B6904C3C8A485 ~/reg

比方注意。 s.t并无法过滤出相应的行。而s.*t却能够, 由于.*表示0个或者多个字符。

5. []用来指定一个字符所述的集合，要注意，[]仅仅会匹配当中的某个字符，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep s[rst]u test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$ grep s[abc]u test.txtAdministrator@51B6904C3C8A485 ~/reg
$

假设要表示全部的英文字母。那该用如何的集合呢？例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep s[a-zA-Z]u test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$

假设是数字，那就用[0-9]限定即可了。非常easy，一笔带过。

我们继续继续看例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test1.txt
good good study
day day up
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep [a-z]ay test1.txt
day day up
daydayupAdministrator@51B6904C3C8A485 ~/reg
$

能够看到， [a-z]ay把daydayup这一行也过滤出来了，倘若我们仅仅要过滤出含有day单词的行，那该怎么办呢？我们以下就会讲。

6. \ 表示转义。比方\<xxx就是以xxx开头的单词， xxx\>表示以xxx结尾的单词。例如以下(正則表達式最好都加上双引號吧)：

Administrator@51B6904C3C8A485 ~/reg
$ cat test1.txt
good good study
day day up
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "\<[a-z]ay\>" test1.txt
day day upAdministrator@51B6904C3C8A485 ~/reg
$

对，别忘了。我们的-w选项也能够过滤单词行，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test1.txt
good good study
day day up
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep -w [a-z]ay test1.txt
day day upAdministrator@51B6904C3C8A485 ~/reg
$

可是，以下的结果可能会出乎你我的意料：

Administrator@51B6904C3C8A485 ~/reg
$ cat test2.txt
good good study
day day up
oh day'abc oh
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "\<[a-z]ay\>" test2.txt
day day up
oh day'abc ohAdministrator@51B6904C3C8A485 ~/reg
$ grep -w [a-z]ay test2.txt
day day up
oh day'abc ohAdministrator@51B6904C3C8A485 ~/reg
$

为什么day'abc所在的行也被过滤出来了呢？这就涉及到正則表達式对单词的定义了。 '和空格符一样，都是切割符号。

7. []中的^表示反义

我们知道， ^xxx表示以xxx开头的行，可是。在[]中的^, 就表示取反了，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "d[^abcd]" test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$ grep "d[^abcdy]" test.txt
good good studyAdministrator@51B6904C3C8A485 ~/reg
$ grep "d[^abcdy ]" test.txtAdministrator@51B6904C3C8A485 ~/reg
$

8. 一些字符类

比方[[:lower:]] 等价于[a-z]

比方[[:upper:]] 等价于[A-Z]

其余的还有不少，我们不一一列举。来看一个上述[[:lower:]]的应用，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ grep "^[[:lower:]]" test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "^[[:upper:]]" test.txtAdministrator@51B6904C3C8A485 ~/reg
$

9. 不得不说的反复

前面我们已经说过， *表示对前面的字符反复0次或者多次。我们再来复习一下：

Administrator@51B6904C3C8A485 ~/reg
$ cat test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "d.*u" test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$ grep "d.*p" test.txt
daydayupAdministrator@51B6904C3C8A485 ~/reg
$

假设要反复一次或者多次。那就用+, 例如以下(注意，例如以下要用扩展的正則表達式, 用grep -E或者直接用egrep)：例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "gd" | egrep "go*d"
gdAdministrator@51B6904C3C8A485 ~/reg
$ echo "gd" | egrep "go+d"Administrator@51B6904C3C8A485 ~/reg
$ echo "god" | egrep "go+d"
godAdministrator@51B6904C3C8A485 ~/reg
$ echo "good" | egrep "go+d"
goodAdministrator@51B6904C3C8A485 ~/reg
$

那要表示只反复0次或者1次，该怎么搞呢？用问号即可了。例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "good" | grep "go+d"Administrator@51B6904C3C8A485 ~/reg
$ echo "gd" | egrep "go?

d" gd Administrator@51B6904C3C8A485 ~/reg $ echo "god" | egrep "go?d" god Administrator@51B6904C3C8A485 ~/reg $ echo "good" | egrep "go?d" Administrator@51B6904C3C8A485 ~/reg $

那要是指定反复4次，该怎么搞呢？例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "goood" | egrep "go{4}d"Administrator@51B6904C3C8A485 ~/reg
$ echo "gooood" | egrep "go{4}d"
goooodAdministrator@51B6904C3C8A485 ~/reg
$ echo "goooood" | egrep "go{4}d"Administrator@51B6904C3C8A485 ~/reg
$

那要是反复4次或以上，该怎么搞呢？例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "goood" | egrep "go{4,}d"Administrator@51B6904C3C8A485 ~/reg
$ echo "gooood" | egrep "go{4,}d"
goooodAdministrator@51B6904C3C8A485 ~/reg
$ echo "goooood" | egrep "go{4,}d"
gooooodAdministrator@51B6904C3C8A485 ~/reg
$

那要是反复4次到6次，该怎么搞呢？例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "goood" | egrep "go{4,6}d"Administrator@51B6904C3C8A485 ~/reg
$ echo "gooood" | egrep "go{4,6}d"
goooodAdministrator@51B6904C3C8A485 ~/reg
$ echo "goooood" | egrep "go{4,6}d"
gooooodAdministrator@51B6904C3C8A485 ~/reg
$ echo "gooooood" | egrep "go{4,6}d"
goooooodAdministrator@51B6904C3C8A485 ~/reg
$ echo "goooooood" | egrep "go{4,6}d"Administrator@51B6904C3C8A485 ~/reg
$

说道这里，关于反复的问题是非常明显了。以下来小结一下：

x*表示0个或者多个x

x+表示1个或者多个x

表示0和或者1个x

x{4}比表示4个x

x{4,}表示4个或4个以上的x

x{4,6}表示有有4个或者5个或者6个x

既然已经学了这么多，那想想怎样匹配出一个5位的正整数呢？例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "123456" | egrep "[1-9][0-9]{4}"
123456Administrator@51B6904C3C8A485 ~/reg
$ echo "1234" | egrep "\<[1-9][0-9]{4}\>"Administrator@51B6904C3C8A485 ~/reg
$ echo "12345" | egrep "\<[1-9][0-9]{4}\>"
12345Administrator@51B6904C3C8A485 ~/reg
$ echo "123456" | egrep "\<[1-9][0-9]{4}\>"Administrator@51B6904C3C8A485 ~/reg
$

10. ()表示总体，例如以下：

Administrator@51B6904C3C8A485 ~/reg
$ echo "abababc" | egrep "(ab){3,}"
abababcAdministrator@51B6904C3C8A485 ~/reg
$ echo "abababc" | egrep "(ab){4,}"Administrator@51B6904C3C8A485 ~/reg
$

另外还要注意，有时候会用()表示空(不是空格哈)，如：

Administrator@51B6904C3C8A485 ~/reg
$ echo "ab" | egrep "a b"Administrator@51B6904C3C8A485 ~/reg
$ echo "ab" | egrep "a()b"
abAdministrator@51B6904C3C8A485 ~/reg
$ echo "ab" | egrep "a b"Administrator@51B6904C3C8A485 ~/reg
$ echo "a b" | egrep "a()b"Administrator@51B6904C3C8A485 ~/reg
$ echo "a b" | egrep "a b"
a bAdministrator@51B6904C3C8A485 ~/reg
$

11. |表示或，非常好理解。

Administrator@51B6904C3C8A485 ~/reg
$ egrep "^g|p$" test.txt
good good study
daydayupAdministrator@51B6904C3C8A485 ~/reg
$

如上就表示以g开头或者以p结尾的行。

OK, 先说这么多。兴许有新东东。再补充。

正則表達式基本使用方法简单介绍相关推荐

python使用正則表達式
python中使用正則表達式 1. 匹配字符正則表達式中的元字符有 . ^ $ * + ? { } [ ] \ | ( ) 匹配字符用的模式有 \d 匹配随意数字 \D 匹配随意非 ...
sqlserver 运行正則表達式，调用c# 函数、代码
--1.新建SqlServerExt项目,编写 C# 方法生成 SqlServerExt.dll 文件 using System; using System.Data; using System.Da ...
EcmaScript正則表達式( 深入淺出系列之淺出 )
来源:http://www.v-ec.com/dh20156/article.asp?id=202 使用方法創建對象 var r = new RegExp("表達式",& ...
Java正則表達式詳解
來源:http://www.computerworld.com.cn 如果你曾經用過Perl或任何其他內建正則表達式支持的語言,你一定知道用正則表達式處理文本和匹配模式是多簡單.如果你不熟悉這個術語, ...
grep 和 sed：linux经常使用工具 amp; 基本正則表達式
grep 见链接:http://www.cyberciti.biz/faq/grep-regular-expressions/ sed參考文章:http://www.thegeekstuff.com ...
【开卷故意】JAVA正則表達式模版
专业既然是机器学习.那工作肯定也是继续和数据打交道,那么问题来了,非常多时候推荐算法和数据挖掘算法都是现成可用的,平台初建,重点还在数据过滤和抽取.如何高效的抽取数据? 利用往常算法比赛中经常使用的字 ...
java 判断二级网址_【Java】利用正則表達式推断是否为网址
本文与<[JavaScript]利用正則表達式检查输入框输入的是否为网址>(点击打开链接)为姊妹篇,在上文中已经提到了,正則表達式在各个程序是通用的,这里不再解说正則表達式的详细使用方法. ...
一入python深似海--正則表達式
字符串是编程时涉及到的最多的一种数据结构.对字符串进行操作的需求差点儿无处不在.比方推断一个字符串是否是一个合法的Email地址.尽管能够编程提取@前后的子串,再分别推断是否是单词和域名,但这样做不但 ...
最全正則表達式汇总—想要的都有了
正则式太难学,并且easy忘记 ,西西是看过非常多次.都是一会就所有不记得了滴.非常多不太懂正则的朋友.在遇到须要用正则校验数据时,往往是在网上去找非常久.结果找来的还是不非常符合要求. 所以我近期把 ...

正則表達式基本使用方法简单介绍

正則表達式基本使用方法简单介绍相关推荐

最新文章

热门文章