turnmissile 的 Blog http://blog.csdn.net/turnmissile/

Microsoft已经把正则表达式的规则收录在了msdn里面了,有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm),这里罗列一些我找到的语法元素功能表,大家自己研究吧!

转意字符表

Escaped character

Description

ordinary characters

Characters other than . $ ^ { [ ( | ) * + ? / match themselves.

/a

Matches a bell (alarm) /u0007.

/b

Matches a backspace /u0008 if in a [] character class; otherwise, see the note following this table.

/t

Matches a tab /u0009.

/r

Matches a carriage return /u000D.

/v

Matches a vertical tab /u000B.

/f

Matches a form feed /u000C.

/n

Matches a new line /u000A.

/e

Matches an escape /u001B.

/040

Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character /040 represents a space.

/x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

/cC

Match+es an ASCII control character; for example, /cC is control-C.

/u0020

Matches a Unicode character using hexadecimal representation (exactly four digits).

/

When followed by a character that is not recognized as an escaped character, matches that character. For example, /* is the same as /x2A.

Note   The escaped character /b is a special case. In a regular expression, /b denotes a word boundary (between /w and /W characters) except within a [] character class, where /b refers to the backspace character. In a replacement pattern, /b always denotes a backspace.

字符集

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.

Character class

Description

.

Matches any character except /n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen () allows specification of contiguous character ranges.

/p{name}

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

/P{name}

Matches text not included in groups and block ranges specified in {name}.

/w

Matches any word character. Equivalent to the Unicode character categories
[/p{Ll}/p{Lu}/p{Lt}/p{Lo}/p{Nd}/p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /w is equivalent to [a-zA-Z_0-9].

/W

Matches any nonword character. Equivalent to the Unicode categories [^/p{Ll}/p{Lu}/p{Lt}/p{Lo}/p{Nd}/p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /W is equivalent to [^a-zA-Z_0-9].

/s

Matches any white-space character. Equivalent to the Unicode character categories [/f/n/r/t/v/x85/p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /s is equivalent to [ /f/n/r/t/v].

/S

Matches any non-white-space character. Equivalent to the Unicode character categories [^/f/n/r/t/v/x85/p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, /S is equivalent to [^ /f/n/r/t/v].

/d

Matches any decimal digit. Equivalent to /p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.

/D

Matches any nondigit. Equivalent to /P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

You can find the Unicode category a character belongs to with the method

正则表达式选项

and ECMAScript are not allowed inline.

RegexOption member

Inline character

Description

None

N/A

Specifies that no options are set.

IgnoreCase

i

Specifies case-insensitive matching.

Multiline

m

Specifies multiline mode. Changes the meaning of ^ and $ so that they match at the beginning and end, respectively, of any line, not just the beginning and end of the whole string.

ExplicitCapture

n

Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>…). This allows parentheses to act as noncapturing groups without the syntactic clumsiness of (?:…).

Compiled

N/A

Specifies that the regular expression will be compiled to an assembly. Generates Microsoft intermediate language (MSIL) code for the regular expression; yields faster execution at the expense of startup time.

Singleline

s

Specifies single-line mode. Changes the meaning of the period character (.) so that it matches every character (instead of every character except /n).

IgnorePatternWhitespace

x

Specifies that unescaped white space is excluded from the pattern and enables comments following a number sign (#). (For a list of escaped white-space characters, see Character Escapes.) Note that white space is never eliminated from within a character class.

RightToLeft

N/A

Specifies that the search moves from right to left instead of from left to right. A regular expression with this option moves to the left of the starting position instead of to the right. (Therefore, the starting position should be specified as the end of the string instead of the beginning.) This option cannot be specified in midstream, to prevent the possibility of crafting regular expressions with infinite loops. However, the (?<) lookbehind constructs provide something similar that can be used as a subexpression.

RightToLeft changes the search direction only. It does not reverse the substring that is searched for. The lookahead and lookbehind assertions do not change: lookahead looks to the right; lookbehind looks to the left.

ECMAScript

N/A

Specifies that ECMAScript-compliant behavior is enabled for the expression. This option can be used only in conjunction with the IgnoreCase and Multiline flags. Use of ECMAScript with any other flags results in an exception.

CultureInvariant

N/A

Specifies that cultural differences in language is ignored. See Performing Culture-Insensitive Operations in the RegularExpressions Namespace for more information.

Atomic Zero-Width Assertions

Assertion

Description

^

Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options.

$

Specifies that the match must occur at the end of the string, before /n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options.

/A

Specifies that the match must occur at the beginning of the string (ignores the Multiline option).

/Z

Specifies that the match must occur at the end of the string or before /n at the end of the string (ignores the Multiline option).

/z

Specifies that the match must occur at the end of the string (ignores the Multiline option).

/G

Specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous.

/b

Specifies that the match must occur on a boundary between /w (alphanumeric) and /W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.

/B

Specifies that the match must not occur on a /b boundary.

数量

Quantifier

Description

*

Specifies zero or more matches; for example, /w* or (abc)*. Equivalent to {0,}.

+

Specifies one or more matches; for example, /w+ or (abc)+. Equivalent to {1,}.

?

Specifies zero or one matches; for example, /w? or (abc)?. Equivalent to {0,1}.

{n}

Specifies exactly n matches; for example, (pizza){2}.

{n,}

Specifies at least n matches; for example, (abc){2,}.

{n,m}

Specifies at least n, but no more than m, matches.

*?

Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).

+?

Specifies as few repeats as possible, but at least one (equivalent to lazy +).

??

Specifies zero repeats if possible, or one (lazy ?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies as few repeats as possible, but at least n (lazy {n,}).

{n,m}?

Specifies as few repeats as possible between n and m (lazy {n,m}).

组构造

Grouping constructs allow you to capture groups of subexpressions and to increase the efficiency of regular expressions with noncapturing lookahead and lookbehind modifiers. The following table describes the Regular Expression Grouping Constructs.

Grouping construct

Description

(   )

Captures the matched substring (or noncapturing group; for more information, see the ExplicitCapture option in Regular Expression Options). Captures using () are numbered automatically based on the order of the opening parenthesis, starting from one. The first capture, capture element number zero, is the text matched by the whole regular expression pattern.

(?<name>   )

Captures the matched substring into a group name or number name. The string used for name must not contain any punctuation and it cannot begin with a number. You can use single quotes instead of angle brackets; for example, (?'name').

(?<name1-name2> )

Balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2').

(?:   )

Noncapturing group.

(?imnsx-imnsx:   )

Applies or disables the specified options within the subexpression. For example, (?i-s: ) turns on case insensitivity and disables single-line mode. For more information, see Regular Expression Options.

(?=   )

Zero-width positive lookahead assertion. Continues match only if the subexpression matches at this position on the right. For example, /w+(?=/d) matches a word followed by a digit, without matching the digit. This construct does not backtrack.

(?!   )

Zero-width negative lookahead assertion. Continues match only if the subexpression does not match at this position on the right. For example, /b(?!un)/w+/b matches words that do not begin with un.

(?<=   )

Zero-width positive lookbehind assertion. Continues match only if the subexpression matches at this position on the left. For example, (?<=19)99 matches instances of 99 that follow 19. This construct does not backtrack.

(?<!   )

Zero-width negative lookbehind assertion. Continues match only if the subexpression does not match at the position on the left.

(?>   )

Nonbacktracking subexpression (also known as a "greedy" subexpression). The subexpression is fully matched once, and then does not participate piecemeal in backtracking. (That is, the subexpression matches only strings that would be matched by the subexpression alone.)

Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted. For instance, the pattern ((?<One>abc)/d+)?(?<Two>xyz)(.*) produces the following capturing groups by number and name. (The first capture (number 0) always refers to the entire pattern).

Number

Name

Pattern

0

0 (default name)

((?<One>abc)/d+)?(?<Two>xyz)(.*)

1

1 (default name)

((?<One>abc)/d+)

2

2 (default name)

(.*)

3

One

(?<One>abc)

4

Two

(?<Two>xyz)

Backreference Constructs

The following table lists optional parameters that add backreference modifiers to a regular expression.

Backreference construct

Definition

/number

Backreference. For example, (/w)/1 finds doubled word characters.

/k<name>

Named backreference. For example, (?<char>/w)/k<char> finds doubled word characters. The expression (?<43>/w)/43 does the same. You can use single quotes instead of angle brackets; for example, /k'char'.

Note the ambiguity between octal escape codes and /number backreferences that use the same notation. See Backreferences for details on how the regular expression engine resolves the ambiguity.

其他

The following table lists subexpressions that modify a regular expression.

Construct

Definition

(?imnsx-imnsx)

Sets or disables options such as case insensitivity to be turned on or off in the middle of a pattern. For information on specific options, see Regular Expression Options. Option changes are effective until the end of the enclosing group. See also the information on the grouping construct (?imnsx-imnsx: ), which is a cleaner form.

(?# )

Inline comment inserted within a regular expression. The comment terminates at the first closing parenthesis character.

# [to end of line]

X-mode comment. The comment begins at an unescaped # and continues to the end of the line. (Note that the x option or the RegexOptions.IgnorePatternWhitespace enumerated option must be activated for this kind of comment to be recognized.)

正则表达式语法规则收集相关推荐

  1. (常用API)正则表达式语法规则

    正则表达式的匹配规则 参照帮助文档,在Pattern类中有正则表达式的的规则定义,正则表达式中明确区分大小写字母.我们来学习语法规则. 正则表达式的语法规则: 字符:x 含义:代表的是字符x 例如:匹 ...

  2. PHP正则表达式语法规则

    什么是正则表达式? 正则表达式是一种描述字符串结构的语法规则,是一个特定的格式化模式,可以匹配.替换.截取匹配的字符.对于用户来说可能以前接触过DOS,如果想匹配当前文件下所有的文件文本,可以输入&q ...

  3. JS之正则表达式语法大全(非常详细)

    JS正则表达式语法大全(非常详细)根据正则表达式语法规则,大部分字符仅能够描述自身,这些字符被称为普通字符,如所有的字母.数字等. 元字符就是拥有特动功能的特殊字符,大部分需要加反斜杠进行标识,以便h ...

  4. 正则表达式语法及用法

    最全常用正则表达式大全: 最全常用正则表达式大全 什么是正则表达式? 正则表达式(regular expression)描述了一种字符串匹配的模式,可以用来检查一个串是否含有某种子串.将匹配的子串做替 ...

  5. 10、正则表达式 (笔试题、语法规则、正则对象方法、正则实例属性、支持正则表达式的String对象的方法、贪婪匹配与非贪婪匹配)

    正则表达式 目录 10.1 语法规则 10.1.1 创建方法 1.直接量 2.构造方法RegExp() 10.1.2 三个属性i,g,m 10.1.3 方括号 10.1.4 元字符 10.1.5 量词 ...

  6. logstash之grok正则表达式语法

    logstash过滤器插件filter详解及实例 1.logstash过滤器插件filter 1.1.grok正则捕获 grok是一个十分强大的logstash filter插件,他可以通过正则解析任 ...

  7. LLVM一些语法规则

    LLVM一些语法规则 LLVM文档 LLVM编译器基础架构支持广泛的项目,从工业强度编译器到专门的JIT应用程序,再到小型研究项目. 同样,文档分为几个针对不同受众的高级别分组: LLVM设计概述 几 ...

  8. python中的正则表达式语法_Python基础教程之正则表达式基本语法以及re模块

    什么是正则: 正则表达式是可以匹配文本片段的模式. 正则表达式'Python'可以匹配'python' 正则是个很牛逼的东西,python中当然也不会缺少. 所以今天的Python就跟大家一起讨论一下 ...

  9. java 正则表达式 demo_JAVA正则表达式语法

    JAVA正则表达式语法(转) 正则表达式语法 正则表达式是一种文本模式,包括普通字符(例如,a 到 z 之间的字母)和特殊字符(称为"元字符").模式描述在搜索文本时要匹配的一个或 ...

最新文章

  1. php 對象轉換成數組,PHP錯誤:陣列對象轉換成關聯數組
  2. Kubernetes网络一年发展动态与未来趋势
  3. python_函数相关的各种参数定义和传递
  4. arcgis与python_Arcgis-ModelBuilder和Python学习
  5. CSS3学习笔记--transform中的Matrix(矩阵)
  6. scrapy获取a标签的连接_python爬虫——基于scrapy框架爬取网易新闻内容
  7. 数学老师出的谜语,语文老师已哭晕在厕所!
  8. leetcode486. 预测赢家(dp)
  9. mysql 表2符合表1_MYSQL-表1和表2中所有可能性的所有行
  10. STL stack 容器
  11. Java 读写json格式的文件方法详解
  12. nodejs实践录:我的nodejs编码风格
  13. 编写一个圆类Circle
  14. H5禁用长按选取,原生拷贝功能
  15. 你还在使用xshell绿色破解版?
  16. oracle系统卸载干净,完全卸载oracle|oracle卸载|彻底卸载oracle
  17. 2020 CCPC 威海(赛后重现)
  18. 谷歌的现实、摩托的无奈与联想的接盘
  19. MOTO ME525/Defy 刷Android4.0 刷机教程
  20. 开心网android客户端,开心网Android客户端V3.8.1升级评测

热门文章

  1. 兹介绍我校计算机科学与技术,清华大学计算机科学与技术系
  2. android+邮箱删除邮件,在Android上删除烦人的语音邮件通知 | MOS86
  3. linux内核远程漏洞,CVE-2019-11815:Linux内核竞争条件漏洞导致远程代码执行
  4. 图像金字塔与resize函数
  5. AI视频行为分析系统项目复盘——技术篇1:Ubuntu 18.04部署编译OpenCV+contrib、TensorFlow2.1、CUDA10.1+cuDNN7.6.5、tensorRT6.0.1等
  6. python、C++ 中通过OpenCV的DNN模块使用YoloV4
  7. BZOJ2038 小Z的袜子(hose)
  8. linux笔记软件,Linux Ubuntu学习笔记_软件管理
  9. ATS中的RAM缓存简介
  10. vim学习笔记(三)