

In most of the cases, RegEx engine performs pattern matching quickly and efficiently. However, in some cases, the engine may seem very lazy.

在大多数情况下,RegEx引擎可以快速有效地执行模式匹配。 但是,在某些情况下,引擎可能看起来很懒。

The article describes some of the best practices that developers can adopt to ensure that regular expressions produce optimal performance.


5种不同用例的正则表达式简介: (Intro to Regular Expression with 5 different use cases:)

考虑输入源 (Consider Input Source)

Regular Expressions accepts two types of inputs:


  • Constrained: When inputting text originates from a known source.


  • Unconstrained: When inputting text originates from an unreliable source.


To test unconstrained input, a RegEx must handle:


  • Text that successfully matches the RegEx pattern.成功匹配RegEx模式的文本。
  • Text that does not match the RegEx pattern.与RegEx模式不匹配的文本。
  • Text that nearly matches the RegEx pattern.与RegEx模式几乎匹配的文本。

Consider an example of text string match for RegEx @”^[0–9A-Z]([-.\w]*[0–9A-Z])*$” and check constrained and unconstrained input performance.

考虑RegEx @”^[0–9A-Z]([-.\w]*[0–9A-Z])*$”的文本字符串匹配示例,并检查约束和不受约束的输入性能。

static void InputConsiderExample()
{Stopwatch sw;string inputValue = "AAAAAAAAAAA"; // Constrained Input//string inputValue="aaaaaAAAAAAAA!"; // UnConstrained Inputstring pattern = @"^[0-9A-Z]([-.\w]*[0-9A-Z])*$";string input;int index = 0;for (int ctr = inputValue.Length - 1; ctr >= 0; ctr--){index++;input = inputValue.Substring(ctr, index);sw = Stopwatch.StartNew();Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);sw.Stop();if (m.Success)Console.WriteLine("{0,2}. Matched '{1,25}' in {2}",index, m.Value, sw.Elapsed);elseConsole.WriteLine("{0,2}. Failed '{1,25}' in {2}",index, input, sw.Elapsed);         }}

代码说明(Code Description)

The above code matches each character of the input string with RegEx pattern and output success/failure along with time elapsed.


测试约束输入 (Test for constrained input)

inputValue="AAAAAAAAAAA"Output====== 1. Matched '                        A' in 00:00:00.0143831 2. Matched '                       AA' in 00:00:00.0001389 3. Matched '                      AAA' in 00:00:00.0000069 4. Matched '                     AAAA' in 00:00:00.0000057 5. Matched '                    AAAAA' in 00:00:00.0000194 6. Matched '                   AAAAAA' in 00:00:00.0000155 7. Matched '                  AAAAAAA' in 00:00:00.0000118 8. Matched '                 AAAAAAAA' in 00:00:00.0000223 9. Matched '                AAAAAAAAA' in 00:00:00.000023710. Matched '               AAAAAAAAAA' in 00:00:00.000011111. Matched '              AAAAAAAAAAA' in 00:00:00.0000111

As we can notice, the time elapsed is similar concerning the length of the input.


测试无限制的输入 (Test for unconstrained input)

inputValue="aaaaaAAAAAAAA!"Output====== 1. Failed '              !' in 00:00:00.0128292 2. Failed '             A!' in 00:00:00.0001336 3. Failed '            AA!' in 00:00:00.0000150 4. Failed '           AAA!' in 00:00:00.0000073 5. Failed '          AAAA!' in 00:00:00.0000090 6. Failed '         AAAAA!' in 00:00:00.0000146 7. Failed '        AAAAAA!' in 00:00:00.0000380 8. Failed '       AAAAAAA!' in 00:00:00.0000239 9. Failed '      AAAAAAAA!' in 00:00:00.000058710. Failed '     aAAAAAAAA!' in 00:00:00.000191411. Failed '    aaAAAAAAAA!' in 00:00:00.000357412. Failed '   aaaAAAAAAAA!' in 00:00:00.000737113. Failed '  aaaaAAAAAAAA!' in 00:00:00.001101014. Failed ' aaaaaAAAAAAAA!' in 00:00:00.0018678

As we can notice, the time elapsed almost doubled with an increase in the length of the unconstrained input.


解决这个问题。 (To solve this problem.)

  • When developing a RegEx, you should consider how Backtracking might affect the performance of the regular expression engine, mainly if your regular expression is designed to process unconstrained input.开发RegEx时,应考虑回溯如何影响正则表达式引擎的性能,主要是在正则表达式设计为处理不受约束的输入的情况下。
  • Thoroughly test your RegEx utilising invalid and near-valid data as well as valid information.使用无效和接近有效的数据以及有效信息来彻底测试RegEx。

适当选择实例化 (Choose instantiation appropriately)

There are four different ways to couple RegEx with a particular pattern to RegEx Engine.

有四种将RegEx与特定模式耦合到RegEx Engine的方法。

使用静态方法 (Use static Method)

The static method does not require any instantiation. Method name is Regex.Match(String, String)

静态方法不需要任何实例化。 方法名称是Regex.Match(String, String)

Consider an example of RegEx to match a valid currency.


Below example creates a RegEx object each time — INEFFICIENT


string pattern = @"\p{Sc}+\s*\d+";       Regex currencyRegex = new Regex(pattern);       return currencyRegex.IsMatch(currencyValue);

Replace the inefficient way with a call to the static Regex.Match(String, String) the method as shown below

用对静态Regex.Match(String, String)的调用代替低效的方法,如下所示

string pattern = @"\p{Sc}+\s*\d+";       return Regex.IsMatch(currencyValue, pattern);

Pattern Breakdown: @”\p{Sc}+\s*\d+”

模式分类: @”\p{Sc}+\s*\d+”

使用不带选项标志的RegEx实例化 (Use RegEx instantiation without options flag)

Regex object is instantiated without an options argument that includes the Compiled flag.

实例化正则表达式对象,而没有包含Compiled标志的options argument

使用带有选项标志的RegEx实例化。 (Use RegEx instantiation with options flag.)

Regex object is instantiated with an options argument that includes the Compiled flag.

正则表达式对象使用一个包含Compiled标志的options argument实例化。

There are two types of options available


  • Interpreted RegEx解释正则表达式
  • Compiled RegEx编译正则表达式

Specify Interpreted RegEx


new Regex(pattern, RegexOptions.Singleline)

Specify Compiled RegEx


new Regex(pattern, RegexOptions.Compiled)

使用CompileToAssembly方法 (Use the CompileToAssembly MethodMethod)

Use this MethodMethod when the RegEx object is tightly coupled with a regular expression and save it to assembly using Regex.CompileToAssembly method.


考虑回溯 (Consider Backtracking)

When quantifiers such as *, +, and ? are used inside a RegEx, then the engine may give up a portion of partial matches to a previous success state to match faster for the complete pattern. This process is called Backtracking.

*+?等量词在正则表达式中使用“否”,则引擎可能会放弃对先前成功状态的部分匹配的一部分,以针对完整模式更快地进行匹配。 此过程称为回溯

Although backtracking showcase RegEx power & flexibility but excessive use of it also degrades performance.


Consider an example of Regex which gets all words that start with a capital letter. Let’s understand backtracking with that regular expression.

考虑一个正则表达式的示例,该示例获取所有以大写字母开头的单词。 让我们了解使用该正则表达式的回溯。

有回溯 (With Backtracking)

string input = "Hello my Name is Sukhpinder";
string pattern = @"\b\p{Lu}\w*\b";foreach (Match match in Regex.Matches(input, pattern))Console.WriteLine(match.Value);// Output
// Hello
// Name
// Sukhpinder

Pattern Breakdown: @”\b\p{Lu}\w*\b”

模式分类: @”\b\p{Lu}\w*\b”

禁用回溯 (Disable Backtracking)

To disable it use the(?>subexpression) language element, known as an atomic group.


没有回溯:请注意模式更改 (Without Backtracking: Notice the pattern change)

string input = "Hello my Name is Sukhpinder";
string pattern = @"\b\p{Lu}(?>\w*)\b";foreach (Match match in Regex.Matches(input, pattern))Console.WriteLine(match.Value);// Output
// Hello
// Name
// Sukhpinder

使用超时超载(Use TimeOut Overloads)

If your regular expressions process input that nearly matches the RegEx pattern, it relies on Backtracking, which in turn impacts the performance.


Always set a time-out condition to lower the influence of excessive Backtracking.


Overloads- Regex(String, RegexOptions, TimeSpan)- Regex.Match(String, String, RegexOptions, TimeSpan)

