2019独角兽企业重金招聘Python工程师标准>>>

Regular expressions are a language of their own. When you learn a new programming language, they're this little sub-language that makes no sense at first glance. Many times you have to read another tutorial, article, or book just to understand the "simple" pattern described. Today, we'll review eight regular expressions that you should know for your next coding project.

Background Info on Regular Expressions

This is what Wikipedia has to say about them:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Now, that doesn't really tell me much about the actual patterns. The regexes I'll be going over today contains characters such as \w, \s, \1, and many others that represent something totally different from what they look like.

If you'd like to learn a little about regular expressions before you continue reading this article, I'd suggest watching the Regular Expressions for Dummies screencast series.

The eight regular expressions we'll be going over today will allow you to match a(n): username, password, email, hex value (like #fff or #000), slug, URL, IP address, and an HTML tag. As the list goes down, the regular expressions get more and more confusing. The pictures for each regex in the beginning are easy to follow, but the last four are more easily understood by reading the explanation.

The key thing to remember about regular expressions is that they are almost read forwards and backwards at the same time. This sentence will make more sense when we talk about matching HTML tags.

Note: The delimiters used in the regular expressions are forward slashes, "/". Each pattern begins and ends with a delimiter. If a forward slash appears in a regex, we must escape it with a backslash: "\/".

1. Matching a Username

Pattern:

1
/^[a-z0-9_-]{3,16}$/

Description:

We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).

String that matches:

my-us3r_n4m3

String that doesn't match:

th1s1s-wayt00_l0ngt0beausername (too long)

2. Matching a Password

Pattern:

1
/^[a-z0-9_-]{6,18}$/

Description:

Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).

String that matches:

myp4ssw0rd

String that doesn't match:

mypa$$w0rd (contains a dollar sign)

3. Matching a Hex Value

Pattern:

1
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, but to be "greedy" and capture it if it's there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and f or a number six times. The vertical bar tells us that we can also have three lowercase letters between a and f or numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f's.

String that matches:

#a3c113

String that doesn't match:

#4d82h4 (contains the letter h)

4. Matching a Slug

Pattern:

1
/^[a-z0-9-]+$/

Description:

You will be using this regex if you ever have to work with mod_rewrite and pretty URL's. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).

String that matches:

my-title-here

String that doesn't match:

my_title_here (contains underscores)

5. Matching an Email

Pattern:

1
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Description:

We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD's (.ny.us or .co.uk). Finally, we want the end of the string ($).

String that matches:

john@doe.com

String that doesn't match:

john@doe.something (TLD is too long)

6. Matching a URL

Pattern:

1
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Description:

This regex is almost like taking the ending part of the above regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all option. It allows the URL to begin with "http://", "https://", or neither of them. I have a question mark after the s to allow URL's that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally we end with the end of the line.

String that matches:

http://net.tutsplus.com/about

String that doesn't match:

http://google.com/some/file!.html (contains an exclamation point)

7. Matching an IP Address

Pattern:

1
/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/

Description:

Now, I'm not going to lie, I didn't write this regex; I got it from here. Now, that doesn't mean that I can't rip it apart character for character.

The first capture group really isn't a captured group because

1

was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It's just a bunch of character sets (things inside brackets): the string "25" followed by a number between 0 and 5; or the string "2" and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it's onto the next non-capturing group. This one wants: the string "25" followed by a number between 0 and 5; or the string "2" with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.

String that matches:

73.60.124.136 (no, that is not my IP address :P)

String that doesn't match:

256.60.124.136 (the first group must be "25" and a number between zero and five)

Advertisement

8. Matching an HTML Tag

Pattern:

1
/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Description:

One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag's name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag's attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag's name. Now, if that couldn't be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by "/>".

The regex is ended with the end of the line.

String that matches:

<a href="http://net.tutsplus.com/">Nettuts+</a>

String that doesn't match:

<img src="img.jpg" alt="My image>" /> (attributes can't contain greater than signs)

Conclusion

I hope that you have grasped the ideas behind regular expressions a little bit better. Hopefully you'll be using these regexes in future projects! Many times you won't need to decipher a regex character by character, but sometimes if you do this it helps you learn. Just remember, don't be afraid of regular expressions, they might not seem it, but they make your life a lot easier. Just try and pull out a tag's name from a string without regular expressions! ;)

  • Follow us on Twitter, or subscribe to the NETTUTS RSS Feed for more daily web development tuts and articles.
Advertisement

Difficulty:
Intermediate

Categories:
Web DevelopmentRegular Expressions

Translations:

Tuts+ tutorials are translated into other languages by our community members—you can be involved too!

Translate this post

About Vasili
N/A

Advertisement

Suggested Tuts+ Course
3:24
POWERED BY WISTIA

JavaScript Fundamentals$15
Related Tutorials
The Tuts+ Guide to Template Tags: Seventh Batch
Code

Creating Alfred Workflows in Haskell
Computer Skills

The Tuts+ Guide to Template Tags: Second Batch
Code

Jobs
Front-End Engineerat Truth Labs in Chicago, IL, USA
Back-End Wordpress Developerat Vidstore in Denver, CO, USA

Envato Market Item

What Would You Like to Learn?
Suggest an idea to the content editorial team at Tuts+.

转载于:https://my.oschina.net/u/2363463/blog/635777

8 Regular Expressions You Should Know相关推荐

  1. Coursera课程Python for everyone:Quiz: Regular Expressions

    Quiz: Regular Expressions 10 试题 1. Which of the following best describes "Regular Expressions&q ...

  2. 正则表达式(Regular Expressions)

    正则表达式(Regular Expressions) 正则表达式在其他编程语言中的应用非常广泛,网上资料也非常多,而网上在ABAP语言中应用的资料却很少,尽管各语言中正则表达式语法知识都很类似,但仍然 ...

  3. .NET Regular Expressions

    HTML去空白回车换行 private static readonly Regex REGEX_LINE_BREAKS = new Regex(@"\n\s*", RegexOpt ...

  4. Optimizing regular expressions in Java

    2019独角兽企业重金招聘Python工程师标准>>> SRC URL:http://www.javaworld.com/article/2077757/core-java/opti ...

  5. 《D o C P》学习笔记(3 - 0)Regular Expressions, other languages and interpreters - 简介

    Regular Expressions, other languages and interpreters 你可以学到什么: 定义正则表达式的语言:解释这个语言. 定义被1个正则表达式匹配的字符串集合 ...

  6. 《D o C P》学习笔记(3 - 1)Regular Expressions, other languages and interpreters - Lesson 3

    备注1:每个视频的英文字幕,都翻译成中文,太消耗时间了,为了加快学习进度,我将暂停这个工作,仅对英文字幕做少量注释. 备注2:将.flv视频文件与Subtitles文件夹中的.srt字幕文件放到同1个 ...

  7. UltraEdit正则表达式使用(Regular Expressions in UltraEdit)

    正则表达式作为模式匹配,经常用于查找/替换操作的特定字符串.使用正则表达式来简化操作和提高效率的方式有许多.下面列出了一个用于ultra - edit样式和unix样式正则表达式的参考以及一些示例,演 ...

  8. 斯坦福大学Dan Jurafsky主讲NLP笔记 2.1 Regular Expressions

    2.1 regular expressions 正则表达式 2.1.1 问题的提出 在英语中,有大小写和单复数的问题,例如: woodchuck      woodchucks      Woodch ...

  9. 9.7.2. SIMILAR TO Regular Expressions

    9.7.2. SIMILAR TO Regular Expressions 9.7.2.SIMILAR TO正则表达式 string SIMILAR TO pattern [ESCAPE escape ...

最新文章

  1. Linux(10)用户和组管理命令
  2. 存储器芯片国产化布局加速 数千亿投资欲打破进口依赖
  3. java简历达内_达内教你怎么写大牛简历
  4. 每天一道LeetCode-----只可能有'.'和'*'的字符串正则匹配
  5. php 合并两个数组并去重,合并两个数组 以KEY 作为键
  6. [家里蹲大学数学杂志]第236期钟玉泉复变函数论前六章第二组习题参考解答
  7. Python+matplotlib数据可视化鼠标悬停自动标注功能实现
  8. 8086微型计算机结构功能,3.2 8086微处理器的功能结构
  9. 安装activex手机控件_86/BRZ 免“油饼”安装 Defi 机油压力表
  10. 数电-汽车尾灯控制电路设计
  11. Django项目使用NGINX通过LDAP实现用户验证
  12. markdown温习笔记
  13. 转载CSDN博客时的错误
  14. java绘制五子棋棋盘
  15. 如何隐藏公共 IP 地址,进行 DDos
  16. Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers简记
  17. 服装行业如何用手持PDA盘点?
  18. 深扒“亚稳态”的底裤,从MOS管到CMOS门电路,再到亚稳态分析
  19. 诚之和:首个俄罗斯太空电影摄制组准备返回地球
  20. 我眼中的光明·第三周

热门文章

  1. tcpcopy使用方法
  2. C++ Vecctor容器浅析
  3. 英语口语-文章朗读Week10 Thursday
  4. Coding For Fun 32小时:充满创造、激情、团结的编程马拉松
  5. jQuery的name选择器 模糊匹配
  6. spring IoC/DI
  7. jquery函数加载及生成随机数
  8. BZOJ 4710 [Jsoi2011]分特产 解题报告
  9. May 18:PHP 输出语句
  10. 鼠标移入视频播放,鼠标移出播放停止,恢复到原来状态