Regular Expressions

April 1, 2010 9:19 am 0 comments

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*.txt$.

But you can do much more with regular expressions. In a text editor like EditPad Pro or a specialized text processing tool like PowerGREP, you could use the regular expression b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b Analyze this regular expression with RegexBuddy to search for an email address. Any email address, to be exact. A very similar regular expression (replace the first b with ^ and the last one with $) can be used by a programmer to check if the user entered a properly formatted email address. In just one line of code, whether that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages.

Most Used Examples

 

Sample Regular Expressions

Below, you will find many example patterns that you can use for and adapt to your own purposes. Key techniques used in crafting each regex are explained, with links to the corresponding pages in the tutorial where these concepts and techniques are explained in great detail.

If you are new to regular expressions, you can take a look at these examples to see what is possible. Regular expressions are very powerful. They do take some time to learn. But you will earn back that time quickly when using regular expressions to automate searching or editing tasks in EditPad Pro or PowerGREP, or when writing scripts or applications in a variety of languages.

RegexBuddy offers the fastest way to get up to speed with regular expressions. RegexBuddy will analyze any regular expression and present it to you in a clearly to understand, detailed outline. The outline links to RegexBuddy’s regex tutorial (the same one you find on this website), where you can always get in-depth information with a single click.

Oh, and you definitely do not need to be a programmer to take advantage of regular expressions!

Grabbing HTML Tags

]*>(.*?) Analyze this regular expression with RegexBuddy matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do. This regex will not properly match tags nested inside themselves, like in onetwoone.

]*>(.*?) Analyze this regular expression with RegexBuddy will match the opening and closing pair of any HTML tag. Be sure to turn off case sensitivity. The key in this solution is the use of the backreference 1 in the regex. Anything between the tags is captured into the second backreference. This solution will also not match tags nested in themselves.

Trimming Whitespace

You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ t]+ Analyze this regular expression with RegexBuddy and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ t]+$ Analyze this regular expression with RegexBuddy to trim trailing whitespace. Do both by combining the regular expressions into ^[ t]+|[ t]+$ Analyze this regular expression with RegexBuddy. Instead of [ t] which matches a space or a tab, you can expand the character class into [ trn] if you also want to strip line breaks. Or you can use the shorthand s instead.

IP Addresses

Matching an IP address is another good example of a trade-off between regex complexity and exactness. bd{1,3}.d{1,3}.d{1,3}.d{1,3}b will match any IP address just fine, but will also match 999.999.999.999 as if it were a valid IP address. Whether this is a problem depends on the files or data you intend to apply the regex to. To restrict all 4 numbers in the IP address to 0..255, you can use this complex beast: b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b Analyze this regular expression with RegexBuddy (everything on a single line). The long regex stores each of the 4 numbers of the IP address into a capturing group. You can use these groups to further process the IP number.

If you don’t need access to the individual numbers, you can shorten the regex with a quantifier to: b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b Analyze this regular expression with RegexBuddy. Similarly, you can shorten the quick regex to b(?:d{1,3}.){3}d{1,3}b Analyze this regular expression with RegexBuddy

More Detailed Examples

Numeric Ranges. Since regular expressions work with text rather than numbers, matching specific numeric ranges requires a bit of extra care.

Matching a Floating Point Number. Also illustrates the common mistake of making everything in a regular expression optional.

Matching an Email Address. There’s a lot of controversy about what is a proper regex to match email addresses. It’s a perfect example showing that you need to know exactly what you’re trying to match (and what not), and that there’s always a trade-off between regex complexity and accuracy.

Matching Valid Dates. A regular expression that matches 31-12-1999 but not 31-13-1999.

Finding or Verifying Credit Card Numbers. Validate credit card numbers entered on your order form. Find credit card numbers in documents for a security audit.

Matching Complete Lines. Shows how to match complete lines in a text file rather than just the part of the line that satisfies a certain requirement. Also shows how to match lines in which a particular regex does not match.

Removing Duplicate Lines or Items. Illustrates simple yet clever use of capturing parentheses or backreferences.

Regex Examples for Processing Source Code. How to match common programming language syntax such as comments, strings, numbers, etc.

Two Words Near Each Other. Shows how to use a regular expression to emulate the “near” operator that some tools have.
Common Pitfalls

Catastrophic Backtracking. If your regular expression seems to take forever, or simply crashes your application, it has likely contracted a case of catastrophic backtracking. The solution is usually to be more specific about what you want to match, so the number of matches the engine has to try doesn’t rise exponentially.

Making Everything Optional. If all the parts in your regex are optional, it will match a zero-width string anywhere. Your regex will need to express the facts that different parts are optional depending on which parts are present.

Repeating a Capturing Group vs. Capturing a Repeated Group. Repeating a capturing group will capture only the last iteration of the group. Capture a repeated group if you want to capture all iterations.

Mixing Unicode and 8-bit Character Codes. Using 8-bit character codes like x80 with a Unicode engine and subject string may give unexpected results.
Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you’ll get a lifetime of advertisement-free access to this site!

Most Used Examples

To search URL – (http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[w-@?^=%&/~+#])?

 

 

 


Leave a Reply


T --- raj-99Labels ||| raj-BIToday ||| raj-Ace2Three ||| raj-Policy-Pension ||| raj-Makaan ||| raj-Rupee-Talk ||| raj-ibibo Rummy ||| raj-The-panel-station ||| raj-CITIBank ||| raj-SnapD ||| raj-Mydala ||| raj-Max-Bupa ||| raj-TimesD || raj-Tata Photon || raj-Policy-TAX ||| raj-IREO ||| raj-Policy-CAR ||| raj-MTS-BLAZE ||| raj-R
O -- || raj- Glispa-SA ||| raj-70 MM ||| raj-Shaadi ||| raj-TimTara II raj-BUFFER1 ||| raj-BUFFER2 ||| raj-BUFFER3 ||| raj-BUFFER4 ||| raj-Classic Rummy ||| raj-R

- - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

T
--- raj-99Labels ||| raj-BIToday ||| raj-Ace2Three |||
raj-Policy-Pension ||| raj-Makaan ||| raj-Rupee-Talk ||| raj-ibibo Rummy ||| raj-The-panel-station ||| raj-CITIBank ||| raj-SnapD ||| raj-Mydala ||| raj-Max-Bupa ||| raj-TimesD || raj-Tata Photon || raj-Policy-TAX ||| raj-IREO ||| raj-Policy-CAR ||| raj-MTS-BLAZE ||| raj-A
O -- || raj- Glispa-SA ||| raj-70 MM ||| raj-Shaadi ||| raj-TimTara II raj-BUFFER1 ||| raj-BUFFER2 ||| raj-BUFFER3 ||| raj-BUFFER4 ||| raj-Classic Rummy ||| raj-A
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

T
--- raj-99Labels ||| raj-BIToday ||| raj-Ace2Three |||
raj-Policy-Pension ||| raj-Makaan ||| raj-Rupee-Talk ||| raj-ibibo Rummy ||| raj-The-panel-station ||| raj-CITIBank ||| raj-SnapD ||| raj-Mydala ||| raj-Max-Bupa ||| raj-TimesD || raj-Tata Photon || raj-Policy-TAX ||| raj-IREO ||| raj-Policy-CAR ||| raj-MTS-BLAZE ||| raj-N
O -- || raj- Glispa-SA ||| raj-70 MM ||| raj-Shaadi ||| raj-TimTara II raj-BUFFER1 ||| raj-BUFFER2 ||| raj-BUFFER3 ||| raj-BUFFER4 ||| raj-Classic Rummy ||| raj-N
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

T
--- raj-99Labels ||| raj-BIToday ||| raj-Ace2Three |||
raj-Policy-Pension ||| raj-Makaan ||| raj-Rupee-Talk ||| raj-ibibo Rummy ||| raj-The-panel-station ||| raj-CITIBank ||| raj-SnapD ||| raj-Mydala ||| raj-Max-Bupa ||| raj-TimesD || raj-Tata Photon || raj-Policy-TAX ||| raj-IREO ||| raj-Policy-CAR ||| raj-MTS-BLAZE ||| raj-MZ
O -- || raj- Glispa-SA ||| raj-70 MM ||| raj-Shaadi ||| raj-TimTara II raj-BUFFER1 ||| raj-BUFFER2 ||| raj-BUFFER3 ||| raj-BUFFER4 ||| raj-Classic Rummy ||| raj-MZ

[Close Ad]