← Programming

Perl Compatible Regular Expressions


Character Classes

Note

Special characters (e.g., ., +, *) lose their meaning when placed in a character class

Note

Both - and ^ are context-sensitive. The - only acts as a range operator when it appears between two characters (e.g., [a-z]). If it’s placed at the very beginning or very end of the brackets (e.g., [-az] or [az-]), or if it’s escaped with a backslash (e.g., [a\-z]), regex treats it as a literal hyphen character, not a range. The ^ only negates the entire class when it appears as the first character inside the brackets (e.g., [^abc]). If ^ appears anywhere else inside the brackets (e.g., [a^bc]), it is treated as a literal caret character and is simply included as one of the characters to match.

Shorthand Character Classes

The lowercase version matches something, and the uppercase version matches the exact opposite (its complement):

NotationDescription
\dAny digit (same as [0-9])
\DAny non-digit
\wAny alphanumeric (letters, digits) or _
\WNeither alphanumeric nor _
\sAny whitespace (space, tab, newline)
\SNot whitespace

Alternation

<pattern>|<pattern>: matches the pattern to its the left or the pattern to its the right

Note

Alternation applies to whole subpatterns, so group when needed (e.g., foo|barbaz matches foo or barbaz, while (foo|bar)baz matches foobaz or barbaz)


Wildcards

NotationDescription
.Matches to any character
?Optional previous (i.e., 0 or 1 of previous)
*0 or more of previous
+1 or more of previous
{n}Exactly n of previous
{n,}n or more of previous
{n, m}Between n and m times of previous
Note

To match it literally, escape it (i.e., \<wildcard>).

Note

By default, quantifiers are greedy (i.e., they match as much as possible). Append ? to make them lazy (i.e., match as little as possible): *?, +?, ??, {n, m}?.


Anchors

Note

Using ^<pattern>$ forces the entire string to match the pattern, instead of letting the regex match a substring somewhere in the middle.


Groups and Backreferences

Note

(\w+) \1 matches a repeated word like hello hello, because \1 must match whatever (\w+) captured.


Lookarounds

Lookarounds are zero-width assertions, which means they check for a pattern at the current position without consuming any characters.

Note

\d+(?= dollars) matches the number in 100 dollars but not in 100 euros — and the match is just 100, not 100 dollars, since the lookahead doesn’t consume characters.

Note

Lookbehind patterns in PCRE must be fixed-width (you can’t use * or + inside them). Lookaheads have no such restriction.