Regular Expressions Guide

Regular expressions (regex) are the universal language for pattern matching in text. Whether you are validating input, parsing logs, or performing search-and-replace operations, understanding regex is a force multiplier. This page covers the three main flavors you will encounter on the command line -- BRE, ERE, and PCRE -- along with character classes, quantifiers, anchors, groups, lookahead, lookbehind, and practical examples with grep.

Three Flavors of Regex

Understanding which flavor you are working with is the first step to writing correct patterns. Each tool defaults to a specific flavor, and the differences in escaping rules can cause frustrating bugs.

BRE (Basic Regular Expressions)

BRE is the default for grep and sed. In BRE, metacharacters like +, ?, {, }, (, and ) must be escaped with a backslash to activate their special meaning. Without the backslash, they are treated as literal characters.

# BRE: must escape grouping and quantifier
grep 'colou\?r' file.txt          # matches color or colour
echo "abc 123" | sed 's/\([0-9]\{1,3\}\)/[\1]/g'
# abc [123]

ERE (Extended Regular Expressions)

ERE is activated with grep -E (or egrep) and sed -E. Metacharacters work without escaping, making patterns much more readable. This is the flavor most people prefer for interactive use.

# ERE: no escaping needed
grep -E 'colou?r' file.txt        # matches color or colour
echo "abc 123" | sed -E 's/([0-9]{1,3})/[\1]/g'
# abc [123]

PCRE (Perl-Compatible Regular Expressions)

PCRE adds features like lookahead, lookbehind, non-greedy quantifiers, and shorthand character classes (\d, \w, \s). Use grep -P (GNU grep only, not available on macOS by default).

# PCRE: shorthand classes and lookahead
grep -P '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' access.log
grep -P '(?<=user=)\w+' config.txt

On macOS, install GNU grep via Homebrew (brew install grep) and use ggrep -P, or use perl directly for PCRE support.

Character Classes

Character classes match any single character from a defined set.

[abc]       # matches a, b, or c
[a-z]       # matches any lowercase letter
[A-Za-z]    # matches any letter
[0-9]       # matches any digit
[^0-9]      # matches any non-digit (caret negates inside brackets)
[a-zA-Z0-9_]  # matches word characters

POSIX Classes

POSIX classes are portable across locales and tools. They must be used inside bracket expressions.

[[:alpha:]]   # any letter
[[:digit:]]   # any digit (same as [0-9])
[[:alnum:]]   # any letter or digit
[[:space:]]   # whitespace (space, tab, newline, etc.)
[[:upper:]]   # uppercase letter
[[:lower:]]   # lowercase letter
[[:punct:]]   # punctuation
[[:xdigit:]]  # hexadecimal digit [0-9a-fA-F]

PCRE Shorthand

PCRE provides convenient shorthand classes that are more concise than POSIX equivalents.

\d    # digit, equivalent to [0-9]
\D    # non-digit
\w    # word character [a-zA-Z0-9_]
\W    # non-word character
\s    # whitespace (space, tab, newline, carriage return)
\S    # non-whitespace

These shorthands are not available in BRE or ERE -- they are PCRE-only. Attempting to use \d with grep -E will not work as expected.

Quantifiers

Quantifiers control how many times the preceding element must appear.

Quantifier Meaning Example
* 0 or more ab*c matches ac, abc, abbc
+ 1 or more (ERE/PCRE) ab+c matches abc, abbc, not ac
? 0 or 1 (ERE/PCRE) colou?r matches color, colour
{n} Exactly n \d{4} matches 2026
{n,} n or more \d{2,} matches 10, 100, 1000
{n,m} Between n and m \d{1,3} matches 1, 12, 123

Greedy vs Non-Greedy (PCRE)

By default, quantifiers are greedy -- they match as much text as possible. Append ? to make them non-greedy (lazy), matching as little as possible.

# Greedy: matches the longest possible string
echo '<b>bold</b> and <i>italic</i>' | grep -oP '<.*>'
# <b>bold</b> and <i>italic</i>

# Non-greedy: matches the shortest possible string
echo '<b>bold</b> and <i>italic</i>' | grep -oP '<.*?>'
# <b>
# </b>
# <i>
# </i>

This distinction matters enormously when parsing HTML, XML, or any format with paired delimiters. Greedy matching is almost never what you want for extracting individual tags.

Anchors

Anchors match positions in the text, not characters. They are zero-width assertions.

^       # start of line
$       # end of line
\b      # word boundary (ERE/PCRE)
\B      # non-word boundary
\A      # start of string (PCRE, different from ^ in multiline mode)
\Z      # end of string (PCRE)
# Lines that start with ERROR
grep -E '^ERROR' logfile.txt

# Lines that end with a semicolon
grep -E ';$' source.c

# The word "the" as a whole word, not part of "other"
grep -E '\bthe\b' document.txt

# Blank lines
grep -E '^$' file.txt

# Lines that are exactly "OK"
grep -E '^OK$' status.txt

Groups and Backreferences

Parentheses create capture groups. Backreferences let you refer to captured text later in the same pattern or in a replacement string.

# Capture and backreference: find repeated words
grep -E '\b(\w+)\s+\1\b' document.txt
# Matches: "the the", "is is", etc.

# Sed: swap first and last name
echo "Doe, John" | sed -E 's/^(\w+), (\w+)$/\2 \1/'
# John Doe

# Non-capturing group (PCRE): group without capturing
grep -P '(?:https?|ftp)://\S+' links.txt

# Alternation within a group
grep -E '(cat|dog|bird)' animals.txt

Named Groups (PCRE)

Named groups make complex patterns more readable by giving captures descriptive names.

# Named capture group
echo "2026-04-16" | grep -oP '(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

# Use named backreferences in replacement (Perl)
echo "2026-04-16" | perl -pe 's/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/$+{d}\/$+{m}\/$+{y}/'
# 16/04/2026

Lookahead and Lookbehind (PCRE)

Lookaround assertions match a position based on what comes before or after it, without consuming characters. This means the matched content is not included in the result.

# Positive lookahead: digits followed by "px"
echo "font-size: 16px" | grep -oP '\d+(?=px)'
# 16

# Negative lookahead: digits NOT followed by "px"
echo "width: 100%; height: 50px" | grep -oP '\d+(?!px|%)\b'

# Positive lookbehind: value after "user="
echo "user=alice role=admin" | grep -oP '(?<=user=)\w+'
# alice

# Negative lookbehind: "cat" not preceded by "bob"
echo "bobcat wildcat" | grep -oP '(?<!bob)cat'
# cat (from "wildcat")

Lookbehind has a restriction in most regex engines: the pattern inside must have a fixed length. You cannot use quantifiers like * or + inside a lookbehind (except in newer engines).

grep Examples

A comprehensive set of grep invocations for everyday work:

# Basic grep: find lines containing "error" (case-insensitive)
grep -i "error" /var/log/syslog

# Extended regex: match email-like patterns
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

# PCRE: match IPv4 addresses
grep -P '\b\d{1,3}(\.\d{1,3}){3}\b' access.log

# Show only the matched part
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

# Count matches
grep -c "404" access.log

# Recursive search with file names and line numbers
grep -rn "TODO" src/

# Invert match: lines that do NOT contain "debug"
grep -v "debug" app.log

# Multiple patterns (OR)
grep -E "error|warning|critical" logfile.txt

# Show filenames only (useful for large searches)
grep -rl "deprecated" lib/

# Match whole words only
grep -w "test" results.txt

Practical Patterns

These patterns come up repeatedly in real-world log parsing, validation, and data extraction tasks.

# IPv4 address
\b([0-9]{1,3}\.){3}[0-9]{1,3}\b

# Email address (simplified)
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# URL
https?://[^\s"'<>]+

# ISO date (YYYY-MM-DD)
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

# Log timestamp (common format)
\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}

# Hex color code
#[0-9a-fA-F]{3,6}\b

# Semantic version
\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?

# MAC address
([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}

# US phone number
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Next Steps

Regular expressions are the foundation of sed pattern matching and awk conditionals. Deepen your skills with the Sed and Awk page. For advanced scripting constructs that use regex in conditionals and parameter expansion, see Advanced Bash. Return to the Shell Scripting hub for the full topic list.