Sed and Awk: Stream Editing and Text Processing

Sed and Awk are the two most important text-processing tools in the UNIX toolkit. Sed excels at line-by-line transformations -- find and replace, deletion, insertion -- while Awk is a full pattern-scanning language designed for structured, column-oriented data. Together they can handle virtually any text-manipulation task you encounter on the command line. Learning these tools pays off immediately because they appear in scripts, one-liners, and automation tasks everywhere.

Sed Fundamentals

Sed (stream editor) reads input line by line, applies editing commands, and writes the result to standard output. It does not modify the original file unless you tell it to.

Basic Substitution

The most common sed command is s (substitute).

# Replace first occurrence on each line
echo "hello world hello" | sed 's/hello/hi/'
# hi world hello

# Replace all occurrences on each line with the g flag
echo "hello world hello" | sed 's/hello/hi/g'
# hi world hi

# Case-insensitive substitution (GNU sed)
echo "Hello HELLO hello" | sed 's/hello/hi/gI'
# hi hi hi

Addresses: Targeting Specific Lines

You can restrict which lines a command acts on by specifying addresses. Addresses can be line numbers, ranges, or regular expression patterns.

# Substitute only on line 3
sed '3s/old/new/' file.txt

# Substitute on lines 2 through 5
sed '2,5s/old/new/g' file.txt

# Substitute on lines matching a pattern
sed '/^ERROR/s/foo/bar/g' file.txt

# Substitute from a pattern to the end of the file
sed '/START/,$s/old/new/g' file.txt

# Negate an address: substitute on all lines EXCEPT those matching
sed '/^#/!s/old/new/g' file.txt

In-Place Editing

The -i flag modifies the file directly. On macOS (BSD sed), you must provide a backup suffix; use -i '' for no backup. GNU sed accepts -i with no argument.

# GNU sed: edit in place, no backup
sed -i 's/old/new/g' file.txt

# BSD sed (macOS): edit in place, no backup
sed -i '' 's/old/new/g' file.txt

# Create a backup before editing
sed -i.bak 's/old/new/g' file.txt

Delete and Print

# Delete lines 1 through 5
sed '1,5d' file.txt

# Delete blank lines
sed '/^$/d' file.txt

# Print only lines matching a pattern (use -n to suppress default output)
sed -n '/ERROR/p' file.txt

# Print lines 10 to 20
sed -n '10,20p' file.txt

# Delete lines matching a pattern
sed '/^DEBUG/d' logfile.txt

Insert, Append, and Change

# Insert a line before line 3
sed '3i\This line was inserted' file.txt

# Append a line after every line matching a pattern
sed '/^SECTION/a\--- end of section ---' file.txt

# Replace (change) an entire line
sed '/^deprecated/c\# This line has been removed' config.txt

Hold Space and Pattern Space

Sed has two buffers: the pattern space (current line being processed) and the hold space (auxiliary buffer). Commands like h (copy pattern to hold), g (copy hold to pattern), H/G (append), and x (exchange) enable multi-line operations.

# Reverse line order (classic sed trick)
sed -n '1!G;h;$p' file.txt

# Join every two lines with a comma
sed 'N;s/\n/,/' file.txt

The hold space is powerful but can be difficult to reason about. For complex multi-line processing, Awk is usually the better and more readable choice.

Awk Fundamentals

Awk reads input records (by default, lines) and splits each record into fields (by default, whitespace-separated). Fields are accessible as $1, $2, etc. $0 is the entire record.

Basic Field Extraction

# Print the first and third columns
echo "Alice 30 Paris" | awk '{print $1, $3}'
# Alice Paris

# Print the last field
echo "one two three four" | awk '{print $NF}'
# four

# Built-in variables: NR = record number, NF = number of fields
echo -e "a b c\nd e" | awk '{print NR, NF, $0}'
# 1 3 a b c
# 2 2 d e

Custom Field Separator

Use -F to set the field separator, or set FS in a BEGIN block.

# Parse /etc/passwd (colon-separated)
awk -F: '{print $1, $3}' /etc/passwd

# Multiple character separator
echo "one::two::three" | awk -F'::' '{print $2}'
# two

# Tab-separated input
awk -F'\t' '{print $1, $3}' data.tsv

Pattern-Action Rules

Awk programs consist of pattern { action } pairs. If the pattern matches, the action runs. Omitting the pattern means the action runs for every record.

# Print lines where the third field is greater than 100
awk '$3 > 100 {print $0}' data.txt

# Print lines matching a regex
awk '/ERROR/ {print NR": "$0}' logfile.txt

# Combine patterns with logical operators
awk '$3 > 50 && $4 == "ACTIVE" {print $1}' report.txt

# Range pattern: from START to END
awk '/START/,/END/ {print}' file.txt

BEGIN and END Blocks

BEGIN runs before any input is read; END runs after all input has been processed. They are invaluable for initialization and summary output.

# Sum values in column 2
awk 'BEGIN {total=0} {total += $2} END {print "Total:", total}' data.txt

# CSV header and footer
awk 'BEGIN {print "Name,Score"} {print $1","$2} END {print "---done---"}' scores.txt

# Count records
awk 'END {print NR, "records processed"}' data.txt

Printf for Formatted Output

# Right-aligned columns
awk '{printf "%-20s %8.2f\n", $1, $2}' prices.txt

# Pad with zeros
awk '{printf "%05d %s\n", NR, $0}' file.txt

Awk Variables and Assignment

# Set output field separator
awk 'BEGIN {OFS=","} {print $1, $3, $5}' data.txt

# Set output record separator
awk 'BEGIN {ORS="\n---\n"} {print $0}' file.txt

Practical One-Liners

These one-liners come up frequently in real-world work. Bookmark this section for quick reference.

# Remove trailing whitespace from every line
sed 's/[[:space:]]*$//' file.txt

# Extract unique IP addresses from an access log
awk '{print $1}' access.log | sort -u

# Sum a column of numbers (column 5)
awk '{sum += $5} END {print sum}' data.txt

# Replace a string only in lines between two markers
sed '/START/,/END/s/old/new/g' config.txt

# Print lines longer than 80 characters
awk 'length > 80' file.txt

# Swap the first two fields
awk '{tmp=$1; $1=$2; $2=tmp; print}' file.txt

# Number non-blank lines
awk 'NF {printf "%4d %s\n", ++n, $0; next} {print}' file.txt

# Extract the value from key=value lines
sed -n 's/^database_host=//p' config.ini

# Count occurrences of each HTTP status code
awk '{count[$9]++} END {for (c in count) print c, count[c]}' access.log | sort -rn -k2

# Remove duplicate adjacent lines (like uniq, but with awk)
awk 'prev != $0; {prev=$0}' file.txt

# Calculate average of a column
awk '{sum+=$1; n++} END {print "Average:", sum/n}' numbers.txt

# Print only lines between line 20 and line 30
sed -n '20,30p' file.txt

When to Use Sed vs Awk

Task Preferred Tool
Simple find-and-replace sed
Delete or print lines by pattern/range sed
Column-based extraction or computation awk
Conditional logic per field awk
Multi-line transformations awk (usually clearer than sed hold space)
Quick in-place edits sed -i
Aggregation and counting awk

In practice you will often chain them together in a pipeline: cat data | sed '...' | awk '...'. Each tool handles the step it does best.

Advanced Sed: Multiple Commands

You can supply multiple sed commands with -e or by separating them with semicolons.

sed -e 's/foo/bar/g' -e '/^#/d' -e '1i\# Generated file' config.txt

Or place them in a sed script file and use -f:

# commands.sed
s/foo/bar/g
/^#/d
1i\# Generated file
sed -f commands.sed config.txt

Advanced Awk: Arrays and Functions

Awk supports associative arrays and user-defined functions, making it surprisingly capable for a "one-liner" language.

# Word frequency counter
awk '{
    for (i = 1; i <= NF; i++)
        freq[tolower($i)]++
}
END {
    for (word in freq)
        printf "%4d %s\n", freq[word], word
}' book.txt | sort -rn | head -20

# User-defined function
awk '
function max(a, b) { return a > b ? a : b }
{ biggest = max(biggest, $1) }
END { print "Max:", biggest }
' numbers.txt

# Multiple array dimensions (simulated with SUBSEP)
awk '{
    data[$1,$2] = $3
}
END {
    for (key in data)
        print key, data[key]
}' matrix.txt

Next Steps

Sed and Awk pair naturally with regular expressions. For a deep dive into regex syntax, see the Regular Expressions Guide. For productivity tips that complement your text-processing skills, visit CLI Productivity. Return to the Shell Scripting hub for the complete topic list.