Sed and Awk: Stream Editing and Text Processing
Sed and Awk are the two most important text-processing tools in the UNIX toolkit. Sed excels at line-by-line transformations -- find and replace, deletion, insertion -- while Awk is a full pattern-scanning language designed for structured, column-oriented data. Together they can handle virtually any text-manipulation task you encounter on the command line. Learning these tools pays off immediately because they appear in scripts, one-liners, and automation tasks everywhere.
Sed Fundamentals
Sed (stream editor) reads input line by line, applies editing commands, and writes the result to standard output. It does not modify the original file unless you tell it to.
Basic Substitution
The most common sed command is s (substitute).
# Replace first occurrence on each line
echo "hello world hello" | sed 's/hello/hi/'
# hi world hello
# Replace all occurrences on each line with the g flag
echo "hello world hello" | sed 's/hello/hi/g'
# hi world hi
# Case-insensitive substitution (GNU sed)
echo "Hello HELLO hello" | sed 's/hello/hi/gI'
# hi hi hi
Addresses: Targeting Specific Lines
You can restrict which lines a command acts on by specifying addresses. Addresses can be line numbers, ranges, or regular expression patterns.
# Substitute only on line 3
sed '3s/old/new/' file.txt
# Substitute on lines 2 through 5
sed '2,5s/old/new/g' file.txt
# Substitute on lines matching a pattern
sed '/^ERROR/s/foo/bar/g' file.txt
# Substitute from a pattern to the end of the file
sed '/START/,$s/old/new/g' file.txt
# Negate an address: substitute on all lines EXCEPT those matching
sed '/^#/!s/old/new/g' file.txt
In-Place Editing
The -i flag modifies the file directly. On macOS (BSD sed), you must provide a backup suffix; use -i '' for no backup. GNU sed accepts -i with no argument.
# GNU sed: edit in place, no backup
sed -i 's/old/new/g' file.txt
# BSD sed (macOS): edit in place, no backup
sed -i '' 's/old/new/g' file.txt
# Create a backup before editing
sed -i.bak 's/old/new/g' file.txt
Delete and Print
# Delete lines 1 through 5
sed '1,5d' file.txt
# Delete blank lines
sed '/^$/d' file.txt
# Print only lines matching a pattern (use -n to suppress default output)
sed -n '/ERROR/p' file.txt
# Print lines 10 to 20
sed -n '10,20p' file.txt
# Delete lines matching a pattern
sed '/^DEBUG/d' logfile.txt
Insert, Append, and Change
# Insert a line before line 3
sed '3i\This line was inserted' file.txt
# Append a line after every line matching a pattern
sed '/^SECTION/a\--- end of section ---' file.txt
# Replace (change) an entire line
sed '/^deprecated/c\# This line has been removed' config.txt
Hold Space and Pattern Space
Sed has two buffers: the pattern space (current line being processed) and the hold space (auxiliary buffer). Commands like h (copy pattern to hold), g (copy hold to pattern), H/G (append), and x (exchange) enable multi-line operations.
# Reverse line order (classic sed trick)
sed -n '1!G;h;$p' file.txt
# Join every two lines with a comma
sed 'N;s/\n/,/' file.txt
The hold space is powerful but can be difficult to reason about. For complex multi-line processing, Awk is usually the better and more readable choice.
Awk Fundamentals
Awk reads input records (by default, lines) and splits each record into fields (by default, whitespace-separated). Fields are accessible as $1, $2, etc. $0 is the entire record.
Basic Field Extraction
# Print the first and third columns
echo "Alice 30 Paris" | awk '{print $1, $3}'
# Alice Paris
# Print the last field
echo "one two three four" | awk '{print $NF}'
# four
# Built-in variables: NR = record number, NF = number of fields
echo -e "a b c\nd e" | awk '{print NR, NF, $0}'
# 1 3 a b c
# 2 2 d e
Custom Field Separator
Use -F to set the field separator, or set FS in a BEGIN block.
# Parse /etc/passwd (colon-separated)
awk -F: '{print $1, $3}' /etc/passwd
# Multiple character separator
echo "one::two::three" | awk -F'::' '{print $2}'
# two
# Tab-separated input
awk -F'\t' '{print $1, $3}' data.tsv
Pattern-Action Rules
Awk programs consist of pattern { action } pairs. If the pattern matches, the action runs. Omitting the pattern means the action runs for every record.
# Print lines where the third field is greater than 100
awk '$3 > 100 {print $0}' data.txt
# Print lines matching a regex
awk '/ERROR/ {print NR": "$0}' logfile.txt
# Combine patterns with logical operators
awk '$3 > 50 && $4 == "ACTIVE" {print $1}' report.txt
# Range pattern: from START to END
awk '/START/,/END/ {print}' file.txt
BEGIN and END Blocks
BEGIN runs before any input is read; END runs after all input has been processed. They are invaluable for initialization and summary output.
# Sum values in column 2
awk 'BEGIN {total=0} {total += $2} END {print "Total:", total}' data.txt
# CSV header and footer
awk 'BEGIN {print "Name,Score"} {print $1","$2} END {print "---done---"}' scores.txt
# Count records
awk 'END {print NR, "records processed"}' data.txt
Printf for Formatted Output
# Right-aligned columns
awk '{printf "%-20s %8.2f\n", $1, $2}' prices.txt
# Pad with zeros
awk '{printf "%05d %s\n", NR, $0}' file.txt
Awk Variables and Assignment
# Set output field separator
awk 'BEGIN {OFS=","} {print $1, $3, $5}' data.txt
# Set output record separator
awk 'BEGIN {ORS="\n---\n"} {print $0}' file.txt
Practical One-Liners
These one-liners come up frequently in real-world work. Bookmark this section for quick reference.
# Remove trailing whitespace from every line
sed 's/[[:space:]]*$//' file.txt
# Extract unique IP addresses from an access log
awk '{print $1}' access.log | sort -u
# Sum a column of numbers (column 5)
awk '{sum += $5} END {print sum}' data.txt
# Replace a string only in lines between two markers
sed '/START/,/END/s/old/new/g' config.txt
# Print lines longer than 80 characters
awk 'length > 80' file.txt
# Swap the first two fields
awk '{tmp=$1; $1=$2; $2=tmp; print}' file.txt
# Number non-blank lines
awk 'NF {printf "%4d %s\n", ++n, $0; next} {print}' file.txt
# Extract the value from key=value lines
sed -n 's/^database_host=//p' config.ini
# Count occurrences of each HTTP status code
awk '{count[$9]++} END {for (c in count) print c, count[c]}' access.log | sort -rn -k2
# Remove duplicate adjacent lines (like uniq, but with awk)
awk 'prev != $0; {prev=$0}' file.txt
# Calculate average of a column
awk '{sum+=$1; n++} END {print "Average:", sum/n}' numbers.txt
# Print only lines between line 20 and line 30
sed -n '20,30p' file.txt
When to Use Sed vs Awk
| Task | Preferred Tool |
|---|---|
| Simple find-and-replace | sed |
| Delete or print lines by pattern/range | sed |
| Column-based extraction or computation | awk |
| Conditional logic per field | awk |
| Multi-line transformations | awk (usually clearer than sed hold space) |
| Quick in-place edits | sed -i |
| Aggregation and counting | awk |
In practice you will often chain them together in a pipeline: cat data | sed '...' | awk '...'. Each tool handles the step it does best.
Advanced Sed: Multiple Commands
You can supply multiple sed commands with -e or by separating them with semicolons.
sed -e 's/foo/bar/g' -e '/^#/d' -e '1i\# Generated file' config.txt
Or place them in a sed script file and use -f:
# commands.sed
s/foo/bar/g
/^#/d
1i\# Generated file
sed -f commands.sed config.txt
Advanced Awk: Arrays and Functions
Awk supports associative arrays and user-defined functions, making it surprisingly capable for a "one-liner" language.
# Word frequency counter
awk '{
for (i = 1; i <= NF; i++)
freq[tolower($i)]++
}
END {
for (word in freq)
printf "%4d %s\n", freq[word], word
}' book.txt | sort -rn | head -20
# User-defined function
awk '
function max(a, b) { return a > b ? a : b }
{ biggest = max(biggest, $1) }
END { print "Max:", biggest }
' numbers.txt
# Multiple array dimensions (simulated with SUBSEP)
awk '{
data[$1,$2] = $3
}
END {
for (key in data)
print key, data[key]
}' matrix.txt
Next Steps
Sed and Awk pair naturally with regular expressions. For a deep dive into regex syntax, see the Regular Expressions Guide. For productivity tips that complement your text-processing skills, visit CLI Productivity. Return to the Shell Scripting hub for the complete topic list.