Perl makes a pretty nice replacement for `grep`, `sed`, and `awk`.
, Somewhere over the United StatesPerl makes a pretty nice replacement for grep
, sed
, and awk
.
Say you’re looking in your web server’s logs for failed requests for a given URI. The venerable Apache common and combined log formats are space delimited. To maintain your sanity, you configured your server to log tab delimited request records instead, and you added a few fields to tell you what upstream servers did with the request.
172.16.16.128 - - [20/Jul/2013:13:02:25 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 200 1234 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 200 0.065 "-"
172.16.16.128 - - [20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500 12345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 500 0.065 "-"
172.16.16.128 - - [20/Jul/2013:13:02:27 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 200 1234 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 200 0.065 "-"
You can number the fields using head
, tr
, and nl
.
$ <~/tmp/input.txt head -n 1 | tr '\t' '\n' | nl -ba
1 172.16.16.128
2 -
3 -
4 [20/Jul/2013:13:02:25 -0500]
5 "GET /uri/that/fails/intermittently HTTP/1.1"
6 200
7 1234
8 "-"
9 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
10 www.example.com
11 66.249.73.104
12 0.065
13 172.16.17.132:3032
14 200
15 0.065
16 "-"
You’re looking for 500
in field 6
. nl
counts lines starting with 1
.
The first tool you pull out is grep
. Give it a pattern that matches
500
in the field after the fifth tab character.
$ <~/tmp/input.txt grep -E '^([^ ]*\t){5}500\t'
172.16.16.128 - - [20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500 12345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 500 0.065 "-"
Not bad, but it took you a while to get that pattern syntax right.
On its own, this is kind of a contrived job for sed
, but you could use it.
$ <~/tmp/input.txt sed -r -n -e '/^([^\t]*\t){5}500\t/p'
172.16.16.128 - - [20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500 12345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 500 0.065 "-"
So sed
probably isn’t the best tool for this job, but it does have the
advantage of editing the line, if that’s what you’re looking to do.
$ <~/tmp/input.txt sed -r -e 's/^(([^\t]*\t){5})([^\t]*)(\t.*)/The status code was \3./'
The status code was 200.
The status code was 500.
The status code was 200.
Back to the original problem. It’s right in awk
‘s wheelhouse. awk
splits fields on a given delimiter for you, and lets you match the text
in only the field of interest. awk
counts fields starting with 1
.
$ <~/tmp/input.txt awk -F '\t' '$6 == 500 { print $0 }'
172.16.16.128 - - [20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500 12345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 500 0.065 "-"
If you’re printing individual fields instead of the whole line, you want to specify the output delimiter in addition to the input delimiter.
This produces space delimited output:
$ <~/tmp/input.txt awk -F '\t' '$6 == 500 { print $4,$5,$6 }'
[20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500
This produces tab delimited output:
$ <~/tmp/input.txt awk 'BEGIN { OFS=FS="\t"} $6 == 500 { print $4,$5,$6 }'
[20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500
At some point, you decide that remembering the details of each program’s
command-line options and the various regular expression flavors is a
hassle, and you just do it all with perl
.
You can match lines with 500
in field 6
. Perl counts array indices
starting with 0
, so field 6
is array index 5
.
$ <~/tmp/input.txt perl -F'\t' -ane 'if ($F[5] == 500) { print join("\t", @F); }'
172.16.16.128 - - [20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500 12345 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" www.example.com 66.249.73.104 0.065 172.16.17.132:3032 500 0.065 "-"
You can edit the line:
$ <~/tmp/input.txt perl -pe 's/^(([^\t]*\t){5})([^\t]*)(\t.*)/The status code was \3./'
The status code was 200.
The status code was 500.
The status code was 200.
And you can print individual fields.
$ <~/tmp/input.txt perl -F'\t' -ane 'if ($F[5] == 500) { print join("\t", $F[3], $F[4], $F[5]), "\n"; }'
[20/Jul/2013:13:02:26 -0500] "GET /uri/that/fails/intermittently HTTP/1.1" 500
The perl
commands are a little more verbose than the grep
, sed
and
awk
equivalents, but perl
can solve all the above problems alone.
Perl also has unique advantages over each of the other tools. It has
Perl regular expressions, which provide features like look around that
you can’t get in grep
, sed
, and awk
‘s basic or extended regular
expressions. And Perl is a full-featured procedural programming
language, making it easier to extend these solutions to more complex problems.