There is great post on a blog called Gregable, the post is titled “Why you should know just a little awk“. The comments are also worth reading. I discovered the link from another post titled “A little awk” on John Cook’s blog, The Endeavour.
I use a wide variety of Unix text processing tools on a regular basis, but over time, like many others I started migrating those tasks that required the power of ‘awk’ over to another language; in my case that language was Python. Typically I can write scripts faster in Python and I find that the code is more readable. However, after reading the above post I was reminded that there are some one-liner ‘awk’ tasks that are really clean and effective. Lately I have found myself starting to sparingly use ‘awk’ again, here’s why…
When to use ‘awk’ instead of ‘cut’
- Cut’s delimiter is a single character, awk’s delimiter is a regular expression.
- Awk allows fields to specified relative to the last field position using ‘NF’.
- Cut always displays fields in order of ascending field number, regardless of the order fields are specified in the field list parameter, awk can redisplay the fields in any order that you specify.
Examples:
splits fields at multiple characters either a, b, c, d
awk -F'[abcd]'
split at one (1) or more spaces
awk -F' +'
re-order fields
awk '{print $3 "\t" $2 "\t" $1}'
prints last field
awk '{print $NF}'
prints next to last field
awk '{print $(NF-1)}'
When to use ‘awk’ instead of Python, Perl, etc.?
- When you can write the task in one simple, readable line with awk, i.e.
- Simple reformatting of data.
- Simple comparisons on fields.
- Rearrange order of fields.
- Split on regular expressions, including multiple characters.
- Feel free to comment on other reasons.
- When the speed of Python, Perl, etc. scripts are too slow for repeated use, this is rare when coded properly.
Watch your quotes with ‘awk’ …
Here is the standard unix method of quoting:
$ awk '$NF > 385 && $(NF-1) ~ "^Sh" {print NR "\t" $0}' orders.txt
4 10416 2005-05-10 00:00:00 Shipped 386
6 10418 2005-05-16 00:00:00 Shipped 412
Here is the equivalent command using unxutils for Windows:
Note the difference in quoting…
C:\> gawk "$NF>385 && $(NF-1) ~ \"^Sh\" {print NR \"\t\" $0}" orders.txt
3 10416 2005-05-10 00:00:00 Shipped 386
5 10418 2005-05-16 00:00:00 Shipped 412
Here is what the above command is doing:
- Iterates through every line in the file “orders.txt”.
- Splits the fields at tab characters (default delimiter).
- Tests if the last field is greater than 385.
- Tests if the next to last field matches the regular expression “^Sh”, i.e. begins with the letters “Sh”.
- If items 3 & 4 were true then print the line number followed by a tab and then then line text itself.
Sample text being processed:
$ cat orders.txt 10413 2005-05-05 00:00:00 Shipped 175 10414 2005-05-06 00:00:00 On Hold 362 10415 2005-05-09 00:00:00 Disputed 471 10416 2005-05-10 00:00:00 Shipped 386 10417 2005-05-13 00:00:00 Disputed 141 10418 2005-05-16 00:00:00 Shipped 412 10419 2005-05-17 00:00:00 Shipped 382 10420 2005-05-29 00:00:00 In Process 282