A utility and a programming language
-
Written by Alfred V. Aho, Peter J. Weinberger, Brian W. Kernighan
-
The Swiss Army knife of text file processing that can replace
sed,grep,ed, … -
Programming language very similar to C
-
Adequate for light and heavy tasks used in real programs
-
Able to work on the output of other commands (via a pipe)
-
Using
gawkon Linux
→ Some improvements and optimizations compared to the original Awk
When to use awk ?
… The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls. …
Programs written with awk are usually much smaller than they would be in other languages. This makes awk programs easy to compose and use. Often, awk programs can be quickly composed at your keyboard, used once, and thrown away. Because awk programs are interpreted, you can avoid the (usually lengthy) compilation part of the typical edit-compile-test-debug cycle of software development.
…
If you find yourself writing awk scripts of more than, say, a few hundred lines, you might consider using a different programming language. The shell is good at string and pattern matching; in addition, it allows powerful use of the system utilities. Python offers a nice balance between high-level ease of programming and access to system facilities.
Program Structure
-
An
awkprogram (script) is made up of one or more rules -
Each rule is composed of a format (pattern) and an action
-
The format is used to select the lines on which to perform the action
-
If the format is omitted, the action is performed on all lines to be processed
-
-
There is no separator between the different rules.
pattern { action }
pattern { action }
...
Patterns
-
Allows you to select certain lines in the data to be processed
-
Are
-
regular expressions written between two
/ -
selectors based on internal variables
-
-
There are special pattern including
BEGINandENDwhich correspond with the start and end of the script, respectively.
Select lines beginning with hello or Hello
/^[Hh]ello/ { ... }
Actions
-
Triggered each time the pattern is validated
-
Use of special variables (
NR,NF,FS,$0,$1, …) -
Use of language operators and functions (similar to C)
-
Use of system commands
-
Possibility of redirecting to files (file name must be enclosed in quotation marks)
-
Within an action, the various instructions are separated by the
;character, as in C.
Display the number of words on lines beginning with Hello or hello
/^[Hh]ello/ { print NF }
Examples
include$> awk '/include/ { print $0 }' src.c
ls -l$> ls -l
-rw-r--r-- 1 root root 66194123 févr. 22 09:42 20190220-dump.csv.gz
-rw-r--r-- 1 root root 32951207 févr. 22 09:48 20190222-dump.csv.gz
-rw-r--r-- 1 root root 110620 mars 8 13:40 awk.html
-rw-r--r-- 1 root root 138684 mars 6 16:39 LDC.html
$>
ls -l output$> ls -l | awk '{ print $5 }'
$> ls -l | awk '{ tot+=$5 } END { print tot }'
An complete example
$> ls -l | awk '
BEGIN {
dst = "report.txt"
print "List of files in the directory : " > dst
}
NR>2 {
print $9 >> dst
tot+=$5
}
END {
if ( tot != 0 ) print "\nTotal file sizes : ",tot,"octets" >> dst ;
else print "(None file found)" >> dst
}'
$> cat report.txt
List of files in the directory :
20190220-dump.csv.gz
20190222-dump.csv.gz
awk.html
LDC.html
Total file sizes : 99394634 bytes
Using Shell variables
-
It is possible to pass an external value to
awkso that it assigns the value to an internal variable -
This is an option when
awkis invoked.
$> VAR="Hello from AWK"
$> awk -v ext="$VAR" 'BEGIN { print ext }'
Hello from AWK
$>
ext will be the (arbitrary) name of the Awk program variable whose contents will be initialized with the contents of the Shell variable VAR.
References
-
"Unix Text Processing", Dale Dougherty and Tim O’Reilly, Hayden Books, 1987
https://www.oreilly.com/openbook/utp/ -
Christophe Blaess cheat sheets[FR]
https://www.blaess.fr/christophe/developpements/aides-memoires/-
Unix commands[FR]
https://www.blaess.fr/christophe/memo_commandes_unix.html -
Shell programming[FR]
https://www.blaess.fr/christophe/memo_programmation_shell.html
-
-
Rich’s sh (POSIX shell) tricks
https://www.etalabs.net/sh_tricks.html -
Bash Reference Manual
https://www.gnu.org/software/bash/manual/bashref.html
-
Advanced Bash-Scripting Guide
http://tldp.org/LDP/abs/html/ -
"Mastering Regular Expressions, 3rd Edition — Understand Your Data and Be More Productive", Jeffrey Friedl
https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/ -
"GAWK: Effective AWK Programming", Edition 4.1
http://www.gnu.org/software/gawk/manual -
Manual pages‡ :
bash(1),grep(1),regex(7),gawk(1)
‡ : read thoses pages on your own operating system, not on the Internet!