A utility and a programming language

awk
  • Written by Alfred V. Aho, Peter J. Weinberger, Brian W. Kernighan

  • The Swiss Army knife of text file processing that can replace sed, grep, ed, …​

  • Programming language very similar to C

  • Adequate for light and heavy tasks used in real programs

  • Able to work on the output of other commands (via a pipe)

gawk
  • Using gawk on Linux
    → Some improvements and optimizations compared to the original Awk

When to use awk ?

…​ The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls. …​

Programs written with awk are usually much smaller than they would be in other languages. This makes awk programs easy to compose and use. Often, awk programs can be quickly composed at your keyboard, used once, and thrown away. Because awk programs are interpreted, you can avoid the (usually lengthy) compilation part of the typical edit-compile-test-debug cycle of software development.

…​

If you find yourself writing awk scripts of more than, say, a few hundred lines, you might consider using a different programming language. The shell is good at string and pattern matching; in addition, it allows powerful use of the system utilities. Python offers a nice balance between high-level ease of programming and access to system facilities.

Gawk: Effective AWK Progamming

Program Structure

  • An awk program (script) is made up of one or more rules

  • Each rule is composed of a format (pattern) and an action

    • The format is used to select the lines on which to perform the action

    • If the format is omitted, the action is performed on all lines to be processed

  • There is no separator between the different rules.

General Syntax
pattern { action }
pattern { action }
...

Patterns

  • Allows you to select certain lines in the data to be processed

  • Are

    • regular expressions written between two /

    • selectors based on internal variables

  • There are special pattern including BEGIN and END which correspond with the start and end of the script, respectively.

Example
Select lines beginning with hello or Hello
/^[Hh]ello/ { ... }

Actions

  • Triggered each time the pattern is validated

  • Use of special variables (NR, NF, FS, $0, $1, …​)

  • Use of language operators and functions (similar to C)

  • Use of system commands

  • Possibility of redirecting to files (file name must be enclosed in quotation marks)

  • Within an action, the various instructions are separated by the ; character, as in C.

Example
Display the number of words on lines beginning with Hello or hello
/^[Hh]ello/ { print NF }

Examples

Display lines containing the word include
$> awk '/include/ { print $0 }' src.c
Example of output of the command ls -l
$> ls -l
-rw-r--r-- 1 root root 66194123 févr. 22 09:42 20190220-dump.csv.gz
-rw-r--r-- 1 root root 32951207 févr. 22 09:48 20190222-dump.csv.gz
-rw-r--r-- 1 root root   110620 mars   8 13:40 awk.html
-rw-r--r-- 1 root root   138684 mars   6 16:39 LDC.html
$>
Display only the file size of the ls -l output
$> ls -l | awk '{ print $5 }'
Sum of the file sizes
$> ls -l | awk '{ tot+=$5 } END { print tot }'

An complete example

Display a small report on the contents of the current directory
$> ls -l | awk '
BEGIN {
  dst = "report.txt"
  print "List of files in the directory : " > dst
}
NR>2 {
  print $9 >> dst
  tot+=$5
}
END {
  if ( tot != 0 ) print "\nTotal file sizes : ",tot,"octets" >> dst ;
  else print "(None file found)" >> dst
}'
$> cat report.txt
List of files in the directory :
20190220-dump.csv.gz
20190222-dump.csv.gz
awk.html
LDC.html

Total file sizes : 99394634 bytes

Using Shell variables

  • It is possible to pass an external value to awk so that it assigns the value to an internal variable

  • This is an option when awk is invoked.

Example
$> VAR="Hello from AWK"
$> awk -v ext="$VAR" 'BEGIN { print ext }'
Hello from AWK
$>

ext will be the (arbitrary) name of the Awk program variable whose contents will be initialized with the contents of the Shell variable VAR.

References

‡ : read thoses pages on your own operating system, not on the Internet!