AWK is a scripting language that has been available with Unix for a very long time. It was named after the initials of the three people who had written this scripting language, viz. Alfred Aho, Peter Weinberger and Brian Kernighan (of legendary C fame). They were all working at Bell Laboratories during that time.

Even though the usage for AWK has lessened a lot in current days, but it is always a good tool to keep under your belt. For very quick text manipulation, it can be used in any Unix like environment. I will be working on my MacBook, and the version that I will be using is as follows (there is no standard way of getting AWK version, but the one I am using supports —version).
% awk --version awk version 20200816
Use Cases for AWK
I have already given a one-liner history of AWK above, so will not go into it any more. Instead I will try to provide some instances when I just write a small script to solve my problem. One of the greatest use case that comes to my mind is when I have some data exported from a database. Normally these will be in CSV, so, I will easily be able to manipulate this data using AWK. For example, think about an use-case where I want to create a simple one time report for business, instead of writing a full fledged program, an AWK script would suffice. When I am trying to analyze a program, and I want to extract some information from it for easier understanding, instead of going through it line by line, I can write a script to extract the specific information I am looking for.
Basics Before we Start
Let’s start with some basics. AWK is built to loop through files one record at a time. Before I start explanation, let me start by discussing the two files that I will use as examples.
% cat prices.txt PRODUCT, PRICE, MANUFACTURER Executive Chair, 300.00, Henredon Manager Office Chair, 170.00, La-Z-Boy Mesh Office Chair, 120.00, True Innovations Office Task Chair, 90.00, True Innovations 48 inches Swivel Desk, 370.00, Camden Adjustable Height Desk, 350.00, ApexDesk Panorama Desk, 790.00, Bestar Executive Desk, 3400.00, Harrington 3-piece Bookcase, 4000.00, Tuscan 3-piece Bookcase, 3000.00, Harrington % cat employees.txt NAME, DESIGNATION, FURNITURE, BRAND Leon Rollins, CEO, Executive Chair, Henredon Leon Rollins, CEO, Panorama Desk, Bestar Leon Rollins, CEO, 3-piece Bookcase, Tuscan Angelo Barnett, CFO, Executive Chair, Henredon Angelo Barnett, CFO, 48 inches Swivel Desk, Camden Angelo Barnett, CFO, 3-piece Bookcase, Harrington Dane Rubio, Senior Manager, Mesh Office Chair, True Innovations Dane Rubio, Senior Manager, Adjustable Height Desk, ApexDesk Kenji Dyer, Manager, Office Task Chair, True Innovations Kenji Dyer, Manager, Adjustable Height Desk, ApexDesk
Consider prices.txt above. When we run a script, AWK will loop through the heading first, then the first record and so on. All AWK scripts are between two braces {}. We can setup some variables before the program starts looping in BEGIN. Anything after the loop has to be in END. So, here is how it looks.
BEGIN { # Before the main loop starts } { # Main Loop } END { # After the main loop }
Next we will talk about some $ variables. When AWK reads the line, by default, entire line is kept in $0 variable. Individual fields will be maintained in $1, $2 and so on variables. Let’s take an example. We will show how these variables print in employees.txt.
# $ variables assigned: # NAME, DESIGNATION, FURNITURE, BRAND # $1. $2. $3. $4 % awk -F, '{printf("|%-15s|%-20s|%-20s|\n", $4, $3, $1)}' employees.txt | head -5 | BRAND | FURNITURE |NAME | | Henredon | Executive Chair |Leon Rollins | | Bestar | Panorama Desk |Leon Rollins | | Tuscan | 3-piece Bookcase |Leon Rollins | | Henredon | Executive Chair |Angelo Barnett |
There are a few things that may not be clear at this time. However, this example is purely for showing how the $ variables work.
AWK Built-in Variables
AWK provides some built-in variables for convenience.
We will just use the above two random files. One of them contains prices for some furnitures, the second one contains what furnitures have been provided to some employees.
Let’s start on the variables now. Specifically we will look at FS, OFS, RS, ORS, NR, NF, FNR and FILENAME variables.
NR (Number of Records)
We use NR to display the total number of records in file. This way we do not have to keep a counter as AWK already does that for us.
% awk 'END{print NR}' prices.txt 11
Here we are printing the total number of records present in the file.
NF (Number of Fields)
NF is used to get the count of fields.
% awk '{print "Field Count:", NF, "::", $0}' prices.txt Field Count: 3 :: PRODUCT, PRICE, MANUFACTURER Field Count: 4 :: Executive Chair, 300.00, Henredon Field Count: 5 :: Manager Office Chair, 170.00, La-Z-Boy Field Count: 6 :: Mesh Office Chair, 120.00, True Innovations
We know that each record has three fields. But as per AWK, we see the field counts are inconsistent. This is because by default, AWK treats spaces as field separator. However, in this case, field separator is commas.
FS (Input Field Separator)
Let’s see if we can fix the problem from above using the Field Separator variable.
% awk 'BEGIN{FS=","}{print "Field Count:", NF, "::", $0}' prices.txt Field Count: 3 :: PRODUCT, PRICE, MANUFACTURER Field Count: 3 :: Executive Chair, 300.00, Henredon Field Count: 3 :: Manager Office Chair, 170.00, La-Z-Boy Field Count: 3 :: Mesh Office Chair, 120.00, True Innovations Field Count: 3 :: Office Task Chair, 90.00, True Innovations
Better! Now each of them will be showing just 3 as the number of fields.
OFS (Output Field Separator)
For the next examples, we will not use the files. We will just echo a file for the examples. OFS is used for separating output fields. In the example below, we have asked AWK to use : as output field separator.
% echo "Bye Cruel World\nHello New World" | awk 'BEGIN{OFS=":"}{print $1, $2, $3}' Bye:Cruel:World Hello:New:World
RS (Record Separator)
Now let’s check Record Separator. By default, new line is record separator. Let’s switch to |.
% echo "Bye Cruel World | Hello New World" | awk 'BEGIN{RS="|"}{print $1, $2, $3} Bye Cruel World Hello New World
ORS (Output Record Separator)
Let’s see if we can do a ^ as output record separator.
% echo "Bye Cruel World\nHello New World" | awk 'BEGIN{ORS=" ^ "}{print $1, $2, $3}' Bye Cruel World ^ Hello New World
FNR (Number of Records in Current File)
FNR is a bit different. This variable is used when we are dealing with more than one file. Let’s assume that we are reading two files. FNR will always give the record number for the current file being read. On the other hand, NR keeps a running count.
FILENAME (Name of the data file)
FILENAME, as the name suggests, returns the name of data file.
Sample Calls for Experimenting
I will put some sample examples here. Most of the time I will not be dumping full output.
Loop through the file and dump
% awk '{print $0}' prices.txt | head -5 PRODUCT, PRICE, MANUFACTURER Executive Chair, 300.00, Henredon Manager Office Chair, 170.00, La-Z-Boy Mesh Office Chair, 120.00, True Innovations Office Task Chair, 90.00, True Innovations
Reformat the Prices and Print
% awk -F, 'NR>1{printf("%-20s%-20s %6.2f\n", $3, $1, $2)}' prices.txt | head -5 Henredon Executive Chair 300.00 La-Z-Boy Manager Office Chair 170.00 True Innovations Mesh Office Chair 120.00 True Innovations Office Task Chair 90.00 Camden 48 inches Swivel Desk 370.00
There are a few things of interest here.
- -F, signifies that field separator is ,. This is the other way of representing FS
- NR>1, this indicates skip Record 1 (start processing from record 2). This ensures that the header is not printed. We can print a static header in BEGIN block
Find all Bookcases
% awk 'BEGIN{FS=","}/Bookcase/{print $0}' prices.txt 3-piece Bookcase, 4000.00, Tuscan 3-piece Bookcase, 3000.00, Harrington
Here we are filtering only for bookcases.
Larger Program
Now let’s work on a larger program. In this one we will read all furniture prices from prices.txt, the start adding up cost per employee for each employee in employee.txt. We will write a program called empcost.awk.
Let’s write the program first and explain.
# empcost.awk function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s } function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s } function trim(s) { return rtrim(ltrim(s)); } BEGIN { FS="," printf("%-25s%-25s%8s\n", "EMPLOYEE NAME", "FURNITURE", "COST"); printf("----------------------------------------------------------\n"); } { if (FNR != 1) { if (FNR==NR) { a=trim($1) "-" trim($3); prices[a]=$2; next; } else { a=trim($3) "-" trim($4); printf("%-25s%-25s%8s\n", $1, $3, prices[a]); empcost[trim($1)] = empcost[trim($1)] + prices[a]; } } } END { printf("----------------------------------------------------------\n"); print("\n\nCost per Employee..."); printf("%-25s%8s\n", "EMPLOYEE NAME", "EXPENSE"); printf("---------------------------------\n"); for (key in empcost) { printf("%-25s%8.2f\n", key, empcost[key]); } printf("---------------------------------\n"); }
Here some interesting things are happening. Lines #2, #3, #4 have routines to trim the text. As improbable it may seem, AWK does not have a trim() function. We will always define our own.
Since we are working on two files, we have used FNR (file specific NR). FNR != 1 just means skip line 1 in both files. We are skipping it because in both cases, it just has a heading.
FNR==NR, this means we are reading the first file. FNR resets every file, however, NR is a consecutive value. So, FNR and NR will only be equal when we are working on the first file. In this case we are creating a map for prices for each of the furnitures.
When we are reading the second file, we start printing the furniture cost for each employee and then sum up the total for the employees. Finally totals are reported in END block.
% awk -f empcost.awk prices.txt employees.txt EMPLOYEE NAME FURNITURE COST ---------------------------------------------------------- Leon Rollins Executive Chair 300.00 Leon Rollins Panorama Desk 790.00 Leon Rollins 3-piece Bookcase 4000.00 Angelo Barnett Executive Chair 300.00 Angelo Barnett 48 inches Swivel Desk 370.00 Angelo Barnett 3-piece Bookcase 3000.00 Dane Rubio Mesh Office Chair 120.00 Dane Rubio Adjustable Height Desk 350.00 Kenji Dyer Office Task Chair 90.00 Kenji Dyer Adjustable Height Desk 350.00 ---------------------------------------------------------- Cost per Employee... EMPLOYEE NAME EXPENSE --------------------------------- Leon Rollins 5090.00 Dane Rubio 470.00 Kenji Dyer 440.00 Angelo Barnett 3670.00 ---------------------------------
Conclusion
I thought about writing on AWK as I keep on going back to it for every small one time requirements. Anything larger, I will resort to Python. I will not use Java for any of these as it has a more descriptive command list. Hope you can also get inspiration from this blog to start using AWK for small one timers. Ciao for now!