awk command in Linux

Given problem
Solution with AWK command
How to seperate complex awk command into smaller program
Some control statements that are supported in awk command
Wrapping up

Given problem

When working with files, we usually use redirect I/O to read each line at one time. But in this case, we normally does not process, analysis these line, we only print their contents. It is really tedious and we have to read multiple line without extracting some useful information that we want.

So our question in this case is that how do we use text processing?

Solution with AWK command

Syntax of AWK

 awk 'pattern { action }'

The following an awk command is the sequence of rules. Each rule contains a pair of pattern and action. And they are seperated by newline or semicolons (;).

An action is always enclosed in braces {} and this braces will contains lots of statements. Each statement should be separated by new line or semicolons.

An action will be run when pattern is satisfied our conditions that are defined. By default, a pattern is the blank or null pattern, so it can match whole record.

In awk command, we have some types of pattern that we need to know:

regular expression
expression

We can use conditional expressions in this pattern.
pattern1, pattern2

A pair of patterns separated by a comma, of the form “begin-pattern, end-pattern”, specifying a range of records. The first pattern, begin-pattern, controls where the range begin, and end-pattern controls where it ends.

For example:
```
  # data
  start field1 field2
  run 1 2
  run 2 4
  run 5 8
  end 10 10

  # command
  awk '$1 == "start", $1 == "end" { print $2, $3 }'
```
BEGIN, END pattern

BEGIN pattern will be called before processing any lines, and END pattern happened after the last line is read.

BEGIN can have multiple actions, each action will be defined in the braces {}.
```
  awk 'BEGIN {print "starting"} \
          {print $0} \
      END {print "the end"}'
```
empty pattern

Belows are some statements that an action uses.

Expressions, such as assignment operator, arithmetic operators, increment, and decrement operators.

About assignment operator

  # assign single value
  # variable=value
  OFS="/"

  # assign a value of a arithmetic expression
  #variable=arithmetic_expression
  awk 'BEGIN {a=1+3+4; print a;}'

About arithmetic operators

Below is a table that is describe operators that awk supports in its actions.

Operator	Type	Meaning
+	Arithmetic	Addition
-	Arithmetic	Subtraction
*	Arithmetic	Multiplication
/	Arithmetic	Division
%	Arithmetic	Modulo
<space>	String	Concatention
+=	Arithmetic	Add result to variable
-=	Arithmetic	Subtract result to variable
*=	Arithmetic	Multiply variable by result
/=	Arithmetic	Divide variable by result
%=	Arithmetic	Apply modulo to variable

We can easily find that all operators that look like in C/C++.

For example:

  # result: 163
  awk 'BEGIN {a=1+3*5 6; print a}'

  # print lines with its length is greater than 5
  awk 'length($0) > 5' <input-file>

About increment, decrement operators

As same as operators in C/C++, awk also supports pre, post increment operator ++ that is used to increase its value by 1, and pre, post decrement operator – that is used to decrease its value by 1.
```
  awk 'BEGIN {a=3; b=++a; print a, b;}'

  awk 'BEGIN {a=3; b=a--; print a, b;}'
```

About conditional expression

Operator	Meaning
==	is equal to
!=	is not equal to
>	is greater than
>=	is greater than or equal to
<	is less than
<=	is less than or equal to
&&	and operator for two conditional expression
\|\|	or operator for two conditional expression
!	not operator for a conditional expression

These operators are used to compare numbers or strings. And with comparing strings, the lower case letters are always greater than upper case letters.

For example:

  awk 'BEGIN { if ("a" == "a") print "same";}'

About regular expression

Operator	Meaning
~	matches
!~	does not match

These operators are used to compare strings. To define a regular expression, we need to embed it in the slashes before these operators.

To know more about the symbols of regular expression, we can refer to the link.

  # syntax
  word ~ /match/

  word !~ /match/

  # use with awk command
  awk -e 'word ~ /match/ { action }'

To use find-and-replace pattern in awk, we can use sub, gsub, and gensub command.

sub command

It will substitutes the first matched entity in a record with a replacement string.
```
  awk -e 'sub(/match/, replace-string)'
```
gsub command

It will substitutes all matched entity in a record with a replacement string.
```
  awk -e 'gsub(/match/, replace-string)'
```
gensub command

This command will use & operator to use the capturing group, and provides a parameter to specify how many word to replace by our string.
```
  awk -e '{ print gensub(/match/, "text-string &", num-replacement) }'
```

Control statements, used to control the flow of the program (if, for, while, switch, and more)

For example:

  # check the third field of each record that is satisfied an condition
  awk '{ if ($3 == "content") print $0; }' <input-file>

  awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is", i*i; }'

Output statements, such as print and printf.
Compound statements, to group other statements.
Input statements, to control the processing of the input.
Deletion statements, to remove array elements.

Accessing fields and properties of a line

Field Identifiers	Description
$0	the entire line of the text
$n	the nth field in a line
$NF	acronym as Number of Fields, it is the last field
$(NF-1)	the field that before the last field
$NR	acronym as Number or Records, it means that the position of a line that is processed
FILENAME	The input file’s name
FS	Field Separator
RS	Record Separator
OFS	Output Field Separator
ORS	Output Record Separator

Format input of awk command

By default, the input separator character in awk command is whitespace, including tabs, space, or newline characters. To specify the different separator characters between fields in a line, we can use FS variable.

There are two ways to use FS variable correctly.
- Using the BEGIN pattern.
```
  #!/bin/bash
  awk 'BEGIN { FS = "," } ; { print $1 }'
```
- Using -F option
```
  # define with awk program in a file
  awk -F, -f <awk-file> <input-file>

  awk -F: '{print $1, $6}' <input-file>
```
  We shouldn’t have mistake about -F option and -f option because they have the different meanings. The -f option specifies a file that containing an awk program. Use -f option when we want to seperate our complex script to multiple smaller things.
Format output of awk command

The output of awk command is created by using print statement. By default, a whitespace is also the separator character that print statement uses.
```
 awk 'BEGIN { OFS = ";"; ORS = "\n\n" } { print $1, $2 }' <input-file>
```

How to seperate complex awk command into smaller program

Below is the script of awk command with the file name: sample.awk

#!/usr/bin/awk -f

BEGIN {
    # define the initialization of variables
}
{
    # list all our actions
}
END {
    # action for the end of execution
}

To run the above file, we can type:

awk -f sample.awk

Some control statements that are supported in awk command

if/else statement

single if statement

  # with one action        
  if (condition)
      action

  # with lots of action
  if (condition) {
      action;
      action;
  }

if/else statement

  # use specify if/else statement
  if (condition)
      action
  else
      action

  # use tenary operator
  condition / action : action;

if/else if/else statement

  if (condition)
      action;
  else if (condition)
      action;
  else
      action;        

for statement
```
 for (i = 1; i < 6; ++i)
     action
```
while statement
```
 while (condition)
     action
```
do..while statement
```
 do
     action
 while (condition)
```
break and continue statements
exit statement

Wrapping up

Understanding about how to use awk command, take note about patterns, and actions.
To define an array type in awk command, we can follow a link.

Thanks for your reading.

Refer:

https://www3.physnet.uni-hamburg.de/physnet/Tru64-Unix/HTML/APS32DTE/WKXXXXXX.HTM

https://linuxize.com/post/awk-command/

https://www.geeksforgeeks.org/awk-command-unixlinux-examples/?ref=lbp

https://www.thegeekstuff.com/2010/01/8-powerful-awk-built-in-variables-fs-ofs-rs-ors-nr-nf-filename-fnr

https://www.howtogeek.com/562941/how-to-use-the-awk-command-on-linux/

https://www.grymoire.com/Unix/Awk.html

https://opensource.com/article/19/11/how-regular-expressions-awk

https://likegeeks.com/awk-command/

Table of contents

Given problem

Solution with AWK command

How to seperate complex awk command into smaller program

Some control statements that are supported in awk command

Wrapping up