You are not logged in or registered. Please login or register to use the full functionality of this Website SybaseTeam.Com...
Hello There, Guest! (LoginRegister) Remember Me? Current time: 09-02-2014, 03:45 PM
   Home  |  About Us  |  Sybase Training  |  Synergy  |  Consulting  |  Job Openings  |  Tech Videos  |  Rules and Disclaimer  |  Search
Post Reply 
Forum Tools
AWK Introduction -Learn the Basics
06-15-2008, 06:31 AM
Post: #1
Quote this message in a reply
AWK Introduction -Learn the Basics


Awk Introduction

Explained by examples rather than by definitons

Syntax for one line awk commands

awk: awk -Fs '/search/ {action}' awkvar=$shellvar infile
nawk: awk -Fs -v awkvar=$shellvar '/search/ {action}' infile
gawk: awk -Fs -v awkvar=$shellvar '/search/ {action}' infile

Concept

Awk scannes ascii files or standard input. It can search strings easily and then has a lot of possibilities to process the found lines and output them in the new format. It does not change the input file but sends it's results onto standard output.
awk/nawk/gawk

Awk is the orignal awk. Nawk is new_awk and gawk the gnu_awk. The gnu_awk can do most, but is not available everywhere. So best is to use only things which nawk can do, because if that is not installed, then the system is not well anyway.
Search and Action

Searching happens within "//" and actions within "{}". The main action is to print.
Reprint all: awk '{print}' infile
Print lines that contain "love": awk '/love/ { print }' infile
Print first entry in lines that contain "money": awk '/money/ { print $1 }' infile

Variables

Awk does not distinguish between strings and numbers. So one may just put anything into a variable with varname = othervar or varname = "string". To get it out of the var just write it's name as a function argument or on the right side of any operator.
Multiline awk in a shell script

All between '' is in awk. With a=$var awk get's a shell variable.
The action is to print varable a and put it into a file.

awk '
BEGIN { print a > "testfile" }
' a=$var

BEGIN { }, { } and end { }

An awk script can have three types of blocks. One of them must be there. The BEGIN{} block is processed before the file is checked. The {} block runs for every line of input and the END{} block is processed after the final line of the input file.

awk '
BEGIN { myvalue = 1700 }
/debt/ { myvalue -= $4 }
/want/ { myvalue += $4 }
END { print myvalue }
' infile

Match in a particular field

Awk autosplits a line on whitespace as default. The fields are stored in $1 through $NF and the whole line is in $0. One can match or not match an individual field.

awk '
$1 ~ /fred/ && $4 !~ /ok/ {
print "Fred has not yet paid $3"
}
' infile

For, If, substr()

Awk can do for() loops like in c++ and has the normal if and while structures. In NR is current line number and in NF the number of fields on the current line.

awk '
BEGIN { count = 0 }
/myline/ {
for(i=1;i<=NF;i++){
if(substr($i,3,2) == "ae"){
bla = "Found it on line: "
print bla NR " in field: " i
}
}
}
END { print "Found " count " instances of it" }
' infile

Turn around each word in a file:

awk '
{ for(i=1;i<=NF;i++){
len = length($i)
for(j=len;j>0;j--){
char = substr($i,j,1)
tmp = tmp char
}
$i = tmp
tmp = ""
}
print
}
' infile

Awk scripts within a shell script

Extract email addresses from incoming mail. The mail would be guided to the following script from within the ~/.forward file. This is not an eficient method, but only an example to show serial processing of text. The next example will do the same thing within awk only and will be efficient. The mail comes in over standardinput into the script.
Between the commands there must be a pipe "|". For continuing on the next line one needs a "\" behind the pipe to escape the invisible newline.

#!/usr/bin/ksh
{ while read line;do
print - "$line"
done } |\
tee -a /path/mymailfile |\
awk '
/^From/ || /^Replay/ {
for(i=1;i<=NF;i++){
if($i ~ /@/){
print $i
}
}
}
' |\
sed '
s/[<>]//g;
s/[()]//g;
s/"//g;
...more substitutions for really extracting the email only...
' |\
{ while read addr;do

if [[ $(grep -c $addr /path/antimailfile) -gt 0 ]];then

mail $addr < Please dont't send me mail any more!
EOF
else
mail $addr < Thanks for mailing me. I'll answer as soon as possible!:-))
EOF
fi

done }

All the above in an awk script

nawk -f, while, break, >>, gsub(), getline, system()


With #!/usr/bin/nawk -f the whole script is interpreted intirely as an awk script and no more shell escapes are needed, but one can and has to do everything in awk itself. It's nawk because of the getline function.
While iterates until the expression becomes wrong or until a break is encountered.
Gsub() is for string substitution.
Getline reads in a line each time it es called.
System() executes a unix command.
">>" appends to a file.

This script es an example only. For really extracting email addresses several special cases would have to be considered...

#!/usr/bin/nawk -f

# Lines from a mail are dropping in over stdin. Append every line to a
# file before checking anything.

{ print >> "/path/mymailfile" }

# Find lines with with From: or Replay: at beginning.

/^From:/ || /^Replay/ {

# Find fields with @. Iterate over the fields and check for @

for(i=1;i<=nf;i++){

if($i ~ /@/){

# Clean the email addresses with gsub()

gsub(/[<>()"]/,"",$i)

# Check whether the email address is in the antimailfile

while( getline antiaddr < "/path/antimailfile" ){

# Compare actual address in $i with loaded address

if($i == antiaddr){

# Send a negative mail

system("mail " $i " < /path/badmail")

# Now end the while loop

break

}else{

# Send a positive mail

system("mail " $i " < /path/goodmail")
}
}
}
}
}

Calculate on columns and print formated output


If one has a formated input of number columns one can still split them on white space, but has to consider the format for output with printf()

#!/usr/bin/nawk -f

# Reprintet lines without foo or boo

! /(foo|boo)/ { print }

# Rearange and calculate with columns but only on lines with foo or boo

/(foo|boo)/ {

# Extract fields

mytype = $1
var1 = $2
var2 = $3
var3 = $4

# Calculate

if(mytype == "foo"){

var1 *= 10
var2 += 20
var3 = log(var3)
}
if(mytype == "boo"){

var1 *= 4
var2 += 10
var3 = cos(var3)
}

# Print formated output in reverse order

printf("%-4s%10.3f%10.3f%10.3f\n",mytype,var3,var2,var1)
}

How to iterate over each word of a shell variable in awk

In this example there is first a shell variable filled in and then it is given to awk. Awk splits it into an array and then iterates over the array and looks for each word on the current line of a file. If it finds it, it prints the whole line.

#!/usr/bin/ksh
var="term1 term2 term3 term4 term5"
awk '
BEGIN { split(myvar,myarr) }
{
for(val in myarr){
if($0 ~ myarr[val]){
print
}
}
}
' myvar="$var" file

Functions
This example substitutes the first three occurences of "searchterm" with a different term in each case and from the fourth case it just prints the line as it is.

It should show where to place a function and how to call it.


#!/usr/bin/nawk -f
BEGIN{
mysub1 = "first_sub"
mysub2 = "second_sub"
mysub3 = "third_sub"
mycount = 1
find = "searchterm"
}
{
if($0 ~ find){
if(mycount == 1){ replace(mysub1); }
if(mycount == 2){ replace(mysub2); }
if(mycount == 3){ replace(mysub3); }
if(mycount > 3){ print; }
mycount++
}else{
print
}
}
function replace(mysub) {

sub(find,mysub)
print
break
}

CGI with gawk

As an example for a CGI script in awk I make one which presents the unix man pages in html.

man.cgi

String functions
sub(regexp,sub) Substitute sub for regexp in $0
sub(regexp,sub,var) Substitute sub for regexp in var
gsub(regexp,sub) Globally substitute sub for regexp in $0
gsub(regexp,sub,var) Globally substitute sub for regexp in var
split(var,arr) Split var on white space into arr
split(var,arr,sep) Split var on white space into arr on sep as separator
index(bigvar,smallvar) Find index of smallvar in bigvar
match(bigvar,expr) Find index for regexp in bigvar
length(var) Number of characters in var
substr(var,num) Extract chars from posistion num to end
substr(var,num1,num2) Extract chars from num1 through num2
sprintf(format,vars) Format vars to a string


When to use awk, when to use perl?

Perl can do 100 times more than awk can, but awk is present on any standard unix system, where perl first has to be installed. And for short commands awk seems to be more practical. The autosplit mode of perl splits into pieces called: $F[0] through $F[$#F] which is not so nice as $1 through $NF where awk retains the whole line in $0 at the same time.
To get the first column of any file in awk and in perl:

awk '{print $1}' infile
perl -nae 'print $F[0],"\n";' infile

I Think this weill be very useful for newbies.

Thanks,
Narendra


Find all posts by this user
Post Reply 


Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  UNIX tips: Learn 10 good UNIX usage habits or best practices in unix usage zaadmin 0 1,962 03-18-2009 06:47 PM
Last Post: zaadmin
  AWK Programming-Basics Nagendra 0 2,491 06-24-2008 11:30 AM
Last Post: Nagendra

Options:
Forum Jump:


Contact Us | SybaseTeam | Disclaimer & Rules | Return to Top | Return to Content | Lite (Archive) Mode | RSS Syndication