DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
Programming with awk

Generating reports

awk is especially useful for producing reports that summarize and format information. Suppose you want to produce a report from the file countries in which the continents are listed alphabetically, and the countries on each continent are listed after in decreasing order of population:

   Africa:
   	Sudan          19
   	Algeria        18
   

Asia: China 866 India 637 USSR 262

Australia: Australia 14

North America: USA 219 Canada 24

South America: Brazil 116 Argentina 26

As with many data processing tasks, it is much easier to produce this report in several stages. First, create a list of continent-country-population triples, in which each field is separated by a colon. This can be done with the following program triples, which uses an array pop indexed by subscripts of the form continent:country to store the population of a given country. The print statement in the END section of the program creates the list of continent-country-population triples that are piped to the sort routine.

   BEGIN { FS = "\t" }
         { pop[$4 ":" $1] += $3 }
   END   { for (cc in pop)
           print cc ":" pop[cc] | "sort -t: +0 -1 +2nr" }
The arguments for sort deserve special mention. The -t: argument tells sort to use : as its field separator. The +0 -1 arguments make the first field the primary sort key. In general, +i -j makes fields i+1, i+2, . . ., j the sort key. If -j is omitted, the fields from i+1 to the end of the record are used. The +2nr argument makes the third field, numerically decreasing, the secondary sort key (n is for numeric, r for reverse order). Invoked on the file countries, this program produces as output
   Africa:Sudan:19
   Africa:Algeria:18
   Asia:China:866
   Asia:India:637
   Asia:USSR:262
   Australia:Australia:14
   North America:USA:219
   North America:Canada:24
   South America:Brazil:116
   South America:Argentina:26
This output is in the right order but the wrong format. To transform the output into the desired form, run it through a second awk program format:
   BEGIN  { FS = ":" }
   {      if ($1 != prev) {
               print "\n" $1 ":"
               prev = $1
          }
          printf "\t%-10s %6d\n", $2, $3
   }
This is a control-break program that prints only the first occurrence of a continent name and formats the country-population lines associated with that continent in the desired manner. The command line
   $ awk -f triples countries | awk -f format<<Return>>
gives the desired report. As this example suggests, complex data transformation and formatting tasks can often be reduced to a few simple awk commands and sorts.
Next topic: Word frequencies
Previous topic: Example applications

© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005