|
|
Normally, awk reads its input one line, or record, at a time; a record is, by default, a sequence of characters ending with a newline. Then awk splits each record into fields, where, by default, a field is a string of non-blank, non-tab characters.
As input for many of the awk programs in this topic's sections, we use a file called countries, which contains information about the ten largest countries in the world. (See ``The sample input file countries''.)
Each record contains the name of a country, its area in thousands of square miles, its population in millions, and the continent on which it is located. (Data are from 1978; the U.S.S.R. has been arbitrarily placed in Asia.) The white space between fields is a tab in the original input; a single blank separates North and South from America .
The sample input file countries
USSR | 8650 | 262 | Asia |
Canada | 3852 | 24 | North America |
China | 3692 | 866 | Asia |
USA | 3615 | 219 | North America |
Brazil | 3286 | 116 | South America |
Australia | 2968 | 14 | Australia |
India | 1269 | 637 | Asia |
Argentina | 1072 | 26 | South America |
Sudan | 968 | 19 | Africa |
Algeria | 920 | 18 | Africa |
This file is typical of the kind of data awk is good at processing--a mixture of words and numbers separated into fields by blanks and tabs.
The number of fields in a record is determined by the field separator. Fields are normally separated by sequences of blanks and/or tabs, so that the first record of countries would have four fields, the second five, and so on. It is possible to set the field separator to just tab, so each line would have four fields, matching the meaning of the data; we will show how to do this shortly. For the time being, we will use the default: fields separated by blanks and/or tabs. The first field within a line is called $1, the second $2, and so forth. The entire record is called $0.