Programming with awk

Fields

Normally, awk reads its input one line, or record, at a time; a record is, by default, a sequence of characters ending with a newline. Then awk splits each record into fields, where, by default, a field is a string of non-blank, non-tab characters.

As input for many of the awk programs in this topic's sections, we use a file called countries, which contains information about the ten largest countries in the world. (See ``The sample input file countries''.)

Each record contains the name of a country, its area in thousands of square miles, its population in millions, and the continent on which it is located. (Data are from 1978; the U.S.S.R. has been arbitrarily placed in Asia.) The white space between fields is a tab in the original input; a single blank separates North and South from America .

The sample input file countries

USSR 8650 262 Asia

Canada 3852 24 North America

China 3692 866 Asia

USA 3615 219 North America

Brazil 3286 116 South America

Australia 2968 14 Australia

India 1269 637 Asia

Argentina 1072 26 South America

Sudan 968 19 Africa

Algeria 920 18 Africa

This file is typical of the kind of data awk is good at processing--a mixture of words and numbers separated into fields by blanks and tabs.

The number of fields in a record is determined by the field separator. Fields are normally separated by sequences of blanks and/or tabs, so that the first record of countries would have four fields, the second five, and so on. It is possible to set the field separator to just tab, so each line would have four fields, matching the meaning of the data; we will show how to do this shortly. For the time being, we will use the default: fields separated by blanks and/or tabs. The first field within a line is called $1, the second $2, and so forth. The entire record is called $0.

USSR	8650	262	Asia
Canada	3852	24	North America
China	3692	866	Asia
USA	3615	219	North America
Brazil	3286	116	South America
Australia	2968	14	Australia
India	1269	637	Asia
Argentina	1072	26	South America
Sudan	968	19	Africa
Algeria	920	18	Africa