|
|
awk provides one-dimensional arrays. Arrays and array elements need not be declared; like variables, they spring into existence by being mentioned. An array subscript may be a number or a string.
As an example of a conventional numeric subscript, the statement
x[NR] = $0assigns the current input line to the NRth element of the array x . In fact, it is
{ x[NR] = $0 } END { . . . processing . . . }The first action merely records each input line in the array x, indexed by line number; processing is done in the END statement.
Array elements may also be named by nonnumeric values. For example, the following program accumulates the total population of Asia and Africa into the associative array pop. The END action prints the total population of these two continents.
/Asia/ { pop["Asia"] += $3 } /Africa/ { pop["Africa"] += $3 } END { print "Asian population in millions is", pop["Asia"] print "African population in millions is", pop["Africa"] }On the file countries, this program generates
In this program if you had used pop[Asia] instead of pop["Asia"] the expression would have used the value of the variable Asia as the subscript, and since the variable is uninitialized, the values would have been accumulated in pop[""] .
Suppose your task is to determine the total area in each continent of the file countries. Any expression can be used as a subscript in an array reference. Thus
area[$4] += $2uses the string in the fourth field of the current input record to index the array area and, in that entry, accumulates the value of the second field:
BEGIN { FS = "\t" } { area[$4] += $2 } END { for (name in area) print name, area[name] }Invoked on the file countries, this program produces
Africa 1888 South America 4358 North America 7467 Australia 2968 Asia 13611
This program uses a form of the for statement that iterates over all defined subscripts of an array:
for (i in array) statementexecutes statement with the variable i set in turn to each value of i for which array[i] has been defined. The loop is executed once for each defined subscript, which is chosen in a random order. Results are unpredictable when i or array is altered during the loop.
awk does not provide multi-dimensional arrays, but it does permit a list of subscripts. They are combined into a single subscript with the values separated by an unlikely string (stored in the variable SUBSEP). For example,
for (i = 1; i <= 10; i++) for (j = 1; j <= 10; j++) arr[i,j] = ...creates an array which behaves like a two-dimensional array; the subscript is the concatenation of i, SUBSEP, and j.
You can determine whether a particular subscript i occurs in an array arr by testing the condition i in arr, as in
if ("Africa" in area) ...This condition performs the test without the side effect of creating area["Africa"], which would happen if you used
if (area["Africa"] != "") ...Note that neither is a test of whether the array area contains an element with the value "Africa" .
It is also possible to split any string into fields in the elements of an array using the built-in function split. The function
split("s1:s2:s3", a, ":")splits the string s1:s2:s3 into three fields, using the separator :, and stores s1 in a[1], s2 in a[2], and s3 in a[3]. The number of fields found, here three, is returned as the value of split. The third argument of split is a extended regular expression to be used as the field separator. If the third argument is missing, FS is used as the field separator.
An array element may be deleted with the delete statement:
delete arrayname[subscript]