|
|
The following sections summarize the features of awk.
awk program filenames awk -f program-file filenames awk -Fs sets field separator to string s; -Ft sets separator to tab
BEGIN END /extended regular expression/ relational expression pattern && pattern pattern || pattern (pattern) !pattern pattern, pattern
if (expr) statement [else statement] if (subscript in array) statement [else statement] while (expr) statement for (expr; expr; expr) statement for (var in array) statement do statement while (expr) break continue next exit [expr] return [expr]
close(filename) | close file |
getline | set $0 from next input record; set NF, NR, FNR |
getline <file | set $0 from next record of file; set NF |
getline var | set var from next input record; set NR, FNR |
getline var <file | set var from next record of file |
print current record | |
print expr-list | print expressions |
print expr-list >file | print expressions on file |
printf fmt, expr-list | format and print |
printf fmt, expr-list >file | format and print on file |
system(cmd-line) | execute command cmd-line, return status |
In
print
and
printf
above,
>>file
appends to the
file,
and
|
command
writes on a pipe.
Similarly,
command
|
getline
pipes into
getline.
getline
returns 0 on end of file, and -1 on error.
func name(parameter list) { statement } function name(parameter list) { statement } function-name(expr, expr, . . .)
gsub(r,s,t) | substitute string s for each substring matching extended regular expression r in string t, return number of substitutions; if t omitted, use $0 |
index(s,t) | return index of string t in string s, or 0 if not present |
length(s) | return length of string s |
match(s,r) | return position in s where extended regular expression r occurs, or 0 if r is not present |
split(s,a,r) | split string s into array a on extended regular expression r, return number of fields; if r omitted, FS is used in its place |
sprintf(fmt, expr-list) | print expr-list according to fmt, return resulting string |
sub(r,s,t) | like gsub except only the first matching substring is replaced |
substr(s,i,n) | return n-char substring of s starting at i; if n omitted, use rest of s |
atan2(y,x) | arctangent of y/x in radians |
cos(expr) | cosine (angle in radians) |
exp(expr) | exponential |
int(expr) | truncate to integer |
log(expr) | natural logarithm |
rand() | random number between 0 and 1 |
sin(expr) | sine (angle in radians) |
sqrt(expr) | square root |
srand(expr) | new seed for random number generator; use time of day if no expr |
= += -= *= /= %= ^= | assignment |
?: | conditional expression |
|| | logical OR |
&& | logical AND |
~ !~ | extended regular expression match, negated match |
< <= > >= != == | relationals |
blank | string concatenation |
+ - | add, subtract |
* / % | multiply, divide, mod |
+ - ! | unary plus, unary minus, logical negation |
^ | exponentiation ( is a synonym) |
++ -- | increment, decrement (prefix and postfix) |
$ | field |
c | matches non-metacharacter c |
\c | matches literal character c |
. | matches any character but newline |
^ | matches beginning of line or string |
$ | matches end of line or string |
[abc...] | character class matches any of abc... |
[^abc...] | negated class matches any but abc... |
r1|r2 | matches either r1 or r2 |
r1r2 | concatenation: matches r1, then r2 |
r+ | matches one or more r's |
r | matches zero or more r's |
r? | matches zero or one r's |
r{low,high} | at least low rs but no more than high |
(r) | grouping: matches r |
ARGC | number of command-line arguments |
ARGV | array of command-line arguments (0..ARGC-1) |
FILENAME | name of current input file |
FNR | input record number in current file |
FS | input field separator (default blank) |
NF | number of fields in current input record |
NR | input record number since beginning |
OFMT | output format for numbers (default %.6g) |
OFS | output field separator (default blank) |
ORS | output record separator (default newline) |
RS | input record separator (default newline) |
RSTART | index of first character matched by match(); 0 if no match |
RLENGTH | length of string matched by match(); -1 if no match |
SUBSEP | separates multiple subscripts in array elements; default \034 |
Any particular implementation of
awk
enforces some limits.
Here are typical values:
100 fields
2500 characters per input record
2500 characters per output record
1024 characters per individual field
1024 characters per printf string
400 characters maximum quoted string
400 characters in character class
15 open files
1 pipe
numbers are limited to what can be represented on the local machine,
for example, 1e-38..1e+38
Each variable and field can potentially be a string or a number or both at any time. When a variable is set by the assignment
var = exprits type is set to that of the expression. (Assignment includes +=, -=, and so on.) An arithmetic expression is of type number, a concatenation is of type string, and so on. If the assignment is a simple copy, as in
v1 = v2then the type of v1 becomes that of v2.
In comparisons, if both operands are numeric,
the comparison is made numerically.
Otherwise, operands are coerced to string if necessary, and the
comparison is made on strings.
The type of any expression can be coerced to
numeric by subterfuges such as
expr + 0and to string by
expr ""(that is, concatenation with a null string).
Uninitialized variables have the numeric value 0 and the string value "". Accordingly, if x is uninitialized,
if (x) ...is false, and
if (!x) ... if (x == 0) ... if (x == "") ...are all true. But the following is false:
if (x == "0") ...
The type of a field is determined by context when possible; for example,
$1++clearly implies that $1 is to be numeric, and
$1 = $1 "," $2implies that $1 and $2 are both to be strings. Coercion is done as needed.
In contexts where types cannot be reliably determined, for example,
if ($1 == $2) ...the type of each field is determined on input. All fields are strings; also, each field that contains only a number is also considered numeric.
Fields that are explicitly null have the string value "" ; they are not numeric. Non-existent fields (that is, fields past NF) are treated this way, too.
As it is for fields, so it is for array elements created by split.
Mentioning a variable in an expression causes it to exist, with the value "" as described above. Thus, if arr[i] does not currently exist,
if (arr[i] == "") ...causes it to exist with the value "" so the if is satisfied. The special construction
if (i in arr) ...determines if arr[i] exists without the side effect of creating it if it does not.