| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
awk and gawk
This chapter covers how to run awk, both POSIX-standard
and gawk-specific command-line options, and what
awk and
gawk do with non-option arguments.
It then proceeds to cover how gawk searches for source files,
obsolete options and/or features, and known bugs in gawk.
This chapter rounds out the discussion of awk
as a program and as a language.
While a number of the options and features described here were discussed in passing earlier in the book, this chapter provides the full details.
12.1 Invoking awkHow to run awk.12.2 Command-Line Options Command-line options and their meanings. 12.3 Other Command-Line Arguments Input file names and variable assignments. 12.4 The AWKPATHEnvironment VariableSearching directories for awkprograms.12.5 Obsolete Options and/or Features Obsolete Options and/or features. 12.6 Undocumented Options and Features 12.7 Known Bugs in gawk
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
awk
There are two ways to run awk---with an explicit program or with
one or more program files. Here are templates for both of them; items
enclosed in [...] in these templates are optional:
awk [options] -f progfile [ |
Besides traditional one-letter POSIX-style options, gawk also
supports GNU long options.
It is possible to invoke awk with an empty program:
awk '' datafile1 datafile2 |
Doing so makes little sense though; awk exits
silently when given an empty program.
(d.c.)
If `--lint' has
been specified on the command-line, gawk issues a
warning that the program is empty.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Options begin with a dash and consist of a single character. GNU-style long options consist of two dashes and a keyword. The keyword can be abbreviated, as long as the abbreviation allows the option to be uniquely identified. If the option takes an argument, then the keyword is either immediately followed by an equals sign (`=') and the argument's value, or the keyword and the argument's value are separated by whitespace. If a particular option with a value is given more than once, it is the last value that counts.
Each long option for gawk has a corresponding
POSIX-style option.
The long and short options are
interchangeable in all contexts.
The options and their meanings are as follows:
-F fs
--field-separator fs
FS variable to fs
(see section Specifying How Fields Are Separated).
-f source-file
--file source-file
awk program is to be found in source-file
instead of in the first non-option argument.
-v var=val
--assign var=val
BEGIN rule
(see section Other Command-Line Arguments).
The `-v' option can only set one variable, but it can be used more than once, setting another variable each time, like this: `awk -v foo=1 -v bar=2 ...'.
Caution: Using `-v' to set the values of the built-in
variables may lead to surprising results. awk will reset the
values of those variables as it needs to, possibly ignoring any
predefined value you may have given.
-mf N
-mr N
awk. They are provided
for compatibility but otherwise ignored by
gawk, since gawk has no predefined limits.
(The Bell Laboratories awk no longer needs these options;
it continues to accept them to avoid breaking old programs.)
-W gawk-opt
gawk-specific options is provided next.
--
This is useful if you have file names that start with `-', or in shell scripts, if you have file names that will be specified by the user that could start with `-'.
The previous list described options mandated by the POSIX standard,
as well as options available in the Bell Laboratories version of awk.
The following list describes gawk-specific options:
-W compat
-W traditional
--compat
--traditional
awk language are disabled, so that gawk behaves just
like the Bell Laboratories research version of Unix awk.
`--traditional' is the preferred form of this option.
See section Extensions in gawk Not in POSIX awk,
which summarizes the extensions. Also see
Downward Compatibility and Debugging.
-W copyright
--copyright
-W copyleft
--copyleft
gawk.
-W dump-variables[=file]
--dump-variables[=file]
gawk prints this
list to a file named `awkvars.out' in the current directory.
Having a list of all the global variables is a good way to look for
typographical errors in your programs.
You would also use this option if you have a large program with a lot of
functions, and you want to be sure that your functions don't
inadvertently use global variables that you meant to be local.
(This is a particularly easy mistake to make with simple variable
names like i, j, and so on.)
-W gen-po
--gen-po
gettext Portable Object file on standard
output for all string constants that have been marked for translation.
See section Internationalization with gawk,
for information about this option.
-W help
-W usage
--help
--usage
gawk accepts and then exit.
-W lint[=fatal]
--lint[=fatal]
awk implementations.
Some warnings are issued when gawk first reads your program. Others
are issued at runtime, as your program executes.
With an optional argument of `fatal',
lint warnings become fatal errors.
This may be drastic but its use will certainly encourage the
development of cleaner awk programs.
-W lint-old
--lint-old
awk from Version 7 Unix
(see section Major Changes Between V7 and SVR3.1).
-W non-decimal-data
--non-decimal-data
Caution: This option can severely break old programs. Use with care.
-W posix
--posix
gawk
extensions (just like `--traditional') and adds the following additional
restrictions:
\x escape sequences are not recognized
(see section 3.2 Escape Sequences).
FS is
equal to a single space
(see section Examining Fields).
func for the keyword function is not
recognized (see section Function Definition Syntax).
FS to be a single tab character
(see section Specifying How Fields Are Separated).
fflush built-in function is not supported
(see section Input/Output Functions).
If you supply both `--traditional' and `--posix' on the
command-line, `--posix' takes precedence. gawk
also issues a warning if both options are supplied.
-W profile[=file]
--profile[=file]
awk programs
(see section Profiling Your awk Programs).
By default, profiles are created in a file named `awkprof.out'.
The optional file argument allows you to specify a different
file name for the profile file.
When run with gawk, the profile is just a "pretty printed" version
of the program. When run with pgawk, the profile contains execution
counts for each statement in the program in the left margin, and function
call counts for each function.
-W re-interval
--re-interval
awk,
gawk does not provide them by default. This prevents old awk
programs from breaking.
-W source program-text
--source program-text
AWKPATH Environment Variable).
-W version
--version
gawk.
This allows you to determine if your copy of gawk is up to date
with respect to whatever the Free Software Foundation is currently
distributing.
It is also useful for bug reports
(see section Reporting Problems and Bugs).
As long as program text has been supplied, any other options are flagged as invalid with a warning message but are otherwise ignored.
In compatibility mode, as a special case, if the value of fs supplied
to the `-F' option is `t', then FS is set to the tab
character ("\t"). This is only true for `--traditional' and not
for `--posix'
(see section Specifying How Fields Are Separated).
The `-f' option may be used more than once on the command-line.
If it is, awk reads its program source from all of the named files, as
if they had been concatenated together into one big file. This is
useful for creating libraries of awk functions. These functions
can be written once and then retrieved from a standard place, instead
of having to be included into each individual program.
(As mentioned in
Function Definition Syntax,
function names must be unique.)
Library functions can still be used, even if the program is entered at the terminal, by specifying `-f /dev/tty'. After typing your program, type Ctrl-d (the end-of-file character) to terminate it. (You may also use `-f -' to read program source from the standard input but then you will not be able to also use the standard input as a source of data.)
Because it is clumsy using the standard awk mechanisms to mix source
file and command-line awk programs, gawk provides the
`--source' option. This does not require you to pre-empt the standard
input for your source code; it allows you to easily mix command-line
and library source code
(see section The AWKPATH Environment Variable).
If no `-f' or `--source' option is specified, then gawk
uses the first non-option command-line argument as the text of the
program source code.
If the environment variable POSIXLY_CORRECT exists,
then gawk behaves in strict POSIX mode, exactly as if
you had supplied the `--posix' command-line option.
Many GNU programs look for this environment variable to turn on
strict POSIX mode. If `--lint' is supplied on the command-line
and gawk turns on POSIX mode because of POSIXLY_CORRECT,
then it issues a warning message indicating that POSIX
mode is in effect.
You would typically set this variable in your shell's startup file.
For a Bourne-compatible shell (such as bash), you would add these
lines to the `.profile' file in your home directory:
POSIXLY_CORRECT=true export POSIXLY_CORRECT |
For a csh compatible
shell,(48)
you would add this line to the `.login' file in your home directory:
setenv POSIXLY_CORRECT true |
Having POSIXLY_CORRECT set is not recommended for daily use,
but it is good for testing the portability of your programs to other
environments.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Any additional arguments on the command-line are normally treated as
input files to be processed in the order specified. However, an
argument that has the form var=value, assigns
the value value to the variable var---it does not specify a
file at all.
(This was discussed earlier in
Assigning Variables on the Command Line.)
All these arguments are made available to your awk program in the
ARGV array (see section 7.5 Built-in Variables). Command-line options
and the program text (if present) are omitted from ARGV.
All other arguments, including variable assignments, are
included. As each element of ARGV is processed, gawk
sets the variable ARGIND to the index in ARGV of the
current element.
The distinction between file name arguments and variable-assignment
arguments is made when awk is about to open the next input file.
At that point in execution, it checks the file name to see whether
it is really a variable assignment; if so, awk sets the variable
instead of reading a file.
Therefore, the variables actually receive the given values after all
previously specified files have been read. In particular, the values of
variables assigned in this fashion are not available inside a
BEGIN rule
(see section The BEGIN and END Special Patterns),
because such rules are run before awk begins scanning the argument list.
The variable values given on the command-line are processed for escape sequences (see section 3.2 Escape Sequences). (d.c.)
In some earlier implementations of awk, when a variable assignment
occurred before any file names, the assignment would happen before
the BEGIN rule was executed. awk's behavior was thus
inconsistent; some command-line assignments were available inside the
BEGIN rule, while others were not. Unfortunately,
some applications came to depend
upon this "feature." When awk was changed to be more consistent,
the `-v' option was added to accommodate applications that depended
upon the old behavior.
The variable assignment feature is most useful for assigning to variables
such as RS, OFS, and ORS, which control input and
output formats before scanning the data files. It is also useful for
controlling state if multiple passes are needed over a data file. For
example:
awk 'pass == 1 { pass 1 stuff }
pass == 2 { pass 2 stuff }' pass=1 mydata pass=2 mydata
|
Given the variable assignment feature, the `-F' option for setting
the value of FS is not
strictly necessary. It remains for historical compatibility.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
AWKPATH Environment Variable awk program files can be named
on the command-line with the `-f' option.
In most awk
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
But in gawk, if the file name supplied to the `-f' option
does not contain a `/', then gawk searches a list of
directories (called the search path), one by one, looking for a
file with the specified name.
The search path is a string consisting of directory names
separated by colons. gawk gets its search path from the
AWKPATH environment variable. If that variable does not exist,
gawk uses a default path, which is
`.:/usr/local/share/awk'.(49) (Programs written for use by
system administrators should use an AWKPATH variable that
does not include the current directory, `.'.)
The search path feature is particularly useful for building libraries
of useful awk functions. The library files can be placed in a
standard directory in the default path and then specified on
the command-line with a short file name. Otherwise, the full file name
would have to be typed for each file.
By using both the `--source' and `-f' options, your command-line
awk programs can use facilities in awk library files.
See section A Library of awk Functions.
Path searching is not done if gawk is in compatibility mode.
This is true for both `--traditional' and `--posix'.
See section Command-Line Options.
Note: If you want files in the current directory to be found, you must include the current directory in the path, either by including `.' explicitly in the path or by writing a null entry in the path. (A null entry is indicated by starting or ending the path with a colon or by placing two colons next to each other (`::').) If the current directory is not included in the path, then files cannot be found in the current directory. This path search mechanism is identical to the shell's.
Starting with version 3.0, if AWKPATH is not defined in the
environment, gawk places its default search path into
ENVIRON["AWKPATH"]. This makes it easy to determine
the actual search path that gawk will use
from within an awk program.
While you can change ENVIRON["AWKPATH"] within your awk
program, this has no effect on the running program's behavior. This makes
sense: the AWKPATH environment variable is used to find the program
source files. Once your program is running, all the files have been
found, and gawk no longer needs to use AWKPATH.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section describes features and/or command-line options from
previous releases of gawk that are either not available in the
current version or that are still supported but deprecated (meaning that
they will not be in the next release).
For version 3.1 of gawk, there are no
deprecated command-line options
from the previous version of gawk.
The use of `next file' (two words) for nextfile was deprecated
in gawk 3.0 but still worked. Starting with version 3.1, the
two word usage is no longer accepted.
The process-related special files described in
Special Files for Process-Related Information,
work as described, but
are now considered deprecated.
gawk prints a warning message every time they are used.
(Use PROCINFO instead; see
Built-in Variables That Convey Information.)
They will be removed from the next release of gawk.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Use the Source, Luke!
Obi-Wan
This section intentionally left blank.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gawk
FS
(see section Command-Line Options)
is not necessary given the command-line variable
assignment feature; it remains only for backwards compatibility.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |