|
|
yacc then turns the specification into a C language function that examines the input stream. This function, called a parser, works by calling the low-level scanner. The scanner, called a lexical analyzer, picks up items from the input stream. The selected items are known as tokens. Tokens are compared to the input construct rules, called grammar rules. When one of the rules is recognized, the code you have supplied for the rule is invoked. This code is called an action. Actions are fragments of C language code. They can return values and make use of values returned by other actions.
The heart of the yacc specification is the collection of grammar rules. Each rule describes a construct and gives it a name. For example, one grammar rule might be
date : month_name day ´,´ year ;where date, month_name, day, and year represent constructs of interest; presumably, month_name, day, and year are defined in greater detail elsewhere. In the example, the comma is enclosed in single quotes. This means that the comma is to appear literally in the input. The colon and semicolon merely serve as punctuation in the rule and have no significance in evaluating the input. With proper definitions, the input
July 4, 1776might be matched by the rule.
The lexical analyzer is an important part of the parsing function. This user-supplied routine reads the input stream, recognizes the lower-level constructs, and communicates these as tokens to the parser. The lexical analyzer recognizes constructs of the input stream as terminal symbols; the parser recognizes constructs as nonterminal symbols. To avoid confusion, we will refer to terminal symbols as tokens.
There is considerable leeway in deciding whether to recognize constructs using the lexical analyzer or grammar rules. For example, the rules
month_name : 'J' 'a' 'n' ; month_name : 'F' 'e' 'b' ; . . . month_name : 'D' 'e' 'c' ;might be used in the above example. While the lexical analyzer only needs to recognize individual letters, such low-level rules tend to waste time and space, and may complicate the specification beyond the ability of yacc to deal with it. Usually, the lexical analyzer recognizes the month names and returns an indication that a month_name is seen. In this case, month_name is a token and the detailed rules are not needed.
Literal characters such as a comma must also be passed through the lexical analyzer and are also considered tokens.
Specification files are very flexible. It is relatively easy to add to the above example the rule
date : month '/' day '/' year ;allowing
7/4/1776as a synonym for
July 4, 1776on input. In most cases, this new rule could be slipped into a working system with minimal effort and little danger of disrupting existing input.
The input being read may not conform to the specifications. With a left-to-right scan, input errors are detected as early as is theoretically possible. Thus, not only is the chance of reading and computing with bad input data substantially reduced, but the bad data usually can be found quickly. Error handling, provided as part of the input specifications, permits the reentry of bad data or the continuation of the input process after skipping over the bad data.
In some cases, yacc fails to produce a parser when given a set of
specifications.
For example, the specifications may be self-contradictory, or they may
require a more powerful recognition
mechanism than that available to yacc.
The former cases represent design errors;
the latter cases often
can be corrected
by making
the lexical analyzer
more powerful or by rewriting some of the grammar rules.
While yacc cannot handle all possible specifications, its power
compares favorably with similar systems.
Moreover, the
constructs that are difficult for yacc to handle are
also frequently difficult for human beings to handle.
Some users have reported
that the discipline of formulating valid
yacc specifications for their input revealed errors of
conception or design early in program development.
The remainder of this topic describes the following subjects:
In addition, there are two examples and a summary of the yacc input syntax.