DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(flex.info.gz) Lex and Posix

Info Catalog (flex.info.gz) Reentrant (flex.info.gz) Top (flex.info.gz) Memory Management
 
 20 Incompatibilities with Lex and Posix
 ***************************************
 
 `flex' is a rewrite of the AT&T Unix _lex_ tool (the two
 implementations do not share any code, though), with some extensions and
 incompatibilities, both of which are of concern to those who wish to
 write scanners acceptable to both implementations.  `flex' is fully
 compliant with the POSIX `lex' specification, except that when using
 `%pointer' (the default), a call to `unput()' destroys the contents of
 `yytext', which is counter to the POSIX specification.  In this section
 we discuss all of the known areas of incompatibility between `flex',
 AT&T `lex', and the POSIX specification.  `flex''s `-l' option turns on
 maximum compatibility with the original AT&T `lex' implementation, at
 the cost of a major loss in the generated scanner's performance.  We
 note below which incompatibilities can be overcome using the `-l'
 option.  `flex' is fully compatible with `lex' with the following
 exceptions:
 
    * The undocumented `lex' scanner internal variable `yylineno' is not
      supported unless `-l' or `%option yylineno' is used.
 
    * `yylineno' should be maintained on a per-buffer basis, rather than
      a per-scanner (single global variable) basis.
 
    * `yylineno' is not part of the POSIX specification.
 
    * The `input()' routine is not redefinable, though it may be called
      to read characters following whatever has been matched by a rule.
      If `input()' encounters an end-of-file the normal `yywrap()'
      processing is done.  A "real" end-of-file is returned by `input()'
      as `EOF'.
 
    * Input is instead controlled by defining the `YY_INPUT()' macro.
 
    * The `flex' restriction that `input()' cannot be redefined is in
      accordance with the POSIX specification, which simply does not
      specify any way of controlling the scanner's input other than by
      making an initial assignment to `yyin'.
 
    * The `unput()' routine is not redefinable.  This restriction is in
      accordance with POSIX.
 
    * `flex' scanners are not as reentrant as `lex' scanners.  In
      particular, if you have an interactive scanner and an interrupt
      handler which long-jumps out of the scanner, and the scanner is
      subsequently called again, you may get the following message:
 
 
               fatal @code{flex} scanner internal error--end of buffer missed
 
      To reenter the scanner, first use:
 
 
               yyrestart( yyin );
 
      Note that this call will throw away any buffered input; usually
      this isn't a problem with an interactive scanner. 
      Reentrant, for `flex''s reentrant API.
 
    * Also note that `flex' C++ scanner classes _are_ reentrant, so if
      using C++ is an option for you, you should use them instead.
       Cxx, and  Reentrant  for details.
 
    * `output()' is not supported.  Output from the ECHO macro is done
      to the file-pointer `yyout' (default `stdout)'.
 
    * `output()' is not part of the POSIX specification.
 
    * `lex' does not support exclusive start conditions (%x), though they
      are in the POSIX specification.
 
    * When definitions are expanded, `flex' encloses them in parentheses.
      With `lex', the following:
 
 
               NAME    [A-Z][A-Z0-9]*
               %%
               foo{NAME}?      printf( "Found it\n" );
               %%
 
      will not match the string `foo' because when the macro is expanded
      the rule is equivalent to `foo[A-Z][A-Z0-9]*?'  and the precedence
      is such that the `?' is associated with `[A-Z0-9]*'.  With `flex',
      the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
      string `foo' will match.
 
    * Note that if the definition begins with `^' or ends with `$' then
      it is _not_ expanded with parentheses, to allow these operators to
      appear in definitions without losing their special meanings.  But
      the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
      definition.
 
    * Using `-l' results in the `lex' behavior of no parentheses around
      the definition.
 
    * The POSIX specification is that the definition be enclosed in
      parentheses.
 
    * Some implementations of `lex' allow a rule's action to begin on a
      separate line, if the rule's pattern has trailing whitespace:
 
 
               %%
               foo|bar<space here>
                 { foobar_action();}
 
      `flex' does not support this feature.
 
    * The `lex' `%r' (generate a Ratfor scanner) option is not
      supported.  It is not part of the POSIX specification.
 
    * After a call to `unput()', _yytext_ is undefined until the next
      token is matched, unless the scanner was built using `%array'.
      This is not the case with `lex' or the POSIX specification.  The
      `-l' option does away with this incompatibility.
 
    * The precedence of the `{,}' (numeric range) operator is different.
      The AT&T and POSIX specifications of `lex' interpret `abc{1,3}'
      as match one, two, or three occurrences of `abc'", whereas `flex'
      interprets it as "match `ab' followed by one, two, or three
      occurrences of `c'".  The `-l' and `--posix' options do away with
      this incompatibility.
 
    * The precedence of the `^' operator is different.  `lex' interprets
      `^foo|bar' as "match either 'foo' at the beginning of a line, or
      'bar' anywhere", whereas `flex' interprets it as "match either
      `foo' or `bar' if they come at the beginning of a line".  The
      latter is in agreement with the POSIX specification.
 
    * The special table-size declarations such as `%a' supported by
      `lex' are not required by `flex' scanners..  `flex' ignores them.
 
    * The name `FLEX_SCANNER' is `#define''d so scanners may be written
      for use with either `flex' or `lex'.  Scanners also include
      `YY_FLEX_MAJOR_VERSION',  `YY_FLEX_MINOR_VERSION' and
      `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
      generated the scanner. For example, for the 2.5.22 release, these
      defines would be 2,  5 and 22 respectively. If the version of
      `flex' being used is a beta version, then the symbol `FLEX_BETA'
      is defined.
 
    * The symbols `[[' and `]]' in the code sections of the input may
      conflict with the m4 delimiters.  M4 Dependency.
 
 
    The following `flex' features are not included in `lex' or the POSIX
 specification:
 
    * C++ scanners
 
    * %option
 
    * start condition scopes
 
    * start condition stacks
 
    * interactive/non-interactive scanners
 
    * yy_scan_string() and friends
 
    * yyterminate()
 
    * yy_set_interactive()
 
    * yy_set_bol()
 
    * YY_AT_BOL()    <<EOF>>
 
    * <*>
 
    * YY_DECL
 
    * YY_START
 
    * YY_USER_ACTION
 
    * YY_USER_INIT
 
    * #line directives
 
    * %{}'s around actions
 
    * reentrant C API
 
    * multiple actions on a line
 
    * almost all of the `flex' command-line options
 
    The feature "multiple actions on a line" refers to the fact that
 with `flex' you can put multiple actions on the same line, separated
 with semi-colons, while with `lex', the following:
 
 
          foo    handle_foo(); ++num_foos_seen;
 
    is (rather surprisingly) truncated to
 
 
          foo    handle_foo();
 
    `flex' does not truncate the action.  Actions that are not enclosed
 in braces are simply terminated at the end of the line.
 
Info Catalog (flex.info.gz) Reentrant (flex.info.gz) Top (flex.info.gz) Memory Management
automatically generated byinfo2html