(guile.info.gz) Rx Regexps
(guile.info.gz) Formatted Output
44 The Rx Regular Expression Library
[FIXME: this is taken from Gary and Mark's quick summaries and should be
reviewed and expanded. Rx is pretty stable, so could already be done!]
The `guile-lang-allover' package provides an interface to Tom Lord's
Rx library (currently only to POSIX regular expressions). Use of the
library requires a two step process: compile a regular expression into
an efficient structure, then use the structure in any number of string
For example, given the regular expression `abc.' (which matches any
string containing `abc' followed by any single character):
guile> (define r (regcomp "abc."))
guile> (regexec r "abc")
guile> (regexec r "abcd")
#((0 . 4))
The definitions of `regcomp' and `regexec' are as follows:
-- Scheme Procedure: regcomp pattern [flags]
Compile the regular expression pattern using POSIX rules. Flags is
optional and should be specified using symbolic names:
-- Variable: REG_EXTENDED
use extended POSIX syntax
-- Variable: REG_ICASE
use case-insensitive matching
-- Variable: REG_NEWLINE
allow anchors to match after newline characters in the string
and prevents `.' or `[^...]' from matching newlines.
The `logior' procedure can be used to combine multiple flags. The
default is to use POSIX basic syntax, which makes `+' and `?'
literals and `\+' and `\?' operators. Backslashes in PATTERN must
be escaped if specified in a literal string e.g., `"\\(a\\)\\?"'.
-- Scheme Procedure: regexec regex string [match-pick] [flags]
Match STRING against the compiled POSIX regular expression REGEX.
MATCH-PICK and FLAGS are optional. Possible flags (which can be
combined using the logior procedure) are:
-- Variable: REG_NOTBOL
The beginning of line operator won't match the beginning of
STRING (presumably because it's not the beginning of a line)
-- Variable: REG_NOTEOL
Similar to REG_NOTBOL, but prevents the end of line operator
from matching the end of STRING.
If no match is possible, regexec returns #f. Otherwise MATCH-PICK
determines the return value:
`#t' or unspecified: a newly-allocated vector is returned,
containing pairs with the indices of the matched part of STRING
and any substrings.
`""': a list is returned: the first element contains a nested list
with the matched part of STRING surrounded by the the unmatched
parts. Remaining elements are matched substrings (if any). All
returned substrings share memory with STRING.
`#f': regexec returns #t if a match is made, otherwise #f.
vector: the supplied vector is returned, with the first element
replaced by a pair containing the indices of the matched portion
of STRING and further elements replaced by pairs containing the
indices of matched substrings (if any).
list: a list will be returned, with each member of the list
specified by a code in the corresponding position of the supplied
a number: the numbered matching substring (0 for the entire match).
`#\<': the beginning of STRING to the beginning of the part matched
`#\>': the end of the matched part of STRING to the end of STRING.
`#\c': the "final tag", which seems to be associated with the "cut
operator", which doesn't seem to be available through the posix
e.g., `(list #\< 0 1 #\>)'. The returned substrings share memory
Here are some other procedures that might be used when using regular
-- Scheme Procedure: compiled-regexp? obj
Test whether obj is a compiled regular expression.
-- Scheme Procedure: regexp->dfa regex [flags]
-- Scheme Procedure: dfa-fork dfa
-- Scheme Procedure: reset-dfa! dfa
-- Scheme Procedure: dfa-final-tag dfa
-- Scheme Procedure: dfa-continuable? dfa
-- Scheme Procedure: advance-dfa! dfa string
(guile.info.gz) Formatted Output
automatically generated byinfo2html