DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

regcmp(S-osr5)


regcmp, regex -- compiles and executes regular expressions

Syntax



cc . . . -lc

#include <libgen.h>

char *regcmp (string1 [, string2, . . . (char *)0) char *string1, *string2, . . .

char *regex (re, subject, [ ret0, . . . ]) char *re, *subject, *ret0, . . .

extern char *__loc1;

Description

The regcmp routine compiles a regular expression (consisting of the concatenated arguments) and returns a pointer to the compiled form. The malloc(S-osr5) routine creates space for the compiled form. It is the user's responsibility to free unneeded space so allocated. A NULL return from regcmp indicates an incorrect argument. regcmp(CP) has been written to generally preclude the need for this routine at execution time.

The regex routine executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. regex returns NULL on failure or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began.

The regex and regcmp routines were borrowed from the editor, ed(C); however, the syntax and semantics have been changed slightly.

The following are the symbols understood by regex and regcmp, and their meanings.


[]*.^
These symbols retain their meaning in ed(C).

$
Matches the end of the string; \n matches a new-line.

-
Within brackets the minus means through. For example, [a-z] is equivalent to [abcd...xyz]. The ``-'' can appear as itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] and -.

+
A regular expression followed by ``+'' means one or more times. For example, [0-9]+ is equivalent to [0-9] [0-9]*.

{m} {m,} {m,u}
Integer values enclosed in ``{}'' indicate the number of times the preceding regular expression is to be applied. The value m is the minimum number and u is a number, less than 256, which is the maximum. If only m is present (for example, {m}), it indicates the exact number of times the regular expression is to be applied. The value {m,} is analogous to {m,infinity}. The plus (``+'') and star (``*'') operations are equivalent to {1,} and {0,} respectively.

( ... )$n
The value of the enclosed regular expression is to be returned. The value is stored in the (n+1)th argument following the subject argument. At most ten enclosed regular expressions are allowed. regex makes its assignments unconditionally.

( ... )
Parentheses are used for grouping. An operator, for example, ``*'', ``+'', ``{}'', can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0.

By necessity, all the above defined symbols are special. They must, therefore, be escaped with a \ (backslash) to be used as themselves.

Notes

The user program may run out of memory if regcmp is called iteratively without freeing the vectors that are no longer required.

See also

ed(C), free(S-osr5), malloc(S-osr5), re_comp(S-osr5)

Standards conformance

regcmp, regex and __loc1 are not part of any currently supported standard; they are an extension of AT&T System V provided by the Santa Cruz Operation.

Examples

Example 1:
   char *cursor, *newcursor, *ptr;
           ...
   newcursor = regex((ptr = regcmp("\n", (char *)0)), cursor);
   free(ptr);
This example matches a leading new-line in the subject string pointed at by cursor.

Example 2:

   char ret0[9];
   char *newcursor, *name;
          ...
   name = regcmp("([A-Za-z][A-za-z0-9]{0,7})$0", (char *)0);
   newcursor = regex(name, "012Testing345", ret0);
This example matches through the string ``Testing3'' and returns the address of the character after the last matched character (the ``4''). The string ``Testing3'' is copied to the character array ret0.

Example 3:

   #include "file.i"
   char *string, *newcursor;
          ...
   newcursor = regex(name, string);
This example applies a precompiled regular expression in file.i against string. See regcmp(CP).

Example 4:

   char *ptr, *newcursor;
   

ptr = regcmp("[a-[=i=][:digit:]]*",(char*)0); newcursor = regex(ptr, "123CHICO321");

It is assumed in this example that the current locale's collation rules specify the following sequence:
   A,a,B,b,C,c,CH,Ch,ch,D,d,E,e,F,f,G,g,H,h,I,i.....
The characters I and i are also both in the same ``primary'' collation group.

The following characters are all members of the digit ctype class:

   0, 1, 2, 3, 4, 5, 6, 7, 8, 9
This example matches through the string ``123CHIC'' and returns the address of the character ``O'' in the string.
© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 02 June 2005