|
|
lex generates a file named lex.yy.c. When lex.yy.c is compiled and linked with the lex library, it copies the input to the output except when a string specified in the file is found. When a specified string is found, then the corresponding program text is executed. The actual string matched is left in yytext, an external character array. Matching is done in order of the patterns in the file. The patterns may contain square brackets to indicate character classes, as in ``[abx-z]'' to indicate ``a'', ``b'', ``x'', ``y'', and ``z''; and the operators ``'', ``+'', and ``?'' mean, respectively, any non-negative number of, any positive number of, and either zero or one occurrence of, the previous character or character class. Thus, ``[a-zA-Z]+'' matches a string of letters. The character ``.'' is the class of all characters except new-line. Parentheses for grouping and vertical bar for alternation are also supported. The notation r{d,e} in a rule indicates between d and e instances of regular expression r. It has higher precedence than |, but lower than , ?, +, and concatenation. The character ``^'' at the beginning of an expression permits a successful match only immediately after a new-line, and the character ``$'' at the end of an expression requires a trailing new-line. The character ``/'' in an expression indicates trailing context; only the part of the expression up to the slash is returned in yytext, but the remainder of the expression must follow in the input stream. An operator character may be used as an ordinary symbol if it is within ``"'' symbols or preceded by ``\''.
Three macros are expected: input to read a character; unput(c) to replace a character read; and output(c) to place an output character. They are defined in terms of the standard streams, but you can override them. The program generated is named yylex, and the lex library contains a main that calls it. The macros input and output read from and write to stdin and stdout, respectively.
The function yymore accumulates additional characters into the same yytext. The function yyless(n) pushes back yyleng- n characters into the input stream. (yyleng is an external int variable giving the length in bytes of yytext.) The function yywrap is called whenever the scanner reaches end of file and indicates whether normal wrapup should continue. The action REJECT on the right side of the rule causes the match to be rejected and the next suitable match executed. The action ECHO on the right side of the rule is equivalent to printf("%s", yytext).
Any line beginning with a blank is assumed to contain only C text and is copied; if it precedes ``%%'', it is copied into the external definition area of the lex.yy.c file. All rules should follow a %%, as in yacc. Lines preceding %% that begin with a non-blank character define the string on the left to be the remainder of the line; it can be called out later by surrounding it with ``{}''. In this section, C code (and preprocessor statements) can also be included between ``%{'' and ``%}''. Note that curly brackets do not imply parentheses; only string substitution is done.
The external names generated by lex all begin with the prefix yy or YY.
The flags must appear before any files.
Certain default table sizes are too small for some users. The table sizes for the resulting finite state machine can be set in the definitions section:
D [0-9] O [0-7] %{ void skipcommnts(void) { for(;;) { while(input()!='*') ; if(input()=='/') return; elseunput(yytext[yyleng-1]); } } %}
%% if printf("IF statement\n"); [a-z]+ printf("tag, value %s\n",yytext); 0{O}+ printf("octal number %s\n",yytext); {D}+ printf("decimal number %s\n",yytext); "++" printf("unary op\n"); "+" printf("binary op\n"); "\n" ;/*no action */ "/" skipcommnts(); %%