|
This module implements a simple ``standard'' lexical analyzer, presented as a function from character streams to token streams. It implements roughly the lexical conventions of Caml, but is parameterized by the set of keywords of your language.
type token = Kwd of string | Ident of string | Int of int | Float of float | String of string | Char of char
The type of tokens. The lexical classes are:Int
andFloat
for integer and floating-point numbers;String
for string literals, enclosed in double quotes;Char
for character literals, enclosed in single quotes;Ident
for identifiers (either sequences of letters, digits, underscores and quotes, or sequences of ``operator characters'' such as+
,*
, etc); andKwd
for keywords (either identifiers or single ``special characters'' such as(
,}
, etc).
val make_lexer: string list -> (char Stream.t -> token Stream.t)
Construct the lexer function. The first argument is the list of keywords. An identifiers
is returned asKwd s
ifs
belongs to this list, and asIdent s
otherwise. A special characters
is returned asKwd s
ifs
belongs to this list, and cause a lexical error (exceptionParse_error
) otherwise. Blanks and newlines are skipped. Comments delimited by(*
and*)
are skipped as well, and can be nested.
Example: a lexer suitable for a desk calculator is obtained bylet lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]The associated parser would be a function fromtoken stream
to, for instance,int
, and would have rules such as:let parse_expr = parser [< 'Int n >] -> n | [< 'Kwd "("; n = parse_expr; 'Kwd ")" >] -> n | [< n1 = parse_expr; n2 = parse_remainder n1 >] -> n2 and parse_remainder n1 = parser [< 'Kwd "+"; n2 = parse_expr >] -> n1+n2 | ...