Default Syntax
Larceny is case-sensitive by default. This can be changed on Larceny's command line by using the -foldcase or -nofoldcase options, and can be changed at runtime using the case-sensitive? parameter described below.
Case-sensitivity is a property of individual ports. The case-sensitivity of a newly created textual port is determined by the case-sensitive? parameter, and can be changed by reading a #!r6rs, #!err5rs, #!r5rs, #!larceny, #!fold-case, or #!no-fold-case flag from the port. These things are described below.
Except for case-sensitivity, which is required by the current draft R6RS but incompatible with the R5RS, Larceny's default lexical syntax extends both R5RS and R6RS lexical syntax. The draft R6RS does not permit extensions, however, so Larceny provides several mechanisms for turning its extensions on and off.
R6RS Lexical Syntax
The R6RS describes a language that extends the R5RS lexical syntax in several important ways, while forbidding most other extensions to the lexical syntax.
Known incompatibilities between R5RS and R6RS lexical syntax
- case sensitivity
- identifiers, numbers, characters, booleans, and dot must be followed by a delimiter
Important extensions provided by the R6RS lexical syntax
- hexadecimal escape sequences allow identifiers to contain any Unicode characters (e.g. i\x2665;\x3bb;arceny)
- hexadecimal escape sequences allow strings to contain any Unicode characters (e.g. "Kurt G\xf6;del")
- hexadecimal escape sequence for any Unicode character (e.g. #\x2192)
- names for selected Ascii characters (e.g. #\vtab)
- single letter escapes for selected Ascii characters within strings (e.g. "Posterity shall ne'er survey\nA nobler grave than this...")
- external representations for bytevectors (e.g. #vu8(105 226 153 165 206 187 97 114 99 101 110 121))
Larceny supports almost all lexical syntax of the R6RS. The only exception is mantissa widths, which parse correctly but are not yet accepted by string->number.
Flags
These flags may be placed at the beginning of a file that contains Scheme code or data. Their effect is limited to that file.
These flags may also be typed at an interactive top level, in which case their effect is limited to the (current-input-port) from which they are read.
With the annoying exception of #!r6rs, these flags read as unspecified values. Preceding them with #; will preserve their side effects on the port but cause them to be treated as comments otherwise. The #!r6rs flag behaves as though it were already preceded by #;, however, so it should never be preceded by an explicit #;.
#!r6rs
Tells the read and get-datum procedures to enforce all lexical restrictions imposed by the R6RS. Implies case-sensitivity.
#!err5rs
Tells the read and get-datum procedures to read from the port in an ERR5RS-compatible mode that allows Larceny's usual lexical extensions to R6RS syntax. Does not affect case-sensitivity.
#!r5rs
Tells the read and get-datum procedures to read from the port in an R5RS-compatible mode that allows Larceny's usual lexical extensions. Implies case-insensitivity. Equivalent to #!err5rs followed by #!fold-case.
#!larceny
Tells the read and get-datum procedures to allow Larceny's usual extensions with Larceny's usual case-sensitivity. Implies case-sensitivity. Equivalent to #!err5rs followed by #!no-fold-case.
#!fold-case
Tells the read and get-datum procedures to use Unicode's locale-independent case-folding algorithm on the names of symbols that are not written with a hexadecimal escape. (If Larceny's default extensions are enabled, then surrounding the symbol with vertical bars will also disable folding on that symbol. If traditional extensions are enabled as well, then any backslash escapes within a symbol will disable case-folding on the entire symbol.) This behavior is Larceny's definition of case-insensitivity.
#!no-fold-case
Tells the read and get-datum procedures not to mess with the names of symbols. This behavior is Larceny's definition of case-sensitivity.
Parameters
The following parameters are carried over from v0.93, controlling the indicated features:
case-sensitive?
- if true: symbols are case-sensitive
- if false: symbols are not case-sensitive (with exceptions listed in the description of #!fold-case)
read-square-bracket-as-paren
- if true: allow square brackets
- Note: This parameter is deprecated. The reader should always accept square brackets.
recognize-keywords?
- if true: treat colon keywords specially (e.g. :foo)
- Note: The reader sets this parameter but does not consult it. The macro expander consults it.
- Note: This parameter is deprecated.
datum-source-locations?
- if true: keep track of source code locations
- Note: This parameter is present in v0.94 but has no effect. It will become more useful in a future version of Larceny.
recognize-javadot-symbols?
- recognize JavaDot symbols (for the subset of JavaDot symbols that are allowed by the lexical mode in effect when the symbol is read)
The following parameters were added in v0.94, and control the indicated features:
read-r6rs-flags?
- if true: allow flags other than #!r6rs
- if false: treat flags other than #!r6rs as errors
read-larceny-weirdness?
- allow # as insignificant digit in numerals (required by R5RS)
- allow some nonstandard peculiar identifiers (-- -1+ 1+ 1-)
- allow leading . or @ or +: or -: in symbols
- allow backslashes in strings before characters that don't have to be escaped
- allow vertical bar as a <subsequent> in symbols (used in FASL files)
- allow #^B #^C #^F #^P #^G randomness (used in FASL files)
- Note: all of these extensions are deprecated
read-traditional-weirdness?
- allow vertical bars surrounding symbol
- allow backslash escaping within symbols
- allow unconditional downcasing of the character following #
- allow #!...!# comments (but these are not implemented in v0.94; see lib/Standard/exec-comment.sch)
- allow #.(...) read-time evaluation (see lib/Standard/sharp-dot.sch)
- allow #&... (but this doesn't work in v0.94; see lib/Standard/box.sch)
- Note: all of these extensions are deprecated
read-mzscheme-weirdness?
- allow MzScheme #\uXX character extension
- allow MzScheme #% randomness
- allow #"..." randomness
- Note: all of these extensions are deprecated
The purposes of the following lexical syntax are unknown, but they were supported in v0.93. These syntaxes are not supported by v0.94:
- #r randomness (regular expressions?)
- some kind of weirdness in flush-whitespace-until-rparen
Past and Future
A completely new reader was introduced in Larceny v0.94. Its state machine and parser were generated by Will's LexGen? and ParseGen? tools, so we can regenerate the reader from a declarative specification.
The new reader is not programmable, but future versions of Larceny should provide tools for constructing and installing custom readers. Programmability is not a requirement for R6RS programs, since the R6RS does not permit extensions anyway.
