Larceny Note #9: The I/O system

Lars T Hansen / November 14, 1997
William D Clinger / 18 July 2008

This note is being written

The I/O architecture

Larceny's I/O system is strongly influenced by the Modula-3 I/O system presented in Greg Nelson's book, Systems Programming in Modula-3. The I/O system is designed with performance and user-extensibity in mind.

Larceny's I/O system is composed of several layers:

Low-level operations

Larceny translates some of the basic I/O operations provided by the underlying operating system into a small set of OS-independent operations that open, close, read, and write a slight abstraction of low-level file and console ports. This abstraction is implemented by the following files:

The ioproc layer

At the ioproc layer, a port is implemented using a Scheme procedure of one argument: a symbol that indicates the operation to perform. After dispatching on that symbol, the ioproc procedure returns a port-specific procedure that, when called, performs the operation. The operations are:

The read operation, which attempts to fill the given input buffer, is supported only for input ports; the write operation, which empties the given output buffer, is supported only for output ports; and the set-position! operation is supported only for ports that support set-port-position!.

The iodata argument to an ioproc represents any additional state that may be associated with a port but encoded separately from the port's ioproc and its methods. There is no standard representation for the iodata object; its nature and interpretation vary from one kind of port to another.

The iosys layer

The iosys layer implements Scheme ports. Each port is an object that encapsulates:

The port state

Once a port enters the closed, error, or eof states, it remains in that state.

On Unix systems, where some programmers (and the R6RS, alas) like to think an end-of-file on a console port can be followed by more input, what actually happens in Larceny is that the end-of-file causes the console port to be abandoned by the (current-input-port) procedure, which opens a fresh console port that becomes the current input port.

An open binary port is normally in the binary state.

A textual input port is normally in the textual state, but often enters the auxstart state when the main buffer is filled and often enters the auxend state when the main buffer has almost been emptied. These two states use a small auxiliary buffer to deal with multi-byte encodings that span a buffer boundary. They also support the use of a sentinel byte that simplifies the inline code for get-char.

Textual output ports make similar use of the auxiliary buffer.

Combined input/output ports are normally in the input/output state. They buffer at most one character, for a maximum of four bytes.

Port position

All of Larceny's textual input ports keep track of their current port position in units of characters, lines, and offsets within a line. Larceny's end-of-line processing depends upon the offset within a line together with a boolean that indicates whether the previous end-of-line was represented by a lone return character.

The R6RS end-of-line semantics has an interesting interaction with set-port-position!. Unless the end-of-line style of a textual input port is none, setting a new port position sometimes requires R6RS-conforming implementations to examine the character that precedes the new position in order to implement the mandated end-of-line semantics.

Specialized ports

Custom ports

Thread-aware I/O

To support thread-aware I/O, we need two things:

Mutual exclusion is not hard; the procedures in Lib/stdio.sch can be wrapped in a without-interrupts form. The lock should probably be a public part of the port structure so that it's possible for (system) code to acquire it once and then call low-level primitives for better performance.

Since the threads system is (currently) written in Scheme on top of continuation, blocking system calls are no good. Instead, I/O system calls that may block indefinitely must be avoided.

The right thing to do seems to consider two subtypes of I/O ports, along the lines of the Modula-3 I/O system. A port is classified either as intermittent or not. Intermittent ports may have to wait an unbounded amount of time before input is available or output is accepted. Currently, the only intermittent ports are console I/O ports, but when the extensible I/O system goes public, we'll have sockets pretty quickly.

Intermittent ports havs the following unique attribute: the underlying read and write methods return would-block if no work was accomplished (no input was ready or no output would be accepted). If that token is returned to the fill or flush methods, then the operation on the port must block until the port is ready. This blocking can be done either by polling or by interrupts. If I/O interrupts are available, then the I/O system must enable them and set up an I/O event handler. If not, the I/O system must register an I/O poll procedure for the port as a periodic system task. In either event, the I/O system will then block the thread on a condition variable that will be signalled by the ready handler, whichever method is used.

I think that the actual underlying mechanism chosen for unblocking threads can and should be independent of the Scheme I/O system. This is possible if the I/O system supports an installable "ioblock" handler that it will call to wait for I/O on a port. System code will then install the correct ioblock handler for the I/O event system chosen on the particular platform.

$Id: note9-iosys.html 87 1998-11-25 14:38:41Z lth $