Foreign-Function Interface to C ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Larceny provides a general foreign-function interface (FFI) substrate on which other FFIs can be built; see link:LarcenyNotes/note7-ffi.html[Larceny Note #7]. The FFI described in this manual section is a simple example of a derived FFI. It is not yet fully evolved, but it is useful. [WARNING] ================================================================ Some of the information in this section may be out of date. ================================================================ [[FfiCompiliing, Creating loadable modules]] ==== Creating loadable modules You must first compile your C code and create one or more loadable object modules. These object modules may then be loaded into Larceny, and Scheme foreign functions may link to specific functions in the loaded module. Defining foreign functions in Scheme is covered in a later section. The method for creating a loadable object module varies from platform to platform. In the following, assume you have to C source files file1.c and file2.c that define functions that you want to make available as foreign functions in Larceny. ===== SunOS 4 Compile your source files and create a shared library. Using GCC, the command line might look like this: gcc -fPIC -shared file1.c file2.c -o my-library.so The command creates my-library.so in the current directory. This library can now be loaded into Larceny using <>. Any other shared libraries used by your library files should also be loaded into Larceny using <> before any procedures are linked using <>. By default, /lib/libc.so is made available to the dynamic linker and to the foreign function interface, so there is no need for you to load that library explicitly. ===== SunOS 5 Compile your source files and create a shared library, linking with all the necessary libraries. Using GCC, the command line might look like this: gcc -fPIC -shared file1.c file2.c -lc -lm -lsocket -o my-library.so Now you can use foreign-file to load my-library.so into Larceny. By default, /lib/libc.so is made available to the foreign function interface, so there is no need for you to load that library explicitly. [[FfiInterface, Loading and linking foreign functions]] ==== The Interface ===== Procedures proc:foreign-file[args="filename",result="unspecified"] Foreign-file loads the named object file into Larceny and makes it available for dynamic linking. Larceny uses the operating system provided dynamic linker to do dynamic linking. The operation of the dynamic linker varies from platform to platform: * On some versions of SunOS 4, if the linker is given a file that does not exist, it will terminate the process. (Most likely this is a bug.) This means you should never call foreign-file with the name of a file that does not exist. * On SunOS 5, if a foreign file is given to foreign-file without a directory specification, then the dynamic linker will search its load path (the LD_LIBRARY_PATH environment variable) for the file. Hence, a foreign file in the current directory should be "./file.so", not "file.so". proc:foreign-procedure[args="name (arg-type ...) return-type",result="unspecified"] FIXME: The interface to this function has been extended to support hooking into Windows procedures that use the Pascal calling convention instead of the C one. The way to select which convention to use should be documented. Returns a Scheme procedure _p_ that calls the foreign procedure whose name is _name_. When _p_ is called, it will convert its parameters to representations indicated by the __arg-type__s and invoke the foreign procedure, passing the converted values as parameters. When the foreign procedure returns, its return value is converted to a Scheme value according to _return-type_. Types are described below. The address of the foreign procedure is obtained by searching for _name_ in the symbol tables of the foreign files that have been loaded with _foreign-file_. proc:foreign-null-pointer[args="",result="integer"] Returns a foreign null pointer. proc:foreign-null-pointer?[args="integer",result="boolean"] Tests whether its argument is a foreign null pointer. ===== Types A _type_ is denoted by a symbol. The following is a list of the accepted types and their conversions at the call-out to the foreign procedure: **int** Any exact integer value in the range [-2^31,2^31-1] is acceptable and is converted to a C "int". **unsigned** Any exact integer value in the range [0,2^32-1] is acceptable and is converted to a C "unsigned". **short** Synonymous with **int** in the current implementation. **ushort** Synonymous to **unsigned** in the current implementation. **char** A character is acceptable. It is converted to a C "int" type. **uchar** A character is acceptable. It is converted to a C "unsigned" type. **long** Synonymous with **int** in the current implementation. **ulong** Synonymous with **unsigned** in the current implementation. **float** A flonum is acceptable. It is converted to a C "float". **double** A flonum is acceptable. It is converted to a C "double". **bool** Any object is acceptable. It is converted to a C "int": #f is converted to 0, and all other objects to 1. **boxed** Any heap-allocated data structure (pair, bytevector-like, vector-like, procedure) is acceptable. It is converted to a C "void*" to the first element of the structure. The value #f is also acceptable. It is converted to a C "(void*)0" value. **string** A string or #f is acceptable. A string is _copied_ into a NUL-terminated bytevector, and the resulting pointer is passed. #f is converted to a C "(char*)0" value. Additionally, the types can be used as the return type, where conversions back to Scheme values take place: **int** A C "int" is expected; it is converted to an exact integer. **unsigned** A C "unsigned" is expected; it is converted to an exact integer. **short** Synonymous with **int** in the current implementation. **ushort** Synonymous with **unsigned** in the current implementation. **char** A C "int" is expected; it is converted to a character. **uchar** A C "unsigned" is expected; it is converted to a character. **long** Synonymous with **int** in the current implementation. **ulong** Synonymous with **unsigned** in the current implementation. **float** A C "float" is expected. It is converted to a flonum. **double** A C "double" is expected. It is converted to a flonum. **bool** A C "int" is expected. 0 is converted to #f, all other values to #t. **void** No return value. **string** A C "char*" is expected. If it is non-null, it is expected to point to a NUL-terminated string, which is copied into a newly allocated Scheme string which is then returned. If the return value is null, then #f is returned. FIXME: add documentation for the ** tramp ** type, which marsalls a callback trampoline. FIXME: add documentation for the ** -> ** and ** maybe ** type constructors. Make sure to note that ** -> ** currently has a space leak on _every_ marshalling (because I believe it generates a fresh trampoline on every invocation). We could at least tighten the leak here if we kept a cache of generated trampolines and reused them, but currently do not do so. FIXME: add documentation for the ** void* ** type, the corresponding _void*-rt_ record type descriptor, and the procedures void*->address void*-byte-ref void*-byte-set! void*-word-ref void*-word-set! void*-void*-ref void*-void*-set! void*-double-ref void*-double-set! Also document ffi-add-attribute-core-entry! (and perhaps ffi-attribute-core-entry ?) [[FfiAccess, Foreign data access]] ==== Foreign Data Access ===== Raw memory access The two primitives _peek-bytes_ and _poke-bytes_ are provided for reading and writing memory at specific addresses. These procedures are typically used for copying data from foreign data structures into Scheme bytevectors for subsequent decoding. (The use of _peek-bytes_ and _poke-bytes_ can often be avoided by keeping foreign data in a Scheme bytevector and passing the bytevector to a call-out using the **boxed** parameter type. However, this technique is inappropriate if the foreign code retains a pointer to the Scheme datum, which may be moved by the garbage collector.) proc:peek-bytes[args="addr bytevector count",result="unspecified"] _Addr_ must be an exact nonnegative integer. _Count_ must be a fixnum. The bytes in the range from _addr_ through _addr+count-1_ are copied into _bytevector_, which must be long enough to hold that many bytes. If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults. proc:poke-bytes[args="addr bytevector count",result="unspecified"] _Addr_ must be an exact nonnegative integer. _Count_ must be a fixnum. The _count_ first bytes from _bytevector_ are copied into memory in the range from _addr_ through _addr+count-1_. If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults. Also, it's possible to corrupt memory with _poke-bytes_. Don't do that. ===== Foreign data sizes The following variables constants define the sizes of basic C data types: * **sizeof:short** The size of a "short int". * **sizeof:int** The size of an "int". * **sizeof:long** The size of a "long int". * **sizeof:pointer** The size of any pointer type. ===== Decoding foreign data Foreign data is visible to a Scheme program either as an object pointed to by a memory address (which is itself represented as an integer), or as a bytevector that contains the bytes of the foreign datum. A number of utility procedures that make reading and writing data of common C primitive types have been written for both these kinds of foreign objects. _Bytevector accessor procedures_ proctempl:%get16[args="bv i",result="integer"] proctempl:%get16u[args="bv i",result="integer"] proctempl:%get32[args="bv i",result="integer"] proctempl:%get32u[args="bv i",result="integer"] proctempl:%get-int[args="bv i",result="integer"] proctempl:%get-unsigned[args="bv i",result="integer"] proctempl:%get-short[args="bv i",result="integer"] proctempl:%get-ushort[args="bv i",result="integer"] proctempl:%get-long[args="bv i",result="integer"] proctempl:%get-ulong[args="bv i",result="integer"] proctempl:%get-pointer[args="bv i",result="integer"] These procedures decode bytevectors that contain the bytes of foreign objects. In each case, _bv_ is a bytevector and _i_ is the offset of the first byte of a field in that bytevector. The field is fetched and returned as an integer (signed or unsigned as appropriate). _Bytevector updater procedures_ proctempl:%set16[args="bv i val",result="unspecified"] proctempl:%set16u[args="bv i val",result="unspecified"] proctempl:%set32[args="bv i val",result="unspecified"] proctempl:%set32u[args="bv i val",result="unspecified"] proctempl:%set-int[args="bv i val",result="unspecified"] proctempl:%set-unsigned[args="bv i val",result="unspecified"] proctempl:%set-short[args="bv i val",result="unspecified"] proctempl:%set-ushort[args="bv i val",result="unspecified"] proctempl:%set-long[args="bv i val",result="unspecified"] proctempl:%set-ulong[args="bv i val",result="unspecified"] proctempl:%set-pointer[args="bv i val",result="unspecified"] These procedures update bytevectors that contain the bytes of foreign objects. In each case, _bv_ is a bytevector, _i_ is an offset of the first byte of a field in that bytevector, and _val_ is a value to be stored in that field. The values must be exact integers in a range implied by the data type. _Foreign-pointer accessor procedures_ proctempl:%peek8[args="addr",result="integer"] proctempl:%peek8u[args="addr",result="integer"] proctempl:%peek16[args="addr",result="integer"] proctempl:%peek16u[args="addr",result="integer"] proctempl:%peek32[args="addr",result="integer"] proctempl:%peek32u[args="addr",result="integer"] proctempl:%peek-int[args="addr",result="integer"] proctempl:%peek-long[args="addr",result="integer"] proctempl:%peek-unsigned[args="addr",result="integer"] proctempl:%peek-ulong[args="addr",result="integer"] proctempl:%peek-short[args="addr",result="integer"] proctempl:%peek-ushort[args="addr",result="integer"] proctempl:%peek-pointer[args="addr",result="integer"] proctempl:%peek-string[args="addr",result="integer"] These procedures read raw memory. In each case, _addr_ is an address, and the value stored at that address (the size of which is indicated by the name of the procedure) is fetched and returned as an integer. _%Peek-string_ expects to find a NUL-terminated string of 8-bit bytes at the given address. It is returned as a Scheme string. _Foreign-pointer updater procedures_ proctempl:%poke8[args="addr val",result="unspecified"] proctempl:%poke8u[args="addr val",result="unspecified"] proctempl:%poke16[args="addr val",result="unspecified"] proctempl:%poke16u[args="addr val",result="unspecified"] proctempl:%poke32[args="addr val",result="unspecified"] proctempl:%poke32u[args="addr val",result="unspecified"] proctempl:%poke-int[args="addr val",result="unspecified"] proctempl:%poke-long[args="addr val",result="unspecified"] proctempl:%poke-unsigned[args="addr val",result="unspecified"] proctempl:%poke-ulong[args="addr val",result="unspecified"] proctempl:%poke-short[args="addr val",result="unspecified"] proctempl:%poke-ushort[args="addr val",result="unspecified"] proctempl:%poke-pointer[args="addr val",result="unspecified"] These procedures update raw memory. In each case, _addr_ is an address, and _val_ is a value to be stored at that address. [[FfiDumping, Heap dumping and the FFI]] ==== Heap dumping and the FFI If foreign functions are linked into Larceny using the FFI, and a Larceny heap image is subsequently dumped (with <> or <>), then the foreign functions are not saved as part of the heap image. When the heap image is subsequently loaded into Larceny at startup, the FFI will attempt to re-link all the foreign functions in the heap image. During the relinking phase, foreign files will again be loaded into Larceny, and Larceny's FFI will use the file names _as they were originally given to the FFI_ when it tries to load the files. In particular, if relative pathnames were used, Larceny will not have converted them to absolute pathnames. An error during relinking will result in Larceny aborting with an error message and returning to the operating system. This is considered a feature. [[FfiExamples, Examples]] ==== Examples ===== Change directory This procedure uses the chdir() system call to set the process's current working directory. The string parameter type is used to pass a Scheme string to the C procedure. (define cd (let ((chdir (foreign-procedure "chdir" '(string) 'int))) (lambda (newdir) (if (not (zero? (chdir newdir))) (error "cd: " newdir " is not a valid directory name.")) (unspecified)))) ===== Print Working Directory This procedure uses the getcwd() (get current working directory) system call to retrieve the name of the process's current working directory. A bytevector is created and passed in as a buffer in which to store the return value -- a 0-terminated ASCII string. Then the FFI utility function ffi/asciiz->string is called to convert the bytevector to a string. (define pwd (let ((getcwd (foreign-procedure "getcwd" '(boxed int) 'int))) (lambda () (let ((s (make-bytevector 1024))) (getcwd s 1024) (ffi/asciiz->string s))))) ===== Quicksort WARNING: this example is bogus. It is not safe to pass a collectable object into a C procedure when the callback invocation might cause a garbage collection, thus moving the object and invalidating the address stored in the C machine context. This demonstrates how to use a callback such as the comparator argument to qsort. It is specified in the type signature using -> as a type constructor. (Note that one should probably use the built-in sort routines rather than call out like this; this example is for demonstrating callbacks, not how to sort.) (define qsort! (foreign-procedure "qsort" '(boxed ushort ushort (-> (void* void*) int)) 'void)) (let ((bv (list->vector '(40 10 30 20 1 2 3 4)))) (qsort! bv 8 4 (lambda (x y) (let ((x (/ (void*-word-ref x 0) 4)) (y (/ (void*-word-ref y 0) 4))) (- x y)))) bv) (let ((bv (list->bytevector '(40 10 30 20 1 2 3 4)))) (qsort! bv 8 1 (lambda (x y) (let ((x (void*-byte-ref x 0)) (y (void*-byte-ref y 0))) (- x y)))) bv) ===== Other examples The Experimental directory contains several examples of use of the FFI. See in particular the files unix.sch (Unix system calls) and socket.sch (procedures for communicating over sockets). ==== Higher level layers The general foreign-function interface functionality described above is powerful but awkward to use in practice. A user might be tempted to hard code values of offsets or constants that are compiler dependent. Also, the FFI will marshall some low-level values such as strings or integers, but other values such as enumerations which could be naturally mapped to sets of symbols are not marshalled since the host environment does not provide the necessary type information to the FFI. This section documents a collection of libraries to mitigate these and other problems. ===== foreign-ctools Foreign data access is performed by peeking at manually calculated addresses, but in practice one often needs to inspect fields of C structures, whose offsets are dependant on the application binary interface (ABI) of the host environment. Similarly, C programs often use refer to values via constant macro definitions; since the values of such names are not provided by the object code and Scheme programs do not have a C preprocessor run on them prior to execution, it is difficult to refer to the same value without encoding "magic numbers" into the Scheme source code. The foreign-ctools library is meant to mitigate problems like the two described above. It provides special forms for introducing global definitions of values typically available at compile-time for a C program. The library assumes the presence of a C compiler (such as _cc_ on Unix systems or _cl.exe_ on Windows systems). The special forms work by dynamically generating, compiling, and running C code at expansion time to determine the desired values of structure offsets or macro constants. _Syntax define-c-values_ ++ (define-c-values ( ...) ( ...) ( ...))++ _Syntax define-c-info_ ++ (define-c-info ... ...)++ where ++ ::= (compiler ) | (path ) | (include
) | (include<>
)++ and ++ ::= cc | cl++ and ++ ::= (const ) | (sizeof ) | (struct ( ) ...) | (fields ( ) ...)++ and ++ ::= int | uint | long | ulong++ ===== foreign-sugar The <> function is sufficient to link in dynamically loaded C procedures, but it can be annoying to use when there are many procedures to define that all follow a regular pattern where one could infer a mapping between Scheme identifiers and C function names. For example, some libraries follow a naming convention where a words within a name are separated by underscores; such functions could be immediately mapped to Scheme names where the underscores have been replaced by dashes. The foreign-sugar library provides a special form, ++define-foreign++, which gives the user a syntax for defining foreign functions using a syntax where one provides only the Scheme name, the argument types, and the return type. The ++define-foreign++ form then attempts to infer what C function the name was meant to refer to. _Syntax define-foreign_ ++ (define-foreign (name arg-type ...) result-type)++ NOTE: There is other functionality provided allowing the user to introduce new rules for inferring C function names, but they are undocumented because they will probably have to change when we switch to an R6RS macro expander. ===== foreign-stdlib proc:stdlib/malloc[args="rtd",optarg="ctor",result="procedure"] Given a record extension of _void*-rt_, returns an allocator that uses the C ++malloc++ procedure to allocate instances of such an object. Note that the client is responsible for eventually freeing such objects with <>. proc:stdlib/free[args="void*-obj"] Frees objects produced by allocators returned from <>. proc:ffi-install-void*-subtype[var="ffi-install-void*-subtype"] proctempl:ffi-install-void*-subtype[args="rtd",result="rtd"] proctempl:ffi-install-void*-subtype[args="string",optarg="parent-rtd",result="rtd"] proctempl:ffi-install-void*-subtype[args="symbol",optarg="parent-rtd",result="rtd"] proc:establish-void*-subhierarchy![args="symbol-tree",result="unspecified"] _Type char*_ extends _void*_ proc:string->char*[args="string",result="char*"] proc:char*-strlen[args="char*",result="fixnum"] proc:char*->string[args="char*",result="string"] proctempl:char*->string[args="char* len",result="string"] proc:CallWithCharStar[var="call-with-char*",args="string string-function",result="value"] _Type char\*\*_ extends _void*_ proc:CallWithCharStarStar[var="call-with-char\*\*",args="string-vector function",result="value"] _Type int*_ extends _void*_ proc:CallWithIntStar[var="call-with-int*",args="fixnum-vector function",result="value"] _Type short*_ extends _void*_ proc:CallWithShortStar[var="call-with-short*",args="fixnum-vector function",result="value"] _Type double*_ extends _void*_ proc:CallWithDoubleStar[var="call-with-double*",args="num-vector function",result="value"] FIXME: (There are other functions, but I want to test and document the ones above first...) ===== foreign-cenums Provides the special form ++define-c-enum++ FIXME: add the doc