TinySCHEME Version 1.31 "Safe if used as prescribed" -- Philip K. Dick, "Ubik" This software is open source, covered by a BSD-style license. Please read accompanying file license.txt. ------------------------------------------------------------------------------- This Scheme interpreter is based on MiniSCHEME version 0.85k4 (see miniscm.tar.gz in the Scheme Repository) Original credits at end of file. D. Souflis (dsouflis@acm.org) ------------------------------------------------------------------------------- What is TinyScheme? ------------------- TinyScheme is a lightweight Scheme interpreter that implements as large a subset of R5RS as was possible without getting very large and complicated. It is meant to be used as an embedded scripting interpreter for other programs. As such, it does not offer IDEs or extensive toolkits although it does sport a small top-level loop, included conditionally. A lot of functionality in TinyScheme is included conditionally, to allow developers freedom in balancing features and footprint. As an embedded interpreter, it allows multiple interpreter states to coexist in the same program, without any interference between them. Programmatically, foreign functions in C can be added and values can be defined in the Scheme environment. Being a quite small program, it is easy to comprehend, get to grips with, and use. Known bugs ---------- SCM tests had revealed a memory allocation error in the past, but not anymore. It probably had to do with vectors, and version 1.21 probably got rid of it. Things that keep missing, or that need fixing --------------------------------------------- There are no hygienic macros. No rational or complex numbers. No unwind-protect and call-with-values. Maybe (a subset of) SLIB will work with TinySCHEME... I will add a debugger... The user will be able to specify breakpoints, and a new toplevel will be entered when the breakpoint is reached. Most of the actual debugger will be in Scheme, with minimal additions to scheme.c. Change Log ---------- Version 1.02 (25 Aug 1998): First part of R5RS I/O. Version 1.03 (26 Aug 1998): Extended .h with useful functions for FFI Library: with-input-* etc. Finished R5RS I/O, added string ports. Version 1.04 Added missing T_ATOM bits... Added vectors Free-list is sorted by address, since vectors need consecutive cells. (quit ) for use with scripts Version 1.05 Support for scripts, *args*, "-1" option. Various R5RS procedures. *sharp-hook* Handles unmatched parentheses. New architecture for procedures. Version 1.06 #! is now skipped generic-assoc bug removed strings are now managed differently, hack.txt is removed various delicate points fixed Version 1.07 '=>' in cond now exists list? now checks for circularity some reader bugs removed Reader is more consistent wrt vectors Quote and Quasiquote work with vectors Version 1.08 quotient,remainder and modulo. gcd. Version 1.09 Removed bug when READ met EOF. lcm. Version 1.10 Another bug when file ends with comment! Added DEFINE-MACRO in init.scm, courtesy of Andy Gaynor. Version 1.11 BSDI defines isnumber... changed all similar functions to is_* EXPT now has correct definition. Added FLOOR,CEILING,TRUNCATE and ROUND, courtesy of Bengt Kleberg. Preprocessor symbols now have values 1 or 0, and can be set as compiler defines (proposed by Andy Ganor *months* ago). 'prompt' and 'InitFile' can now be defined during compilation, too. Version 1.12 Cis* incorrectly called isalpha() instead of isascii() Added USE_CHAR_CLASSIFIERS, USE_STRING_PORTS. Version 1.13 Silly bug involving division by zero resolved by Roland Kaufman. Macintoch support from Shmulik Regev. Float parser bug fixed by Alexander Shendi. GC bug from Andru Luvisi. Version 1.14 Unfortunately, after Andre fixed the GC it became obvious that the algorithm was too slow... Fortunately, David Gould found a way to speed it up. Version 1.15 David Gould also contributed some changes that speed up operation. Kirk Zurell fixed HASPROP. The Garbage Collection didn't collect all the garbage...fixed. Version 1.16 Dynamically-loaded extensions introduced (USE_DL). Santeri Paavolainen found a race condition: When a cons is executed, and each of the two arguments is a constructing function, GC could happen before all arguments are evaluated and cons() is called, and the evaluated arguments would all be reclaimed! Fortunately, such a case was rare in the code, although it is a pitfall in new code and code in foreign functions. Currently, only one such case remains, when COLON_HOOK is defined. Version 1.17 Dynamically-loaded extensions are more fully integrated. TinyScheme is now distributed under the BSD open-source license. Version 1.18 The FFI has been extended. USE_VERBOSE_GC has gone. Anyone wanting the same functionality can put (gcverbose #t) in init.scm. print-width was removed, along with three corresponding op-codes. Extended character constants with ASCII names were added. mk_counted_string paves the way for full support of binary strings. As much as possible of the type-checking chores were delegated to the inner loop, thus reducing the code size to less than 4200 loc! Version 1.19 Carriage Return now delimits identifiers. DOS-formatted Scheme files can be used by Unix. Random number generator added to library. Fixed some glitches of the new type-checking scheme. Fixed erroneous (append '() 'a) behavior. Will continue with r4rstest.scm to fix errors. Version 1.20 Tracing has been added. The toplevel loop has been slightly rearranged. Backquote reading for vector templates has been sanitized. Symbol interning is now correct. Arithmetic functions have been corrected. APPLY, MAP, FOR-EACH, numeric comparison functions fixed. String reader/writer understands \xAA notation. Version 1.21 Jason Felice submitted a radically different datatype representation which he had implemented. While discussing its pros and cons, it became apparent that the current implementation of ports suffered from a grave fault: ports were not garbage-collected. I changed the ports to be heap-allocated, which enabled the use of string ports for loading. Jason also fixed errors in the garbage collection of vectors. USE_VERBATIM is gone. "ssp_compiler.c" has a better solution on HTML generation. A bug involving backslash notation in strings has been fixed. '-c' flag now executes next argument as a stream of Scheme commands. Foreign functions are now also heap allocated, and scheme_define is used to define everything. Version 1.22 The new ports had a bug in LOAD. MK_CLOSURE is introduced. Shawn Wagner inquired about string->number and number->string. I added string->atom and atom->string and defined the number functions from them. Doing that, I fixed WRITE applied to symbols (it didn't quote them). Unfortunately, minimum build is now slightly larger than 64k... I postpone action because Jason's idea might solve it elegantly. Version 1.23 Finally I managed to mess it up with my version control. Version 1.22 actually lacked some of the things I have been fixing in the meantime. This should be considered as a complete replacement for 1.22. Version 1.24 SCM tests now pass again after change in atom2str. Version 1.25 Types have been homogenized to be able to accomodate a different representation. Plus, promises are no longer closures. Unfortunately, I discovered that continuations and force/delay do not pass the SCM test (and never did)... However, on the bright side, what little modifications I did had a large impact on the footprint: USE_NO_FEATURES now produces an object file of 63960 bytes on Linux! Version 1.26 Version 1.26 was never released. I changed a lot of things, in fact too much, even the garbage collector, and hell broke loose. I'll try a more gradual approach next time. Version 1.27 Version 1.27 is the successor of 1.25. Bug fixes only, but I had to release them so that everybody can profit. 'Backchar' tried to write back to the string, which obviously didn't work for const strings. 'Substring' didn't check for crossed start and end indices. Defines changed to restore the ability to compile under MSVC. Version 1.28 Many people have contacted me with bugfixes or remarks in the three months I was inactive. A lot of them spotted that scheme_deinit crashed while reporting gc results. They suggested that sc->outport be set to NIL in scheme_deinit, which I did. Dennis Taylor remarked that OP_VALUEPRINT reset sc->value instead of preserving it. He submitted a modification which I adopted partially. David Hovemeyer sent me many little changes, that you will find in version 1.28, and Partice Stoessel modified the float reader to conform to R5RS. Version 1.29 The previous version contained a lot of corrections, but there were a lot more that still wait on a sheet of paper lost in a carton someplace after my house move... Manuel Heras-Gilsanz noticed this and resent his own contribution, which relies on another bugfix that v.1.28 was missing: a problem with string output, that this version fixes. I hope other people will take the time to resend their contributions, if they didn't make it to v.1.28. Version 1.30 After many months, I followed Preston Bannister's advice of using macros and a single source text to keep the enums and the dispatch table in sync, and I used his contributed "opdefines.h". Timothy Downs contributed a helpful function, "scheme_call". Stephen Gildea contributed new versions of the makefile and practically all other sources. He created a built-in STRING-APPEND, and fixed a lot of other bugs. Ruhi Bloodworth reported fixes necessary for OS X and a small bug in dynload.c. Version 1.31 Patches to the hastily-done version 1.30. Stephen Gildea fixed some things done wrongly, and Richard Russo fixed the makefile for building on Windows. Property lists (heritage from MiniScheme) are now optional and have dissappeared from the interface. They should be considered as deprecated. Scheme Reference ---------------- If something seems to be missing, please refer to the code and "init.scm", since some are library functions. Refer to the MiniSCHEME readme as a last resort. Environments (interaction-environment) See R5RS. In TinySCHEME, immutable list of association lists. (current-environment) The environment in effect at the time of the call. An example of its use and its utility can be found in the sample code that implements packages in "init.scm": (macro (package form) `(apply (lambda () ,@(cdr form) (current-environment)))) The environment containing the (local) definitions inside the closure is returned as an immutable value. (defined? ) (defined? ) Checks whether the given symbol is defined in the current (or given) environment. Symbols (gensym) Returns a new interned symbol each time. Will probably move to the library when string->symbol is implemented. Directives (gc) Performs garbage collection immediatelly. (gcverbose) (gcverbose ) The argument (defaulting to #t) controls whether GC produces visible outcome. (quit) (quit ) Stops the interpreter and sets the 'retcode' internal field (defaults to 0). When standalone, 'retcode' is returned as exit code to the OS. (tracing ) 1, turns on tracing. 0 turns it off. (Only when USE_TRACING is 1). Mathematical functions Since rationals and complexes are absent, the respective functions are also missing. Supported: exp, log, sin, cos, tan, asin, acos, atan, floor, ceiling, trunc, round and also sqrt and expt when USE_MATH=1. Number-theoretical quotient, remainder and modulo, gcd, lcm. Library: exact?, inexact?, odd?, even?, zero?, positive?, negative?, exact->inexact. inexact->exact is a core function. Type predicates boolean?,eof-object?,symbol?,number?,string?,integer?,real?,list?,null?, char?,port?,input-port?,output-port?,procedure?,pair?,environment?', vector?. Also closure?, macro?. Types Types supported: Numbers (integers and reals) Symbols Pairs Strings Characters Ports Eof object Environments Vectors Literals String literals can contain escaped quotes \" as usual, but also \n, \r, \t and \xDD (hex representations). Note also that it is possible to include literal newlines in string literals, e.g. (define s "String with newline here and here that can function like a HERE-string") Character literals contain #\space and #\newline and are supplemented with #\return and #\tab, with obvious meanings. Hex character representations are allowed (e.g. #\x20 is #\space). When USE_ASCII_NAMES is defined, various control characters can be refered to by their ASCII name. 0 #\nul 17 #\dc1 1 #\soh 18 #\dc2 2 #\stx 19 #\dc3 3 #\etx 20 #\dc4 4 #\eot 21 #\nak 5 #\enq 22 #\syn 6 #\ack 23 #\etv 7 #\bel 24 #\can 8 #\bs 25 #\em 9 #\ht 26 #\sub 10 #\lf 27 #\esc 11 #\vt 28 #\fs 12 #\ff 29 #\gs 13 #\cr 30 #\rs 14 #\so 31 #\us 15 #\si 16 #\dle 127 #\del Numeric literals support #x #o #b and #d. Flonums are currently read only in decimal notation. Full grammar will be supported soon. Quote, quasiquote etc. As usual. Immutable values Immutable pairs cannot be modified by set-car! and set-cdr!. Immutable strings cannot be modified via string-set! I/O As per R5RS, plus String Ports (see below). current-input-port, current-output-port, close-input-port, close-output-port, input-port?, output-port?, open-input-file, open-output-file. read, write, display, newline, write-char, read-char, peek-char. char-ready? returns #t only for string ports, because there is no portable way in stdio to determine if a character is available. Also open-input-output-file, set-input-port, set-output-port (not R5RS) Library: call-with-input-file, call-with-output-file, with-input-from-file, with-output-from-file and with-input-output-from-to-files, close-port and input-output-port? (not R5RS). String Ports: open-input-string, open-output-string, open-input-output-string. Strings can be used with I/O routines. Vectors make-vector, vector, vector-length, vector-ref, vector-set!, list->vector, vector-fill!, vector->list, vector-equal? (auxiliary function, not R5RS) Strings string, make-string, list->string, string-length, string-ref, string-set!, substring, string->list, string-fill!, string-append, string-copy. string=?, string?, string>?, string<=?, string>=?. (No string-ci*? yet). string->number, number->string. Also atom->string, string->atom (not R5RS). Symbols symbol->string, string->symbol Characters integer->char, char->integer. char=?, char?, char<=?, char>=?. (No char-ci*?) Pairs & Lists cons, car, cdr, list, length, map, for-each, foldr, list-tail, list-ref, last-pair, reverse, append. Also member, memq, memv, based on generic-member, assoc, assq, assv based on generic-assoc. Streams head, tail, cons-stream Control features Apart from procedure?, also macro? and closure? map, for-each, force, delay, call-with-current-continuation (or call/cc), eval, apply. 'Forcing' a value that is not a promise produces the value. There is no call-with-values, values, nor dynamic-wind. Dynamic-wind in the presence of continuations would require support from the abstract machine itself. Property lists TinyScheme inherited from MiniScheme property lists for symbols. put, get. Dynamically-loaded extensions (load-extension ) Loads a DLL declaring foreign procedures. Esoteric procedures (oblist) Returns the oblist, an immutable list of all the symbols. (macro-expand
) Returns the expanded form of the macro call denoted by the argument (define-with-return ( ...) ) Like plain 'define', but makes the continuation available as 'return' inside the procedure. Handy for imperative programs. (new-segment ) Allocates more memory segments. defined? See "Environments" (get-closure-code ) Gets the code as scheme data. (make-closure ) Makes a new closure in the given environment. Obsolete procedures (print-width ) Programmer's Reference ---------------------- The interpreter state is initialized with "scheme_init". Custom memory allocation routines can be installed with an alternate initialization function: "scheme_init_custom_alloc". Files can be loaded with "scheme_load_file". Strings containing Scheme code can be loaded with "scheme_load_string". It is a good idea to "scheme_load" init.scm before anything else. External data for keeping external state (of use to foreign functions) can be installed with "scheme_set_external_data". Foreign functions are installed with "assign_foreign". Additional definitions can be added to the interpreter state, with "scheme_define" (this is the way HTTP header data and HTML form data are passed to the Scheme script in the Altera SQL Server). If you wish to define the foreign function in a specific environment (to enhance modularity), use "assign_foreign_env". The procedure "scheme_apply0" has been added with persistent scripts in mind. Persistent scripts are loaded once, and every time they are needed to produce HTTP output, appropriate data are passed through global definitions and function "main" is called to do the job. One could add easily "scheme_apply1" etc. The interpreter state should be deinitialized with "scheme_deinit". DLLs containing foreign functions should define a function named init_. E.g. foo.dll should define init_foo, and bar.so should define init_bar. This function should assign_foreign any foreign function contained in the DLL. The first dynamically loaded extension available for TinyScheme is a regular expression library. Although it's by no means an established standard, this library is supposed to be installed in a directory mirroring its name under the TinyScheme location. Foreign Functions ----------------- The user can add foreign functions in C. For example, a function that squares its argument: pointer square(scheme *sc, pointer args) { if(args!=sc->NIL) { if(sc->isnumber(sc->pair_car(args))) { double v=sc->rvalue(sc->pair_car(args)); return sc->mk_real(sc,v*v); } } return sc->NIL; } Foreign functions are now defined as closures: sc->interface->scheme_define( sc, sc->global_env, sc->interface->mk_symbol(sc,"square"), sc->interface->mk_foreign_func(sc, square)); Foreign functions can use the external data in the "scheme" struct to implement any kind of external state. External data are set with the following function: void scheme_set_external_data(scheme *sc, void *p); As of v.1.17, the canonical way for a foreign function in a DLL to manipulate Scheme data is using the function pointers in sc->interface. Standalone ---------- Usage: tinyscheme -? or: tinyscheme [ ...] followed by -1 [ ...] -c [ ...] assuming that the executable is named tinyscheme. Use - in the place of a filename to denote stdin. The -1 flag is meant for #! usage in shell scripts. If you specify #! /somewhere/tinyscheme -1 then tinyscheme will be called to process the file. For example, the following script echoes the Scheme list of its arguments. #! /somewhere/tinyscheme -1 (display *args*) The -c flag permits execution of arbitrary Scheme code. Customizing ----------- The following symbols are defined to default values in scheme.h. Use the -D flag of cc to set to either 1 or 0. STANDALONE Define this to produce a standalone interpreter. USE_MATH Includes math routines. USE_CHAR_CLASSIFIERS Includes character classifier procedures. USE_ASCII_NAMES Enable extended character notation based on ASCII names. USE_STRING_PORTS Enables string ports. USE_ERROR_HOOK To force system errors through user-defined error handling. (see "Error handling") USE_TRACING To enable use of TRACING. USE_COLON_HOOK Enable use of qualified identifiers. (see "Colon Qualifiers - Packages") Defining this as 0 has the rather drastic consequence that any code using packages will stop working, and will have to be modified. It should only be used if you *absolutely* need to use '::' in identifiers. USE_STRCASECMP Defines stricmp as strcasecmp, for Unix. STDIO_ADDS_CR Informs TinyScheme that stdio translates "\n" to "\r\n". For DOS/Windows. USE_DL Enables dynamically loaded routines. If you define this symbol, you should also include dynload.c in your compile. USE_PLIST Enables property lists (not Standard Scheme stuff). Off by default. USE_NO_FEATURES Shortcut to disable USE_MATH, USE_CHAR_CLASSIFIERS, USE_ASCII_NAMES, USE_STRING_PORTS, USE_ERROR_HOOK, USE_TRACING, USE_COLON_HOOK, USE_DL. Build instructions ------------------ Easy. Define the appropriate symbols, compile scheme.c (dynload.c, too, if you define USE_DL=1) and link with your application. For demonstration purposes, a sample makefile that produces a standalone interpreter for Linux is included in this distribution. Error Handling -------------- Errors are recovered from without damage. The user can install his own handler for system errors, by defining *error-hook*. Defining to '() gives the default behavior, which is equivalent to "error". USE_ERROR_HOOK must be defined. A simple exception handling mechanism can be found in "init.scm". A new syntactic form is introduced: (catch ... ) "Catch" establishes a scope spanning multiple call-frames until another "catch" is encountered. Exceptions are thrown with: (throw "message") If used outside a (catch ...), reverts to (error "message"). Example of use: (define (foo x) (write x) (newline) (/ x 0)) (catch (begin (display "Error!\n") 0) (write "Before foo ... ") (foo 5) (write "After foo")) The exception mechanism can be used even by system errors, by (define *error-hook* throw) which makes use of the error hook described above. If necessary, the user can devise his own exception mechanism with tagged exceptions etc. Reader extensions ----------------- When encountering an unknown character after '#', the user-specified procedure *sharp-hook* (if any), is called to read the expression. This can be used to extend the reader to handle user-defined constants or whatever. It should be a procedure without arguments, reading from the current input port (which will be the load-port). Colon Qualifiers - Packages --------------------------- When USE_COLON_HOOK=1: The lexer now recognizes the construction :: and transforms it in the following manner (T is the transformation function): T(::) = (*colon-hook* 'T() ) where is a symbol not containing any double-colons. As the definition is recursive, qualifiers can be nested. The user can define his own *colon-hook*, to handle qualified names. By default, "init.scm" defines *colon-hook* as EVAL. Consequently, the qualifier must denote a Scheme environment, such as one returned by (interaction-environment). "Init.scm" defines a new syntantic form, PACKAGE, as a simple example. It is used like this: (define toto (package (define foo 1) (define bar +))) foo ==> Error, "foo" undefined (eval 'foo) ==> Error, "foo" undefined (eval 'foo toto) ==> 1 toto::foo ==> 1 ((eval 'bar toto) 2 (eval 'foo toto)) ==> 3 (toto::bar 2 toto::foo) ==> 3 (eval (bar 2 foo) toto) ==> 3 If the user installs another package infrastructure, he must define a new 'package' procedure or macro to retain compatibility with supplied code. Note: Older versions used ':' as a qualifier. Unfortunately, the use of ':' as a pseudo-qualifier in existing code (i.e. SLIB) essentially precludes its use as a real qualifier. O R I G I N A L C R E D I T S ------------------------------- TinyScheme would not exist if it wasn't for MiniScheme. I had just written the HTTP server for Ovrimos SQL Server, and I was lamenting the lack of a scripting language. Server-side Javascript would have been the preferred solution, had there been a Javascript interpreter I could lay my hands on. But there weren't. Perl would have been another solution, but it was probably ten times bigger that the program it was supposed to be embedded in. There would also be thorny licencing issues. So, the obvious thing to do was find a trully small interpreter. Forth was a language I had once quasi-implemented, but the difficulty of handling dynamic data and the weirdness of the language put me off. I then looked around for a LISP interpreter, the next thing I knew was easy to implement. Alas, the LeLisp I knew from my days in UPMC (Universite Pierre et Marie Curie) had given way to Common Lisp, a megalith of a language! Then my search lead me to Scheme, a language I knew was very orthogonal and clean. When I found Mini-Scheme, a single C file of some 2400 loc, I fell in love with it! What if it lacked floating-point numbers and strings! The rest, as they say, is history. Below are the original credits. Don't email Akira KIDA, the address has changed. ---------- Mini-Scheme Interpreter Version 0.85 ---------- coded by Atsushi Moriwaki (11/5/1989) E-MAIL : moriwaki@kurims.kurims.kyoto-u.ac.jp THIS SOFTWARE IS IN THE PUBLIC DOMAIN ------------------------------------ This software is completely free to copy, modify and/or re-distribute. But I would appreciate it if you left my name on the code as the author. This version has been modified by R.C. Secrist. Mini-Scheme is now maintained by Akira KIDA. This is a revised and modified version by Akira KIDA. current version is 0.85k4 (15 May 1994) Please send suggestions, bug reports and/or requests to: Features compared to MiniSCHEME ------------------------------- All code is now reentrant. Interpreter state is held in a 'scheme' struct, and many interpreters can coexist in the same program, possibly in different threads. The user can specify user-defined memory allocation primitives. (see "Programmer's Reference") The reader is more consistent. Strings, characters and flonums are supported. (see "Types") Files being loaded can be nested up to some depth. R5RS I/O is there, plus String Ports. (see "Scheme Reference","I/O") Vectors exist. As a standalone application, it supports command-line arguments. (see "Standalone") Running out of memory is now handled. The user can add foreign functions in C. (see "Foreign Functions") The code has been changed slightly, core functions have been moved to the library, behavior has been aligned with R5RS etc. Support has been added for user-defined error recovery. (see "Error Handling") Support has been added for modular programming. (see "Colon Qualifiers - Packages") To enable this, EVAL has changed internally, and can now take two arguments, as per R5RS. Environments are supported. (see "Colon Qualifiers - Packages") Promises are now evaluated once only. (macro (foo form) ...) is now equivalent to (macro foo (lambda(form) ...)) The reader can be extended using new #-expressions (see "Reader extensions")