JScheme: Design Decisions

In June of 1997, I wanted to learn Java beter. I had a pretty good theoretical understanding of the language, but hadn't written much actual Java code. I decided to write a Lisp interpreter, and JSCHEME is the result. Here are the design decisions I dealt with.

  1. What resources? I decided I didn't want to spend more than 20 hours on this exercise of learning Java, learning the IDE I was using, and producing a Lisp interpreter. I met the goal in the allocated time: I had a Lisp with 638 lines of code, 4 data types (pair, symbol, number, and procedure). I put the project aside (and went on to write lots more Java at work), but then I found Aubrey Jaffrey's r4rstest.scm test suite, which tests Scheme compliance. My system failed most of the tests, of course, so every now and then I would get some spare time and make some additions. Eventually I passed all the tests, released it to the Internet, and made a few updates once I found that people were finding it useful and downloading it. Because the system came out to about 50KB I decided to call it SILK, or Scheme In L KB (L is 50 in Roman), with the intention that if it ever bloats up to 100KB, I'd change the name to SICK. In version 1.4 the jar files (source and .class) are comfortably below 30KB, but the uncompressed source code has climbed to 69KB (still closer to L than C).

    Addendum: In 2001 (and again in 2002) we got letters from the friendly folks at Thredtec.com, who have a trademark on ``Silk'' for a Java simulation tool, so we changed the name to ``Jscheme,'' which was the name of Tim Hickey's implementation before he joined the SILK project.

  2. What Lisp? I settled on R4RS Scheme, because it has the test suite, is widely used, and has a publicly accessible manual.  Some day I should upgrade to R5RS Scheme, which is the most recent verison, and is compatible with the IEEE Scheme standard.

  3. Compiler or Interpreter? I just wanted something that works, so I did a simple interpreter.
  4. What kind of performance? This was not a major concern.  I wanted an interpreter that would work, but it didn't have to be fast. It turns out I got about what I expected: it takes about half a second to load the 500 line primitives.scm file, and another half a second to loop 1000 times using this code:
    (define (loop n) (if (> n 0) (loop (- n 1))))
    (loop 1000)
    
    I conjecture that most of the time goes into a linear lookup of functions in the global environment. This conjecture is confirmed by seeing that the following variation runs about nine times faster:
    (define loop2 (let ((> >) (- -)) 
                    (lambda (n) (if (> n 0) (loop2 (- n 1))))))
    (loop2 1000)
    
    I can get another 15% faster by doing:
    (eval `(define (loop3 n) (if (,< n 0) 0 (loop3 (,- n 1)))))
    (loop3 1000) 
    
    With this I can loop 14,000 times in a second, which seems fast enough. Perhaps I should have the notion of immutable variables, and have lambda do some pre-treatment analysis, so the user can write the loop version and get the effect of the loop3 version.

  5. How to Organize into classes?  At first, I was confused about the syntax for static methods in Java. I'd seen both Math.sqrt(x) and java.lang.Math.sqrt(x), but I thought you could do sqrt(x) if you import java.lang.Math. I quickly discovered that is not correct: you can only write sqrt(x) in code that is in the Math class, or a class that extends Math. So now I had a problem: over 100 error messages, and a question of how to re-organize.  My first thought was that I could live with one use of Math.sqrt(x), but I really wanted to write list(x, y), not SomeOtherClass.list(x, y), and the only way I could think of to achieve that was to put everything (well, almost everything) into one big class.  That's when I thought of the name SIOC, or Scheme in One Class, which is a play on George Carrette's SIOD, or Scheme in One Defun.

    However, I knew I really wanted to have a separate class for the interpreter (so I could swap several interpreters in; perhaps a simple interpreter, a tail-recursive interpreter, and an interpreter that supports full call/cc, as I did in PAIP). I also wanted to have a class for the primitive procedures, so I could have an R4RS, and an R5RS version some day. (I should probably clean things up so that the extensions are in a separate class from the R4RS primitives.) But if I put the list method in the primitive procedure class (where it will be used fairly often), then I would have to write Primitive.list(x, y) in the other classes (which also want to use list fairly often). I guess most Java programmers would just have accepted this, but I came up with the idea of putting the utility methods in their own class, SchemeUtils, and then having the five other top-level classes extend SchemeUtils.  That way, I can use the unqualified name, and I get the modularity I want. So the class structure I ended up with is:

  6. SchemePrimitives
    SchemeUtils
        Scheme
        Environment
        Pair
        InputPort
        Procedure
            Primitive
            JavaMethod
            Continuation
            Closure
                Macro
    

  7. Basic Implementation Parts Once these design issues are out of the way, the rest is pretty easy. In order to build a Scheme interpreter, you basically need six things:

For numbers, I have only one type: Double. This is allowable by the specification but raises a problem with respect to exact/inexact numbers. I didn't implement inexact contagion properly. Technically, (max 4 3.9) should be an inexact 4. But I have no way to distinguish #e4 from #i4. I implement (exact? x) as true iff (= x (round x)) and if adding or subtracting 1 from x changes x. Rather than do the add/subtract each time, I test against the maximum allowable number. I guess I could have looked up the format for IEEE double floating point to see that there are 53 bits in the mantissa, but instead I wrote a program to figure out that I can represent exact integers up to 0x1fffffffffffff (just under 1016). I still have the problem that (* 2 0.5) comes out to an exact number, when it should be inexact. Here's the program:

(define (f lo hi) 
  (let ((mid (+ (/ hi 2) (/ lo 2)) (round (sqrt (* lo hi)))))
    (if (or (= lo mid) (= mid hi))
	lo
	(if (ok? mid) (f mid hi) (f lo mid)))))

(define (ok? n)
  (and (< (- n 1) n (+ n 1)) 
       (= (- (+ n 1) 1) n (+ (- n 1) 1))))

(number->string (f 1 (expt 2 63)) 16)

Scheme Type Java Type
Notes
pair Pair Java has no equivalent to use.
empty list (null) We could have Pair and Empty as subclasses of List, or EMPTY could be a static var.  But I decided its easiest to have null for ().  This means only static methods on lists.
boolean Boolean Simple enough. Note that the empty list is not considered false in Scheme, only #f is false.
number Double We might add BigDecimal or BigInteger.
string char[] Can't be String; they're immutable.  Could be StringBuffer, but char[] is simpler and sufficient.
character Character Simple enough.
symbol String String is a somewhat confusing choice, because you have to remember that String means symbol and char[] means string, but String has the right properties: has a name, is internable, is immutable, doesn't need anything else. Another option is to have a class for Symbol that includes a slot for the global value.  That would make lookup faster, but note that then we'd need a symbol table.  Actually, we'd need one symbol table for each instance of the Scheme interpreter, and we'd need separate copies of every symbol in each instance.
vector Object[] Because Vector is more powerful than is needed.
procedure Procedure 
Primitive 
Closure 
SchemePrimitives
Macro 
Continuation 
JavaMethod
Procedure is the abstract superclass. 
Primitive is a Scheme primitive. 
Closure is a user-defined function. 
SchemePrimitives holds Scheme code in a string; loaded at start-up.
Macro is a subclass of Closure for doing macros. It has an expand method.
Continuation is what you get from call/cc
JavaMethod encapsulates a Java Method.
input port InputPort It would have been simpler to use Reader, but I wanted someplace to put the code for read, so I created a class.
output port PrintWriter We could have had a class to hold the stringify method, but its not really necessary to have another class.
Scheme Concept Java Type Notes
interpreter Scheme  The class Scheme implements eval, apply, and a few other methods. Note you can instantiate several different Schemes at once. 
environment  Environment  An environment has lists of variables and values. It uses linear search, which is slow, especially in the global environment.
continuation Continuation call/cc builds a Continuation procedure which, when applied, throws a RuntimeException. call/cc then sets up a try/catch block that catches that identical exception only. Got it?
error RuntimeException Call me lazy, but I didn't want to clutter the program with throws clauses, so I use RuntimeException, which doesn't need to be declared.
symbol table (none) We use the internal Hashtable that Java uses in its String.intern() method as the symbol table.  That is, symbols are represented as Strings that have been interned.
variable (none) Variables are refered to implicitly by their position in Environments.