Why does this matter? As Steve Yegge said, "If you don't know how compilers work, then you don't know how computers work." Yegge describes 8 problems that can be solved with compilers (or equally well with interpreters, or with Yegge's typical heavy dosage of cynicism).
Scheme syntax is different from most other programming languages. Consider:
Java has a wide variety of syntactic conventions (keywords, infix operators, three kinds of brackets, operator precedence, dot notation, quotes, commas, comments, semicolons), but Scheme syntax is much simpler:
Java Scheme if (x.val() > 0) {
return fn(A[i] + 3 * i,
new int[] {1, 2});
}(if (> (val x) 0)
(fn (+ (aref A i) (* 3 i))
(list 1 2))
In this page we will cover all the important points of the Scheme language and its interpretation (omitting some minor details), but we will take two steps to get there, defining a simplified language first, before defining the more complete Scheme language. In a followup page we make an even more complete implementation.
(define r 10) (* pi (* r r))Here is a table of all the allowable expressions:
| Expression | Syntax | Semantics and Example |
|---|---|---|
| variable reference | symbol | A symbol is interpreted as a variable name;
its value is the variable's
value. Example: r ⇒ 10 (assuming r was previously defined to be 10) |
| constant literal | number | A number
evaluates to itself. Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6 |
| conditional | (if test conseq alt) | Evaluate test; if true,
evaluate and return conseq; otherwise
alt. Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6 |
| definition | (define symbol exp) | Define a new variable and give it
the value of evaluating the expression exp.
Examples: (define r 10) |
| procedure call | (proc arg...) | If proc is
anything other than the symbols if or define then it is treated as a procedure. Evaluate proc
and all the args, and then the procedure is applied to the list of arg values. Example: (sqrt (* 2 8)) ⇒ 4.0 |
In the Syntax column of this table, symbol must be a symbol, number must be an integer or floating point number, and the other italicized words can be any expression. The notation arg... means zero or more repetitions of arg.
program ➡ parse ➡ abstract-syntax-tree ➡ eval ➡ result
And here is a short example of what we want parse and eval to be able to do (begin evaluates each expression in order and returns the final one):
>> program = "(begin (define r 10) (* pi (* r r)))" >>> parse(program) ['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]] >>> eval(parse(program)) 314.1592653589793
Symbol = str # A Scheme Symbol is implemented as a Python str Number = (int, float) # A Scheme Number is implemented as a Python int or float Atom = (Symbol, Number) # A Scheme Atom is a Symbol or Number List = list # A Scheme List is implemented as a Python list Exp = (Atom, List) # A Scheme expression is an Atom or List
def tokenize(chars: str) -> list:
"Convert a string of characters into a list of tokens."
return chars.replace('(', ' ( ').replace(')', ' ) ').split()
Here we apply tokenize to our sample program:
>>> program = "(begin (define r 10) (* pi (* r r)))"
>>> tokenize(program)
['(', 'begin', '(', 'define', 'r', '10', ')', '(', '*', 'pi', '(', '*', 'r', 'r', ')', ')', ')']
Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ')' that's a syntax error. If it is a '(', then we start building up a list of sub-expressions until we hit a matching ')'. Any non-parenthesis token must be a symbol or number. We'll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and if it is neither of those, it must be a symbol. Here is the parser:
def parse(program: str) -> Exp:
"Read a Scheme expression from a string."
return read_from_tokens(tokenize(program))
def read_from_tokens(tokens: list) -> Exp:
"Read an expression from a sequence of tokens."
if len(tokens) == 0:
raise SyntaxError('unexpected EOF')
token = tokens.pop(0)
if token == '(':
L = []
while tokens[0] != ')':
L.append(read_from_tokens(tokens))
tokens.pop(0) # pop off ')'
return L
elif token == ')':
raise SyntaxError('unexpected )')
else:
return atom(token)
def atom(token: str) -> Atom:
"Numbers become numbers; every other token is a symbol."
try: return int(token)
except ValueError:
try: return float(token)
except ValueError:
return Symbol(token)
parse works like this:
>>> program = "(begin (define r 10) (* pi (* r r)))" >>> parse(program) ['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]]We're almost ready to define eval. But we need one more concept first.
import math
import operator as op
def standard_env():
"An environment with some Scheme standard procedures."
env = {}
env.update(vars(math)) # sin, cos, sqrt, pi, ...
env.update({
'+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv,
'>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq,
'abs': abs,
'append': op.add,
'apply': lambda proc, args: proc(*args),
'begin': lambda *x: x[-1],
'car': lambda x: x[0],
'cdr': lambda x: x[1:],
'cons': lambda x,y: [x] + y,
'eq?': op.is_,
'equal?': op.eq,
'length': len,
'list': lambda *x: list(x),
'list?': lambda x: isinstance(x,list),
'map': lambda *args: list(map(*args)),
'max': max,
'min': min,
'not': op.not_,
'null?': lambda x: x == [],
'number?': lambda x: isinstance(x, Number),
'procedure?': callable,
'round': round,
'symbol?': lambda x: isinstance(x, Symbol),
})
return env
global_env = standard_env()
We are now ready for the implementation of eval. As a refresher, we repeat the table of Lispy Calculator forms:
| Expression | Syntax | Semantics and Example |
|---|---|---|
| variable reference | symbol | A symbol is interpreted as a variable name;
its value is the variable's
value. Example: r ⇒ 10 (assuming r was previously defined to be 10) |
| constant literal | number | A number
evaluates to itself. Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6 |
| conditional | (if test conseq alt) | Evaluate test; if true,
evaluate and return conseq; otherwise
alt. Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6 |
| definition | (define symbol exp) | Define a new variable and give it
the value of evaluating the expression exp.
Examples: (define r 10) |
| procedure call | (proc arg...) | If proc is
anything other than the symbols if or define then it is treated as a procedure. Evaluate proc
and all the args, and then the procedure is applied to the list of arg values. Example: (sqrt (* 2 8)) ⇒ 4.0 |
Here is the code for eval, which closely follows the table:
def eval(x: Exp, env=global_env) -> Exp:
"Evaluate an expression in an environment."
if isinstance(x, Symbol): # variable reference
return env[x]
elif isinstance(x, Number): # constant number
return x
elif x[0] == 'if': # conditional
(_, test, conseq, alt) = x
exp = (conseq if eval(test, env) else alt)
return eval(exp, env)
elif x[0] == 'define': # definition
(_, symbol, exp) = x
env[symbol] = eval(exp, env)
else: # procedure call
proc = eval(x[0], env)
args = [eval(arg, env) for arg in x[1:]]
return proc(*args)
We're done! You can see it all in action:
>>> eval(parse("(begin (define r 10) (* pi (* r r)))"))
314.1592653589793
def repl(prompt='lis.py> '):
"A prompt-read-eval-print loop."
while True:
val = eval(parse(raw_input(prompt)))
if val is not None:
print(schemestr(val))
def schemestr(exp):
"Convert a Python object back into a Scheme-readable string."
if isinstance(exp, List):
return '(' + ' '.join(map(schemestr, exp)) + ')'
else:
return str(exp)
Here is repl in action:
>>> repl() lis.py> (define r 10) lis.py> (* pi (* r r)) 314.159265359 lis.py> (if (> (* 11 11) 120) (* 7 6) oops) 42 lis.py> (list (+ 1 1) (+ 2 2) (* 2 3) (expt 2 3)) lis.py>
| Expression | Syntax | Semantics and Example | |||
|---|---|---|---|---|---|
| quotation | (quote exp)|
Return the exp literally; do not evaluate it. | Example: (quote (+ 1 2)) ⇒ (+ 1 2) procedure | (lambda (symbol...)
exp) | Create a procedure
with parameter(s) named symbol... and exp as the body. | Example: (lambda (r) (* pi (* r r))) |
The lambda special form (an obscure nomenclature choice that refers to Alonzo Church's lambda calculus) creates a procedure. We want procedures to work like this:
lis.py> (define r (* 6 7)) lis.py> (define circle-area (lambda (r) (* pi (* r r)))) lis.py> (circle-area (+ 5 5)) 314.159265359 lis.py> r 42There are two steps here. First, we evaluate (lambda (r) (* pi (* r r))) to create a procedure, one which refers to the global variables pi and *, takes a single parameter, which it calls r. This procedure becomes the value of the new variable circle-area. Second, we call the procedure, which means that we evaluate the body of the procedure, (* pi (* r r)), with r set to 10. Note that this has no effect on the value of r in the global environment.
|
pi: 3.141592653589793
*: <built-in function mul> r: 42 ...
|
We implement environments as ChainMaps, and implement Procedures as follows:
from collections import ChainMap as Environment
class Procedure(object):
"A user-defined Scheme procedure."
def __init__(self, parms, body, env):
self.parms, self.body, self.env = parms, body, env
def __call__(self, *args):
env = Environment(dict(zip(self.parms, args)), self.env)
return eval(self.body, env)
We see that every procedure has three components: a list of parameter names, a body expression, and an environment
that tells us what other variables are accessible from the body. For a procedure defined at the top level this
will be the global environment, but it is also possible for a procedure to refer to the local variables of the
environment in which it was defined (and not the environment in which it is called).
The expression Environment(dict(zip(self.parms, args)), self.env) says to build a new mapping that first binds each of the parameters to the corresponding argument value, and then also has access to all the variables in the environment in which the procedure was defined.
To see how these all go together, here is the new definition of eval, with new clauses for quote and lambda:
def eval(x, env=global_env):
"Evaluate an expression in an environment."
if isinstance(x, Symbol): # variable reference
return env[x]
elif not isinstance(x, List): # constant literal
return x
elif x[0] == 'quote': # (quote exp)
(_, exp) = x
return exp
elif x[0] == 'if': # (if test conseq alt)
(_, test, conseq, alt) = x
exp = (conseq if eval(test, env) else alt)
return eval(exp, env)
elif x[0] == 'define': # (define var exp)
(_, var, exp) = x
env[var] = eval(exp, env)
elif x[0] == 'lambda': # (lambda (var...) body)
(_, parms, body) = x
return Procedure(parms, body, env)
else: # (proc arg...)
proc = eval(x[0], env)
args = [eval(exp, env) for exp in x[1:]]
return proc(*args)
Let's see what we can do now:
>>> repl() lis.py> (define circle-area (lambda (r) (* pi (* r r)))) lis.py> (circle-area 3) 28.274333877 lis.py> (define fact (lambda (n) (if (<= n 1) 1 (* n (fact (- n 1)))))) lis.py> (fact 10) 3628800 lis.py> (fact 100) 9332621544394415268169923885626670049071596826438162146859296389521759999322991 5608941463976156518286253697920827223758251185210916864000000000000000000000000 lis.py> (circle-area (fact 10)) 4.1369087198e+13 lis.py> (define first car) lis.py> (define rest cdr) lis.py> (define count (lambda (item L) (if L (+ (equal? item (first L)) (count item (rest L))) 0))) lis.py> (count 0 (list 0 1 2 3 0 0)) 3 lis.py> (count (quote the) (quote (the more the merrier the bigger the better))) 4 lis.py> (define twice (lambda (x) (* 2 x))) lis.py> (twice 5) 10 lis.py> (define repeat (lambda (f) (lambda (x) (f (f x))))) lis.py> ((repeat twice) 10) 40 lis.py> ((repeat (repeat twice)) 10) 160 lis.py> ((repeat (repeat (repeat twice))) 10) 2560 lis.py> ((repeat (repeat (repeat (repeat twice)))) 10) 655360 lis.py> (pow 2 16) 65536.0 lis.py> (define fib (lambda (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))) lis.py> (define range (lambda (a b) (if (= a b) (quote ()) (cons a (range (+ a 1) b))))) lis.py> (range 0 10) (0 1 2 3 4 5 6 7 8 9) lis.py> (map fib (range 0 10)) (1 1 2 3 5 8 13 21 34 55) lis.py> (map fib (range 0 20)) (1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765)We now have a language with procedures, variables, conditionals (if), and sequential execution (the begin procedure). If you are familiar with other languages, you might think that a while or for loop would be needed, but Scheme manages to do without these just fine. The Scheme report says "Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language." In Scheme you iterate by defining recursive functions.
bash$ grep "^\s*[^#\s]" lis.py | wc
107 449 3901
From there Tony and I split paths. He reasoned that the hard part was the interpreter for expressions; he needed Lisp for that, but he knew how to write a tiny C routine for reading and echoing the non-Lisp characters and link it in to the Lisp program. I didn't know how to do that linking, but I reasoned that writing an interpreter for this trivial language (all it had was set variable, fetch variable, and string concatenate) was easy, so I wrote an interpreter in C. So, ironically, Tony wrote a Lisp program (with one small routine in C) because he was a C programmer, and I wrote a C program because I was a Lisp programmer.
In the end, we both got our theses done (Tony, Peter).
To learn more about Scheme consult some of the fine books (by Friedman and Fellesein, Dybvig, Queinnec, Harvey and Wright or Sussman and Abelson), videos (by Abelson and Sussman), tutorials (by Dorai, PLT, or Neller), or the reference manual.
I also have another page describing a more advanced version of Lispy.