Life (1961) |
The Universe (1956) |
Mathematics (1964) |
Information (1966) |
But the issue that had the biggest effect on me was the September 1966 on Information (which I read about 40 years ago). The issue featured a terrific collection of authors who are now acknowledged as pioneering leaders in computer science: Evans and Sutherland explaining computer hardware; Fano and Corbato on operating systems; Tony Oettinger describing his natural language parser; and the two giants of my own subfield (Artificial Intelligence), McCarthy and Minsky, on Information Theory and AI. If I had somehow been able to comprehend everything in this issue, I could have cut a decade's time off my education in Computer Science.
In this essay I'll concentrate on on one article from the issue: Christopher Strachey's contribution on System Analysis and Programming. At the time I had seen only a few snippets of BASIC code; nothing more than a few lines. This short article was my first introduction to a high-level programming language and the first non-trivial program I'd ever read: a checkers-playing program. When I rediscovered this article recently I was surprised to find two things: (1) the programming language and programming style are thoroughly modern, and (2) there is a serious mismatch between the design and the implementation, or the systems analysis and programming as Strachey calls it. Let's look at what Strachey did surprisingly well, what he got wrong, and what he didn't include at all.
"The trouble, I think, is that so many educational processes put a high premium on getting the correct answer the first time. If you give the wrong answer on an examination question, you lose your mark and that is the end of the matter. If you make a mistake in writing your program--or indeed, in many other situations outside a classroom--it is by no means a catastrophe; you do, however, have to find your error and put it right; maybe it would be better if more academic teaching adopted this attitude also."How much better would the world be today if everyone had adopted Strachey's approach 45 years ago?
ChosenPosition(P) = value of
§ let L = LegalPositionsFrom(P)
if Null(L) then Resign
let (p,v) = BestPosition(NIL, - ∞,L,0)
result is p §
BestPosition(P,V,L,d) = Null(L) → (P,V), value of
§ let (p, l) = Next(L)
let v = - PositionValue(p,d + 1)
result is (v > V) → BestPosition(p,v,l,d),
BestPosition(P,V,l,d) §
A reader needs to recognize that the first two arguments to
BestPosition are holding the best position and value found so
far, and the third argument, L, is a list of the remaining
positions left to consider. After studying the function for a bit,
one realizes that it is computing the maximum-valued position
according to the function PositionValue (with the complication
that PositionValue gives the value for the player to move, so
you need to negate the value of the position to your opponent). It takes some mental
effort to arrive at this understanding.
I would prefer an approach that makes it immediately obvious that ChosenPosition is computing the maximum according to PositionValue, so I would code it as follows:
ChosenPosition(P) = Max(LegalPositions(P), PositionValue)Here I am assuming a slightly different representation of positions where the depth is encoded as part of the position (and you can handle the negation of opponent scores by noting whether the depth is odd or even). The function Max takes two arguments: a list of items and a function that computes the value of an item. Max returns the item with the highest value according to this function. I think this version of ChosenPosition is easier to understand. The function max in this form is built-in to the language Python, but could easily be defined in CPL as follows:
Max(Items, ValueFunction) = value of
§ (Best, BestVal) = (NIL, -∞)
while Items do §
(Item, Val) = (Head(Items), ValueFunction(Head(Items)))
if Val > BestVal then (Best, BestVal) := (Item, Val)
Items := Rest(Items) §
result is Best §
My
implementation does two things differently than Strachey. First,
separation of concerns: I've abstracted out the idea of finding a
maximum from the specifics of findings the legal position with the
maximum value, while Strachey conflates the two ideas. Second,
re-use: Once I have defined Max, I can use it again and again.
ChosenPosition(P) = value of
§ let L = LegalPositionsFrom(P)
if Null(L) then Resign
let (p,v) = BestPosition(NIL, - ∞,L,0)
result is p §
BestPosition(P,V,L,d) = Null(L) → (P,V), value of
§ let (p, l) = Next(L)
let v = - PositionValue(p,d + 1)
result is (v > V) → BestPosition(p,v,l,d),
BestPosition(P,V,l,d) §
PositionValue(P,d) = Terminal(P,d) → TerminalValue(P), value of
§ let L = LegalPositionsFrom(P)
let (p,v) = BestPosition(NIL,- ∞,L,d)
result is v §
LegalPositionsFrom(P) = value of
§ let L = RemainingPositionList(P,Capture,5)
result is Null(L)→RemainingPositionList(P,NonCapture,5),L §
RemainingPositionList(P,C,s) =
PartialPositionList(P,C,s,FindMoveList(P,C,s))
PartialPositionList(P,C,s,L) =
Null(L)→( (s = -5)→NIL,
RemainingPositionList(P,C,NextMoveDirection(s) ),
value of
§ let Φ = SingleDigitFrom(L)
let Ip = MakeMove(P,C,s,Φ)
let l = (C = Capture)→CaptureTree(Ip),
FinalPosition(Ip)
result is Join(l,PartialPositionList(P,C,s, L - Φ))§
NextMoveDirection(s) = (s = 5) → 4, ( (s = 4) → - 4, -5)
FindMoveList(P,C,s) = value of
§ let (X,Y,K,σ) = P
let Empty = ~ X ∧ ~ Y ∧ Board
let ψ = (C = Capture) → (Shift(Empty,σs) ∧ Y), Empty
let Φ = Shift(ψ, σs) ∧ X
result is (s > 0) → Φ, Φ ∧ K §
MakeMove(P,C,s,Φ) = value of
§ let (X,Y,K,σ) = P
let ψ = (C = Capture) → Shift(Φ, - σs), NIL
let θ = (C = Capture) → Shift(ψ, - σs),
Shift(Φ, - σs)
let Xk = Null(Φ ∧ K) → (θ ∧ LastRows), (θ - Φ)
result is ((X - Φ + θ), (Y - ψ), (K - ψ ∧ K + Xk),σ,θ)§
FinalPosition(Ip) = value of
§ let (X,Y,K,σ,Φ) = Ip
result is (Y,X,K, - σ)§
CaptureTree(Ip) = value of
§ let L = PartialCapturePositionList(Ip)
result is Null(L) → (FinalPosition(Ip) ),
CombineCaptureTrees(L) §
PartialCapturePositionList(Ip) = value of
§ let (X,Y,K,σ,Φ) = Ip
let P = (X,Y,K,σ)
result is MinList(PCP(P,Φ,5),PCP(P,Φ,4),
PCP(P,Φ ∧ K, - 4), PCP(P,Φ ∧ K, - 5) )§
PCP(P,Φ,s) = value of
§ let (X,Y,K,σ) = P
let ψ = Shift(Φ, - σs) ∧ Y
let Empty = ~ X ∧ ~ Y ∧ Board
let θ = Shift(ψ, - σs) ∧ Empty
let Xk = Null(Φ ∧ K) → (θ ∧ LastRows), (θ - Φ)
result is Null(θ) → NIL,
((X - Φ + θ),(Y - ψ),(K - ψ ∧ K + Xk),σ,θ)§
CombineCaptureTrees(L) = Null(L) → NIL, value of
§ let (Ip,l) = Next(L)
result is Join(CaptureTree(Ip),CombineCaptureTrees(l) )§
|
Note that this representation is not dense; some numbered squares (13, 22, 31) are not on the board. The advantage of this approach is that every square is either 4 or 5 away from its diagonal neighbors.
Given this numbering, a board position is a four-tuple (X, Y, K,
σ), where σ is the player to move, and X,Y and K are bit
patterns with a 1 in the square number where there is a piece: the X
bit pattern represents the player's pieces; Y represents the
opoonent's pieces; and K represents the kings (of either player).
A Translator for CPL
Strachey didn't have a CPL compiler when he wrote the article in 1966.
The first CPL compiler emerged around 1970, and by the 1980s they were
gone. So I had to create my own before I could run and debug the
checkers program. I had three choices to make: (1) What CPL syntax is
legal and what does it mean? (2) What parser-generator system should I
use to create the CPL translator? (3) What language should I translate
into?
(1) Most of CPL is straightforward, but I got some help by reading the 1963 and 1968 articles on CPL. I learned that "A block consists of one or more definitions followed by one or more commands, the whole being enclosed within section brackets." However, the checkers program does not follow this definition of block, because in the first function definition, ChosenPosition, we have a block that has a definition (let L), followed by a command (if Null(L)), followed by another definition. According to the 1963 paper, we should have a second block for this second definition. So either Strachey made a mistake in the checkers program or he was working with a different version of CPL then. I decided to accept definitions anywhere, not just at the start of blocks. That means that "value of" and "result is" are no-ops; the result of a function is always the last expression. I'm not sure if that was intended for CPL, but this interpretation works for all the code in the checkers program.
Another thing I learned from the 1963 paper is that CPL has two kinds of variable names: small names, which are single lower-case letters (or Greek letters, I presume from the checkers program), and large names, which start with an upper-case letter and continue with any letters. Small names need not be delimited by spaces, and it seems that implicit multiplication is allowed, so "σs" is equivalent to "σ * s".
(2) For the parser generator, I settled on YAPPS2 (Yet Another Python Parsing System), mostly because the author of the system, Amit Patel, is a friend who sits nearby, so if I ran into trouble I knew I'd be able to walk down the hall and get help. (It turns out his system was very easy to use, and I didn't need any help. Good job, Amit!)
(3) CPL is a flexible mostly-functional language, so it seems that Lisp would be a good target language. But CPL uses infix notation; I'd have to get all the operator precedence right to generate Lisp code with the parens in the right place. So I decided to generate Python code instead, and to ignore the problem of operator precedence: I would translate the CPL code "(K - ψ ∧ K + Xk)" into the Python code "(K - psi & K + Xk)" and hope that the operator precedence just works out. But there is one problem in using Python: it has a strict separation between expressions and statements (statements cannot appear inside expressions), and CPL does not have this strict distinction. Therefore, I generate code in a subset of Python that uses only expressions. For example, given the CPL function:
LegalPositionsFrom(P) = value of
§ let L = RemainingPositionList(P,Capture,5)
result is Null(L)→RemainingPositionList(P,NonCapture,5),L §
the most straightforward translation to Python is:
def LegalPositionsFrom(P):
L = RemainingPositionList(P, Capture, 5)
return (RemainingPositionList(P, NonCapture, 5) if Null(L) else L)
That would work fine for this function. But in general, a CPL block can be
an expression embedded inside another expression. Python statements
(such as "L = ..." and "return ...") can't appear
inside expressions. So we won't use them; instead we will translate a
CPL block into a Python lambda, which we will then
immediately call, using the syntax "(lambda ...)()". No
arguments are passed in this call to the lambda (because it is just a
block of code); however, we can introduce new local variables within the
block by defining optional arguments to the lambda expression, with
their default values. So we have:
def LegalPositionsFrom(P):
return (lambda L=RemainingPositionList(P, Capture, 5):
(RemainingPositionList(P, NonCapture, 5) if Null(L) else L))()
Notice that there is one exception to the "everything is an expression" rule:
every function definition is a statement of the form "def f(x): return expr".
The 1963 article also made it clear that CPL has a logical data type that is distinct from the integer data type. In languages like C and Python, the bit pattern 0b1111 is identical to the integer 15; they ae both of type int. But in CPL the two are different. I implemented the CPL logical data type with a Logical class in Python. The reason I need this is that the expression int(0) - int(1) should be -1 (because here the minus sign is ordinary integer subtraction), but Logical(0) - Logical(1) should be Logical(0) (because here the minus sign is the logical-bitset-difference operator). The Logical class has the usual set of operators (and, or, not, etc.) and has a constructor that takes as input either a bit pattern or a list of set bits. Thus, we can use either Logical(0b1111) or Logical([0,1,2,3]) to represent the logical bitset in which the four rightmost bits are true.
With those choices made, I was ready to write the grammar of CPL in
the YAPPS formalism,
with the translation into Python. The original code uses Greek letters
and math symbols (§, ∨, ∧, →); I chose to use HTML
entities for these (rather than Unicode). You can see the grammar in
the file cpl.g.
The Rest of the Checkers Program
Scientific American limited Strachey in his page count, and so didn't show the whole
program. He left out some of the basic primitives of CPL (such as
Shift, which does a logical shift of a bit string) and some of the parts of the
checkers program that didn't seem crucial (such as printing the
board). You can see these missing parts in the file checkers.py or as an
annotated web page.
Debugging the Checkers Program
Here are the bugs/misunderstandings I discovered as I implemented the CPL parser and
checkers program:
However, the correspondance between software and documentation is sloppy. The number of arguments on functions don't match; not all documented functions are implemented; there is confusion between what is a list and what is a string of bits. If only Strachey the analyst and Strachey the programmer had iterated a few times, the article would have been even more of a work of beauty!