Solving Every Sudoku Puzzle Quickly

In this essay I develop and implement an algorithm to solve every Sudoku puzzle. It turns out to be quite easy (about one page of code for the main idea and two pages for embellishments that make it faster) using two ideas: search and constraint propagation.

Sudoku notation and preliminary notions

First we have to agree on some notation. A Sudoku puzzle is a square grid of 81 squares, which is composed of a 3x3 array of boxes, where each box is a 3x3 array of squares. The columns are numbered 0 to 8 (left to right), the rows also 0 to 8 (top to bottom), and the squares are numbered 0 to 80, left to right first, then top to bottom. A collection of nine squares (either row, column, or box) is called a unit and the peers of a square are all the squares that share a unit with that square. We define Sudoku puzzles and solutions as follows:

A puzzle is a grid in which a subset of squares have been filled with a digit 1 to 9 and the rest are blank.
A solution to a puzzle is a grid where every square is filled with a digit 1 to 9, every unit contains all of the 9 digits, and the filled squares of the original puzzle remain unchanged in the solution.
A well-formed puzzle has exactly one solution.

The definition of a solution implies that no digit can appear twice in a unit, every digit must appear once, and every square must have a different value from any of its peers. Here are the numbers of the squares, a typical puzzle, and the solution to the puzzle:

 00 01 02| 03 04 05| 06 07 08    4 . . |. . . |8 . 5     4 1 7 |3 6 9 |8 2 5 
 09 10 11| 12 13 14| 15 16 17    . 3 . |. . . |. . .     6 3 2 |1 5 8 |9 4 7
 18 19 20| 21 22 23| 24 25 26    . . . |7 . . |. . .     9 5 8 |7 2 4 |3 1 6 
---------+---------+---------    ------+------+------    ------+------+------
 27 28 29| 30 31 32| 33 34 35    . 2 . |. . . |. 6 .     8 2 5 |4 3 7 |1 6 9 
 36 37 38| 39 40 41| 42 43 44    . . . |. 8 . |4 . .     7 9 1 |5 8 6 |4 3 2 
 45 46 47| 48 49 50| 51 52 53    . . . |. 1 . |. . .     3 4 6 |9 1 2 |7 5 8 
---------+---------+---------    ------+------+------    ------+------+------
 54 55 56| 57 58 59| 60 61 62    . . . |6 . 3 |. 7 .     2 8 9 |6 4 3 |5 7 1 
 63 64 65| 66 67 68| 69 70 71    5 . . |2 . . |. . .     5 7 3 |2 9 1 |6 8 4 
 72 73 74| 75 76 77| 78 79 80    1 . 4 |. . . |. . .     1 6 4 |8 7 5 |2 9 3

Every square has exactly 3 units and 20 peers. For example, here are the units and peers for square number 19:

    01   |         |                    |         |            00 01 02|         |         
    10   |         |                    |         |            09 10 11|         |         
    19   |         |            18 19 20| 21 22 23| 24 25 26   18 19 20|         |         
---------+---------+---------  ---------+---------+---------  ---------+---------+---------
    28   |         |                    |         |                    |         |         
    37   |         |                    |         |                    |         |         
    46   |         |                    |         |                    |         |         
---------+---------+---------  ---------+---------+---------  ---------+---------+---------
    55   |         |                    |         |                    |         |         
    64   |         |                    |         |                    |         |         
    73   |         |                    |         |                    |         |

It is also possible to have puzzles of different sizes. Instead of having boxes of size 3x3, each box could be 2x2 or 4x4 or 5x5, etc. We'll use the variable name Nbox for the box size, and N for the whole puzzle size. So the traditional Sudoku puzzle has Nbox = 3 and N = 9, and therefore 81 squares; a puzzle with Nbox = 5 has N = 25 and Nbox⁴ = 625 squares. Of course, each puzzle must have N "digits" as well (we can use letters as digits when N > 9).

We can implement the notions of units, peers, and squares in the programming language Python as follows:

def cross(R, C):
    "All squares in these rows and columns"
    return [N*r+c for r in R for c in C]

def grouper(n, items):
    "Group items n at a time: grouper(3, 'abcdefghi') => ['abc', 'def', 'ghi']"
    return [items[i:i+n] for i in range(0, len(items), n)]

Nbox     = 3         # Each box is of size Nbox x Nbox
N        = Nbox*Nbox # The whole grid is of size N x N
rows     = range(N)
cols     = rows
squares  = range(N*N)
digits   = '123456789abcdefghijklmnopqrstuvwxyz!'[:N] # E.g., '123456789' for N = 9
digitset = set(digits)
rowunits = [cross([r], cols) for r in rows]
colunits = [cross(rows, [c]) for c in cols]
blocks   = grouper(Nbox, rows) # E.g., [[0, 1, 2], [3, 4, 5], [6, 7, 8]] for N = 9
boxunits = [cross(rblock, cblock) for rblock in blocks for cblock in blocks]
allunits = rowunits + colunits + boxunits
units    = [[u for u in allunits if s in u] for s in squares]
peers    = [set(s2 for u in units[s] for s2 in u if s2!=s) for s in squares]

The next step is to define the Sudoku playing grid. We will actually define two representations for grids: First, a textual format used to specify the initial state of a puzzle, suitable for representing example puzzles in a file:

A gridstring is defined as a character string with 1-9 indicating a digit, and a 0 or period specifying an empty square. All other characters are ignored (including spaces, newlines, dashes, and bars). These conventions cover most, but not all, puzzle formats that are readily available for download. Each of the following three gridstring strings represent the same puzzle:

"4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......"

"""
400000805
030000000
000700000
020000060
000080400
000010000
000603070
500200000
104000000"""

"""
4 . . |. . . |8 . 5
. 3 . |. . . |. . .
. . . |7 . . |. . .
------+------+------
. 2 . |. . . |. 6 .
. . . |. 8 . |4 . .
. . . |. 1 . |. . .
------+------+------
. . . |6 . 3 |. 7 .
5 . . |2 . . |. . .
1 . 4 |. . . |. . .
"""

Our second representation is used as an internal representation of any state of a puzzle (initial, partially solved or complete):

A grid is a list (the Python name for a vector/array) of 81 elements, we will call this a values list because we use it to describe the values for each square.

Now for the values list representation. One might think that a 9x9 array would be the obvious data structure. But two-dimensional arrays are a bit complicated in Python, so we will use a one-dimensional array (which Python calls a list) of length 81. There is one complication: we will soon start filling in Sudoku grids with digits. If we make a mistake and reach a contradiction, we will use the value False to indicate the contradiction. Thus, functions that take values as a parameter must also handle False appropriately. Here is the function parse to parse a gridstring into a values list, and the inverse function, show, to turn a values list into a gridstring--one that can be printed in a pretty format:

def parse(gridstring):
    "Convert gridstring into a list of values. '.' or '0' used for an empty square."
    gridstring = gridstring.replace('0', '.')
    values = [c for c in gridstring if c in digits or c == '.']
    assert len(values) == N*N
    return values

def show(values):
    "Convert a list of values to a string laid out as a 2-dimensional gridstring."
    if values is False: return "NO SOLUTION"
    line = '\n' + '+'.join(['--'*Nbox]*Nbox) + '\n'
    def showsquare(s):
        r, c = s/N, s % N
        sep = ((line if (r % Nbox == Nbox-1 and r != N-1) else '\n')
               if (c % N == N-1) else (' |' if (c % Nbox == Nbox-1) else ' '))
        return values[s] + sep
    return ''.join(showsquare(s) for s in squares)

And here is an example of the functions at work:

>>> grid1 = "4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......"
>>> parse(grid1)
['4', '.', '.', '.', '.', '.', '8', '.', '5',
 '.', '3', '.', '.', '.', '.', '.', '.', '.',
 '.', '.', '.', '7', '.', '.', '.', '.', '.',
 '.', '2', '.', '.', '.', '.', '.', '6', '.',
 '.', '.', '.', '.', '8', '.', '4', '.', '.',
 '.', '.', '.', '.', '1', '.', '.', '.', '.',
 '.', '.', '.', '6', '.', '3', '.', '7', '.',
 '5', '.', '.', '2', '.', '.', '.', '.', '.',
 '1', '.', '4', '.', '.', '.', '.', '.', '.']
>>> print show(parse(grid1))
4 . . |. . . |8 . 5
. 3 . |. . . |. . .
. . . |7 . . |. . .
------+------+------
. 2 . |. . . |. 6 .
. . . |. 8 . |4 . .
. . . |. 1 . |. . .
------+------+------
. . . |6 . 3 |. 7 .
5 . . |2 . . |. . .
1 . 4 |. . . |. . .

Now let's define what it means to be a solution: as we said before, a solution is a grid (represented as a values list) where every unit is filled with a permutation of the digits 1 to 9, and the filled squares of the original puzzle remain unchanged in the solution:

def is_solution(values, puzzle):
    """A list of values is a solution to a puzzle if each unit is a permutation
    of the digits 1 to 9 and if the filled squares remain unchanged."""
    filled_squares = [s for s in squares if puzzle[s] in digitset]
    def is_permutation(unit): return set(values[s] for s in unit) == digitset
    return (values is not False
            and all(is_permutation(u) for u in allunits)
            and all(values[s]==puzzle[s] for s in filled_squares))

Searching for solutions

Humans solve Sudoku problems by using logical deduction to infer that a certain digit is (or is not) the value of a square; such deductions are repeated until all squares are filled with a single known digit. We could write an algorithm to do deductions like that, but the difficulty is in knowing when you have enough inference rules to solve every possible puzzle.

Therefore we will take a different route. Technically, it is known as model checking rather than deduction, and it consists of searching for a solution that works by trying out possibilities one by one. We will describe a series of search algorithms, starting with the hopelessly inefficient and ending with one that can solve over 50 of the hardest Sudoku puzzles per second.

Algorithm 1: generate and test

Here is the simplest possible search algorithm, which we call generate and test:

First fill in all the empty squares with digits. Then check to see if we have a solution. If not, then fill the empty squares with a different combination of digits. Repeat with different combinations until a solution is found.

Sometimes generate and test is the best algorithm available. Herb Simon, in his classic book The Sciences of the Artificial (page 194), describes the task of opening a safe that has ten dials, each with 100 numbers on it. If the safe is well-designed there is no better approach than trying all 100¹⁰ combinations.

Here is the implementation of generate and test for Sudoku:

def solve(gridstring):
    "Try every combination of digits for the empty squares; test each one."
    from itertools import product
    puzzle = parse(gridstring)
    for candidate in product(*[(v if v in digits else digits) for v in puzzle]):
        if is_solution(candidate, puzzle):
            return candidate
    return None

And here it is solving a tiny puzzle. There are ten empty squares, so generate and test first tries to fill them with the digits 1111111111, then 1111111112, then 1111111113, and so on until a solution is found at 3441321214.

>>> tiny = '12..3..24......3'
>>> print show(parse(tiny))
1 2 |. .
3 . |. 2
----+----
4 . |. .
. . |. 3

>>> print show(solve(tiny))
1 2 |3 4
3 4 |1 2
----+----
4 3 |2 1
2 1 |4 3

It took 4 seconds to solve this tiny 4x4 puzzle. Now consider the 9x9 puzzle defined above as grid1. It has 17 filled squares, and thus 64 empty squares. So there are 9⁶⁴ possibile combinations (about 10⁶¹) to consider. How long will it take to process them?

Suppose we have a custom 1024 core 10 GHz CPU that features a special instruction that can generate and test a position in one clock cycle, and let's say we could afford to buy a billion of these, and while we're shopping, let's say we also pick up a time machine and go back 13 billion years to the early days of the universe, and a fleet of starships with which we visit 10 billion galaxies and in each galaxy convince the inhabitants of 10 billion planets to each buy the same number of computers, and we all start our programs running, managing to distribute the cases perfectly so that no computation is duplicated or wasted. Then we can estimate that we'd be less than 4% done with this one puzzle by now.

Generate and test isn't the algorithm you're looking for. Move along.

Algorithm 2: incremental search

In Simon's safe example, he next supposes a defective safe, in which a faint click is heard whenever a dial is set to the right position. With this safe it only takes 100 × 10 trials to discover the correct combination, not 100¹⁰. If we have some selective knowledge of components of the safe (or puzzle), our search will be much easier.

Fortunately for us, Sudoku is more like the defective safe. To be precise, it is like a safe with 81 dials, each with 9 numbers, and sometimes if you put one of the dials in the wrong position you hear a sound, but sometimes not (depending on the positions of the other dials).

For example, if we fill the upper right square of the tiny puzzle with a 1, 2, or 3 then we can detect right away there is a problem: there would be two of the same digit in the top row or the rightmost column. Once we detect that we needn't consider any combination of values for the remaining 8 unfilled squares--nothing we can do with them will correct the flaw in the upper right square. Instead of enumerating values for the 8 squares we can skip ahead and consider a different value (4) for the upper right square. This suggests the incremental search algorithm:

Start filling squares with digits, always making sure that the digit we put in a square is not the same as any peer. If we reach a square where every possible digit conflicts with a peer, back up and consider a different digit for the previous square. Stop when we successfully fill every square, or give up if we tried all possibilities without finding a solution.

The progress of this algorithm can be described as a search tree, where each node represents a partially-filled state of the puzzle, and a branch corresponds to filling in a square with a digit. In the following tree we order the squares left-to-right, top-to-bottom, and order the digits to 1-to-4 (different orderings would give different trees).

        1 2 |. .
        3 . |. 2
        ----+----
        4 . |. .
        . . |. 3
            :
            :
    :.......:.......:
    :               :
    :               :
1 2 |3 .        1 2 |4 .   # The first empty square can be filled with 3 or 4
3 . |. 2        3 . |. 2
----+----       ----+----
4 . |. .        4 . |. .
. . |. 3        . . |. 3
    :
    :
1 2 |3 4    # For this square there is only one choice: 4
3 . |. 2
----+----
4 . |. .
. . |. 3
    :
 (etc.)                    # We skip a few squares
    :
    :...............:
    :               :
1 2 |3 4        1 2 |3 4
3 4 |1 2        3 4 |1 2
----+----       ----+----
4 1 |. .        4 3 |. .   # There are two choices here, 1 and 3
. . |. 3        . . |. 3   # We consider 1 first, then come back to 3 after the dead end
    :               :
    :               :
1 2 |3 4        1 2 |3 4
3 4 |1 2        3 4 |1 2
----+----       ----+----
4 1 |2 .        4 3 |2 .
. . |. 3        . . |. 3
#(dead end)         :
                    :
                 (etc.)
                    :
                    :
                1 2 |3 4
                3 4 |1 2
                ----+----
                4 3 |2 1
                2 1 |4 3   # Found the solution!

The code to implement this is straightforward:

def solve(gridstring):
    "Parse gridstring and search for a solution."
    return search(parse(gridstring))

def search(values):
    """Select an unfilled square, try each possible value in order, recursively searching.
    When all filled: success; when no more digits to try: failure."""
    if values is None: return None
    s = select_square(values)
    if s is None: return values
    for d in possible(values, s):
        result = search(assign(values[:], s, d))
        if result: return result
    return None

def select_square(values):
    "Just choose the first square that is not filled yet."
    for s in squares:
        if values[s] not in digitset: return s
    return None

def possible(values, s):
    "A square can be any digit that is not already taken by a peer."
    return digitset - set(values[s] for s in peers[s])

def assign(values, s, d):
    "For now, simply assign values[s] = d."
    values[s] = d
    return values

The key function is search. It obeys the convention that if it is passed None (to indicate an invalid grid, such as one with two 1s in the top row) it returns None, meaning that no solution can be found. Otherwise it selects some unfilled square to work on. If select_square returns None that is actually good news: it means that all the squares are already filled, and we are done: we have a solution, so we just return it. Otherwise, line 60 says that for each possible digit that could fill square s we try to assign that digit to s and search for a solution from there. If some call to search succeeds, return it; but if a digit assignment causes the search to fail, just keep going with the next digit assignment. If all digit assignments fail, then the whole call to search fails, and we back up to the previous recursive call.

Here we see solve (and therefore search) in action:

>>> print show(solve(grid1))
4 1 7 |3 6 9 |8 2 5
6 3 2 |1 5 8 |9 4 7
9 5 8 |7 2 4 |3 1 6
------+------+------
8 2 5 |4 3 7 |1 6 9
7 9 1 |5 8 6 |4 3 2
3 4 6 |9 1 2 |7 5 8
------+------+------
2 8 9 |6 4 3 |5 7 1
5 7 3 |2 9 1 |6 8 4
1 6 4 |8 7 5 |2 9 3

That's all there is to it! The only reason we don't stop this article now is that the program is still too slow for some purposes. We're not talking billions-of-years slow, but it did take 3 minutes and 45 seconds to solve this puzzle (a rather hard one). On a benchmark of 50 easy puzzles the program took a total of 9.5 seconds; a rate of 5 Hz. Not bad, but we can do much better.

Algorithm 3: forward checking

The incremental search algorithm prevents us from wasting time enumerating values for the rest of a grid after we have discovered that two filled squares conflict with each other. But what it does not do is prevent us from generating the same conflict over and over again. Consider again grid1, and the two squares marked A and B:


4 . . |. A . |8 . 5
. 3 . |. . . |. . .
. . . |7 . . |. . .
------+------+------
. 2 . |. B . |. 6 .
. . . |. 8 . |4 . .
. . . |. 1 . |. . .
------+------+------
. . . |6 . 3 |. 7 .
5 . . |2 . . |. . .
1 . 4 |. . . |. . .

We can tell right from the start (by examining peers) that square A must be one of 2,3,6,7,9 and B must be one of 3,4,5,7,9. Suppose we are at a point in the search where we are considering 3 as a possible value for A. We are considering squares in left-to-right, top-to-bottom order, so square B has not been considered yet. But when we eventually get to B we will detect a contradiction, because it turns out that B must be 3. So we will back up in the search, and will end up trying many combinations for all the squares between A and B. We'd like to make the process of failing faster--we don't want to wait until we get around to checking square B to see that there is a contradiction if there is a way to detect it faster.

Our approach will be for each square to keep track of its own remaining possible values. Then we can fail immediately when any square runs out of possibilities; we won't have to wait until the square is chosen by select_square. This approach is called forward checking because we look forward and check squares other than the currently selected square.

The implementation of forward checking is easy. Currently values[s] is equal to '.' for an empty square; we will change that so that values[s] holds a collection of all the remaining possible digits for the square. When this is reduced to the empty collection, for any square, we can immediately fail. The collection could be represented by a Python set or list, but I chose instead to use a string of digits. So the value for square A would start out as the string '23679'.

We'll change solve so that it calls initialize to set all the unfilled squares to their initial value according to this convention. We'll also change assign(values, s, d) so that it eliminates d from all the peers of s and fails if any peer has no remaining possible values. With these added complications one function becomes easier: possible can now just look up the collection of possible values rather than having to compute it.

def solve(gridstring):
    "Parse gridstring, handle initial constraints, and search for a solution."
    return search(initialize(parse(gridstring)))

def initialize(initvalues):
    """First set each square to have the domain '123456789'.
    Then do forward checking for each of the filled squares."""
    values = N * N * [digits]
    for s in squares:
        if initvalues[s] in digitset: values = assign(values, s, initvalues[s])
    return values

def possible(values, s): return values[s]

def assign(values, s, d):
    "Assign d to s and eliminate d from the peers of s. Return None on contradiction."
    if d not in values[s]: return None
    values[s] = d
    if not all(eliminate(values, p, d) for p in peers[s]): return None
    return values

def eliminate(values, s, d):
    "Eliminate d as a possible value of s. Fail if forward checking finds a contradiction."
    values[s] = values[s].replace(d, '')
    return forward_check(values, s)

def forward_check(values, s):
    "Fail if s has no values, or a single value the same as a peer."
    V = len(values[s])
    if V==0 or (V==1 and any(values[p]==values[s] for p in peers[s])):
        return None
    return values

Forward checking is a kind of constraint satisfaction. The fact that peers must have different values is a constraint, and we say that a solution is an assignment of values that satisfies all the constraints.

How much does forward checking help? For the easy puzzles, not much. The reason they are called "easy" is that they don't require much forward checking. For the harder grid1 puzzle the solving speed is improved by a factor of 5. But that's still 43 seconds. Let's keep improving.

Algorithm 4: arc consistency

Forward checking helps because every time we eliminate a possible value from a square we peek forward to peer squares to see if they will cause a failure. We can improve that idea by looking even farther ahead to the peers of peers. You might think that eliminating, say, the digit 1 from square s would effect only the peers of s and could have no effect on the peers of the peers of s. However, suppose that the peer s2 had only the possibilities 1 and 2. If we eliminate 1, then s2 must be 2, and we can then eliminate 2 from all the peers of s2. If some peer of s2 had only the possibilities 2 and 3, then it must become 3, and we can keep on going from there.

This idea is called arc consistency; it gets its name from the practice of describing a constraint satisfaction problem as a graph where the nodes are the variables (in Sudoku, the squares) and the arcs are the constraints. We make the arcs (the peer constraints) consistent one by one, and we can end up traversing multiple links in the graph (whereas the forward checking algorithm only looks ahead one arc in the graph).

The implementation of arc consistency for Sudoku is a simple change to what we already have:

def eliminate(values, s, d):
    "Eliminate d as a possible value of s.  Fail if not arc consistent."
    if d not in values[s]: return values # Already eliminated d
    values[s] = values[s].replace(d, '')
    return arc_consistent(values, s)

def arc_consistent(values, s):
    "Fail if s has no values left, or"
    V = len(values[s])
    if V==0 or (V==1 and not assign(values, s, values[s])):
        return None
    return values

Compared to forward checking, arc consistency reduces the time to solve grid1 from 43 seconds down to 0.16 seconds (a 266-fold speedup). The 50 easy problems are speeded up from a rate of 5 per second to 282 per second, a 56-fold speedup. Clearly, arc consistency is very effective on Sudoku problems (as it is on many constraint satisfaction problems), and one might be tempted to stop here. But before we declare victory and go home, let's do two things: first, benchmark the algorithm on a sampling of difficult problems as well as easy ones, and second, try some variations to see if we can do even better.

Benchmarks

I have collected some benchmark problems from around the web that will be used to measure the performance of algorithms. Thanks particularly to Gordon Royle, Peter Cock, Guenter Stertenbrink and Jean Charles Meyrignac. Here is a description of the files:

Filename N puzzles Description
grid1.txt 1 hard puzzle, called grid1 above
easy50.txt 50 easy puzzles
hardest.txt 11 puzzles from around the web advertised as "hardest"
top95.txt 95 hard puzzles
top870.txt 870 hard puzzles
top2365.txt 2365 hard puzzles
subig20.txt 17,445 hard puzzles
min17.txt 49,151 hard puzzles with 17 filled squares

Filename	N puzzles	Description
`grid1.txt`	1	hard puzzle, called `grid1` above
`easy50.txt`	50	easy puzzles
`hardest.txt`	11	puzzles from around the web advertised as "hardest"
`top95.txt`	95	hard puzzles
`top870.txt`	870	hard puzzles
`top2365.txt`	2365	hard puzzles
`subig20.txt`	17,445	hard puzzles
`min17.txt`	49,151	hard puzzles with 17 filled squares

Here is the code to read puzzles from a file, run them, and print statistics about the results:

def from_file(filename, sep=None):
    "Parse a file into a list of strings, separated by sep."
    return file(filename).read().strip().split(sep)

benchmarks = [ # A collection of files with puzzles, and the URLs they came from.
    (0, "grid1.txt",      "magictour.free.fr/top95"),
    (1, "easy50.txt",     "projecteuler.net/index.php?section=problems&id=96"),
    (2, "hardest.txt",    "norvig.com"),
    (3, "top95.txt",      "magictour.free.fr/top95"),
    (4, "top870.txt",     "magictour.free.fr/top870"),
    (5, "top2365.txt",    "magictour.free.fr/top2365"),
    (6, "subig20.txt",    "www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/python/sudoku/"),
    (7, "min17.txt",      "mapleta.maths.uwa.edu.au/~gordon/sudokumin.php"),
    ]

def benchmark(todo=range(10), version="", showif=60.0):
    "Run several benchmark puzzles, in files. Show ones that take longer than showif seconds."
    show_header = True
    for (num, filename, url) in benchmarks:
        if num in todo:
            solve_all(from_file(filename), filename.replace('.txt',''), version, showif, show_header)
            show_header = False
        
def solve_all(gridstrings, name, version, showif=0.0, show_header=True):
    """Attempt to solve a sequence of gridstrings. Report stats on solving times.
    Show puzzles that take longer than showif seconds to solve."""
    times = []
    for gridstring in gridstrings:
        values, t = time_call(solve, gridstring)
        times.append(t)
        ok = is_solution(values, parse(gridstring))
        ## Display puzzles that take long enough, or that fail
        if t > showif or not ok:
            print side_by_side(show(parse(gridstring)), show(values),
                               'Solution in %.3f seconds' % t)
            assert ok
    print stats(times, show_header, name=name, version=version)

def time_call(function, *args, **keyword_args):
    "Return the result of applying function to args, and the number of seconds it took."
    t0 = time.clock()
    result = function(*args, **keyword_args)
    t1 = time.clock()
    return result, t1-t0

def side_by_side(*strings):
    "Concatenate several strings on a line-by-line basis."
    split_strings = [st.split('\n') for st in strings]
    nlines = max(len(st) for st in split_strings)
    def next_line(): return '        '.join((st.pop(0) if st else '') for st in split_strings)
    return '\n'.join(next_line() for _ in range(nlines))

def stats(times, show_header=True, **extras):
    "Return a tab-separated header+data string representing various statistics about data."
    # Compute Stats
    total = sum(times)
    n = len(times)
    mean = total/n
    stddev = (sum((t-mean)**2 for t in times)/n)**0.5
    times.sort()
    # Format stats
    def p(percentile): return times[n*percentile/100]
    def fmt(vals): return '  '.join(('%7.3f'%x if isinstance(x, float) else '%7s'%x) for x in vals)
    return ((fmt(extras.keys() + 'N Hz avg stddev min 1% 25% median 75% 99% max'.split()) + '\n'
            if show_header else '')
            + fmt(extras.values() + [n, int(n/total), mean, stddev, min(times),
                                     p(1), p(25), p(50), p(75), p(99), max(times)]))

Algorithm 5: unit consistency

The next step is to do a better job of exploiting the constraints. So far, we detect a failure if some square has no remaining possible value. But we could also detect the case where some value (digit) has no remaining possible square. That is, all the squares in a unit might all have several possible values left, but if one digit, say 3, does not appear in any square, then we can fail immediately. And if the digit 3 appears as a possibility in only one square, then we can assign it to that square. Here is the code to implement this idea:

def eliminate(values, s, d):
    "Eliminate d as a possible value of s.  Fail if not arc and unit consistent."
    if d not in values[s]: return values # Already eliminated d
    values[s] = values[s].replace(d, '')
    return arc_consistent(values, s) and unit_consistent(values, s, d)

def unit_consistent(values, s, d):
    "Fail if some unit has no place for digit d."
    for u in units[s]:
        dplaces = [s for s in u if d in values[s]] # Remaining places for d
        D = len(dplaces)
        if D==0 or (D==1 and not assign(values, dplaces[0], d)):
            return None
    return values

Algorithm 6: naked pairs

The literature on Sudoku solving strategies includes some colorful names; one of them is the so-called naked pairs strategy. It involves finding two squares in a unit which each have the same two possible values. For example, suppose two squares in a box have both been wittled down to the possible values 3 and 5. We don't know which is 3 and which is 5, but either way, we know that no other square in the box can be either 3 or 5, so those values can be eliminated. Here is the code:

def eliminate(values, s, d):
    "Eliminate d as a possible value of s.  Fail if not arc and unit consistent; detect naked pairs."
    if d not in values[s]: return values # Already eliminated d
    values[s] = values[s].replace(d, '')
    return arc_consistent(values, s) and unit_consistent(values, s, d) and naked_pairs(values, s)

def naked_pairs(values, s):
    """If two squares s and s2 in a unit have the same two values, eliminate
    those values from all other squares s3 in the unit."""
    if values is None: return None
    v = values[s]
    if len(v) != 2: return values
    for u in units[s]:
        for s2 in u:
            if s2 != s and values[s2] == v:
                # s and s2 are a naked pair; remove values from other squares (s3) in unit
                for s3 in u:
                    if s != s3 != s2:
                        if not (eliminate(values, s3, v[0]) and eliminate(values, s3, v[1])):
                            return None
    return values

Benchmark comparison of algorithms so far

I ran the benchmark code on the five serious algorithms: incremental search (incr), forward checking (FC), arc consistency (AC), unit consistency (AC+Unit), and naked pairs (AC+U+NP). For each benchmark file we can see the name of the file, the number of puzzles, the average speed of solving (expressed two ways, as a frequency in Hertz and as the average number of seconds), the standard deviation of the results, and then, to give an idea of the distribution of times, the minimum, the first percentile, 25th percentile, median, 75th percentile, 99th percentile, and maximum times. Here are the results:

version     name        N       Hz      avg   stddev      min       1%      25%   median      75%      99%      max
   incr    grid1        1        0  225.729    0.000  225.729  225.729  225.729  225.729  225.729  225.729  225.729

     FC    grid1        1        0   42.132    0.000   42.132   42.132   42.132   42.132   42.132   42.132   42.132
     FC   easy50       50        5    0.172    0.272    0.004    0.004    0.010    0.050    0.161    0.987    0.987

     AC    grid1        1        6    0.159    0.000    0.159    0.159    0.159    0.159    0.159    0.159    0.159
     AC   easy50       50      285    0.004    0.005    0.001    0.001    0.001    0.002    0.003    0.025    0.025
     AC  hardest       11       22    0.044    0.113    0.001    0.001    0.002    0.010    0.014    0.401    0.401
     AC    top95       95        0    1.030    2.668    0.001    0.001    0.024    0.072    0.329   16.211   16.211
     AC   top870      870        4    0.221    1.111    0.001    0.001    0.006    0.020    0.060    6.943   16.197

AC+Unit    grid1        1       62    0.016    0.000    0.016    0.016    0.016    0.016    0.016    0.016    0.016
AC+Unit   easy50       50      188    0.005    0.001    0.005    0.005    0.005    0.005    0.005    0.009    0.009
AC+Unit  hardest       11       67    0.015    0.010    0.005    0.005    0.006    0.011    0.018    0.035    0.035
AC+Unit    top95       95       15    0.067    0.128    0.005    0.005    0.010    0.021    0.055    0.890    0.890
AC+Unit   top870      870       33    0.030    0.074    0.005    0.005    0.008    0.014    0.028    0.298    1.273

AC+U+NP    grid1        1      147    0.007    0.000    0.007    0.007    0.007    0.007    0.007    0.007    0.007
AC+U+NP   easy50       50      183    0.005    0.001    0.005    0.005    0.005    0.005    0.005    0.007    0.007
AC+U+NP  hardest       11       83    0.012    0.007    0.005    0.005    0.007    0.009    0.015    0.031    0.031
AC+U+NP    top95       95       30    0.033    0.063    0.005    0.005    0.007    0.012    0.025    0.489    0.489
AC+U+NP   top870      870       58    0.017    0.033    0.005    0.005    0.006    0.010    0.017    0.137    0.596

Note that for the easy50 file, the arc consistency algorithm is fastest. That is because it is the simplest algorithm (it has no unnecessary checks or bookkeeping), yet it does enough to solve all the easy puzzles. For all the other benchmarks, it is worth the effort to incorporate arc consistency plus unit consistency plus naked pairs.

Algorithms 6-9: variable ordering

Note that our select_square function does not do much in the way of selection: it just returns the first square it finds that has not been filled yet. Could being more clever about selecting a square speed things up? We'll stick with arc consistency plus unit consistency plus naked pairs, but we'll compare the square selection strategy we've been using so far (which we will call firstV) with three alternative selection strategies: randomV, which chooses a random unfilled square; MinRV which chooses a square with the Minimum number of Remaining Values (that is, if there is a square with only 2 possible values we would choose that over one with 3 or more values), and MaxRV, which reverses that choice. MinRV is generally held to be a good strategy; the idea is that if there are only two possibilities you have a 50% chance of guessing right.

Here are the results; the benchmarks uphold the idea that MinRV is a good strategy; it performs the best on all the benchmarks except easy50, where it falls just slightly behind firstV. This is true on the average time to solution, and even more so on the maximum time.

version     name        N       Hz      avg   stddev      min       1%      25%   median      75%      99%      max
 firstV   easy50       50      183    0.005    0.001    0.005    0.005    0.005    0.005    0.005    0.007    0.007
 firstV  hardest       11       83    0.012    0.007    0.005    0.005    0.007    0.009    0.015    0.031    0.031
 firstV    top95       95       30    0.033    0.063    0.005    0.005    0.007    0.012    0.025    0.489    0.489
 firstV   top870      870       58    0.017    0.033    0.005    0.005    0.006    0.010    0.017    0.137    0.596

randomV   easy50       50      165    0.006    0.003    0.005    0.005    0.005    0.005    0.006    0.019    0.019
randomV  hardest       11      113    0.009    0.004    0.005    0.005    0.006    0.006    0.010    0.018    0.018
randomV    top95       95        7    0.127    0.203    0.005    0.005    0.021    0.047    0.120    1.090    1.090
randomV   top870      870       25    0.039    0.101    0.005    0.005    0.010    0.018    0.032    0.490    1.762

  MinRV   easy50       50      182    0.005    0.001    0.005    0.005    0.005    0.005    0.005    0.008    0.008
  MinRV  hardest       11      142    0.007    0.002    0.005    0.005    0.005    0.006    0.010    0.011    0.011
  MinRV    top95       95       57    0.017    0.017    0.005    0.005    0.007    0.012    0.021    0.132    0.132
  MinRV   top870      870       73    0.014    0.012    0.005    0.005    0.007    0.010    0.016    0.055    0.185

  MaxRV   easy50       50      164    0.006    0.002    0.005    0.005    0.005    0.005    0.005    0.017    0.017
  MaxRV  hardest       11       25    0.038    0.058    0.005    0.005    0.007    0.013    0.058    0.210    0.210
  MaxRV    top95       95        0    1.161    2.647    0.005    0.005    0.075    0.172    0.913   19.633   19.633
  MaxRV   top870      870        4    0.223    0.983    0.005    0.005    0.025    0.050    0.109    5.129   19.664

Here is the code to implement the three new variable selection strategies and produce the benchmark results:

benchmark([1,2,3,4], "firstV")

import random

def select_square(values):
    "Return an unfilled square with the fewest possible values; or None if no unfilled squares."
    def numvalues(s): return len(values[s])
    unfilled = [s for s in squares if numvalues(s) > 1]
    return None if unfilled==[] else random.choice(unfilled)

benchmark([1,2,3,4], "randomV")

def select_square(values):
    "Return an unfilled square with the fewest possible values; or None if no unfilled squares."
    def numvalues(s): return len(values[s])
    unfilled = [s for s in squares if numvalues(s) > 1]
    return None if unfilled==[] else min(unfilled, key=numvalues)

benchmark([1,2,3,4], "MinRV")

def select_square(values):
    "Return an unfilled square with the fewest possible values; or None if no unfilled squares."
    def numvalues(s): return len(values[s])
    unfilled = [s for s in squares if numvalues(s) > 1]
    return None if unfilled==[] else max(unfilled, key=numvalues)

benchmark([1,2,3,4], "MaxRV")

Algorithm 10-12: value ordering

The next question is: given that we have selected a variable, in what order should we consider the possible values? The current definition of possible just goes in increasing order of digits: 1,2,3, etc. In the literature on constraint satisfaction there is the concept of the least constraining value: the value that rules out the fewest choices for the neighbors (peers). We will benchmark least constraining value, most constraining value, and random ordering. Here is the code:

def constraining_value(values, s, d):
    "Count the number of peers of s that would have 0 or 1 value left without d."
    return sum((d in values[s2] and len(values[s2]) <= 2) for s2 in peers[s])

def possible(values, s):
    return sorted(values[s], key=lambda d: constraining_value(values, s, d))

benchmark([1,2,3,4], "MostCVl")

def possible(values, s):
    return sorted(values[s], key=lambda d: -constraining_value(values, s, d))

benchmark([1,2,3,4], "LeastCV")

def possible(values, s):
    vals = list(values[s])
    random.shuffle(vals)
    return vals

benchmark([1,2,3,4], "RandVal")

And here are the results. We see that value ordering makes almost no difference.

 version    name        N       Hz      avg   stddev      min       1%      25%   median      75%      99%      max
MostCVl   easy50       50      184    0.005    0.000    0.005    0.005    0.005    0.005    0.005    0.008    0.008
MostCVl  hardest       11      113    0.009    0.006    0.005    0.005    0.006    0.006    0.009    0.029    0.029
MostCVl    top95       95       57    0.018    0.018    0.005    0.005    0.008    0.012    0.021    0.137    0.137
MostCVl   top870      870       72    0.014    0.013    0.005    0.005    0.007    0.011    0.016    0.057    0.193

LeastCV   easy50       50      181    0.006    0.001    0.005    0.005    0.005    0.005    0.005    0.009    0.009
LeastCV  hardest       11      124    0.008    0.004    0.005    0.005    0.006    0.006    0.010    0.018    0.018
LeastCV    top95       95       54    0.018    0.018    0.005    0.005    0.008    0.012    0.022    0.135    0.135
LeastCV   top870      870       72    0.014    0.013    0.005    0.005    0.008    0.011    0.016    0.058    0.194

RandVal   easy50       50      185    0.005    0.000    0.005    0.005    0.005    0.005    0.005    0.008    0.008
RandVal  hardest       11       91    0.011    0.007    0.005    0.005    0.006    0.007    0.014    0.029    0.029
RandVal    top95       95       52    0.019    0.018    0.005    0.005    0.008    0.013    0.023    0.123    0.123
RandVal   top870      870       75    0.013    0.012    0.005    0.005    0.007    0.010    0.015    0.056    0.209

Robustness

The benchmark results above cover about 1000 puzzles, and every one was solved in a quarter second or less. Does that mean that every puzzle can be solved in a quarter second? I think the answer is clearly no. There must be some puzzles in which the solution is way down in the lower-right corner of the search tree, and thus the search has to backtrack many times to get there. How long could that take? One way to estimate this is to sample bigger and bigger numbers of puzzles and watch how the maximum time goes up. Here it is:

version     name        N       Hz      avg   stddev      min       1%      25%   median      75%      99%      max
 Normal    top95       95       38    0.026    0.025    0.008    0.008    0.011    0.018    0.032    0.196    0.196
 Normal   top870      870       49    0.020    0.018    0.008    0.008    0.011    0.016    0.023    0.082    0.275
 Normal  top2365     2365       54    0.018    0.018    0.008    0.008    0.010    0.014    0.021    0.076    0.435
 Normal  subig20    17445       83    0.012    0.022    0.007    0.007    0.008    0.008    0.010    0.066    1.583
 Normal    min17    49151       49    0.020    0.052    0.007    0.007    0.008    0.010    0.016    0.179    3.385
 Normal     200K   200000       47    0.021    0.051    0.007    0.007    0.008    0.011    0.017    0.178    3.732

1 . . |. . . |. 6 4        1 3 2 |8 5 7 |9 6 4        Solution in 3.732 seconds
. . . |3 2 . |. . .        9 6 4 |3 2 1 |5 7 8        
. 5 . |. . . |. . .        7 5 8 |9 6 4 |1 3 2        
------+------+------       ------+------+------        
. 2 . |6 4 . |. . .        5 2 1 |6 4 8 |7 9 3        
6 . . |. . . |. . 1        6 8 9 |5 7 3 |2 4 1        
. . . |. . . |8 . .        3 4 7 |2 1 9 |8 5 6        
------+------+------       ------+------+------        
. . . |. . 5 |3 2 .        8 1 6 |4 9 5 |3 2 7        
4 . . |7 . . |. . .        4 9 3 |7 8 2 |6 1 5        
. . . |. . . |. . .        2 7 5 |1 3 6 |4 8 9

version     name        N     max  N-ratio max-ratio
 Normal    top95       95   0.196
 Normal   top870      870   0.275  9.2      1.4
 Normal  top2365     2365   0.435  2.7      1.6
 Normal  subig20    17445   1.583  7.4      3.6
 Normal    min17    49151   3.385  2.8      2.1
 Normal     200K   200000   3.732  4.0      1.1

Why?

Why did I do this? As computer security expert Ben Laurie has stated, Sudoku is "a denial of service attack on human intellect". My wife was infected by the virus, and I wanted to convince her that the problem had been solved and didn't need any more of her time. It didn't work for her (although she has since independently substantially cut down on the habit), but at least one other person has told me it worked for them, so I've made the world more productive. And perhaps along the way I've taught something about Python, constraint propagation, and search.

Translations

Peter Norvig