A puzzle is a grid in which a subset of squares have been filled with a digit 1 to 9 and the rest are blank.The definition of a solution implies that no digit can appear twice in a unit, every digit must appear once, and every square must have a different value from any of its peers. Here are the numbers of the squares, a typical puzzle, and the solution to the puzzle:A solution to a puzzle is a grid where every square is filled with a digit 1 to 9, every unit contains all of the 9 digits, and the filled squares of the original puzzle remain unchanged in the solution.
A well-formed puzzle has exactly one solution.
00 01 02| 03 04 05| 06 07 08 4 . . |. . . |8 . 5 4 1 7 |3 6 9 |8 2 5 09 10 11| 12 13 14| 15 16 17 . 3 . |. . . |. . . 6 3 2 |1 5 8 |9 4 7 18 19 20| 21 22 23| 24 25 26 . . . |7 . . |. . . 9 5 8 |7 2 4 |3 1 6 ---------+---------+--------- ------+------+------ ------+------+------ 27 28 29| 30 31 32| 33 34 35 . 2 . |. . . |. 6 . 8 2 5 |4 3 7 |1 6 9 36 37 38| 39 40 41| 42 43 44 . . . |. 8 . |4 . . 7 9 1 |5 8 6 |4 3 2 45 46 47| 48 49 50| 51 52 53 . . . |. 1 . |. . . 3 4 6 |9 1 2 |7 5 8 ---------+---------+--------- ------+------+------ ------+------+------ 54 55 56| 57 58 59| 60 61 62 . . . |6 . 3 |. 7 . 2 8 9 |6 4 3 |5 7 1 63 64 65| 66 67 68| 69 70 71 5 . . |2 . . |. . . 5 7 3 |2 9 1 |6 8 4 72 73 74| 75 76 77| 78 79 80 1 . 4 |. . . |. . . 1 6 4 |8 7 5 |2 9 3
Every square has exactly 3 units and 20 peers. For example, here are the units and peers for square number 19:
01 | | | | 00 01 02| |
10 | | | | 09 10 11| |
19 | | 18 19 20| 21 22 23| 24 25 26 18 19 20| |
---------+---------+--------- ---------+---------+--------- ---------+---------+---------
28 | | | | | |
37 | | | | | |
46 | | | | | |
---------+---------+--------- ---------+---------+--------- ---------+---------+---------
55 | | | | | |
64 | | | | | |
73 | | | | | |
It is also possible to have puzzles of different sizes. Instead of having boxes of size 3x3, each box could be 2x2 or 4x4 or 5x5, etc. We'll use the variable name Nbox for the box size, and N for the whole puzzle size. So the traditional Sudoku puzzle has Nbox = 3 and N = 9, and therefore 81 squares; a puzzle with Nbox = 5 has N = 25 and Nbox4 = 625 squares. Of course, each puzzle must have N "digits" as well (we can use letters as digits when N > 9).
We can implement the notions of units, peers, and squares in the programming language Python as follows:
def cross(R, C):
"All squares in these rows and columns"
return [N*r+c for r in R for c in C]
def grouper(n, items):
"Group items n at a time: grouper(3, 'abcdefghi') => ['abc', 'def', 'ghi']"
return [items[i:i+n] for i in range(0, len(items), n)]
Nbox = 3 # Each box is of size Nbox x Nbox
N = Nbox*Nbox # The whole grid is of size N x N
rows = range(N)
cols = rows
squares = range(N*N)
digits = '123456789abcdefghijklmnopqrstuvwxyz!'[:N] # E.g., '123456789' for N = 9
digitset = set(digits)
rowunits = [cross([r], cols) for r in rows]
colunits = [cross(rows, [c]) for c in cols]
blocks = grouper(Nbox, rows) # E.g., [[0, 1, 2], [3, 4, 5], [6, 7, 8]] for N = 9
boxunits = [cross(rblock, cblock) for rblock in blocks for cblock in blocks]
allunits = rowunits + colunits + boxunits
units = [[u for u in allunits if s in u] for s in squares]
peers = [set(s2 for u in units[s] for s2 in u if s2!=s) for s in squares]
The next step is to define the Sudoku playing grid. We will actually define two representations for grids: First, a textual format used to specify the initial state of a puzzle, suitable for representing example puzzles in a file:
A gridstring is defined as a character string with 1-9 indicating a digit, and a 0 or period specifying an empty square. All other characters are ignored (including spaces, newlines, dashes, and bars). These conventions cover most, but not all, puzzle formats that are readily available for download. Each of the following three gridstring strings represent the same puzzle:
"4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......" """ 400000805 030000000 000700000 020000060 000080400 000010000 000603070 500200000 104000000""" """ 4 . . |. . . |8 . 5 . 3 . |. . . |. . . . . . |7 . . |. . . ------+------+------ . 2 . |. . . |. 6 . . . . |. 8 . |4 . . . . . |. 1 . |. . . ------+------+------ . . . |6 . 3 |. 7 . 5 . . |2 . . |. . . 1 . 4 |. . . |. . . """Our second representation is used as an internal representation of any state of a puzzle (initial, partially solved or complete):
A grid is a list (the Python name for a vector/array) of 81 elements, we will call this a values list because we use it to describe the values for each square.
Now for the values list representation. One might think
that a 9x9 array would be the obvious data structure. But
two-dimensional arrays are a bit complicated in Python, so we will
use a one-dimensional array (which Python calls a list) of
length 81. There is one complication: we will soon start filling in
Sudoku grids with digits. If we make a mistake and reach a
contradiction, we will use the value False to indicate the
contradiction. Thus, functions that take values
as a parameter must also handle False appropriately.
Now let's define what it means to be a solution: as we said before,
a solution is a grid (represented as a values list) where every unit
is filled with a permutation of the digits 1 to 9, and the filled
squares of the original puzzle remain unchanged in the solution:
Humans solve Sudoku problems by using logical deduction to infer
that a certain digit is (or is not) the value of a square; such
deductions are repeated until all squares are filled with a single
known digit. We could write an algorithm to do deductions like that,
but the difficulty is in knowing when you have enough inference rules
to solve every possible puzzle.
Therefore we will take a different route. Technically, it is known
as model checking rather than deduction, and it consists
of searching for a solution that works by trying out
possibilities one by one. We will describe a series of search
algorithms, starting with the hopelessly inefficient and ending with
one that can solve over 50 of the hardest Sudoku puzzles per second.
Sometimes generate and test is the best algorithm available. Herb
Simon, in his classic book The Sciences of
the Artificial (page 194), describes the task of opening a
safe that has ten dials, each with 100 numbers on it. If the safe is
well-designed there is no better approach than trying all
10010 combinations.
Here is the implementation of generate and test for Sudoku:
It took 4 seconds to solve this tiny 4x4 puzzle. Now consider the
9x9 puzzle defined above as grid1. It has 17 filled squares, and
thus 64 empty squares. So there are 964 possibile
combinations (about 1061) to consider. How long will it
take to process them?
Suppose we have a custom 1024 core 10 GHz
CPU that features a special instruction that can generate and
test a position in one clock cycle, and let's say we could afford to buy a billion
of these, and while we're shopping, let's say we also pick up a time
machine and go back 13 billion years to the early days of the universe,
and a fleet of starships with which we visit 10 billion galaxies and
in each galaxy convince the inhabitants of 10 billion planets to each
buy the same number of computers, and we all start our programs
running, managing to distribute the cases perfectly so that no
computation is duplicated or wasted. Then we can estimate
that we'd be less than 4% done with this one puzzle by now.
Generate and test isn't the algorithm you're looking for. Move along.
Fortunately for us, Sudoku is more like the defective safe. To be
precise, it is like a safe with 81 dials, each with 9 numbers,
and sometimes if you put one of the dials in the wrong position you hear a sound,
but sometimes not (depending on the positions of the other dials).
For example, if we fill the upper right
square of the tiny puzzle with a 1, 2, or 3 then we can detect right away
there is a problem: there would be two of the same digit in the top
row or the rightmost column. Once we detect that we needn't consider
any combination of values for the remaining 8 unfilled squares--nothing we can do with
them will correct the flaw in the upper right square. Instead of
enumerating values for the 8 squares we can skip ahead and consider
a different value (4) for the upper right square. This suggests the
incremental search algorithm:
The code to implement this is straightforward:
Here we see solve (and therefore search) in action:
Our approach will be for each square to keep track of its own
remaining possible values. Then we can fail immediately when
any square runs out of possibilities; we won't have to wait
until the square is chosen by select_square. This approach
is called forward checking because we look forward and check
squares other than the currently selected square.
The implementation of forward checking is easy. Currently
values[s] is equal to We'll change solve so that it calls initialize to
set all the unfilled squares to their initial value according to this
convention. We'll also change assign(values, s, d) so that
it eliminates d from all the peers of s and fails if
any peer has no remaining possible values. With these added
complications one function becomes easier: possible can now
just look up the collection of possible values rather than having to
compute it.
How much does forward checking help? For the easy puzzles, not
much. The reason they are called "easy" is that they don't require
much forward checking. For the harder grid1 puzzle the
solving speed is improved by a factor of 5. But that's still 43
seconds. Let's keep improving.
This idea is called arc consistency; it gets its name from
the practice of describing a constraint satisfaction problem as a
graph where the nodes are the variables (in Sudoku, the squares) and
the arcs are the constraints. We make the arcs (the peer constraints)
consistent one by one, and we can end up traversing multiple links in
the graph (whereas the forward checking algorithm only looks ahead
one arc in the graph).
The implementation of arc consistency for Sudoku is a simple change to what we already have:
Compared to forward checking, arc consistency reduces the time to
solve grid1 from 43 seconds down to 0.16 seconds (a 266-fold
speedup). The 50 easy problems are speeded up from a rate of 5 per
second to 282 per second, a 56-fold speedup. Clearly, arc consistency
is very effective on Sudoku problems (as it is on many constraint
satisfaction problems), and one might be tempted to stop here. But
before we declare victory and go home, let's do two things: first,
benchmark the algorithm on a sampling of difficult problems as well as
easy ones, and second, try some variations to see if we can do even
better.
Here is the code to read puzzles from a file, run them, and print
statistics about the results:
Here are the results; the benchmarks uphold the idea that MinRV is a
good strategy; it performs the best on all the benchmarks except
easy50, where it falls just slightly behind firstV.
This is true on the average time to solution, and even more so on the
maximum time.
def parse(gridstring):
"Convert gridstring into a list of values. '.' or '0' used for an empty square."
gridstring = gridstring.replace('0', '.')
values = [c for c in gridstring if c in digits or c == '.']
assert len(values) == N*N
return values
def show(values):
"Convert a list of values to a string laid out as a 2-dimensional gridstring."
if values is False: return "NO SOLUTION"
line = '\n' + '+'.join(['--'*Nbox]*Nbox) + '\n'
def showsquare(s):
r, c = s/N, s % N
sep = ((line if (r % Nbox == Nbox-1 and r != N-1) else '\n')
if (c % N == N-1) else (' |' if (c % Nbox == Nbox-1) else ' '))
return values[s] + sep
return ''.join(showsquare(s) for s in squares)
And here is an example of the functions at work:
>>> grid1 = "4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......"
>>> parse(grid1)
['4', '.', '.', '.', '.', '.', '8', '.', '5',
'.', '3', '.', '.', '.', '.', '.', '.', '.',
'.', '.', '.', '7', '.', '.', '.', '.', '.',
'.', '2', '.', '.', '.', '.', '.', '6', '.',
'.', '.', '.', '.', '8', '.', '4', '.', '.',
'.', '.', '.', '.', '1', '.', '.', '.', '.',
'.', '.', '.', '6', '.', '3', '.', '7', '.',
'5', '.', '.', '2', '.', '.', '.', '.', '.',
'1', '.', '4', '.', '.', '.', '.', '.', '.']
>>> print show(parse(grid1))
4 . . |. . . |8 . 5
. 3 . |. . . |. . .
. . . |7 . . |. . .
------+------+------
. 2 . |. . . |. 6 .
. . . |. 8 . |4 . .
. . . |. 1 . |. . .
------+------+------
. . . |6 . 3 |. 7 .
5 . . |2 . . |. . .
1 . 4 |. . . |. . .
def is_solution(values, puzzle):
"""A list of values is a solution to a puzzle if each unit is a permutation
of the digits 1 to 9 and if the filled squares remain unchanged."""
filled_squares = [s for s in squares if puzzle[s] in digitset]
def is_permutation(unit): return set(values[s] for s in unit) == digitset
return (values is not False
and all(is_permutation(u) for u in allunits)
and all(values[s]==puzzle[s] for s in filled_squares))
Searching for solutions
Algorithm 1: generate and test
Here is the simplest possible search algorithm,
which we call generate and test:
First fill in all the
empty squares with digits. Then check to see if we have a
solution. If not, then fill the empty squares with a different
combination of digits. Repeat with different combinations until a solution is found.
def solve(gridstring):
"Try every combination of digits for the empty squares; test each one."
from itertools import product
puzzle = parse(gridstring)
for candidate in product(*[(v if v in digits else digits) for v in puzzle]):
if is_solution(candidate, puzzle):
return candidate
return None
And here it is solving a tiny puzzle. There are ten empty squares,
so generate and test first tries to fill them with the digits 1111111111, then
1111111112, then 1111111113, and so on until a solution is found at 3441321214.
>>> tiny = '12..3..24......3'
>>> print show(parse(tiny))
1 2 |. .
3 . |. 2
----+----
4 . |. .
. . |. 3
>>> print show(solve(tiny))
1 2 |3 4
3 4 |1 2
----+----
4 3 |2 1
2 1 |4 3
Algorithm 2: incremental search
In Simon's safe example, he next supposes a defective safe, in which a
faint click is heard whenever a dial is set to the right position.
With this safe it only takes 100 × 10 trials to discover the
correct combination, not 10010. If we have some selective
knowledge of components of the safe (or puzzle), our search will be
much easier.
Start filling squares with digits, always making sure that the digit we put in a square is not the same as any peer.
If we reach a square where every possible digit conflicts with a peer, back up and consider a different digit for the previous square.
Stop when we successfully fill every square, or give up if we tried all possibilities without finding a solution.
The progress of this algorithm can be described as a search
tree, where each node represents a partially-filled state of the
puzzle, and a branch corresponds to filling in a square with a digit.
In the following tree we order the squares left-to-right,
top-to-bottom, and order the digits to 1-to-4 (different orderings
would give different trees).
1 2 |. .
3 . |. 2
----+----
4 . |. .
. . |. 3
:
:
:.......:.......:
: :
: :
1 2 |3 . 1 2 |4 . # The first empty square can be filled with 3 or 4
3 . |. 2 3 . |. 2
----+---- ----+----
4 . |. . 4 . |. .
. . |. 3 . . |. 3
:
:
1 2 |3 4 # For this square there is only one choice: 4
3 . |. 2
----+----
4 . |. .
. . |. 3
:
(etc.) # We skip a few squares
:
:...............:
: :
1 2 |3 4 1 2 |3 4
3 4 |1 2 3 4 |1 2
----+---- ----+----
4 1 |. . 4 3 |. . # There are two choices here, 1 and 3
. . |. 3 . . |. 3 # We consider 1 first, then come back to 3 after the dead end
: :
: :
1 2 |3 4 1 2 |3 4
3 4 |1 2 3 4 |1 2
----+---- ----+----
4 1 |2 . 4 3 |2 .
. . |. 3 . . |. 3
#(dead end) :
:
(etc.)
:
:
1 2 |3 4
3 4 |1 2
----+----
4 3 |2 1
2 1 |4 3 # Found the solution!
def solve(gridstring):
"Parse gridstring and search for a solution."
return search(parse(gridstring))
def search(values):
"""Select an unfilled square, try each possible value in order, recursively searching.
When all filled: success; when no more digits to try: failure."""
if values is None: return None
s = select_square(values)
if s is None: return values
for d in possible(values, s):
result = search(assign(values[:], s, d))
if result: return result
return None
def select_square(values):
"Just choose the first square that is not filled yet."
for s in squares:
if values[s] not in digitset: return s
return None
def possible(values, s):
"A square can be any digit that is not already taken by a peer."
return digitset - set(values[s] for s in peers[s])
def assign(values, s, d):
"For now, simply assign values[s] = d."
values[s] = d
return values
The key function is search. It obeys the convention that if
it is passed None (to indicate an invalid grid, such as one
with two 1s in the top row) it returns None, meaning that no
solution can be found. Otherwise it selects some unfilled square to
work on. If select_square returns None that is
actually good news: it means that all the squares are already filled,
and we are done: we have a solution, so we just return it. Otherwise,
line 60 says that for each possible digit that could fill square s
we try to assign that digit to s and search for a solution from there.
If some call to search succeeds, return it; but if a digit assignment
causes the search to fail, just keep going with the next digit assignment.
If all digit assignments fail, then the whole call to search fails,
and we back up to the previous recursive call.
>>> print show(solve(grid1))
4 1 7 |3 6 9 |8 2 5
6 3 2 |1 5 8 |9 4 7
9 5 8 |7 2 4 |3 1 6
------+------+------
8 2 5 |4 3 7 |1 6 9
7 9 1 |5 8 6 |4 3 2
3 4 6 |9 1 2 |7 5 8
------+------+------
2 8 9 |6 4 3 |5 7 1
5 7 3 |2 9 1 |6 8 4
1 6 4 |8 7 5 |2 9 3
That's all there is to it! The only reason we don't stop this article
now is that the program is still too slow for some purposes. We're
not talking billions-of-years slow, but it did take 3 minutes and 45
seconds to solve this puzzle (a rather hard one). On a benchmark of
50 easy puzzles the program took a total of 9.5 seconds; a rate of 5
Hz. Not bad, but we can do much better.
Algorithm 3: forward checking
The incremental search algorithm prevents us from wasting time
enumerating values for the rest of a grid after we have discovered
that two filled squares conflict with each other. But what it does
not do is prevent us from generating the same conflict over and over
again. Consider again grid1, and the two squares marked A and B:
4 . . |. A . |8 . 5
. 3 . |. . . |. . .
. . . |7 . . |. . .
------+------+------
. 2 . |. B . |. 6 .
. . . |. 8 . |4 . .
. . . |. 1 . |. . .
------+------+------
. . . |6 . 3 |. 7 .
5 . . |2 . . |. . .
1 . 4 |. . . |. . .
We can tell right from the start (by examining peers) that square
A must be one of 2,3,6,7,9 and B must be one of
3,4,5,7,9. Suppose we are at a point in the search where we are
considering 3 as a possible value for A. We are considering
squares in left-to-right, top-to-bottom order, so square B has
not been considered yet. But when we eventually get to B we
will detect a contradiction, because it turns out that B must
be 3. So we will back up in the search, and will end up trying many
combinations for all the squares between A and B. We'd
like to make the process of failing faster--we don't want to wait
until we get around to checking square B to see that there is a
contradiction if there is a way to detect it faster.
'.'
for an empty square; we will change that so that values[s]
holds a collection of all the remaining possible digits for the
square. When this is reduced to the empty collection, for any square,
we can immediately fail. The collection could be represented by a
Python set or list, but I chose instead to use a
string of digits. So the value for square A would start out as
the string '23679'.
def solve(gridstring):
"Parse gridstring, handle initial constraints, and search for a solution."
return search(initialize(parse(gridstring)))
def initialize(initvalues):
"""First set each square to have the domain '123456789'.
Then do forward checking for each of the filled squares."""
values = N * N * [digits]
for s in squares:
if initvalues[s] in digitset: values = assign(values, s, initvalues[s])
return values
def possible(values, s): return values[s]
def assign(values, s, d):
"Assign d to s and eliminate d from the peers of s. Return None on contradiction."
if d not in values[s]: return None
values[s] = d
if not all(eliminate(values, p, d) for p in peers[s]): return None
return values
def eliminate(values, s, d):
"Eliminate d as a possible value of s. Fail if forward checking finds a contradiction."
values[s] = values[s].replace(d, '')
return forward_check(values, s)
def forward_check(values, s):
"Fail if s has no values, or a single value the same as a peer."
V = len(values[s])
if V==0 or (V==1 and any(values[p]==values[s] for p in peers[s])):
return None
return values
Forward checking is a kind of constraint satisfaction.
The fact that peers must have different values is a constraint,
and we say that a solution is an assignment of values that satisfies
all the constraints.
Algorithm 4: arc consistency
Forward checking helps because every time we eliminate a possible
value from a square we peek forward to peer squares to see if they
will cause a failure. We can improve that idea by looking even
farther ahead to the peers of peers. You might think that
eliminating, say, the digit 1 from square s would effect only
the peers of s and could have no effect on the peers of the
peers of s. However, suppose that the peer s2 had
only the possibilities 1 and 2. If we eliminate 1, then s2
must be 2, and we can then eliminate 2 from all the peers of
s2. If some peer of s2 had only the possibilities 2
and 3, then it must become 3, and we can keep on going from there.
def eliminate(values, s, d):
"Eliminate d as a possible value of s. Fail if not arc consistent."
if d not in values[s]: return values # Already eliminated d
values[s] = values[s].replace(d, '')
return arc_consistent(values, s)
def arc_consistent(values, s):
"Fail if s has no values left, or"
V = len(values[s])
if V==0 or (V==1 and not assign(values, s, values[s])):
return None
return values
Benchmarks
I have collected some benchmark problems from around the web that will
be used to measure the performance of algorithms. Thanks particularly
to Gordon Royle, Peter Cock, Guenter Stertenbrink and Jean Charles Meyrignac. Here is
a description of the files:
Filename N puzzles Description
grid1.txt 1 hard
puzzle, called grid1 above
easy50.txt
50 easy puzzles
hardest.txt
11 puzzles from around the web advertised as "hardest"
top95.txt
95 hard puzzles
top870.txt
870 hard puzzles
top2365.txt
2365 hard puzzles
subig20.txt
17,445 hard puzzles
min17.txt
49,151 hard puzzles with 17 filled squares
def from_file(filename, sep=None):
"Parse a file into a list of strings, separated by sep."
return file(filename).read().strip().split(sep)
benchmarks = [ # A collection of files with puzzles, and the URLs they came from.
(0, "grid1.txt", "magictour.free.fr/top95"),
(1, "easy50.txt", "projecteuler.net/index.php?section=problems&id=96"),
(2, "hardest.txt", "norvig.com"),
(3, "top95.txt", "magictour.free.fr/top95"),
(4, "top870.txt", "magictour.free.fr/top870"),
(5, "top2365.txt", "magictour.free.fr/top2365"),
(6, "subig20.txt", "www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/python/sudoku/"),
(7, "min17.txt", "mapleta.maths.uwa.edu.au/~gordon/sudokumin.php"),
]
def benchmark(todo=range(10), version="", showif=60.0):
"Run several benchmark puzzles, in files. Show ones that take longer than showif seconds."
show_header = True
for (num, filename, url) in benchmarks:
if num in todo:
solve_all(from_file(filename), filename.replace('.txt',''), version, showif, show_header)
show_header = False
def solve_all(gridstrings, name, version, showif=0.0, show_header=True):
"""Attempt to solve a sequence of gridstrings. Report stats on solving times.
Show puzzles that take longer than showif seconds to solve."""
times = []
for gridstring in gridstrings:
values, t = time_call(solve, gridstring)
times.append(t)
ok = is_solution(values, parse(gridstring))
## Display puzzles that take long enough, or that fail
if t > showif or not ok:
print side_by_side(show(parse(gridstring)), show(values),
'Solution in %.3f seconds' % t)
assert ok
print stats(times, show_header, name=name, version=version)
def time_call(function, *args, **keyword_args):
"Return the result of applying function to args, and the number of seconds it took."
t0 = time.clock()
result = function(*args, **keyword_args)
t1 = time.clock()
return result, t1-t0
def side_by_side(*strings):
"Concatenate several strings on a line-by-line basis."
split_strings = [st.split('\n') for st in strings]
nlines = max(len(st) for st in split_strings)
def next_line(): return ' '.join((st.pop(0) if st else '') for st in split_strings)
return '\n'.join(next_line() for _ in range(nlines))
def stats(times, show_header=True, **extras):
"Return a tab-separated header+data string representing various statistics about data."
# Compute Stats
total = sum(times)
n = len(times)
mean = total/n
stddev = (sum((t-mean)**2 for t in times)/n)**0.5
times.sort()
# Format stats
def p(percentile): return times[n*percentile/100]
def fmt(vals): return ' '.join(('%7.3f'%x if isinstance(x, float) else '%7s'%x) for x in vals)
return ((fmt(extras.keys() + 'N Hz avg stddev min 1% 25% median 75% 99% max'.split()) + '\n'
if show_header else '')
+ fmt(extras.values() + [n, int(n/total), mean, stddev, min(times),
p(1), p(25), p(50), p(75), p(99), max(times)]))
Algorithm 5: unit consistency
The next step is to do a better job of exploiting the constraints. So
far, we detect a failure if some square has no remaining possible
value. But we could also detect the case where some value (digit) has
no remaining possible square. That is, all the squares in a unit might
all have several possible values left, but if one digit, say 3, does
not appear in any square, then we can fail immediately. And if the
digit 3 appears as a possibility in only one square, then we can
assign it to that square. Here is the code to implement this idea:
def eliminate(values, s, d):
"Eliminate d as a possible value of s. Fail if not arc and unit consistent."
if d not in values[s]: return values # Already eliminated d
values[s] = values[s].replace(d, '')
return arc_consistent(values, s) and unit_consistent(values, s, d)
def unit_consistent(values, s, d):
"Fail if some unit has no place for digit d."
for u in units[s]:
dplaces = [s for s in u if d in values[s]] # Remaining places for d
D = len(dplaces)
if D==0 or (D==1 and not assign(values, dplaces[0], d)):
return None
return values
Algorithm 6: naked pairs
The literature on Sudoku solving strategies includes some colorful
names; one of them is the so-called naked pairs strategy. It
involves finding two squares in a unit which each have the same two
possible values. For example, suppose two squares in a box have both
been wittled down to the possible values 3 and 5. We don't know which
is 3 and which is 5, but either way, we know that no other square in
the box can be either 3 or 5, so those values can be eliminated. Here is the code:
def eliminate(values, s, d):
"Eliminate d as a possible value of s. Fail if not arc and unit consistent; detect naked pairs."
if d not in values[s]: return values # Already eliminated d
values[s] = values[s].replace(d, '')
return arc_consistent(values, s) and unit_consistent(values, s, d) and naked_pairs(values, s)
def naked_pairs(values, s):
"""If two squares s and s2 in a unit have the same two values, eliminate
those values from all other squares s3 in the unit."""
if values is None: return None
v = values[s]
if len(v) != 2: return values
for u in units[s]:
for s2 in u:
if s2 != s and values[s2] == v:
# s and s2 are a naked pair; remove values from other squares (s3) in unit
for s3 in u:
if s != s3 != s2:
if not (eliminate(values, s3, v[0]) and eliminate(values, s3, v[1])):
return None
return values
Benchmark comparison of algorithms so far
I ran the benchmark code on the five serious algorithms: incremental
search (incr), forward checking (FC), arc consistency (AC), unit
consistency (AC+Unit), and naked pairs (AC+U+NP). For each benchmark
file we can see the name of the file, the number of puzzles, the
average speed of solving (expressed two ways, as a frequency in Hertz
and as the average number of seconds), the standard deviation of the
results, and then, to give an idea of the distribution of times, the
minimum, the first percentile, 25th percentile, median, 75th
percentile, 99th percentile, and maximum times. Here are the results:
version name N Hz avg stddev min 1% 25% median 75% 99% max
incr grid1 1 0 225.729 0.000 225.729 225.729 225.729 225.729 225.729 225.729 225.729
FC grid1 1 0 42.132 0.000 42.132 42.132 42.132 42.132 42.132 42.132 42.132
FC easy50 50 5 0.172 0.272 0.004 0.004 0.010 0.050 0.161 0.987 0.987
AC grid1 1 6 0.159 0.000 0.159 0.159 0.159 0.159 0.159 0.159 0.159
AC easy50 50 285 0.004 0.005 0.001 0.001 0.001 0.002 0.003 0.025 0.025
AC hardest 11 22 0.044 0.113 0.001 0.001 0.002 0.010 0.014 0.401 0.401
AC top95 95 0 1.030 2.668 0.001 0.001 0.024 0.072 0.329 16.211 16.211
AC top870 870 4 0.221 1.111 0.001 0.001 0.006 0.020 0.060 6.943 16.197
AC+Unit grid1 1 62 0.016 0.000 0.016 0.016 0.016 0.016 0.016 0.016 0.016
AC+Unit easy50 50 188 0.005 0.001 0.005 0.005 0.005 0.005 0.005 0.009 0.009
AC+Unit hardest 11 67 0.015 0.010 0.005 0.005 0.006 0.011 0.018 0.035 0.035
AC+Unit top95 95 15 0.067 0.128 0.005 0.005 0.010 0.021 0.055 0.890 0.890
AC+Unit top870 870 33 0.030 0.074 0.005 0.005 0.008 0.014 0.028 0.298 1.273
AC+U+NP grid1 1 147 0.007 0.000 0.007 0.007 0.007 0.007 0.007 0.007 0.007
AC+U+NP easy50 50 183 0.005 0.001 0.005 0.005 0.005 0.005 0.005 0.007 0.007
AC+U+NP hardest 11 83 0.012 0.007 0.005 0.005 0.007 0.009 0.015 0.031 0.031
AC+U+NP top95 95 30 0.033 0.063 0.005 0.005 0.007 0.012 0.025 0.489 0.489
AC+U+NP top870 870 58 0.017 0.033 0.005 0.005 0.006 0.010 0.017 0.137 0.596
Note that for the easy50 file, the arc consistency algorithm
is fastest. That is because it is the simplest algorithm (it has no
unnecessary checks or bookkeeping), yet it does enough to solve all
the easy puzzles. For all the other benchmarks, it is worth the effort
to incorporate arc consistency plus unit consistency plus naked pairs.
Algorithms 6-9: variable ordering
Note that our select_square function does not do much in the
way of selection: it just returns the first square it finds that has
not been filled yet. Could being more clever about selecting a square
speed things up? We'll stick with arc consistency plus unit
consistency plus naked pairs, but we'll compare the square selection strategy we've been using so far
(which we will call firstV) with three alternative selection
strategies: randomV, which chooses a random unfilled square;
MinRV which chooses a square with the Minimum number of
Remaining Values (that is, if there is a square with only 2 possible
values we would choose that over one with 3 or more values), and
MaxRV, which reverses that choice. MinRV is generally held to
be a good strategy; the idea is that if there are only two
possibilities you have a 50% chance of guessing right.
version name N Hz avg stddev min 1% 25% median 75% 99% max
firstV easy50 50 183 0.005 0.001 0.005 0.005 0.005 0.005 0.005 0.007 0.007
firstV hardest 11 83 0.012 0.007 0.005 0.005 0.007 0.009 0.015 0.031 0.031
firstV top95 95 30 0.033 0.063 0.005 0.005 0.007 0.012 0.025 0.489 0.489
firstV top870 870 58 0.017 0.033 0.005 0.005 0.006 0.010 0.017 0.137 0.596
randomV easy50 50 165 0.006 0.003 0.005 0.005 0.005 0.005 0.006 0.019 0.019
randomV hardest 11 113 0.009 0.004 0.005 0.005 0.006 0.006 0.010 0.018 0.018
randomV top95 95 7 0.127 0.203 0.005 0.005 0.021 0.047 0.120 1.090 1.090
randomV top870 870 25 0.039 0.101 0.005 0.005 0.010 0.018 0.032 0.490 1.762
MinRV easy50 50 182 0.005 0.001 0.005 0.005 0.005 0.005 0.005 0.008 0.008
MinRV hardest 11 142 0.007 0.002 0.005 0.005 0.005 0.006 0.010 0.011 0.011
MinRV top95 95 57 0.017 0.017 0.005 0.005 0.007 0.012 0.021 0.132 0.132
MinRV top870 870 73 0.014 0.012 0.005 0.005 0.007 0.010 0.016 0.055 0.185
MaxRV easy50 50 164 0.006 0.002 0.005 0.005 0.005 0.005 0.005 0.017 0.017
MaxRV hardest 11 25 0.038 0.058 0.005 0.005 0.007 0.013 0.058 0.210 0.210
MaxRV top95 95 0 1.161 2.647 0.005 0.005 0.075 0.172 0.913 19.633 19.633
MaxRV top870 870 4 0.223 0.983 0.005 0.005 0.025 0.050 0.109 5.129 19.664
Here is the code to implement the three new variable selection
strategies and produce the benchmark results:
benchmark([1,2,3,4], "firstV")
import random
def select_square(values):
"Return an unfilled square with the fewest possible values; or None if no unfilled squares."
def numvalues(s): return len(values[s])
unfilled = [s for s in squares if numvalues(s) > 1]
return None if unfilled==[] else random.choice(unfilled)
benchmark([1,2,3,4], "randomV")
def select_square(values):
"Return an unfilled square with the fewest possible values; or None if no unfilled squares."
def numvalues(s): return len(values[s])
unfilled = [s for s in squares if numvalues(s) > 1]
return None if unfilled==[] else min(unfilled, key=numvalues)
benchmark([1,2,3,4], "MinRV")
def select_square(values):
"Return an unfilled square with the fewest possible values; or None if no unfilled squares."
def numvalues(s): return len(values[s])
unfilled = [s for s in squares if numvalues(s) > 1]
return None if unfilled==[] else max(unfilled, key=numvalues)
benchmark([1,2,3,4], "MaxRV")
Algorithm 10-12: value ordering
The next question is:
given that we have selected a variable, in what order should we
consider the possible values? The current definition of
possible just goes in increasing order of digits: 1,2,3, etc.
In the literature on constraint satisfaction there is the concept of
the least constraining value: the value that rules out the
fewest choices for the neighbors (peers). We will benchmark least
constraining value, most constraining value, and random ordering. Here
is the code:
def constraining_value(values, s, d):
"Count the number of peers of s that would have 0 or 1 value left without d."
return sum((d in values[s2] and len(values[s2]) <= 2) for s2 in peers[s])
def possible(values, s):
return sorted(values[s], key=lambda d: constraining_value(values, s, d))
benchmark([1,2,3,4], "MostCVl")
def possible(values, s):
return sorted(values[s], key=lambda d: -constraining_value(values, s, d))
benchmark([1,2,3,4], "LeastCV")
def possible(values, s):
vals = list(values[s])
random.shuffle(vals)
return vals
benchmark([1,2,3,4], "RandVal")
And here are the results. We see that value ordering makes almost no
difference.
version name N Hz avg stddev min 1% 25% median 75% 99% max
MostCVl easy50 50 184 0.005 0.000 0.005 0.005 0.005 0.005 0.005 0.008 0.008
MostCVl hardest 11 113 0.009 0.006 0.005 0.005 0.006 0.006 0.009 0.029 0.029
MostCVl top95 95 57 0.018 0.018 0.005 0.005 0.008 0.012 0.021 0.137 0.137
MostCVl top870 870 72 0.014 0.013 0.005 0.005 0.007 0.011 0.016 0.057 0.193
LeastCV easy50 50 181 0.006 0.001 0.005 0.005 0.005 0.005 0.005 0.009 0.009
LeastCV hardest 11 124 0.008 0.004 0.005 0.005 0.006 0.006 0.010 0.018 0.018
LeastCV top95 95 54 0.018 0.018 0.005 0.005 0.008 0.012 0.022 0.135 0.135
LeastCV top870 870 72 0.014 0.013 0.005 0.005 0.008 0.011 0.016 0.058 0.194
RandVal easy50 50 185 0.005 0.000 0.005 0.005 0.005 0.005 0.005 0.008 0.008
RandVal hardest 11 91 0.011 0.007 0.005 0.005 0.006 0.007 0.014 0.029 0.029
RandVal top95 95 52 0.019 0.018 0.005 0.005 0.008 0.013 0.023 0.123 0.123
RandVal top870 870 75 0.013 0.012 0.005 0.005 0.007 0.010 0.015 0.056 0.209
Robustness
The benchmark results above cover about 1000 puzzles, and every one
was solved in a quarter second or less. Does that mean that every
puzzle can be solved in a quarter second? I think the answer is
clearly no. There must be some puzzles in which the solution is way
down in the lower-right corner of the search tree, and thus the search
has to backtrack many times to get there. How long could that take?
One way to estimate this is to sample bigger and bigger numbers of
puzzles and watch how the maximum time goes up. Here it is:
version name N Hz avg stddev min 1% 25% median 75% 99% max
Normal top95 95 38 0.026 0.025 0.008 0.008 0.011 0.018 0.032 0.196 0.196
Normal top870 870 49 0.020 0.018 0.008 0.008 0.011 0.016 0.023 0.082 0.275
Normal top2365 2365 54 0.018 0.018 0.008 0.008 0.010 0.014 0.021 0.076 0.435
Normal subig20 17445 83 0.012 0.022 0.007 0.007 0.008 0.008 0.010 0.066 1.583
Normal min17 49151 49 0.020 0.052 0.007 0.007 0.008 0.010 0.016 0.179 3.385
Normal 200K 200000 47 0.021 0.051 0.007 0.007 0.008 0.011 0.017 0.178 3.732
1 . . |. . . |. 6 4 1 3 2 |8 5 7 |9 6 4 Solution in 3.732 seconds
. . . |3 2 . |. . . 9 6 4 |3 2 1 |5 7 8
. 5 . |. . . |. . . 7 5 8 |9 6 4 |1 3 2
------+------+------ ------+------+------
. 2 . |6 4 . |. . . 5 2 1 |6 4 8 |7 9 3
6 . . |. . . |. . 1 6 8 9 |5 7 3 |2 4 1
. . . |. . . |8 . . 3 4 7 |2 1 9 |8 5 6
------+------+------ ------+------+------
. . . |. . 5 |3 2 . 8 1 6 |4 9 5 |3 2 7
4 . . |7 . . |. . . 4 9 3 |7 8 2 |6 1 5
. . . |. . . |. . . 2 7 5 |1 3 6 |4 8 9
version name N max N-ratio max-ratio
Normal top95 95 0.196
Normal top870 870 0.275 9.2 1.4
Normal top2365 2365 0.435 2.7 1.6
Normal subig20 17445 1.583 7.4 3.6
Normal min17 49151 3.385 2.8 2.1
Normal 200K 200000 3.732 4.0 1.1
Why?
Why did I do this? As computer security expert Ben Laurie has
stated, Sudoku is "a denial of service attack on human intellect". My
wife was infected by the virus, and I wanted to convince her that the
problem had been solved and didn't need any more of her time. It
didn't work for her (although she has since independently
substantially cut down on the habit), but at least one other person
has told me it worked for them, so I've made the world more
productive. And perhaps along the way I've taught something about
Python, constraint propagation, and search.
Translations
Peter Norvig