|
|
Below we see the three units (row, column, and box) for the square that is filled with the bold-face red 2, and we can verify that no two squares in any unit have the same digit:
|
|
|
We will define the peers of a square to be all the other squares in that square's row, column, and box. Thus, each square must be filled by a digit that is different from all of the square's peers. Here are all the peers of the square with the bold red 2. Notice that this square (and every other square) has exactly three units (one row, one column and one box) and 20 peers.
| 4 | 1 | 7 | 3 | 6 | 9 | 8 | 2 | 5 |
| 6 | 3 | 2 | 1 | 5 | 8 | 9 | 4 | 7 |
| 9 | 5 | 8 | 7 | 2 | 4 | 3 | 1 | 6 |
| 8 | 2 | 5 | 4 | 3 | 7 | 1 | 6 | 9 |
| 7 | 9 | 1 | 5 | 8 | 6 | 4 | 3 | 2 |
| 3 | 4 | 6 | 9 | 1 | 2 | 7 | 5 | 8 |
| 2 | 8 | 9 | 6 | 4 | 3 | 5 | 7 | 1 |
| 5 | 7 | 3 | 2 | 9 | 1 | 6 | 8 | 4 |
| 1 | 6 | 4 | 8 | 7 | 5 | 2 | 9 | 3 |
Let's start to think about how to write a program to solve Sudoku. We'll need to define an algorithm for solving puzzles, and we'll need data structures for representing all the concepts we defined above: puzzle, grid, digit, row, column, box, unit, peer, and solution. We will also see that it will be useful to represent a set of possible digits that could potentially fill a square.
Let's start with the grid, because that drives everything else. I can think of three possibilities for representing grids; Below I show how each square is represented under each of the three schemes, and show sample Python code to create a grid; define a square; and fill a square with a digit.
|
2-D [9 × 9] Array:
square = (y, x) = (3, 1) grid[y][x] = '2' |
1-D [81] Array:
square = 28 grid[square] = '2'
|
Hash table:
square = 'D2' grid[square] = '2'
|
2-D [9 × 9] Array:
Pro: Seems like the most "obvious" choice to mirror the 2-D layout of squares.
Con: If we want to iterate over all squares, we need two loops, not one. Making a copy of a grid is complicated. We can't say grid[square] = value, we have to say grid[y][x] = value. The notion of "square" becomes less important; x and y coordinates become more important, but they really aren't crucial to the puzzle (just to the way it is typically drawn).
1-D [81] Array:
Pro: Easy to iterate over all the squares. Easy to make a copy of a grid. Takes a minimal amount of storage.
Con: We need to be able to map from the 1-D array into a 2-D layout, for printing, or for determining where the rows and columns are. This can be done as follows: Rows are numbered 0-8 (top to bottom), columns are numbered 0-8 (left to right), and then row r, column c is square number 9r + c.
Hash table:
Pro: Squares don't have to be integers. Easy to represent a partially-filled grid.
Con: Slightly less efficient. Harder to make a copy of a grid.
Given these pros and cons, I chose the 1-D array (or list in Python) to represent a grid. That means a square is represented as an integer from 0 to 80. The contents of each square is a character, either a digit or '.' for empty. (In my previous essay, grids were hash tables (what Python calls a dict), because I was focusing on simplicity and debuggability and didn't care about efficiency.)
Note: It is also possible to consider puzzles of different sizes. Instead of having boxes of size 3×3, each box could be 2×2 or 4×4 or 5×5, etc. We'll use the variable name N for the size of a unit (row or column), and the variable Nbox for the width or height of a box. So the traditional Sudoku puzzle has N = 9 (and thus N2 = 81 squares) and Nbox = 3. A puzzle with Nbox = 6 has N = 36 and Nbox4 = N2 = 1296 squares. Of course, each puzzle must have N "digits" as well (we can use letters as digits when N > 9).
We can implement the notions of squares, digits, units, and peers as follows. The squares are all the integers from 0 to N×N; the digits are the characters from '1' to '9' when N=9; they include more characters when N is 16 or 25 or 36 or 49 (N=49 is a 2401 element grid; I thought that was big enough). And we define units[s] to be the list of three lists of squares denoting the three units that s participates in, and peers[s] is a set of squares that are the peers of s. The choice of what to represent as a set, and what as a list is determined for convenience in writing the program later on; for now consider those choices unimportant.
# The size of the grid, and the squares within.
Nbox = 2 # Each box is of size Nbox x Nbox
N = Nbox * Nbox # The whole grid is of size N x N
squares = range(N * N) # ... and thus has N x N squares
Square = int # Squares are implemented as integers
# The allowable digits, as a string and as a set.
digitstr = '123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmn'[:N] # E.g., '123456789' for N = 9
digitset = set(digitstr)
def roundto(n:int, m:int) -> int:
"Round the integer n down to a multiple of m."
return m * (n // m)
def unitsfor(s: Square) -> [[Square]]:
"Return the three units for square s."
# (r, c, b) are the first squares in the (row, col, box) of s
r = roundto(s, N)
c = (s % N)
b = roundto(s, N * Nbox) + roundto(c, Nbox)
print s, 'r,c,b=', r, c, b
return [range(r, r + N, 1), # row
range(c, N * N, N), # col
[br + bc # box
for br in range(b, b + Nbox*N, N)
for bc in range(3)]]
# Define units[s] and peers[s] for all squares s
units = [unitsfor(s) for s in squares]
peers = [set().union(*unitsfor(s)) - {s} for s in squares]
every_unit = set(tuple(u) for s in squares for u in units[s])
We can check that the peers and units look good:
>>> units[28] [[27, 28, 29, 30, 31, 32, 33, 34, 35], [1, 10, 19, 28, 37, 46, 55, 64, 73], [27, 28, 29, 36, 37, 38, 45, 46, 47]] >>> peers[28] set([32, 33, 34, 35, 36, 37, 38, 1, 64, 73, 10, 45, 46, 47, 19, 55, 27, 29, 30, 31])
How are we going to input a puzzle to our program? Looking around on the web, we see that there are several common notations for describing a puzzle as a string of text. Below I list four example formats, each representing the puzzle diagrammed at the top of this page.
4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4...... 4.....8.5 .3....... ...7..... .2.....6. ....8.4.. ....1.... ...6.3.7. 5..2..... 1.4...... 400000805 030000000 000700000 020000060 000080400 000010000 000603070 500200000 104000000 4 . . |. . . |8 . 5 . 3 . |. . . |. . . . . . |7 . . |. . . ------+------+------ . 2 . |. . . |. 6 . . . . |. 8 . |4 . . . . . |. 1 . |. . . ------+------+------ . . . |6 . 3 |. 7 . 5 . . |2 . . |. . . 1 . 4 |. . . |. . .We will develop a function that accepts any of these formats; we call the format a gridstring and define it to be a string that contains exactly N2 instances of either a digit or the characters '.' or '0' to indicate an empty square. Other characters (such as spaces and hyphens) are ignored.
Here is the function Grid that takes a gridstring as input and returns a grid, and the inverse function, show, that takes a grid and returns a pretty gridstring:
def grid(gridstring: str):
"""Convert a gridstring into a grid: a list of values for squares.
Accepts digits, or '.' or '0' for empty; all other characters ignored."""
gridstring = gridstring.replace('0', empty)
result = [c for c in gridstring if c in digitset or c == empty]
if len(grid) ** (1/4.) not in range(2, 8):
raise ValueError('Wrong length for a grid')
return grid
def show(grid: Grid) -> str:
"Convert a grid to a pretty gridstring."
dashes = '\n' + '+'.join(['--'*Nbox]*Nbox) + '\n'
boxes = '\n'.join(['|'.join(['{} '*Nbox]*Nbox)]*Nbox)
template = dashes.join([boxes]*Nbox)
return template.format(*grid)
And here is an example of the functions at work:
>>> puzzle1 = "4.....8.5.3..........7......2.....6.....8.4......1.......6.3.7.5..2.....1.4......" >>> parse(puzzle1) ['4', '.', '.', '.', '.', '.', '8', '.', '5', '.', '3', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '7', '.', '.', '.', '.', '.', '.', '2', '.', '.', '.', '.', '.', '6', '.', '.', '.', '.', '.', '8', '.', '4', '.', '.', '.', '.', '.', '.', '1', '.', '.', '.', '.', '.', '.', '.', '6', '.', '3', '.', '7', '.', '5', '.', '.', '2', '.', '.', '.', '.', '.', '1', '.', '4', '.', '.', '.', '.', '.', '.'] >>> print show(Grid(puzzle1)) 4 . . |. . . |8 . 5 . 3 . |. . . |. . . . . . |7 . . |. . . ------+------+------ . 2 . |. . . |. 6 . . . . |. 8 . |4 . . . . . |. 1 . |. . . ------+------+------ . . . |6 . 3 |. 7 . 5 . . |2 . . |. . . 1 . 4 |. . . |. . .
We'll need a function to check/verify a solution. The definition of a solution was given at the top of this page, but we will implement the check slightly differently from the way the definition is stated: instead of checking both that every square is filled and that no two squares in a unit have the same digit, we will instead check that every unit forms a set of digits which is equal to the set of all digits (digitset). We still check that the filled-in squares of the puzzle are unchanged in the proposed solution.
def is_solution(candidate: Grid, puzzle: Grid) -> bool:
"""A candidate grid is a solution to a puzzle if every unit of the candidate
contains all the digits, and the filled squares of the original puzzle remain
unchanged in the candidate."""
return (all(set(candidate[s] for s in unit) == digitset for unit in every_unit)
and
all(puzzle[s] == candidate[s] for s in squares if puzzle[s] is not empty))
Humans solve Sudoku problems by using logical deduction to infer that a certain digit is (or is not) the value of a square; such deductions are repeated until all squares are filled with a single known digit. An example of an inference rule that allows deductions is "If there is only one empty square in a unit, fill it with the digit that has not yet appeared in the unit." If the set of inference rules are sound and complete and the puzzle is well-formed, the human will only need a pencil--never an eraser. We could write an algorithm to do deductions like that, but the trick is in knowing when you have enough inference rules to solve every possible puzzle. I don't know how to guarantee that.
Therefore we will take a different route: searching for a solution that works by trying out possibilities one by one. We will describe a series of search algorithms, starting with the hopelessly inefficient and ending with one that can solve over 40,000 hard Sudoku puzzles per second. However, when implemented by a human, this strategy requires an eraser as well as a pencil.
Generate and test algorithm: First fill in all the empty squares with digits. Then check to see if we have a solution. If not, then fill the empty squares with a different combination of digits. Repeat with different combinations until a solution is found.
Sometimes generate and test is the best you can do. Herb Simon, in his classic book The Sciences of the Artificial (page 194), describes the task of opening a safe that has ten dials, each with 100 numbers on it. If the safe is well-designed there is no better approach than to generate-and-test with all 10010 combinations.
Let's consider an example of a tiny puzzle, with N=4:
>>> tiny = '...4 4.3. .4.. .241' >>> print show(Grid(tiny)) . . |. 4 4 . |3 . ----+---- . 4 |. . . 2 |4 1There are nine empty squares, so we need to fill them in with nine digits. Let's do it systematically, starting with 111111111, then 111111112, 111111113, 111111114, and on to 111111121, 111111122, and so on. The function itertools.product is useful for doing this kind of enumeration over possibilities. Here is the implementation of generate and test, encoded in the function solve, which takes a gridstring as input and returns a solution if it can find one, else None.
import itertools
def solve(gridstring: str) -> Grid:
"Generate every combination of digits for the empty squares; test each one."
puzzle = Grid(gridstring)
possibilities = [(digitstr if v == empty else v) for v in puzzle]
C = 0
for candidate in itertools.product(*possibilities):
C += 1
#print candidate
if is_solution(candidate, puzzle):
return candidate, C
return None
We can use it to solve the tiny puzzle:
>>> print show(solve(tiny)) 1 2 |3 4 3 4 |1 2 ----+---- 4 3 |2 1 2 1 |4 3
It took 0.8 seconds to solve this tiny 4x4 puzzle (it tried about 100,000 possibilities before finding the solution). Now consider the 9×9 puzzle defined above as puzzle1. It has 17 filled squares, and thus 81 - 17 = 64 empty squares. So there are 964 possibile combinations (about 1061) to consider. How long will it take to run through them?
Suppose we have access to a secret new custom 1024 core 10 GHz CPU that features special hardware instructions that can generate and test a position in one clock cycle, and let's say we could afford to buy a billion of these, and while we're shopping, let's say we also pick up a time machine and go back 13 billion years to the early days of the universe, and a fleet of starships with which we visit 10 billion galaxies and in each galaxy convince the inhabitants of 10 billion planets to each buy a billion of these computers, and we all start our programs running, managing to distribute the cases perfectly so that no computation is duplicated or wasted. Then we can estimate that we'd be less than 4% done with this one puzzle by now.
Generate and test is not the algorithm you're looking for. Move along.
Fortunately for us, Sudoku is like the defective safe. To be precise, it is like a safe with 81 dials, each with 9 numbers, and sometimes if you put one of the dials in the wrong position you hear a sound, but sometimes not (depending on the positions of the other dials and on how carefully you listen). For example, in the lower-left corner of the tiny puzzle above, if we try 1, 2, or 4 we would immediately get feedback that those digits won't work, because they already appear elsewhere in the bottom row. Therefore, the lower-left corner must be filled by a 3. There are 262,144 ways to fill in the 9 empty squares, but right away we can eliminate 196,608 of them; we need only consider the ones that have a 3 in that position.
Unfortunately for us, Sudoku does not give up all its secrets so easily. Sometimes, when we consider a square, we can't immediately tell what digit to put there. For example, in the upper-left of the tiny puzzle, only 4 can be eliminated as a possibility. That's ok--we're allowing ourselves the use of an eraser, so just guess one of the remaining three possibilities; if it doesn't pan out, erase it and try one of the others. This gives us:
Combinatorial search algorithm: Start filling squares with digits, one at a time, always making sure that the digit we put in a square is not the same as any peer. If there are several possible digits, pick one, but remember the others. If we reach a square where every digit conflicts with a peer, back up and consider a different digit for the previous square. Stop when we successfully fill every square, or give up if we tried all possibilities without finding a solution.The progress of this algorithm can be described as a search tree, where each node represents a partially-filled (incremental) state of the puzzle, and a branch corresponds to filling in a square with a digit. (We display the new digits in red.) Here is a search tree for the tiny puzzle:
A few things to note about this particular search tree:
The code to implement combinatorial search is straightforward:
def search(grid: Grid) -> Grid:
"""Select an unfilled square, try each possible digit in order, recursively searching.
When all squares filled: success; when no more digits to try: return Inconsistent for failure."""
if grid is Inconsistent:
return Inconsistent
s = select_square(grid)
if s is None:
return grid
for d in possible_values(grid, s):
result = search(assign(grid[:], s, d))
if result is not Inconsistent:
return result
return Inconsistent
def select_square(grid: Grid) -> Square:
"Return the first square that is not filled with a digit; or None if all squares are filled."
for s in squares:
if grid[s] not in digitset: return s
return None
def possible_values(grid: Grid, s: Square) -> {Digit}:
"A square can be any digit that is not already taken by a peer."
return digitset - set(grid[s] for s in peers[s])
def assign(grid, s, d):
"For now, simply assign grid[s] = d."
grid[s] = d
return grid
def solve(gridstring):
"Parse gridstring and search for a solution."
return search(Grid(gridstring))
Inconsistent = ['?'] * N * N ## A special grid used to indicate that the puzzle is unsolvable
The key function is search. It obeys the convention that if
it is passed Inconsistent (to indicate an invalid grid) it returns
Inconsistent, meaning that no solution can be found. Otherwise it
selects some unfilled square to work on. If select_square
returns None that is actually good news: it means that all
the squares are already filled, and we are done: we have a solution,
so we just return it. Otherwise, for each possible
digit that could fill square s we try to assign that digit to
s and search for a solution from there. If some call to
search succeeds, return it; but if a digit assignment causes
the search to fail, just keep going with the next digit assignment.
If all digit assignments fail, then the whole call to search
fails, and we back up to the previous recursive call. (Note that the
expression grid[:] means to make a copy of the entire
grid.)
We call this type of combinatorial search a constraint satisfaction search: think of each square as being a variable that can take on a value (a digit), and the rules of Sudoku as being constraints on the possible values. A state in the search tree is then an assignment of values to some subset of variables in a way that does not violate any constraint. The constraint satisfaction approach has found many fun applications in puzzles and serious applications in problem solving.
Here we see solve (and therefore search) in action:
>>> print show(solve(puzzle1)) 4 1 7 |3 6 9 |8 2 5 6 3 2 |1 5 8 |9 4 7 9 5 8 |7 2 4 |3 1 6 ------+------+------ 8 2 5 |4 3 7 |1 6 9 7 9 1 |5 8 6 |4 3 2 3 4 6 |9 1 2 |7 5 8 ------+------+------ 2 8 9 |6 4 3 |5 7 1 5 7 3 |2 9 1 |6 8 4 1 6 4 |8 7 5 |2 9 3That's all there is to it! The only reason we don't stop this article now is that the program is still too slow for some purposes. We're not talking billions-of-years slow, but it did take 3 minutes and 45 seconds to solve this puzzle (a rather hard one). On a benchmark of 50 easy puzzles the program was fast enough, taking a total of 9.5 seconds (a rate of 5 puzzles per second, or 5 Hz).
However, if you want to solve a million hard puzzles, you probably don't want to wait a few years, and you would prefer a faster algorithm. Look again at the search tree above, and consider how we can find the solution faster. Here are four general strategies:
Consider this branch of the search tree from above:
As soon as we place a 1 in the upper-left corner, there is an inconsistency: there is no digit that can go in the square in the second row, second column (because 1 is in the same box, 2 is in the same column, and 3 and 4 are in the same row). But our combinatorial search does not discover that until three more levels in the tree, when it gets around to selecting the offending square. The idea of forward checking is that we don't wait: every time we place a digit, we check the unfilled squares and if one of them has no possible digits left we backtrack immediately rather than waiting to select the offending square.
However, there is a cost to forward checking: we need to figure out if any square anywhere in the puzzle has no possible digits left. To do that efficiently we modify the combinatorial search algorithm as follows:
Constraint checking with forward checking algorithm: Initialize each square (each grid[s]) to hold the set of possible digits for that square (that is, all the digits that have not been assigned to a peer). Start filling squares with digits, one at a time, always making sure that the digit we put in a square is not the same as any peer. Whenever we assign a digit d to a square, eliminate that digit from the set of possible digits for all peers of the square. Back up and consider a different digit for the previous square whenever any square is reduced to the empty set of possibilities. Stop when we successfully fill every square, or give up if we tried all possibilities without finding a solution.
Note that grid[s] holds the set of possible digits for square s in the same way that humans use a pencil to write the possible digits for some squares into the squares themselves.
We will describe the implementation in two steps. First, we change
the implementation of an unfilled square from a '.' to a set
of possible digits. (Each set could be represented by a Python
set or list, but I chose instead to use a string of
digits. So the value for the upper-left square in the tiny puzzle
(before it is assigned a digit) would be '123'.) This convention is set up by the
function initialize and we will see that it is maintained by
assign. One thing becomes easier: the function
possible_values can now just look up the precomputed set of
possible digits rather than having to compute it:
def solve(gridstring):
"Parse gridstring, handle initial constraints, and search for a solution."
return search(initialize(Grid(gridstring)))
def initialize(puzzle):
"""First set each square in the result grid to have the set of all possible digits.
Then assign each filled square to the specified digit."""
assert len(puzzle) == N*N
grid = N*N * [digitstr]
for s in squares:
if puzzle[s] in digitset: grid = assign(grid, s, puzzle[s])
return grid
def possible_values(grid, s): return grid[s]
Next we'll change assign(grid, s, d) so that it calls the function eliminate to eliminate d as a possibility from all the peers of s and fails if any peer has no remaining possible digits. (We add an extra level of indirection--eliminate calls check which calls forward_check to see if any square has no possible digits--we do this because we will soon introduce other kinds of checks.) Note that the functions search and select_square remain unchanged.
def assign(grid, s, d):
"""Assign grid[s] = d and eliminate d from the peers of s.
Return the updated grid, or Inconsistent if inconsistency detected."""
if d not in grid[s]: return Inconsistent # d is not among the possibilities
grid[s] = d
if not all(eliminate(grid, p, d) for p in peers[s]): return Inconsistent
return grid
def eliminate(grid, s, d):
"Remove d from possibilities for grid[s]. If checking finds an inconsistency return Inconsistent."
if d not in grid[s]: return grid # Already eliminated d
grid[s] = grid[s].replace(d, '')
return check(grid, s, d)
def check(grid, s, d):
"Return True iff the constraints on square s after eliminating d are all satisfied."
return forward_check(grid, s)
def forward_check(grid, s):
"Fail if s has no possible digit, or one value which is the same as a peer."
ndigits = len(grid[s])
return not (ndigits==0 or (ndigits==1 and any(grid[p]==grid[s] for p in peers[s])))
How much does forward checking help? For the easy puzzles, not much. One reason they are called "easy" is that they don't require forward checking. For the harder puzzle1 puzzle the solving speed is improved by a factor of 5. But that's still 43 seconds. Let's keep improving.
This type of cascade affecting multiple squares throughout the puzzle is called constraint propagation, in contrast to the constraint checking that we saw in the previous section. Constraint propagation contributes to strategy (C), pruning earlier, and strategy (B), moving the solution nearer to the top (because we can fill in multiple squares at a time).
For example, suppose we assign grid[s] = '1', and suppose that some peer square s2 has the possibilities '12'. Now we know that s2 can only be '2', so we can make that assignment. Then if some peer of s2 has the possible digits '23', we now know it must take the digit '3', and so on.
This particular type of constraint propagation is called arc consistency; it gets its exotic name from the practice of describing a constraint satisfaction problem as a graph where the nodes are the variables (in Sudoku, the squares) and the arcs are the constraints. We make the arcs consistent one by one, and we can end up traversing multiple links in the graph. In contrast, the forward checking algorithm only peeks ahead one arc in the graph.
The implementation of arc consistency for Sudoku is a simple change to what we already have:
def arc_consistent(grid, s):
"Return true if s has multiple digits left, or one that we can consistently assign."
ndigits = len(grid[s])
return ndigits >= 2 or (ndigits == 1 and assign(grid, s, grid[s]))
def check(grid, s, d): return arc_consistent(grid, s)
Compared to forward checking, arc consistency reduces the time to solve puzzle1 from 43 seconds down to 0.16 seconds (a 266-fold speedup). The 50 easy problems improve from a rate of 5 per second to 282 per second, a 56-fold speedup. In fact, 70% of the easy problems have their search tree pruned to a single node with this approach! Clearly, arc consistency is very effective on Sudoku problems (as it is on many constraint satisfaction problems), and one might be tempted to stop here. But if you want to solve a million hard puzzles, you're still looking at a day or two of compute time, so let's keep going.
def dual_consistent(grid, s, d):
"""After eliminating d from grid[s], check each unit of s and make sure there is some
position in the unit for d. If only one possible place left for d, assign it."""
for u in units[s]:
places_for_d = [s2 for s2 in u if d in grid[s2]]
nplaces = len(places_for_d)
if nplaces==0 or (nplaces==1 and not assign(grid, places_for_d[0], d)):
return Inconsistent
return True
def check(grid, s, d):
return arc_consistent(grid, s) and dual_consistent(grid, s, d)
Dual consistency doesn't help on the easy puzzles, but on a set of hard puzzles, top870, dual consistency gives a 7-fold speedup over arc consistency.
def naked_pairs(grid, s):
"""Look for two squares in a unit with the same two possible digits.
For example, if s and s2 both have the value '35', then we know that 3 and 5
must go in those two squares. We don't know which is which, but we can eliminate
3 and 5 from any other square s3 that is in the unit."""
vals = grid[s]
if len(vals) != 2: return True
for u in units[s]:
for s2 in u:
if s2 != s and grid[s2] == vals:
# Found naked pair: s and s2; remove their two vals from others in unit
for s3 in u:
if s != s3 != s2:
if not all(eliminate(grid, s3, v) for v in vals):
return Inconsistent
return True
def check(grid, s, d):
return arc_consistent(grid, s) and dual_consistent(grid, s, d) and naked_pairs(grid, s)
Naked pairs is also an effective strategy -- on top870, adding naked pairs gives us a speed almost twice as fast as dual consistency, and about 13 times faster than arc consistency alone.
import random
def select_square_randomV(grid):
"Return a random unfilled square."
unfilled = [s for s in squares if len(grid[s]) > 1]
return random.choice(unfilled) if unfilled else None
def select_square_minRV(grid):
"Return an unfilled square with the fewest possible digits; or None if no unfilled squares."
unfilled = [s for s in squares if len(grid[s]) > 1]
return min(unfilled, key=lambda s: len(grid[s])) if unfilled else None
def select_square_maxRV(grid):
"Return an unfilled square with the most possible digits; or None if no unfilled squares."
unfilled = [s for s in squares if len(grid[s]) > 1]
return max(unfilled, key=lambda s: len(grid[s])) if unfilled else None
MinRV is generally held to be a good strategy; the idea is that if there are only two possibilities remaining then on average, half the time you will have moved the solution into the left-most branch of the search tree. Benchmarks show that MinRV is in fact the best variable ordering strategy for my implementation of Sudoku; it gives a speedup of about 20% over the first-variable ordering (and even more over random ordering and MaxRV.
def constraining_value(grid, s, d):
"Count the number of peers of s that would have 0 or 1 digit left without d."
return sum((d in grid[s2] and len(grid[s2]) <= 2) for s2 in peers[s])
def possible_values_most_constrained(grid, s):
"Most constrained value first."
return sorted(grid[s], key=lambda d: -constraining_value(grid, s, d))
def possible_values_least_constrained(grid, s):
"Least constrained value first."
return sorted(grid[s], key=lambda d: constraining_value(grid, s, d))
def possible_values_random(grid, s):
"Values in random order."
vals = list(grid[s])
random.shuffle(vals)
return vals
It turns out that value ordering makes no consistent difference in the spped of execution. Perhaps that is because I re-compute the constraining_value computation for each square each time. If I kept a sorted queue of precomputed constraining values, I might see a speedup.
| Filename | N puzzles | Description |
|---|---|---|
| puzzle1 | 1 | hard puzzle, called puzzle1 above |
| easy50 | 50 | easy puzzles |
| hardest | 11 | puzzles from around the web advertised as "hardest" |
| top95 | 95 | hard puzzles |
| top870 | 870 | hard puzzles |
| top2365 | 2365 | hard puzzles |
| subig20 | 17,445 | hard puzzles |
| min17 | 49,151 | hard puzzles with 17 filled squares |
| ALL | 69,091 | all of the above (duplicates eliminated) |
| ALL2 | 138,182 | puzzles from ALL2, plus the reverse of each of them |
I ran the benchmark code on five algorithms: combinatorial search (comb), forward checking (FC), and arc consistency by itself (AC), with dual consistency (AC+Dual), and with dual plus naked pairs (AC+D+NP). For each benchmark file we can see the name of the file, the number of puzzles, the average speed of solving (expressed two ways: as a frequency in Hertz (cycles per second) and as the average number of seconds per puzzle), and the maximum time across all puzzles. Here are the results:
version name N Hz avg max
comb puzzle1 1 0 225.729 225.729
FC puzzle1 1 0 42.132 42.132
FC easy50 50 5 0.172 0.987
AC puzzle1 1 6 0.159 0.159
AC easy50 50 285 0.004 0.025
AC top870 870 4 0.221 16.197
AC+Dual puzzle1 1 62 0.016 0.016
AC+Dual easy50 50 188 0.005 0.009
AC+Dual top870 870 33 0.030 1.273
AC+D+NP puzzle1 1 147 0.007 0.007
AC+D+NP easy50 50 183 0.005 0.007
AC+D+NP top870 870 58 0.017 0.596
Here are the results; the benchmarks uphold the idea that MinRV is a good strategy; it decreases the average time by a modest amount and the maximum time by a lot. (On the easy50 benchmark, MinRV falls just slightly behind firstV, indicating again that on easy puzzles, simple is better, but the complexity pays off on harder puzzles.)
version name N Hz avg max firstV easy50 50 183 0.005 0.007 firstV top870 870 58 0.017 0.596 randomV easy50 50 165 0.006 0.019 randomV top870 870 25 0.039 1.762 MaxRV easy50 50 164 0.006 0.017 MaxRV top870 870 4 0.223 19.664 MinRV easy50 50 182 0.005 0.008 MinRV top870 870 73 0.014 0.185We can see that on the hard puzzles (top870) the combination of all three checking procedures (AC+D+NP) is the fastest. (On the easy puzzles the more complex checking is not needed; AC alone is good enough.) And here are the results. We see that value ordering makes almost no difference.
version name N Hz avg max 1stVal easy50 50 182 0.005 0.008 1stVal top870 870 73 0.014 0.185 MostCVl easy50 50 184 0.005 0.008 MostCVl top870 870 72 0.014 0.193 LeastCV easy50 50 181 0.006 0.009 LeastCV top870 870 72 0.014 0.194 RandVal easy50 50 185 0.005 0.008 RandVal top870 870 75 0.013 0.209
import concurrent.futures
def solve_list(puzzles, max_workers=6):
results = concurrent.futures.ThreadPoolExecutor(max_workers).map(solve, puzzles)
return Inconsistent not in results
Therefore I decided to port the program to Java, which has much better facilities for concurrency. The translation is fairly straightforward, but there are some changes; some because of the nature of Java, and some because I wanted to take advantage of this rewrite opportunity to improve efficiency. Here are the main differences between the Python version and the Java version:
When we start adding threads, the performance increases almost linearly with the number of threads up to 6 (we reach a speedup of 5.66 times over a single thread). That is about what I would expect with 6 CPUs.
As we add a few more threads we continue to run faster, but the slope of the curve is about cut in half from 7 up to 10 threads (where we have a speedup of up to 7.26). From there we have a long flat plateau where the speedup remains between 7 and 7.5 no matter how many threads we add. The best time recorded was 43,873 puzzles per second (with 40 threads, but random variation suggests that 40 threads may be no better than 30 or 20). This is 700 times faster than the Python program.
Here we see the performance curve, with two sample runs done for each number of threads, and with the results expressed two ways: by total number of puzzles solved per second and by the speedup over a single thread:

I have to say I was surprised by two things: that the curve continued to rise past 6 threads, and that the plateau was so long. I expected the curve to peak at 6 and then drop off. In retrospect, I think both surprises can be explained by the facts that thread creation is fast, thread overhead is small, and each core can take advantage of instruction level parallelism. I have a 6-Core Intel Xeon "Westmere" processor, which can execute (various parts of) several instructions simultaneously in a pipeline process. Apparently, multiple threads running at the same time (on each core) can keep this pipeline filled better than one thread. Another advantage of having more than 6 threads is that it is less likely that one thread will take an extra long time (because it happens to have a few very difficult puzzles). With more threads, each sublist is shorter, and thus the expected time when only one thread is left, working on the remaining puzzles, is shorter.
One more thing: we need to make clear the distinction between improving throughput and latency. Throughput is the total number of puzzles that can be solved in a given time. The multithreading approach increased throughput by a factor of 7. Latency is the time it takes to solve a single puzzle. Latency is not helped at all by multithreading. It takes the same amount of time to solve a single puzzle with single threading as with multithreading; it is just that in multithreading we are doing 6 or 7 puzzles at once.
Don't let my reluctance stop you: it would be a great project to implement any of these techniques for Sudoku and see how they perform.
There might be a bug in the program that causes it to crash or give erroneous results for some puzzle. That seems less likely every time the program successfully processes another 100,000 puzzles, but, even though the program does pass the test suite, I have not yet proved the program correct, nor provided complete test coverage of every line.
There might be a puzzle that takes a really long time to solve. Let's consider just the Java version for this, since it is so much faster. I modified Sudoku.java to keep track of the execution time of each individual puzzle, and applied it to all the puzzles in ALL.txt, along with the reversal of each of those puzzles (that is, put the last square first, etc.). That gives us 138,182 puzzles. The average time to solve these puzzles is 184 microseconds, and the maximum time is 34,762 microseconds (or 0.034762 seconds). But if we looked at 100 billion puzzles instead of 100 thousand, would some puzzle take significantly more time? To try to answer this I made a chart of the maximum time as a function of the number of puzzles attempted. I also show the average time and standard deviation:
N mean stddev max (times in milliseconds) 10 94 14 120 100 94 29 288 1000 100 48 617 10000 134 267 17,963 100000 184 447 34,762 138182 184 431 34,762
We can see that the maximum time tends to about double for each 10-fold increase in the number of puzzles solved, except for one big jump where it goes up by a factor of 30. This suggests that if we increase the number of puzzles by a factor of a 106 (to 100 billion puzzles) the maximum time might increase by a factor of 26 (to about 2 seconds), or there might be another 30-fold jump (to about 30 seconds). Or not--we just don't have enough data, nor do we have a reasonable model of the underlying distribution we are sampling from.
Here's a chart of the base 10 log of the number of puzzles at each
execution time interval up to .01 seconds. The peak point says that
there are about 105 or 100,000 puzzles that took between 50
and 100 microseconds:
It might help to look at the problem slightly differently: instead of measuring the time to completion, let's measure the size of the search tree, which we can compute just by incrementing a counter every time we make a recursive call to search. For the 138,182 puzzles, the minimum tree size was 1 and the maximum 6,724. The median was 6 and the mean 20. To get a better idea of the distribution, here is a table of percentiles for the distribution of tree sizes:
| Percentile: | 1% | 10% | 25% | 50% | 75% | 90% | 99% | 99.9% | 99.99% | 99.999% | 100% |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tree size: | 1 | 1 | 3 | 6 | 15 | 38 | 245 | 1,041 | 3,122 | 4,102 | 6,724 |
And here is a histogram of the number of puzzles at each search tree size up to size 100 nodes (which accounts for 96.6% of the puzzles):
In my previous attempt to explain Sudoku solvers, I generated a million random puzzles, almost all of which were ill-formed (that is, they didn't have exactly one solution). The toughest one took 189 seconds for the old Python program to solve, 101 seconds for the new Python program described here, and 1.03 seconds for the Java program. My random procedure also generated some impossible puzzles (ones with no solutions); the toughest one of those is really hard: it requires 11.5 seconds for the Java program. In terms of tree size, the toughest solvable puzzle had 162,310 nodes, and the impossible one had 2,112,362 nodes.
I'm fairly confident in saying that if you generate random puzzles according to any reasonable random distribution, fewer than one in a billion of them will take as long as a minute for the Java program to solve. But if you used a clever adversarial approach to find the hardest possible puzzle particularly designed to thwart my program (the puzzle where the solution is in the lower-right corner of the largest possible search tree), I just don't know how long it would take or how big the tree would be. (If I suspected such an adversarial attack, I would use a random variable-ordering procedure to defend against it.)
But if you have enough scratch paper, you don't need an eraser. Think about it: I said that when there are two choices for some square, we write down the first one, recursively search to see if we can find a solution, and if not, erase what we did and try the other possible digit. But we don't really need an eraser at all; we could just use pieces of scratch paper to try out one alternative and then the other.
Another way to look at it: consider the (human) Sudoku strategy known as naked quads, which is similar to naked pairs, but involves finding four squares in a unit that share the same four digits. If we can find such an arrangement, we can make a logical deduction that those four digits appear nowhere else in the unit. But just finding the four squares is a form of search. If you're going to allow a search through four squares as a prerequisite to a deduction, then you should equally allow a search through a search tree of four nodes--and 38% of all the puzzles tested can be completely solved in a four node search tree. So the moral is that search is not qualitatively different than deduction--it just seems quantitatively different when the search trees are larger than a few nodes. From a logical point of view, even the generate-and-test search can be seen as a form of deduction--it is just a very complex deduction involving 81 squares at a time.