Wednesday, March 2, 2022

Excel Sudoku Generation Techniques: An Overview

There has been some interest and several questions around the macros used to generate and solve Sudoku puzzles published in this blog. In this post, we will review the different methods available to generate Sudoku puzzles with Excel VBA (or any other programming language), while the macros to solve Sudoku puzzles have been covered in this previous post: Excel Sudoku Solver. There are basically two ways to generate Sudoku puzzles with a computer. We can use either incremental or decremental generation.


Incremental Generation

Incremental generation assigns numbers to one cell after another until sufficient hints are given for the puzzle to have a unique solution. The numbers are usually added randomly, and always following the Sudoku rules (a number must be unique within a row, column, and block). In order to ensure the configuration is leading to a solvable puzzle, and to check when a unique solution is reached, a Sudoku solver is needed. Thus, the solver should run after adding each number. If adding a number results in a configuration that cannot be solved, the number is removed (backtracking), or the process starts all over from the beginning. If the configuration can be solved and has a unique solution, the process is complete and the Sudoku puzzle is ready.

There are different ways to incrementally add random numbers to a Sudoku grid with Excel VBA. The recommended approach is to add random numbers (between 1 and 9) to random positions in the grid. We do that using the Randomize and Rnd functions to get and assign random values to the respective variables (num, CellRow, and CellCol in the example below). The process repeats within a loop until a successful candidate number and position is found.

  Randomize
  CellRow = Int((Rnd * 9) + 2)          'random row index between 2 and 10
  CellCol = Int((Rnd * 9) + 2)            'random column index between 2 and 10
  num = Int((Rnd * 9) + 1)                'random number between 1 and 9

Then we check if the chosen number and position is compliant with the Sudoku rules using the Excel CountIf worksheet function. Note that we use a custom function (qRng) to get the range of the block for a given row and column.

  WorksheetFunction.CountIf(Range("B" & CellRow & ":J" & CellRow), num)            ‘count in row
  WorksheetFunction.CountIf(Range(Cells(2, CellCol), Cells(10, CellCol)), num)        ‘count in column
  WorksheetFunction.CountIf(Range(qRng(CellRow, CellCol)), num)                        ‘count in block
 

Then it calls the solver macro each time a number is added. If the solver returns a conflict, it removes the last number added or starts all over. Here’s an example of a Sudoku puzzle created using incremental generation.


In this case, a simple solver was used, and therefore, the difficulty of the puzzle is rather low. But we cannot really tell what is the difficulty of the puzzle or what solving techniques are needed to solve it, unless we specifically use a solver that applies only certain techniques and solves for a certain difficulty (see more about that in this other post: Excel Sudoku Solver). We can tweak a bit the process and follow some pattern or algorithm to add the numbers, e.g., adding numbers by block. That could speed up the process, and should not have much of a greater impact on preserving the randomness of the configuration. We can further improve the performance using an array instead of read/write directly to the worksheet (we will see that later).

We can conclude that, despite it is possible to create Sudoku puzzles using incremental generation, the process requires a Sudoku solver to check the puzzle is correct and confirm when a unique solution is reached, which requires processing power and takes time. Furthermore, we cannot set or predict the difficulty of the puzzles generated, unless specifically programing the solver to do so. 

 

Decremental Generation

Decremental generation removes numbers from the cells of a fully filled Sudoku grid for as long as desired or possible in order for the solution to remain unique. It can be seen as a two-step process. In the first step, the entire Sudoku grid must be filled with numbers. The second step consists of removing some numbers to leave the gaps of the final puzzle.

 

Step 1: Add all numbers to fill the grid

There are several ways to add all numbers and fill the entire Sudoku grid with a computer. But if we were to do it on paper, we could use a trivial sequence or any other logical sequence derived from a mathematical formula. The simplest formula leads to a linear sequence as the one shown in the picture below.


Then we can apply transformations that preserve the validity of the puzzle. Thus, we can shuffle the rows or columns within a group of blocks, or also the group of blocks altogether, as many times as wished.  


However, this method creates a grid full of repetitions of the same three numbers along columns or rows within each block across the grid. Therefore, it is NOT a desired option to fill the entire grid, and filling the grid randomly is preferred instead.



We can use a similar approach to the one we have seen for incremental generation to fill the entire grid randomly. That is adding random numbers to random positions, or yet faster if adding each number one after another (first all 1s, then 2s, etc) to a random cell within each of the nine blocks, until either a wrong configuration is reached or the entire grid is filled (trial and error). When reaching a wrong configuration, we can either remove the last number or numbers (backtracking), or start all over again (brute force).

The simplest approach is to use brute force and write the values directly to the worksheet. There are several ways to achieve this with Excel VBA. One (hopefully) simple way to put it, is using two loops, one for rows and one for columns, and stepping every 3 cells to locate the starting cell at the top left of each block (see picture below).

 For GridRow = 2 To 10 Step 3
     For GridCol = 2 To 10 Step 3
            'tasks to do for each block
     Next GridCol
 Next GridRow


Then we start a Do loop to find a random place to add the number. The Do loop repeats until an empty cell where the number can be placed is found. The whole process continues until all blocks have been visited, and then moves to the next number until either filling the entire grid or getting stack – in such case, it starts all over.

This approach has been used in the simplified version of the Excel Sudoku Generator and it is explained in this other post: Excel Simple Sudoku Generator

A better approach though is to go back (backtracking) to the previous configuration (e.g., remove the positions for the last number added), instead of starting all over, when numbers cannot be placed any longer. A variation of this method has been used in the Excel Sudoku Generator, posted in this blog some years ago. An improved version of the backtracking algorithm is used to add all numbers more effectively in Excel Sudoku Pro

 

 

In the previous examples, we have been reading and writing the numbers directly from and into a range in the worksheet. That’s helpful to follow the process and see the numbers while testing. We can hide the progress by setting the screen updating of the Application object to false – that speeds up the macro performance considerably.

  Application.ScreenUpdating = False

We can also read and write the numbers from and into an array instead. That makes the process much faster and efficient. It creates a 2D array with the row position as the first dimension and the column index as the second dimension.

  Dim SudArray(1 To 9, 1 To 9) As Variant
  'if the conditions are met inside the loop
  SudArray(RowIndex, ColIndex) = num

When ready, we just print the array to the 9x9 range with a single line of code.

  Range("B2:J10") = SudArray

 

Step 2: Remove numbers from the grid

The second step is to remove numbers from the grid until a puzzle of the desired difficulty is reached. There are several ways to remove numbers without compromising a unique solution. At some point, a solver should be used to confirm the solution is unique. Also, the way the numbers are removed will ultimately determine the difficulty of the puzzle.

It is not advisable to follow a predetermined pattern or sequence to remove the numbers (for example, removing one number per block, or removing each number once in every round, etc.). A better approach is to randomly select positions and then check if the number can be removed. Initially, numbers that cannot be placed in any other cell within the same row, column, or block, are removed. For example, number 5 below can be safely removed.


However, number 3 cannot be safely removed (see the other picture below). It is also safe to remove numbers that can be placed in just one other empty cell, then blocking that row or column from removing more.

 

This usually ends up removing around 40 numbers and generates a puzzle of low difficulty that can be solved finding numbers with a single position (naked singles).


To increase the difficulty, we keep removing numbers targeting now certain positions to achieve the desired level of difficulty. But before removing those numbers, we need to make sure to not remove all numbers that belong to groups that can swap position between blocks. These are repetitions of the same numbers in groups of two or three, within a band or stack of blocks. In the Sudoku grid below, the numbers 3 and 6 form such group along the first band (top row of blocks), and numbers 7 and 8 form another group in the first stack (first column of blocks). There are many more if you look around.

 

Removing all four numbers that belong to a group of four or all six numbers in a group of six results in a puzzle with at least two solutions, or in other words, the puzzle does not have a unique solution (see example below removing all four numbers for the group with 3 and 6).


Therefore, it is important to keep at least one out of the four or six numbers in a group, so we need to check that before removing a number (if not using a solver to confirm the solution is unique).




We get to the next level targeting positions that reduce naked singles as much as possible. That leads to a puzzle of medium difficulty - see the difference between the two puzzles below (Easy on the left, Intermediate on the right).

 

Additionally, now we want to use a solver to ensure the final puzzle has a unique solution. From this point forward, and to create difficult and very difficult puzzles, the solver must be used to check removed numbers do not compromise a unique solution, and also to set the rules or solving techniques required for a desired level of difficulty (see more about solving techniques in this other post: Excel Sudoku Solver).

Finally, the highest level of difficult for a given configuration is the result of removing all numbers that can possibly be removed without compromising a unique solution, and therefore, a multi-solver must run for each of the remaining numbers. The multi-solver uses a backtracking algorithm to crack the solution or multiple solutions (if more than one), and tells if the number can be removed. That generates very difficult Sudoku puzzles that require advanced techniques and/or making assumptions/guessing to be solved.

Find below the links to other related posts that may be of your interest:

  

No comments:

Post a Comment

Popular Posts