It is a vision based sudoku solver. Show your sudoku puzzle to the camera and it will overlay the solution over the frame. You can either turn on your webcam or pass an image containing a sudoku. By default, it runs in video mode.
You will need to install opencv
, numpy
and pytesseract
.
You might have to install tesseract-ocr
package before installing pytesseract. Use the following command to install this:
$ sudo apt-get install tesseract-ocr
You can use the following commands to install (preferably in a virtual env) these dependencies. Make sure you have python and pip3 installed.
$ pip3 install opencv-python
$ pip3 install numpy
$ pip3 install pytesseract
Video mode:
python3 sudoku-vision.py
Image mode:
python3 sudoku-vision.py --file path_to_file
We process the image to find a square contour. We are assumming that the sudoku grid is the largest square contour in the frame. If there is no contour found, we wait for the next frame to come in.
Now that we found the square, our next target is to get a stable top view of the sudoku grid. We find the corners of the grid by getting it's convex hull. Since it is a (almost) square, the corners in the convex hull will represent the actual corners in the image quite well. After finding the corners, we use persective transform to get the top view of the grid. Also, resize this top view to a square (450 x 450) for better readability.
Divide the square top view into 9 x 9 small boxes. In each box, we search for a contour with significant size. If there is no contour, the box must be empty. If we find the contour, it must be a digit so get the bounding box of the contour and crop it out. Here comes tesseract to our rescue. It will read the image and return the digit. But if there is a contour but that contour is not a digit, there must be some disturbance in then image so we ignore that frame and start again on next frame.
The naive method i.e. backtracking is good enough but we can improve. We can see the sudoku as a Constraint Satisfaction Problem. This efficient algorithm first eliminates the possibilities (of a cell having a certain value) leading to a wrong solution. Then starting from the cell having least number of possible values, iterates over all possible solutions. One might say it iterrates greedily. This effectively solves a sudoku in 0.01 seconds on average. For a better explaination, refer here.
The bulk of the work has already been done. Noe that we have the solution of the sudoku, we place the answers over the empty cells in the top view. Return the top view to its original shape and reverse the perspective transform. Then we place it in the original frame. Mission completed!
When I started this project, I was a noob in Machine Learning and Deep Learning. This was the reason I chose to use a prebuilt library for OCR. But well, it was too slow. Or probably slow only in my low spec laptop (I haven't had an opportunity to test it in another system thanks to this quarantine). In MY laptop, it(tesseract) takes a good 3-4 seconds to read the whole grid and the video mode appears to be lagging heavily. I will be studying and exploring these domains and eventually replace tesseract with a custom trained model.