Building a Handwritten Multi-Digit Recognition

Building a Handwritten Multi-Digit Recognition

Setting up the Local Environment

Open a command prompt, created a folder digitapp and create a virtual environment

mkdir digitapp
cd digitapp
python -m venv digitapp

Activate the virtual environment

digitapp\Scripts\activate

Install the necessary libraries

File: requirements.txt

joblib==1.5.0

keras==3.8.0

matplotlib==3.10.0

numpy==2.0.2

pandas==2.2.2

scikit-learn==1.6.1

seaborn==0.13.2

streamlit==1.45.0

streamlit-drawable-canvas==0.9.3

# Command to execute for installation of the libraries

pip install -r requirements.txt

Create a model

File: trainer.py

#~~~ Working on the Data set ~~~

# 'Import necessary libraries'

import numpy as np

from tensorflow.keras import layers # For defining layers of the neural network

from tensorflow.keras import models # For building the model architecture

from keras.datasets import mnist # MNIST dataset of digits

from keras.utils import to_categorical # Utility to convert labels to one-hot encoding

# 'Set a random seed for reproducibility'

np.random.seed(42)

# 'load the mnist dataset from keras'

# 'Split the data into train and test sets'

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# 'Preprocess the training data'

# 'Reshape 60000 images with 28x28 pixels i.e., (60000, 28, 28) to (60000, 28, 28, 1)'

# 'to add a channel dimension (grayscale)'

train_images = train_images.reshape((60000, 28, 28, 1))

# 'Normalize pixel values from range[0, 255] to [0, 1] for better performance during training'

train_images = train_images.astype('float32') / 255

# 'Preprocess the test data in the same way'

test_images = test_images.reshape((10000, 28, 28, 1))

test_images = test_images.astype('float32') / 255

# 'Convert class vectors (integers) to binary class matrices (one-hot encoding)'

# 'e.g., label 3 becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]'

train_labels = to_categorical(train_labels)

test_labels = to_categorical(test_labels)

#~~~ Model~~~

# 'Create a Sequential model – layers are added'

# 'creating the instance of the model'

model = models.Sequential()

# 'adding layers to the model'

# 'First convolutional layer:

# - 32 filters

# - Each filter is 3x3 in size

# - ReLU activation introduces non-linearity

# - input_shape defines the shape of the input images (28x28 pixels, 1 channel)'

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))

# 'MaxPooling layer:

# - Reduces spatial dimensions (downsampling) using 2x2 pooling

# - Helps reduce computation and overfitting'

model.add(layers.MaxPooling2D((2, 2)))

# 'Dropout layer:

# - Randomly sets 20% of inputs to zero during training

# - Helps prevent overfitting'

model.add(layers.Dropout(0.2))

# 'Second convolutional layer:

# - 64 filters, again with 3x3 kernels

# - ReLU activation'

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# 'Another MaxPooling layer'

model.add(layers.MaxPooling2D((2, 2)))

# 'Another Dropout layer (20%)'

model.add(layers.Dropout(0.2))

# 'Third convolutional layer:

# - 64 filters

# - No pooling here,so spatial dimensions stay same(except for being reduced by conv itself)'

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# 'Flatten layer:

# - Converts 3D feature maps into 1D feature vector for the dense (fully connected) layers'

model.add(layers.Flatten())

# 'Dense (fully connected) layer:

# - 64 units

# - ReLU activation'

model.add(layers.Dense(64, activation='relu'))

# 'Output layer:

# - 10 units for 10 digit classes (0–9)

# - Softmax activation to output probabilities for each class'

model.add(layers.Dense(10, activation='softmax'))

# 'Print a summary of the model:

# - Shows the number of parameters and the output shape of each layer'

model.summary()

'''Summary of Architecture:

3 Convolutional layers to extract features

2 MaxPooling layers to reduce dimensions

2 Dropout layers to prevent overfitting

1 Flatten layer to convert to vector

1 Hidden dense layer

1 Output layer with 10 classes (digits 0–9) '''

# 'Compile the model:

# - optimizer='rmsprop':

# RMSprop is used to adjust the learning rate during training (good for RNNs and CNNs)

# - loss='categorical_crossentropy':

# Appropriate for multi-class classification where labels are one-hot encoded

# - metrics=['accuracy']:

# Track accuracy as the performance metric during training and validation'

model.compile(optimizer='rmsprop',

loss='categorical_crossentropy',

metrics=['accuracy'])

# 'Fitting the model:

# - train_images and train_labels: the input data and labels

# - epochs=5: the model will go through the full dataset 5 times

# - batch_size=64: the model will update weights after every 64 samples

# - validation_split=0.1: use 10% of training data for validation

# (to monitor performance on unseen data)'

training_history = model.fit(train_images, train_labels,

epochs=5,

batch_size=64,

validation_split=0.1)

# 'Plot the model results upon Loss:

import matplotlib.pyplot as plt

# Set the x-axis and y-axis labels

plt.xlabel('Epoch Number') # X-axis will represent epochs

plt.ylabel('Loss') # Y-axis will represent loss values

# Plot the training loss

plt.plot(training_history.history['loss'], label='Training Loss')

# Plot the validation loss

plt.plot(training_history.history['val_loss'], label='Validation Loss')

# Add a legend to label the two curves

plt.legend()

# Display the plot

plt.show()

# 'Plot the model results upon Accuracy:

# Plot training and validation accuracy over epochs

plt.xlabel('Epoch Number') # Label for x-axis (number of epochs)

plt.ylabel('Accuracy') # Label for y-axis (accuracy values)

# Plot training accuracy

plt.plot(training_history.history['accuracy'], label='Training Accuracy')

# Plot validation accuracy

plt.plot(training_history.history['val_accuracy'], label='Validation Accuracy')

# Add legend to differentiate between the two curves

plt.legend()

# Display the plot

plt.show()

# Evaluate the model on the test data

# This computes the loss and accuracy of the model on the test set

test_loss, test_acc = model.evaluate(test_images, test_labels)

# Print the accuracy on the test set

print(f"Test Accuracy: {test_acc}")

# Save the trained model to a file (mnist.h5)

# This allows you to reload and use the model later without retraining

model.save('mnist.h5')

Execute trainer.py and generate the model

Build the Web Application

File: app.py

#~~~ Importing the libraries ~~~

# Import Streamlit for creating the web application UI

import streamlit as st

# Import the Streamlit drawable canvas component for drawing on the web page

from streamlit_drawable_canvas import st_canvas

# Import OpenCV for image processing tasks

import cv2

# Import Keras function to load a pre-trained model

from keras.models import load_model

# Import NumPy for numerical operations, such as reshaping and padding arrays

import numpy as np

# Import the warnings module to filter and suppress warning messages

import warnings

warnings.filterwarnings('ignore') # Ignore any warnings that might clutter the output

# Initialize a global list to store predicted digits

dgrs = []

# Initialize a global string to store the prediction result

res = " "

#~~~ Working on the digits drawn ~~~

# User defined function to perform the prediction

def predict():

global res # Declare res as global so it can be modified inside this function

# Load the pre-trained MNIST model (expects input images of size 28x28)

model = load_model('mnist.h5')

# Define path to the image saved from the canvas

image_folder = "./"

filename = f'img.jpg'

# Working on the captured image to match the model input

# Read the image in color format using OpenCV

image = cv2.imread(image_folder + filename, cv2.IMREAD_COLOR)

# Convert the image to grayscale to simplify the image data (removes color channels)

gray = cv2.cvtColor(image.copy(), cv2.COLOR_BGR2GRAY)

# Apply Gaussian blur to smooth the image and reduce noise

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Apply adaptive thresholding to convert the grayscale image to binary (black & white)

# Adaptive thresholding is better than global thresholding in varying lighting conditions

th = cv2.adaptiveThreshold(

blurred, # Source image

255, # Maximum value to use with THRESH_BINARY

cv2.ADAPTIVE_THRESH_GAUSSIAN_C, # Use a weighted sum of neighborhood values

cv2.THRESH_BINARY_INV, # Invert the output (digits become white)

11, 2 # Block size and constant C

)

# Find contours (continuous lines or curves that bound the white regions)

contours = cv2.findContours(

th, # Binary image

cv2.RETR_EXTERNAL, # Only retrieve external contours

cv2.CHAIN_APPROX_SIMPLE # Compress horizontal, vertical, and diagonal segments

)[0] # Only need the contours details

# Loop through each contour (likely each digit drawn)

for cnt in contours:

# Compute the bounding rectangle around the contour

x, y, w, h = cv2.boundingRect(cnt)

# Draw a blue rectangle around each detected digit

cv2.rectangle(image, (x, y), (x + w, y + h), (255, 0, 0), 1)

# Crop the digit from the binary image using the bounding box

digit = th[y:y + h, x:x + w]

# Resize the digit to 18x18 pixels (MNIST model expects 28x28)

resized_digit = cv2.resize(digit, (18, 18))

# Pad the resized digit with 5 pixels of black pixels (zeros) on each side

# This results in a 28x28 image as expected by the model

padded_digit = np.pad(resized_digit, ((5, 5), (5, 5)), "constant", constant_values=0)

# Reshape the image to match the model's input shape: (1, 28, 28, 1)

digit = padded_digit.reshape(1, 28, 28, 1)

# Normalize pixel values from [0, 255] to [0, 1]

digit = digit / 255.0

# Get the prediction probabilities for each digit (0–9)

pred = model.predict(digit)[0]

# Get the digit with the highest probability (model's final prediction)

final_pred = np.argmax(pred)

# Append predicted digit to the global list

dgrs.append(int(final_pred))

# Add predicted digit to the result string

res = res + " " + str(final_pred)

# Prepare text showing prediction and confidence percentage

data = str(final_pred) + ' ' + str(int(max(pred) * 100)) + '%'

# Define font settings for overlaying prediction on the image

font = cv2.FONT_HERSHEY_SIMPLEX

fontScale = 0.5

color = (255, 255, 255) # White color

thickness = 1

# Overlay prediction text on the image at the top-left corner of the bounding box

cv2.putText(image, data, (x, y), font, fontScale, color, thickness)

#~~~ UI interface ~~~

# setting up the UI interface

# Set the app title

st.title("Drawable Canvas")

# Add a description below the title

st.markdown("""

Draw digits on the canvas, get the image data back into Python!

""")

# Create a canvas where users can draw digits

canvas_result = st_canvas(

stroke_width=10, # Thickness of the brush

stroke_color='red', # Color of the brush

height=150 # Height of the canvas

)

# Check if the user has drawn something

if canvas_result.image_data is not None:

# Save the drawn image to a file for processing

cv2.imwrite(f"img.jpg", canvas_result.image_data)

# Create a "Predict" button

if st.button("Predict"):

predict() # Call the predict function

st.write('The predicted digit:', res) # Display the result on the app

Run app.py using streamlit

Open the the below streamlit app URL in your browser

Web User Interface

Search This Blog

AI

Building a Handwritten Multi-Digit Recognition

Comments

Post a Comment

Popular posts from this blog

Building a Brand Detection Model with YOLOv11 in Google Colab