<aside> 💡

We’ll start off by creating a virtual environment

python -m venv .venv

Activate the venv

source .venv/bin/activate

Ensure setuptools and wheel are the latest version

pip install -U setuptools wheel

</aside>

Now install TensorFlow

pip install tensorflow

MNIST Dataset

We will be training a model to classify handwritten digits (0-9). I will try to explain the reasoning behind why each layer was chosen.

Start off by importing Tensorflow

import tensorflow as tf
print("TensorFlow version:", tf.__version__)

Building our Model

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

Now, load the dataset

mnist = tf.keras.datasets.mnist

Let’s load the train and test data

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

<aside> 💡

The pixel values of the images range from 0 through 255. We scale these values to a range of 0 to 1 by dividing the values by 255.0. This also converts the sample data from integers to floating-point numbers

</aside>

I will try to explain the reasoning behind each layer

  1. Flatten - This layer is used to convert the 2D image data (28x28 pixels) into a 1D array (784 pixels) because the subsequent layers expect a 1D input.
  2. Dense - Here we use 128 units with ReLU activation function.
    1. 128 units offers enough capacity to learn the features without being computationally expensive.
    2. ReLU allows the model to learn complex patters while mitigating vanishing gradient problem.
  3. Dropout - With a rate of 0.2 or 20% we tell it to randomly drop 20% of the neurons. This helps to prevent overfitting.
  4. Dense - This layer has 10 units each corresponding to digits from 0-9. The output will be in raw logits when can be given to a softmax function to calculate probability.

<aside> 💡

Sequential is useful for stacking layers where each layer has one input tensor and one output tensor. Layers are functions with a known mathematical structure that can be reused and have trainable variables. Most TensorFlow models are composed of layers. This model uses the Flatten, Dense, and Dropout layers.

For each example, the model returns a vector of logits or log-odds scores, one for each class.

</aside>