Skip to content

Commit

Permalink
Merge pull request #1259 from dawidborycki/LP-PyTorch-Intro
Browse files Browse the repository at this point in the history
LP Creating a PyTorch model for digit classification
  • Loading branch information
jasonrandrews authored Sep 20, 2024
2 parents bf43b0a + 6636450 commit 108be23
Show file tree
Hide file tree
Showing 9 changed files with 328 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: Learn how to create the PyTorch model for digit classification
minutes_to_complete: 40

who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create a feedforward neural network for digit classification.

learning_objectives:
- Prepare the environment.
- Understand the MNIST digit dataset.
- Create a neural network architecture using PyTorch.

prerequisites:
- A x86_64 or Apple development machine with Code Editor (we recommend Visual Studio Code).

author_primary: Dawid Borycki

### Tags
skilllevels: Introductory
subjects: Neural Networks
armips:
- Cortex-A
- Cortex-X
operatingsystems:
- Windows
- Linux
- MacOS
tools_software_languages:
- Android Studio
- Coding

### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
# ================================================================================
# Edit
# ================================================================================

next_step_guidance: >
Proceed to Get Started with Arm Performance Studio for mobile to continue learning about Android performance analysis.
# 1-3 sentence recommendation outlining how the reader can generally keep learning about these topics, and a specific explanation of why the next step is being recommended.

recommended_path: "/learning-paths/smartphones-and-mobile/ams/"

# Link to the next learning path being recommended(For example this could be /learning-paths/servers-and-cloud-computing/mongodb).


# further_reading links to references related to this path. Can be:
# Manuals for a tool / software mentioned (type: documentation)
# Blog about related topics (type: blog)
# General online references (type: website)

further_reading:
- resource:
title: PyTorch
link: https://pytorch.org
type: documentation
- resource:
title: MNIST
link: https://en.wikipedia.org/wiki/MNIST_database
type: website
- resource:
title: Visual Studio Code
link: https://code.visualstudio.com
type: website



# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
# ================================================================================
# Edit
# ================================================================================

# Always 3 questions. Should try to test the reader's knowledge, and reinforce the key points you want them to remember.
# question: A one sentence question
# answers: The correct answers (from 2-4 answer options only). Should be surrounded by quotes.
# correct_answer: An integer indicating what answer is correct (index starts from 0)
# explanation: A short (1-3 sentence) explanation of why the correct answer is correct. Can add additional context if desired


review:
- questions:
question: >
Does the input layer of the model flatten the 28x28 pixel image into a 1D array of 784 elements?
answers:
- "Yes"
- "No"
correct_answer: 1
explanation: >
Yes, the model uses nn.Flatten() to reshape the 28x28 pixel image into a 1D array of 784 elements for processing by the fully connected layers.
- questions:
question: >
Does the model use dropout layers with a 20% dropout rate after each hidden layer?
answers:
- "Yes"
- "No"
correct_answer: 1
explanation: >
Yes, the model applies dropout layers after each hidden layer, randomly setting 20% of the neurons to 0 during training to prevent overfitting.
- questions:
question: >
Will the model make random predictions if it’s run before training?
answers:
- "Yes"
- "No"
correct_answer: 1
explanation: >
Yes, however in such the case the model will produce random outputs, as the network has not been trained to recognize any patterns from the data.
# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
title: "Review" # Always the same title
weight: 20 # Set to always be larger than the content in this path
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
# User change
title: "Background and Installation"

weight: 2

layout: "learningpathall"
---

## Background
PyTorch is an open-source deep learning framework developed by Facebook’s AI Research lab, designed to provide a flexible and efficient platform for building and training neural networks. It is widely used due to its dynamic computational graph, which allows users to modify the architecture during runtime, making debugging and experimentation easier.

The major motivation for introducing PyTorch was to provide a more flexible, user-friendly deep learning framework that addressed the limitations of static computational graphs found in earlier tools like TensorFlow. Prior to PyTorch, many frameworks used static computation graphs that required the entire model structure to be defined before training, making experimentation and debugging more cumbersome. PyTorch introduced dynamic computational graphs (also known as “define-by-run”), which allow the graph to be constructed on the fly as operations are executed. This flexibility significantly improved ease of use for researchers and developers, enabling faster prototyping, easier debugging, and more intuitive code.

Additionally, PyTorch was designed to have seamless integration with Python, encouraging a more native coding experience. Its deep integration with GPU acceleration also made it a powerful tool for both research and production environments. This combination of flexibility, usability, and performance contributed to PyTorch’s rapid adoption, especially in academic research, where experimentation and iteration are crucial.

A typical process for creating a feedforward neural network in PyTorch involves defining a sequential stack of fully connected layers (also known as linear layers). Each layer transforms the input by applying a set of weights and biases, followed by an activation function like ReLU. PyTorch supports this process using the torch.nn module, where layers are easily defined and composed.

To create a model, users subclass the torch.nn.Module class, defining the network architecture in the __init__ method, and implement the forward pass in the forward method. PyTorch’s intuitive API and strong support for GPU acceleration make it ideal for building efficient feedforward networks, particularly in tasks like image classification and digit recognition.

In this learning path, you will explore how to use PyTorch for creating a model for digit recognition.

## Before you begin
Before you begin make sure Python3 is installed on your system. You can check by running:

```console
python3 --version
```

If Python3 is not installed, download and install it from [python.org](https://www.python.org/downloads/).

Then, download and install [Visual Studio Code](https://code.visualstudio.com/download).

## Install PyTorch and other tools
Now, you will prepare a virtual Python environment, install PyTorch, and other tools you will need for this learning path:
1. Open a terminal or command prompt and navigate to your project directory. Create a virtual environment by running:
```console
python -m venv pytorch-env
```
This will create a virtual environment named pytorch-env. You can replace pytorch-env with your desired name.

2. Activate the virtual environment:
* On Windows:
```console
pytorch-env\Scripts\activate
```
* On macOS/Linux:
```console
source pytorch-env/bin/activate
```

Once activated, you should see the virtual environment name in your terminal prompt.

3. Install PyTorch by typing:
```console
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
```

4. Install torchsummary, Jupyter and IPython Kernel:
```console
pip install torchsummary
pip install jupyter
pip install ipykernel
```

5. Register your virtual environment as a new kernel:
```console
python3 -m ipykernel install --user --name=pytorch-env
```

6. Install the Jupyter Extension in VS Code:
* Open VS Code and go to the Extensions view (click on the Extensions icon or press Ctrl+Shift+X).
* Search for “Jupyter” and install the official Jupyter extension.
* Optionally, also install the Python extension if you haven’t already, as it improves Python language support in VS Code.

To ensure everything is set up correctly:
1. Open Visual Studio Code.
2. Click New file, and select `Jupyter Notebook .ipynb Support`.
3. Save the file as `pytorch-digits.ipynb`.
4. Select the Python kernel you created earlier (pytorch-env). To do so, click Kernels in the top right corner. Then, click Jupyter Kernel..., and you will see the Python kernel as shown below:

![img1](Figures/01.png)

5. In your Jupyter notebook, run the following code to verify PyTorch is working correctly:
```console
import torch
print(torch.__version__)
```

It will look as follows:
![img2](Figures/02.png)

Now, when everything is set up you can proceed to creating a model.
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
# User change
title: "Create a Model"

weight: 3

layout: "learningpathall"
---

We will create and train a feedforward neural network to classify handwritten digits from the MNIST dataset. This dataset contains 70,000 images (60,000 training and 10,000 testing images) of handwritten numerals (0-9), each with dimensions of 28x28 pixels. Some representative MNIST digits with their corresponding labels are shown below.

![img3](Figures/03.png)

Our neural network will begin with an input layer containing 28x28 = 784 input nodes, with each node accepting a single pixel from an MNIST image. Next, we will add a linear hidden layer with 96 nodes, using the hyperbolic tangent (tanh) activation function. To prevent overfitting, a dropout layer will be applied, randomly setting 20% of the nodes to zero.

We will then include another hidden layer with 256 nodes, followed by a second dropout layer that again removes 20% of the nodes. Finally, the output layer will consist of ten nodes, each representing the probability of recognizing one of the digits (0-9).

The total number of trainable parameters for this network is calculated as follows:
* First hidden layer: 784 x 96 + 96 = 75,360 parameters (weights + biases).
* Second hidden layer: 96 x 256 + 256 = 24,832 parameters.
* Output layer: 256 x 10 + 10 = 2,570 parameters.

In total, the network will have 102,762 trainable parameters.

# Implementation
To implement the model, supplement the `pytorch-digits.ipynb` notebook with the following statements:

```Python
from torch import nn
from torchsummary import summary

class_names = range(10)

class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_stack = nn.Sequential(
nn.Linear(28*28, 96),
nn.Tanh(),
nn.Dropout(.2),

nn.Linear(96, 256),
nn.Sigmoid(),
nn.Dropout(.2),

nn.Linear(256, len(class_names)),
nn.Softmax(dim=1)
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_stack(x)
return logits
```

To build the neural network in PyTorch, we define a class that inherits from PyTorch’s nn.Module. This approach is similar to TensorFlow’s subclassing API. In this case, we define a class named NeuralNetwork, which consists of two main components:
1. **__init__** method This serves as the constructor for the class. We first initialize the nn.Module with super(NeuralNetwork, self).__init__(). Inside this method, we define the architecture of the feedforward neural network. The input is first flattened from its original 28x28 pixel format into a 1D array of 784 elements using nn.Flatten(). Next, we create a sequential stack of layers using nn.Sequential. The network consists of:
* A fully connected (Linear) layer with 96 nodes, followed by the Tanh activation function.
* A Dropout layer with a 20% dropout rate to prevent overfitting.
* A second Linear layer with 256 nodes, followed by the Sigmoid activation function.
* Another Dropout layer that removes 20% of the nodes.
* A final Linear layer with 10 nodes (matching the number of classes in the dataset), followed by a Softmax activation function that outputs class probabilities.

2. **forward** method. This method defines the forward pass of the network. It takes an input tensor x, flattens it using self.flatten, and then passes it through the defined sequential stack of layers (self.linear_stack). The output, called logits, represents the class probabilities for the digit prediction.

In the next step, we initialize the model and display its summary using the torchsummary package:

```Python
model = NeuralNetwork()

summary(model, (1, 28, 28))
```

After running the notebook you will see the following output:

![img4](Figures/04.png)

You will see a detailed summary of the NeuralNetwork model’s architecture, including the following information:
1. Layer Details:
* The summary will list each layer of the network sequentially, including:
* The Flatten layer, which reshapes the 28x28 input images into a 784-element vector.
* The Linear layers with 96 and 256 nodes, respectively, along with the activation functions (Tanh and Sigmoid) applied after each linear transformation.
* The Dropout layers that randomly deactivate 20% of the neurons in the respective layers.
* The final Linear layer with 10 nodes, corresponding to the output probabilities for the 10 digit classes, followed by the Softmax function.

2. Input and Output Shapes. For each layer, the summary shows the shape of the input and output tensors, helping to trace how the data flows through the network. For example, the input shape starts as (1, 28, 28) for the image, which gets flattened to (1, 784) after the Flatten layer.

3. The summary provides the total number of trainable parameters in each layer, including both weights and biases. This includes:
* 75,360 parameters for the first Linear layer (784 inputs × 96 nodes + 96 biases).
* 24,832 parameters for the second Linear layer (96 nodes × 256 nodes + 256 biases).
* 2,570 parameters for the output Linear layer (256 nodes × 10 output nodes + 10 biases).
* At the end, you will see the total number of parameters in the model, which is 102,762 trainable parameters.

This summary provides a clear overview of the model architecture, the dimensional transformations happening at each layer, and the number of parameters that will be optimized during training.

Running the model now will produce random outputs, as the network has not been trained to recognize any patterns from the data. The next step is to train the model using a dataset and an optimization process (like gradient descent) so that it can learn to make accurate predictions.

At this point, the model will make predictions, but since it hasn’t been trained, the predictions will be random and unreliable. The network’s weights are initialized randomly (or using default initialization methods), so the output probabilities from the softmax layer will be essentially random.

The output will still be a probability distribution over the 10 digit classes (0-9), but the values won’t correspond to the actual images, because the model hasn’t learned the patterns from the MNIST dataset.

Technically, the code will run without errors as long as you provide it with an input image of the correct dimensions (28x28 pixels). The model can accept input, pass it through the layers, and return a prediction (a vector of 10 probabilities). However, the results won’t be useful until the model is trained.

# Summary
In this step, we successfully defined and initialized a feedforward neural network using PyTorch. The model was designed to classify handwritten digits from the MNIST dataset, and we examined its architecture using the **summary()** function. The network consists of input flattening, two hidden layers with activation functions and dropout for regularization, and an output layer with a softmax function to predict the digit class probabilities. We also confirmed that the model has a total of 102,762 trainable parameters.

The next step is to train the model using the MNIST dataset, which involves feeding the data through the network, calculating the loss, and optimizing the weights based on backpropagation to improve the model's accuracy in digit classification.

0 comments on commit 108be23

Please sign in to comment.