Merge pull request #1259 from dawidborycki/LP-PyTorch-Intro

LP Creating a PyTorch model for digit classification
ArmDeveloperEcosystem · Sep 20, 2024 · 108be23 · 108be23
2 parents bf43b0a + 6636450
commit 108be23
Show file tree

Hide file tree

Showing 9 changed files with 328 additions and 0 deletions.
diff --git a/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/01.png b/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/01.png
diff --git a/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/02.png b/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/02.png
diff --git a/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/03.png b/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/03.png
diff --git a/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/04.png b/...g-paths/cross-platform/pytorch-digit-classification-architecture/Figures/04.png
diff --git a/...arning-paths/cross-platform/pytorch-digit-classification-architecture/_index.md b/...arning-paths/cross-platform/pytorch-digit-classification-architecture/_index.md
@@ -0,0 +1,36 @@
+---
+title: Learn how to create the PyTorch model for digit classification
+minutes_to_complete: 40
+
+who_is_this_for: This is an introductory topic for software developers interested in learning how to use PyTorch to create a feedforward neural network for digit classification. 
+
+learning_objectives:
+    - Prepare the environment.
+    - Understand the MNIST digit dataset.
+    - Create a neural network architecture using PyTorch.
+
+prerequisites:
+    - A x86_64 or Apple development machine with Code Editor (we recommend Visual Studio Code).    
+
+author_primary: Dawid Borycki
+
+### Tags
+skilllevels: Introductory
+subjects: Neural Networks
+armips:
+    - Cortex-A
+    - Cortex-X
+operatingsystems:
+    - Windows
+    - Linux
+    - MacOS
+tools_software_languages:
+    - Android Studio
+    - Coding
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/...g-paths/cross-platform/pytorch-digit-classification-architecture/_next-steps.md b/...g-paths/cross-platform/pytorch-digit-classification-architecture/_next-steps.md
@@ -0,0 +1,43 @@
+---
+# ================================================================================
+#       Edit
+# ================================================================================
+
+next_step_guidance: >
+   Proceed to Get Started with Arm Performance Studio for mobile to continue learning about Android performance analysis.
+
+# 1-3 sentence recommendation outlining how the reader can generally keep learning about these topics, and a specific explanation of why the next step is being recommended.
+
+recommended_path: "/learning-paths/smartphones-and-mobile/ams/"
+
+# Link to the next learning path being recommended(For example this could be /learning-paths/servers-and-cloud-computing/mongodb).
+
+
+# further_reading links to references related to this path. Can be:
+    # Manuals for a tool / software mentioned   (type: documentation)
+    # Blog about related topics                 (type: blog)
+    # General online references                 (type: website) 
+
+further_reading:
+    - resource:
+        title: PyTorch
+        link: https://pytorch.org
+        type: documentation    
+    - resource:
+        title: MNIST
+        link: https://en.wikipedia.org/wiki/MNIST_database
+        type: website
+    - resource:
+        title: Visual Studio Code
+        link: https://code.visualstudio.com
+        type: website
+
+
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 21                  # set to always be larger than the content in this path, and one more than 'review'
+title: "Next Steps"         # Always the same
+layout: "learningpathall"   # All files under learning paths have this same wrapper
+---
diff --git a/...rning-paths/cross-platform/pytorch-digit-classification-architecture/_review.md b/...rning-paths/cross-platform/pytorch-digit-classification-architecture/_review.md
@@ -0,0 +1,48 @@
+---
+# ================================================================================
+#       Edit
+# ================================================================================
+
+# Always 3 questions. Should try to test the reader's knowledge, and reinforce the key points you want them to remember.
+    # question:         A one sentence question
+    # answers:          The correct answers (from 2-4 answer options only). Should be surrounded by quotes.
+    # correct_answer:   An integer indicating what answer is correct (index starts from 0)
+    # explanation:      A short (1-3 sentence) explanation of why the correct answer is correct. Can add additional context if desired
+
+
+review:
+    - questions:
+        question: >        
+            Does the input layer of the model flatten the 28x28 pixel image into a 1D array of 784 elements?
+        answers:
+            - "Yes"
+            - "No"
+        correct_answer: 1
+        explanation: >
+            Yes, the model uses nn.Flatten() to reshape the 28x28 pixel image into a 1D array of 784 elements for processing by the fully connected layers.
+    - questions:
+        question: >
+            Does the model use dropout layers with a 20% dropout rate after each hidden layer?
+        answers:
+            - "Yes"
+            - "No"
+        correct_answer: 1
+        explanation: >
+            Yes, the model applies dropout layers after each hidden layer, randomly setting 20% of the neurons to 0 during training to prevent overfitting. 
+    - questions:
+        question: >
+            Will the model make random predictions if it’s run before training?
+        answers:
+            - "Yes"
+            - "No"
+        correct_answer: 1
+        explanation: >
+            Yes, however in such the case the model will produce random outputs, as the network has not been trained to recognize any patterns from the data. 
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+title: "Review"                 # Always the same title
+weight: 20                      # Set to always be larger than the content in this path
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+---
diff --git a/...earning-paths/cross-platform/pytorch-digit-classification-architecture/intro.md b/...earning-paths/cross-platform/pytorch-digit-classification-architecture/intro.md
@@ -0,0 +1,93 @@
+---
+# User change
+title: "Background and Installation"
+
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Background
+PyTorch is an open-source deep learning framework developed by Facebook’s AI Research lab, designed to provide a flexible and efficient platform for building and training neural networks. It is widely used due to its dynamic computational graph, which allows users to modify the architecture during runtime, making debugging and experimentation easier. 
+
+The major motivation for introducing PyTorch was to provide a more flexible, user-friendly deep learning framework that addressed the limitations of static computational graphs found in earlier tools like TensorFlow. Prior to PyTorch, many frameworks used static computation graphs that required the entire model structure to be defined before training, making experimentation and debugging more cumbersome. PyTorch introduced dynamic computational graphs (also known as “define-by-run”), which allow the graph to be constructed on the fly as operations are executed. This flexibility significantly improved ease of use for researchers and developers, enabling faster prototyping, easier debugging, and more intuitive code.
+
+Additionally, PyTorch was designed to have seamless integration with Python, encouraging a more native coding experience. Its deep integration with GPU acceleration also made it a powerful tool for both research and production environments. This combination of flexibility, usability, and performance contributed to PyTorch’s rapid adoption, especially in academic research, where experimentation and iteration are crucial.
+
+A typical process for creating a feedforward neural network in PyTorch involves defining a sequential stack of fully connected layers (also known as linear layers). Each layer transforms the input by applying a set of weights and biases, followed by an activation function like ReLU. PyTorch supports this process using the torch.nn module, where layers are easily defined and composed.
+
+To create a model, users subclass the torch.nn.Module class, defining the network architecture in the __init__ method, and implement the forward pass in the forward method. PyTorch’s intuitive API and strong support for GPU acceleration make it ideal for building efficient feedforward networks, particularly in tasks like image classification and digit recognition.
+
+In this learning path, you will explore how to use PyTorch for creating a model for digit recognition. 
+
+## Before you begin
+Before you begin make sure Python3 is installed on your system. You can check by running:
+
+```console
+python3 --version
+```
+
+If Python3 is not installed, download and install it from [python.org](https://www.python.org/downloads/).
+
+Then, download and install [Visual Studio Code](https://code.visualstudio.com/download).
+
+## Install PyTorch and other tools
+Now, you will prepare a virtual Python environment, install PyTorch, and other tools you will need for this learning path:
+1. Open a terminal or command prompt and navigate to your project directory. Create a virtual environment by running:
+```console
+python -m venv pytorch-env
+```
+This will create a virtual environment named pytorch-env. You can replace pytorch-env with your desired name.
+
+2. Activate the virtual environment:
+* On Windows:
+```console
+pytorch-env\Scripts\activate
+```
+* On macOS/Linux: 
+```console
+source pytorch-env/bin/activate
+```
+
+Once activated, you should see the virtual environment name in your terminal prompt.
+
+3. Install PyTorch by typing:
+```console
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
+```
+
+4. Install torchsummary, Jupyter and IPython Kernel:
+```console
+pip install torchsummary
+pip install jupyter
+pip install ipykernel
+```
+
+5. Register your virtual environment as a new kernel:
+```console
+python3 -m ipykernel install --user --name=pytorch-env
+```
+
+6. Install the Jupyter Extension in VS Code:
+* Open VS Code and go to the Extensions view (click on the Extensions icon or press Ctrl+Shift+X).
+* Search for “Jupyter” and install the official Jupyter extension.
+* Optionally, also install the Python extension if you haven’t already, as it improves Python language support in VS Code.
+
+To ensure everything is set up correctly:
+1. Open Visual Studio Code. 
+2. Click New file, and select `Jupyter Notebook .ipynb Support`.
+3. Save the file as `pytorch-digits.ipynb`.
+4. Select the Python kernel you created earlier (pytorch-env). To do so, click Kernels in the top right corner. Then, click Jupyter Kernel..., and you will see the Python kernel as shown below:
+
+![img1](Figures/01.png)
+
+5. In your Jupyter notebook, run the following code to verify PyTorch is working correctly:
+```console
+import torch
+print(torch.__version__)
+```
+
+It will look as follows:
+![img2](Figures/02.png)
+
+Now, when everything is set up you can proceed to creating a model.
diff --git a/...earning-paths/cross-platform/pytorch-digit-classification-architecture/model.md b/...earning-paths/cross-platform/pytorch-digit-classification-architecture/model.md
@@ -0,0 +1,108 @@
+---
+# User change
+title: "Create a Model"
+
+weight: 3
+
+layout: "learningpathall"
+---
+
+We will create and train a feedforward neural network to classify handwritten digits from the MNIST dataset. This dataset contains 70,000 images (60,000 training and 10,000 testing images) of handwritten numerals (0-9), each with dimensions of 28x28 pixels. Some representative MNIST digits with their corresponding labels are shown below.
+
+![img3](Figures/03.png)
+
+Our neural network will begin with an input layer containing 28x28 = 784 input nodes, with each node accepting a single pixel from an MNIST image. Next, we will add a linear hidden layer with 96 nodes, using the hyperbolic tangent (tanh) activation function. To prevent overfitting, a dropout layer will be applied, randomly setting 20% of the nodes to zero.
+
+We will then include another hidden layer with 256 nodes, followed by a second dropout layer that again removes 20% of the nodes. Finally, the output layer will consist of ten nodes, each representing the probability of recognizing one of the digits (0-9).
+
+The total number of trainable parameters for this network is calculated as follows:
+* First hidden layer:  784 x 96 + 96 = 75,360 parameters (weights + biases).
+* Second hidden layer:  96 x 256 + 256 = 24,832 parameters.
+* Output layer:  256 x 10 + 10 = 2,570 parameters.
+
+In total, the network will have 102,762 trainable parameters.
+
+# Implementation
+To implement the model, supplement the `pytorch-digits.ipynb` notebook with the following statements:
+
+```Python
+from torch import nn
+from torchsummary import summary
+
+class_names = range(10)
+
+class NeuralNetwork(nn.Module):
+    def __init__(self):
+        super(NeuralNetwork, self).__init__()
+        self.flatten = nn.Flatten()
+        self.linear_stack = nn.Sequential(
+            nn.Linear(28*28, 96),            
+            nn.Tanh(),            
+            nn.Dropout(.2),
+
+            nn.Linear(96, 256),
+            nn.Sigmoid(),
+            nn.Dropout(.2),
+
+            nn.Linear(256, len(class_names)),
+            nn.Softmax(dim=1)
+        )
+
+    def forward(self, x):
+        x = self.flatten(x)
+        logits = self.linear_stack(x)
+        return logits
+```
+
+To build the neural network in PyTorch, we define a class that inherits from PyTorch’s nn.Module. This approach is similar to TensorFlow’s subclassing API. In this case, we define a class named NeuralNetwork, which consists of two main components:
+1. **__init__** method This serves as the constructor for the class. We first initialize the nn.Module with super(NeuralNetwork, self).__init__(). Inside this method, we define the architecture of the feedforward neural network. The input is first flattened from its original 28x28 pixel format into a 1D array of 784 elements using nn.Flatten(). Next, we create a sequential stack of layers using nn.Sequential. The network consists of:
+* A fully connected (Linear) layer with 96 nodes, followed by the Tanh activation function.
+* A Dropout layer with a 20% dropout rate to prevent overfitting.
+* A second Linear layer with 256 nodes, followed by the Sigmoid activation function.
+* Another Dropout layer that removes 20% of the nodes.
+* A final Linear layer with 10 nodes (matching the number of classes in the dataset), followed by a Softmax activation function that outputs class probabilities.
+
+2. **forward** method. This method defines the forward pass of the network. It takes an input tensor x, flattens it using self.flatten, and then passes it through the defined sequential stack of layers (self.linear_stack). The output, called logits, represents the class probabilities for the digit prediction.
+
+In the next step, we initialize the model and display its summary using the torchsummary package:
+
+```Python
+model = NeuralNetwork()
+
+summary(model, (1, 28, 28))
+```
+
+After running the notebook you will see the following output:
+
+![img4](Figures/04.png)
+
+You will see a detailed summary of the NeuralNetwork model’s architecture, including the following information:
+1.	Layer Details:
+* The summary will list each layer of the network sequentially, including:
+* The Flatten layer, which reshapes the 28x28 input images into a 784-element vector.
+* The Linear layers with 96 and 256 nodes, respectively, along with the activation functions (Tanh and Sigmoid) applied after each linear transformation.
+* The Dropout layers that randomly deactivate 20% of the neurons in the respective layers.
+* The final Linear layer with 10 nodes, corresponding to the output probabilities for the 10 digit classes, followed by the Softmax function.
+
+2. Input and Output Shapes. For each layer, the summary shows the shape of the input and output tensors, helping to trace how the data flows through the network. For example, the input shape starts as (1, 28, 28) for the image, which gets flattened to (1, 784) after the Flatten layer.
+
+3.	The summary provides the total number of trainable parameters in each layer, including both weights and biases. This includes:
+* 75,360 parameters for the first Linear layer (784 inputs × 96 nodes + 96 biases).
+* 24,832 parameters for the second Linear layer (96 nodes × 256 nodes + 256 biases).
+* 2,570 parameters for the output Linear layer (256 nodes × 10 output nodes + 10 biases).
+* At the end, you will see the total number of parameters in the model, which is 102,762 trainable parameters.
+
+This summary provides a clear overview of the model architecture, the dimensional transformations happening at each layer, and the number of parameters that will be optimized during training.
+
+Running the model now will produce random outputs, as the network has not been trained to recognize any patterns from the data. The next step is to train the model using a dataset and an optimization process (like gradient descent) so that it can learn to make accurate predictions.
+
+At this point, the model will make predictions, but since it hasn’t been trained, the predictions will be random and unreliable. The network’s weights are initialized randomly (or using default initialization methods), so the output probabilities from the softmax layer will be essentially random.
+
+The output will still be a probability distribution over the 10 digit classes (0-9), but the values won’t correspond to the actual images, because the model hasn’t learned the patterns from the MNIST dataset.
+
+Technically, the code will run without errors as long as you provide it with an input image of the correct dimensions (28x28 pixels). The model can accept input, pass it through the layers, and return a prediction (a vector of 10 probabilities). However, the results won’t be useful until the model is trained.
+
+# Summary
+In this step, we successfully defined and initialized a feedforward neural network using PyTorch. The model was designed to classify handwritten digits from the MNIST dataset, and we examined its architecture using the **summary()** function. The network consists of input flattening, two hidden layers with activation functions and dropout for regularization, and an output layer with a softmax function to predict the digit class probabilities. We also confirmed that the model has a total of 102,762 trainable parameters.
+
+The next step is to train the model using the MNIST dataset, which involves feeding the data through the network, calculating the loss, and optimizing the weights based on backpropagation to improve the model's accuracy in digit classification.