Merge pull request #1457 from jasonrandrews/review

Continue MNIST Learning Path review
ArmDeveloperEcosystem · Dec 18, 2024 · fe8694a · fe8694a
2 parents 8b4b72e + d13d01b
commit fe8694a
Show file tree

Hide file tree

Showing 5 changed files with 12 additions and 13 deletions.
diff --git a/...rning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md b/...rning-paths/cross-platform/pytorch-digit-classification-arch-training/_index.md
@@ -27,7 +27,6 @@ skilllevels: Advanced
 subjects: ML
 armips:
     - Cortex-A
-    - Cortex-X
     - Neoverse
 operatingsystems:
     - Windows

diff --git a/...aths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md b/...aths/cross-platform/pytorch-digit-classification-arch-training/intro-android.md
@@ -17,7 +17,7 @@ Running a machine learning model on Android involves a few key steps.
 
 First, you train and save the model in a mobile-friendly format, such as TensorFlow Lite, ONNX, or TorchScript, depending on the framework you are using. 
 
-Next, you add the model file to your Android project’s assets directory. In your app’s code, use the corresponding framework’s Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. 
+Next, you add the model file to your Android project's assets directory. In your application's code, use the corresponding framework's Android library, such as TensorFlow Lite or PyTorch Mobile, to load the model. 
 
 You then prepare the input data, ensuring it is formatted and preprocessed in the same way as during model training. The input data is passed through the model, and the output predictions are retrieved and interpreted accordingly. For improved performance, you can leverage hardware acceleration using Android’s Neural Networks API (NNAPI) or use GPU support if available. This process enables the Android app to make real-time predictions and execute complex machine learning tasks directly on the device.
 

diff --git a/...arning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md b/...arning-paths/cross-platform/pytorch-digit-classification-arch-training/intro.md
@@ -20,7 +20,7 @@ Prior to PyTorch, many frameworks used static computation graphs that require th
 
 Additionally, PyTorch seamlessly integrates with Python, encouraging a native coding experience. Its deep integration with GPU acceleration also makes it a powerful tool for both research and production environments. This combination of flexibility, usability, and performance has contributed to PyTorch’s rapid adoption, especially in academic research, where experimentation and iteration are crucial.
 
-A typical process for creating a feedforward neural network in PyTorch involves defining a sequential stack of fully-connected layers, which are also known as *linear layers*. Each layer transforms the input by applying a set of weights and biases, followed by an activation function like ReLU. PyTorch supports this process using the torch.nn module, where layers are easily defined and composed.
+A typical process for creating a feedforward neural network in PyTorch involves defining a sequential stack of fully-connected layers, which are also known as linear layers. Each layer transforms the input by applying a set of weights and biases, followed by an activation function like ReLU. PyTorch supports this process using the torch.nn module, where layers are easily defined and composed.
 
 To create a model, users subclass the torch.nn.Module class, defining the network architecture in the __init__ method, and implement the forward pass in the forward method. PyTorch’s intuitive API and support for GPU acceleration make it ideal for building efficient feedforward networks, particularly in tasks such as image classification and digit recognition.
 

diff --git a/...rning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md b/...rning-paths/cross-platform/pytorch-digit-classification-arch-training/intro2.md
@@ -17,13 +17,13 @@ The typical approach to training a neural network in PyTorch involves several ke
 
 First, obtain and preprocess the dataset, which usually includes normalizing the data and converting it into a format suitable for the model. 
 
-Next, the dataset is split into training and testing subsets. Training data is used to update the model’s parameters, while testing data evaluates its performance. During training, feed batches of input data through the network, calculate the prediction error or loss using a loss function (such as cross-entropy for classification tasks), and optimize the model’s weights and biases using backpropagation. Backpropagation involves computing the gradient of the loss with respect to each parameter and then updating the parameters using an optimizer, like Stochastic Gradient Descent (SGD) or Adam. This process is repeated for multiple epochs until the model achieves satisfactory performance, balancing accuracy and generalization.
+Next, the dataset is split into training and testing subsets. Training data is used to update the model's parameters, while testing data evaluates its performance. During training, feed batches of input data through the network, calculate the prediction error or loss using a loss function (such as cross-entropy for classification tasks), and optimize the model's weights and biases using backpropagation. Backpropagation involves computing the gradient of the loss with respect to each parameter and then updating the parameters using an optimizer, like Stochastic Gradient Descent (SGD) or Adam. This process is repeated for multiple epochs until the model achieves satisfactory performance, balancing accuracy and generalization.
 
 ### Loss, gradients, epoch and backpropagation
 
-Loss is a measure of how well a model’s predictions match the true labels of the data. It quantifies the difference between the predicted output and the actual output. The lower the loss, the better the model’s performance. In classification tasks, a common loss function is Cross-Entropy Loss, while Mean Squared Error (MSE) is often used for regression tasks. The goal of training is to minimize the loss, which indicates that the model’s predictions are getting closer to the actual labels.
+Loss is a measure of how well a model's predictions match the true labels of the data. It quantifies the difference between the predicted output and the actual output. The lower the loss, the better the model's performance. In classification tasks, a common loss function is Cross-Entropy Loss, while Mean Squared Error (MSE) is often used for regression tasks. The goal of training is to minimize the loss, which indicates that the model's predictions are getting closer to the actual labels.
 
-Gradients represent the rate of change of the loss with respect to each of the model’s parameters (weights and biases). They are used to update the model’s parameters in the direction that reduces the loss. Gradients are calculated during the backpropagation step, where the loss is propagated backward through the network to compute how each parameter contributes to the overall loss. Optimizers like SGD or Adam use these gradients to adjust the parameters, effectively “teaching” the model to improve its predictions.
+Gradients represent the rate of change of the loss with respect to each of the model's parameters (weights and biases). They are used to update the model's parameters in the direction that reduces the loss. Gradients are calculated during the backpropagation step, where the loss is propagated backward through the network to compute how each parameter contributes to the overall loss. Optimizers like SGD or Adam use these gradients to adjust the parameters, effectively “teaching” the model to improve its predictions.
 
 An epoch refers to one complete pass through the entire training dataset. During each epoch, the model sees every data point once and updates its parameters accordingly. Multiple epochs are typically required to train a model effectively because, during each epoch, the model learns and fine-tunes its parameters based on the data it processes. The number of epochs is a hyperparameter that you set before training, and increasing it can improve the model’s performance, but too many epochs may lead to overfitting, where the model performs well on training data but poorly on new, unseen data.
 

diff --git a/...arning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md b/...arning-paths/cross-platform/pytorch-digit-classification-arch-training/model.md
@@ -60,7 +60,7 @@ class NeuralNetwork(nn.Module):
 
 To build the neural network in PyTorch, define a class that inherits from PyTorch’s nn.Module. This approach is similar to TensorFlow’s subclassing API. In this case, define a class named NeuralNetwork, which consists of two main components:
 
-1. **__init__** method 
+1. __init__ method 
 
 This method serves as the constructor for the class. 
 
@@ -75,7 +75,7 @@ The network consists of:
 * Another Dropout layer, that removes 20% of the nodes.
 * A final Linear layer, with 10 nodes (matching the number of classes in the dataset), followed by a Softmax activation function that outputs class probabilities.
 
-2. **forward** method 
+2. forward method 
 
 This method defines the forward pass of the network. It takes an input tensor x, flattens it using self.flatten, and then passes it through the defined sequential stack of layers (self.linear_stack). 
 
@@ -99,14 +99,14 @@ You will see a detailed summary of the NeuralNetwork model’s architecture, inc
 
 The summary lists each layer of the network sequentially, including:
 
-* The Flatten layer, which reshapes the 28x28 input images into a 784-element vector.
-* The Linear layers with 96 and 256 nodes, respectively, along with the activation functions (Tanh and Sigmoid) applied after each linear transformation.
-* The Dropout layers that randomly-deactivate 20% of the neurons in the respective layers.
-* The final Linear layer with 10 nodes, corresponding to the output probabilities for the 10 digit classes, followed by the Softmax function.
+* The flatten layer, which reshapes the 28x28 input images into a 784-element vector.
+* The linear layers with 96 and 256 nodes, respectively, along with the activation functions (Tanh and Sigmoid) applied after each linear transformation.
+* The dropout layers that randomly-deactivate 20% of the neurons in the respective layers.
+* The final linear layer with 10 nodes, corresponding to the output probabilities for the 10 digit classes, followed by the softmax function.
 
 2. Input and Output Shapes 
 
-For each layer, the summary shows the shape of the input and output tensors, helping to trace how the data flows through the network. For example, the input shape starts as (1, 28, 28) for the image, which gets flattened to (1, 784) after the Flatten layer.
+For each layer, the summary shows the shape of the input and output tensors, helping to trace how the data flows through the network. For example, the input shape starts as (1, 28, 28) for the image, which gets flattened to (1, 784) after the flatten layer.
 
 3.	The summary