diff --git a/content/general_advice/intro.md b/content/general_advice/intro.md
index 7eb367f..4c06126 100644
--- a/content/general_advice/intro.md
+++ b/content/general_advice/intro.md
@@ -23,9 +23,9 @@ Therefore, this section is intended to **review potential issues on the ML side
 
 The General Advice chapter is divided into into 3 sections. Things become logically aligned if presented from the perspective of the training procedure (fitting/loss minimisation part). That is, the sections will group validation items as they need to be investigated:
 
-* Before training
-* During training
-* After training
+* [Before training](./before/domains.md)
+* [During training](./during/overfitting.md)
+* [After training](./after/after.md)
 
 ---   
 
diff --git a/content/inference/conifer.md b/content/inference/conifer.md
index 5fe8227..52a7030 100644
--- a/content/inference/conifer.md
+++ b/content/inference/conifer.md
@@ -19,7 +19,7 @@ All L1T algorithms require bit-exact emulation for performance studies and valid
 Both the conifer FPGA firmware and C++ emulation use Xilinx's arbitrary precision types for fixed-point arithmetic (`hls` external of CMSSW). This is cheaper and faster in the FPGA fabric than floating-point types. An important part of the model preparation process is choosing the proper fixed-point data types to avoid loss of performance compared to the trained model. Input preprocessing, in particular scaling, can help constrain the input variables to a smaller numerical range, but may also have a hardware cost to implement. In C++ the arbitrary precision types are specified like: `ap_fixed<width, integer, rounding mode, saturation mode>`. 
 
 Minimal preparation from Python:
-```
+```python
 import conifer
 model = conifer. ... # convert or load a conifer model
 # e.g. model = conifer.converters.convert_from_xgboost(xgboost_model)
@@ -27,7 +27,7 @@ model.save('my_bdt.json')
 ```
 
 CMSSW C++ user code:
-```
+```c++
 // include the conifer emulation header file
 #include "L1Trigger/Phase2L1ParticleFlow/interface/conifer.h"
 
diff --git a/content/inference/onnx.md b/content/inference/onnx.md
index de52c62..8264f13 100644
--- a/content/inference/onnx.md
+++ b/content/inference/onnx.md
@@ -175,7 +175,7 @@ Let's construct the full example.
 
     The example assumes the following directory structure:
 
-    ```
+    ```bash
     MySubsystem/MyModule/
     │
     ├── plugins/
@@ -216,7 +216,7 @@ Let's construct the full example.
 Under `MySubsystem/MyModule/test`, run `#!bash cmsRun my_plugin_cfg.py` to launch our module. You may see the following from the output, which include the input and output vectors in the inference process.
 
 ??? hint "Click to see the output"
-    ```
+    ```bash
     ...
     19-Jul-2022 10:50:41 CEST  Successfully opened file root://xrootd-cms.infn.it//store/mc/RunIISummer20UL18MiniAODv2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/MINIAODSIM/106X_upgrade2018_realistic_v16_L1v1-v2/230000/4C8619B2-D0C0-4647-B946-B33754F4ED16.root
     Begin processing the 1st record. Run 1, Event 27074045, LumiSection 10021 on stream 0 at 19-Jul-2022 10:50:43.494 CEST
@@ -291,7 +291,7 @@ print('output ->', outputs)
 
 Under the directory `MySubsystem/MyModule/test`, run the example with `python3 my_standalone_test.py`. Then we see the output:
 
-```
+```bash
 input -> [45. 46. 47. 48. 49. 50. 51. 52. 53. 54.]
 output -> [[0.9956566  0.00434343]]
 ```
@@ -326,7 +326,7 @@ Please find details in the following block.
     ```
 
     We should see the output as follows
-    ```
+    ```bash
     processing.examples.exampleOrtModule exampleOrtModuleConstr -N 10
     Loading exampleOrtModuleConstr from PhysicsTools.NanoAODTools.postprocessing.examples.exampleOrtModule
     Will write selected trees to outDir
diff --git a/content/inference/tensorflow2.md b/content/inference/tensorflow2.md
index 1d4196a..ba39888 100644
--- a/content/inference/tensorflow2.md
+++ b/content/inference/tensorflow2.md
@@ -299,7 +299,7 @@ delete graphDef;
 
     The example assumes the following directory structure:
 
-    ```
+    ```bash
     MySubsystem/MyModule/
     │
     ├── plugins/
diff --git a/content/inference/tensorflow_aot.md b/content/inference/tensorflow_aot.md
index 8bb0351..ca1d7ef 100644
--- a/content/inference/tensorflow_aot.md
+++ b/content/inference/tensorflow_aot.md
@@ -140,7 +140,7 @@ The following files should have been created upon success.
 
 ??? hint "SavedModel files"
 
-    ```
+    ```bash
     /path/to/saved_model
     │
     ├── variables/
@@ -270,7 +270,7 @@ Upon success, all generated files can be found in `$CMSSW_BASE/tfaot/test` and s
 
 ???+ hint "Generated files"
 
-    ```
+    ```bash
     ${CMSSW_BASE}/tfaot/test
     │
     ├── lib/
@@ -398,7 +398,7 @@ std::tie(out1, out2) = model.run<tfaot::DoubleArrays, tfaot::Int32Arrays>(
 
     The example assumes the following directory structure:
 
-    ```
+    ```bash
     MySubsystem/MyModule/
     │
     ├── plugins/
diff --git a/content/optimization/data_augmentation.md b/content/optimization/data_augmentation.md
index 22deb8f..5120ff2 100644
--- a/content/optimization/data_augmentation.md
+++ b/content/optimization/data_augmentation.md
@@ -113,7 +113,7 @@ RDA methods augment the existing dataset by performance some transformation on t
 
 In [Barnard et al., 2016][1e], the authors investigate the effect of parton shower modelling in DNN jet taggers using images of hadronically decaying W bosons. They introduce a method known as zooming to study the scale invariance of these networks. This is the RDA strategy used by [Dolan & Ore, 2021][1a]. Zooming is similar to a normalization procedure such that it standardizes features in signal data, but it aims to not create similar features in background. 
 
-After some standard data processing steps, including jet trimming and clustering via the $k_t$ algorithm, and some further processing to remove spatial symmetries, the resulting jet image depicts the leading subjet and subleading subjet directly below. [Barnard et al., 2016][1e] notes that the separation between the leading and subleading subjets varies linearly as $2m/p_T$ where $m$ and $p_T$ are the mass and transverse momentum of the jet. Standardizing this separation, or removing the linear dependence, would allow the DNN tagger to generalize to a wide range of jet $p_T$. To this end, the authors construct a factor, $R/\DeltaR_{act}$, where $R$ is some fixed value and $\DeltaR_{act}$ is the separation between the leading and subleading subjets. To discriminate between signal and background images with this factor, the authors enlarge the jet images by a scaling factor of $\text{max}(R/s,1)$ where $s = 2m_W/p_T$ and $R$ is the original jet clustering size. This process of jet image enlargement by a linear mass and $p_T$ dependent factor to account for the distane between the leading and subleading jet is known as zooming. This process can be thought of as an RDA technique to augment the data in a domain-specific way.
+After some standard data processing steps, including jet trimming and clustering via the $k_t$ algorithm, and some further processing to remove spatial symmetries, the resulting jet image depicts the leading subjet and subleading subjet directly below. [Barnard et al., 2016][1e] notes that the separation between the leading and subleading subjets varies linearly as $2m/p_T$ where $m$ and $p_T$ are the mass and transverse momentum of the jet. Standardizing this separation, or removing the linear dependence, would allow the DNN tagger to generalize to a wide range of jet $p_T$. To this end, the authors construct a factor, $R/\Delta R_{act}$, where $R$ is some fixed value and $\Delta R_{act}$ is the separation between the leading and subleading subjets. To discriminate between signal and background images with this factor, the authors enlarge the jet images by a scaling factor of $\text{max}(R/s,1)$ where $s = 2m_W/p_T$ and $R$ is the original jet clustering size. This process of jet image enlargement by a linear mass and $p_T$ dependent factor to account for the distane between the leading and subleading jet is known as zooming. This process can be thought of as an RDA technique to augment the data in a domain-specific way.
 
 Advantage of using the zooming technique is that it makes the construction of scale invariant taggers easier. Scale invariant searches which are able to interpolate between the boosted and resolved parts of phase space have the advantage of being applicable over a broad range of masses and kinematics, allowing a single search or analysis to be effective where previously more than one may have been necessary.
 
@@ -168,7 +168,7 @@ Oversampling and undersampling are essentially opposite and roughly equivalent t
 
 It has been shown that the combination of SMOTE and undersampling performs better than only undersampling the majority class. However, over- and undersampling remain popular as it each is much easier to implement alone than in some complex hybrid approach.
 
-**Synthetic Minority Over-sampling Technique (SMOTE)**
+### Synthetic Minority Over-sampling Technique (SMOTE)
 *Text mostly based on [Chawla et al., 2002][2j] and in part on [He et al., 2010][2k]*
 
 In case of Synthetic Minority Over-sampling Technique (SMOTE), the minority class is oversampled by creating synthetic examples along the line segments joining any or all of the $k$-nearest neighbours in the minority class.
@@ -197,7 +197,7 @@ Extend X by SYNTHETIC_SAMPLES
 ```
 
 
-**Adaptive synthetic sampling approach (ADASYN)**
+### Adaptive synthetic sampling approach (ADASYN)
 *Text mostly based on [He et al., 2010][2k]*
 
 Adaptive synthetic sampling approach (ADASYN) is a sampling approach for learning from imbalanced datasets. The main idea is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. Thus, ADASYN improves learning with respect to the data distributions by reducing the bias introduced by the class imbalance and by adaptively shifting the classification boundary toward the difficult examples.
diff --git a/content/training/MLaaS4HEP.md b/content/training/MLaaS4HEP.md
index 68ed2ab..727c1d6 100644
--- a/content/training/MLaaS4HEP.md
+++ b/content/training/MLaaS4HEP.md
@@ -31,7 +31,7 @@ Here is a list of the dependencies:
 
 ### Installation
 The easiest way to install and run [MLaaS4HEP](https://cloud.docker.com/u/veknet/repository/docker/veknet/mlaas4hep) and [TFaaS](https://cloud.docker.com/u/veknet/repository/docker/veknet/tfaas) is to use pre-build docker images
-```
+```bash
 # run MLaaS4HEP docker container
 docker run veknet/mlaas4hep
 # run TFaaS docker container
@@ -43,7 +43,7 @@ MLaaS4HEP python repository provides the `reader.py` module that defines a DataR
 [uproot](https://github.com/scikit-hep/uproot) framework.
 
 Basic usage
-```
+```bash
 # setup the proper environment, e.g.
 # export PYTHONPATH=/path/src/python # path to MLaaS4HEP python framework
 # export PATH=/path/bin:$PATH # path to MLaaS4HEP binaries