Predicting the neural responses to visual stimuli of naturalistic scenes using machine learning
- About the project
- Dataset
- Feature engineering
- Machine learning models
- How to set up the environment to run the code?
- Structure of the repository
- Credits
- Further details
- Contact
The goal of this project is to employ machine learning techniques to forecast the neural visual responses triggered by naturalistic scenes. These computational models strive to replicate the intricate process through which neuronal activity encodes visual stimuli aroused by the external environment. The following figure gives a schematic representation of the brain encoding and decoding processes.
Brain encoding and decoding in fMR. Obtained from [1].
Visual encoding models based on fMRI data employ algorithms that transform image pixels into model features and map these features to brain activity. This framework enables the prediction of neural responses from images. The following figure illustrates the mapping between the pixel, feature, and brain spaces.
The general architecture of visual encoding models consists of three spaces (the input space, the feature space, and the brain activity space) and two in-between mappings. Obtained from [2].
The data for this project is part of the Natural Scenes Dataset (NSD), a massive dataset of 7T fMRI responses to images of natural scenes coming from the COCO dataset. The training dataset consists of brain responses measured at 10.000 brain locations (voxels) to 8857 images (in jpg format) for one subject. The 10.000 voxels are distributed around the visual pathway and may encode perceptual and semantic features in different proportions. The test dataset comprises 984 images (in jpg format), and the goal is to predict the brain responses to these images.
You can access the dataset through Zenodo with the following DOI: 10.5281/zenodo.7979730.
The training dataset was split into training and validation partitions with an 80/20 ratio. The training partition was used to train the models, and the validation partition was used to evaluate the models. The test dataset was used to make predictions with the best model on unseen data.
Due to the high dimensionality of the feature representation of images using the raw pixel values (i.e., the original images have a size of 425x425 and 3 channels (RGB), which results in a feature representation of 425x425x3 = 541875 features), I used the representations obtained from different layers of pre-trained CNNs to obtain a lower dimensional representation of the images. In this case, I tried various layers of four different pre-trained CNNs: AlexNet, VGG16, ResNet50, and InceptionV3, available in the torchvision package.
The feature representations of the images were obtained by passing the images through the pre-trained CNNs and extracting the output of the desired layer. The size of the feature vectors at this point was still very large, so I used PCA to overcome this problem and got a set of 30 features. I fit the PCA on the training image features and used it to downsample the training, validation, and test image features.
I evaluated the best feature representation by training a simple linear regression model to predict the brain activity of the voxels from the feature representation of the images. The best feature representation was the one that resulted in the highest encoding accuracy (i.e., median correlation between the predicted and actual brain activity of the voxels) on the validation set.
You can find the code for this part of the project here.
I trained 6 different machine learning algorithms (linear regression - base model, ridge regression, lasso regression, elasticnet regression, k-nearest neighbors regressor, and decision tree regressor) to predict the brain activity of the voxels from the feature representation of the images. In this project, the learning task was a multioutput regression problem, where the input is the feature representation of the images and the output is the brain activity of all the voxels. Each regressor maps from the feature space to each voxel, so there is a separate encoding model per voxel, leading to voxelwise encoding models. Therefore, every model trained with this dataset has 10.000 independent regression models with n coefficients each (the number of features). As in the previous section, the best model was the one that resulted in the highest encoding accuracy on the validation set.
The best model was the lasso regression
with an encoding accuracy of 0.2417
on the validation set. The best hyperparameters of the lasso regression model were alpha=0.01
and the default max_iter=1000
. This model was trained with the feature representation of the images obtained from the layer features.12
of the AlexNet CNN. The feature representation of the images was reduced to 100 features using PCA. Although the encoding accuracy of the best model was low, it is a starting point to build upon.
Check it out the code for this part of the project here.
I used conda to create a virtual environment with the required libraries to run the code. To create a Python virtual environment with libraries and dependencies required for this project, you should clone this GitHub repository, open a terminal, move to the folder containing this repository, and create a conda virtual environment with the following commands:
# Create the conda virtual environment
$ conda env create -f img2brain_env.yml
# Activate the conda virtual environment
$ conda activate img2brain_env
Then, you can open the Jupyter Notebook with the IDE of your choice and run the code.
The main files and directories of this repository are:
File | Description |
---|---|
EDA_feateng_modelbuild_img2brain.ipynb | Jupyter notebook with EDA, feature engineering, creation of the machine learning algorithms, performance metrics of all models, and evaluation of the best model |
LassoRegressor_alpha0.01_img2brain.bin | Bin file of the best model |
img2brain_env.yml | File with libraries and dependencies to create the conda virtual environment |
img2brain_report.pdf | Report with detailed explanation of the project |
Results/ | Folder to save performance metrics and other outputs of the machine learning models |
Scripts_plots/ | Folder for the scripts to create the plots of the report |
img/ | images and gifs |
-
Developed by Sebastián Ayala Ruano. I created this project for my capstone project of the Machine learning course from the MSc in Systems Biology at Maastricht University.
-
Part of the code was inspired by the Algonauts Project 2023 Challenge development kit tutorial.
More details about the biological background of the project, the interpretation of the results, and ideas for further work are available in this pdf report.
If you have comments or suggestions about this project, you can open an issue in this repository, or email me at sebasar1245@gamil.com.