Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #15

Merged
merged 16 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 74 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,108 @@
# schneider-lab-to-nwb
NWB conversion scripts for Schneider lab data to the [Neurodata Without Borders](https://nwb-overview.readthedocs.io/) data format.


## Installation
## Basic installation

You can install the latest release of the package with pip:

```
pip install schneider-lab-to-nwb
```

We recommend that you install the package inside a [virtual environment](https://docs.python.org/3/tutorial/venv.html). A simple way of doing this is to use a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) from the `conda` package manager ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)). Detailed instructions on how to use conda environments can be found in their [documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).

### Running a specific conversion
Once you have installed the package with pip, you can run any of the conversion scripts in a notebook or a python file:

https://github.com/catalystneuro/schneider-lab-to-nwb//tree/main/src/schneider_2024/schneider_2024_convert_session.py




## Installation from Github
Another option is to install the package directly from Github. This option has the advantage that the source code can be modifed if you need to amend some of the code we originally provided to adapt to future experimental differences. To install the conversion from GitHub you will need to use `git` ([installation instructions](https://github.com/git-guides/install-git)). We also recommend the installation of `conda` ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)) as it contains all the required machinery in a single and simple instal
We recommend installing the package directly from Github. This option has the advantage that the source code can be modifed if you need to amend some of the code we originally provided to adapt to future experimental differences. To install the conversion from GitHub you will need to use `git` ([installation instructions](https://github.com/git-guides/install-git)). We also recommend the installation of `conda` ([installation instructions](https://docs.conda.io/en/latest/miniconda.html)) as it contains all the required machinery in a single and simple instal

From a terminal (note that conda should install one in your system) you can do the following:

```
```bash
git clone https://github.com/catalystneuro/schneider-lab-to-nwb
cd schneider-lab-to-nwb
conda env create --file make_env.yml
conda activate schneider-lab-to-nwb-env
conda activate schneider_lab_to_nwb_env
```

This creates a [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html) which isolates the conversion code from your system libraries. We recommend that you run all your conversion related tasks and analysis from the created environment in order to minimize issues related to package dependencies.

Alternatively, if you want to avoid conda altogether (for example if you use another virtual environment tool) you can install the repository with the following commands using only pip:

```
```bash
git clone https://github.com/catalystneuro/schneider-lab-to-nwb
cd schneider-lab-to-nwb
pip install -e .
```

Note:
both of the methods above install the repository in [editable mode](https://pip.pypa.io/en/stable/cli/pip_install/#editable-installs).
The dependencies for this environment are stored in the dependencies section of the `pyproject.toml` file.

### Running a specific conversion
To run a specific conversion, you might need to install first some conversion specific dependencies that are located in each conversion directory:
```
pip install -r src/schneider_lab_to_nwb/schneider_2024/schneider_2024_requirements.txt
```
## Helpful Definitions

You can run a specific conversion with the following command:
```
python src/schneider_lab_to_nwb/schneider_2024/schneider_2024_convert_session.py
```
This conversion project is comprised primarily by DataInterfaces, NWBConverters, and conversion scripts.

In neuroconv, a [DataInterface](https://neuroconv.readthedocs.io/en/main/user_guide/datainterfaces.html) is a class that specifies the procedure to convert a single data modality to NWB.
This is usually accomplished with a single read operation from a distinct set of files.
For example, in this conversion, the `Zempolich2024BehaviorInterface` contains the code that converts all of the behavioral data to NWB from a raw .mat file.

In neuroconv, a [NWBConverter](https://neuroconv.readthedocs.io/en/main/user_guide/nwbconverter.html) is a class that combines many data interfaces and specifies the relationships between them, such as temporal alignment.
This allows users to combine multiple modalites into a single NWB file in an efficient and modular way.

In this conversion project, the conversion scripts determine which sessions to convert,
instantiate the appropriate NWBConverter object,
and convert all of the specified sessions, saving them to an output directory of .nwb files.

## Repository structure
Each conversion is organized in a directory of its own in the `src` directory:

schneider-lab-to-nwb/
├── LICENSE
├── MANIFEST.in
├── README.md
├── make_env.yml
├── pyproject.toml
├── README.md
├── requirements.txt
├── setup.py
└── src
├── schneider_lab_to_nwb
│ ├── conversion_directory_1
│ └── schneider_2024
│ ├── schneider_2024_behaviorinterface.py
│ ├── schneider_2024_convert_session.py
│ ├── schneider_2024_metadata.yml
│ ├── schneider_2024_nwbconverter.py
│ ├── schneider_2024_requirements.txt
│ ├── schneider_2024_notes.md

│ └── __init__.py
│ ├── conversion_directory_b

└── __init__.py

For example, for the conversion `schneider_2024` you can find a directory located in `src/schneider-lab-to-nwb/schneider_2024`. Inside each conversion directory you can find the following files:

* `schneider_2024_convert_sesion.py`: this script defines the function to convert one full session of the conversion.
* `schneider_2024_requirements.txt`: dependencies specific to this conversion.
* `schneider_2024_metadata.yml`: metadata in yaml format for this specific conversion.
* `schneider_2024_behaviorinterface.py`: the behavior interface. Usually ad-hoc for each conversion.
* `schneider_2024_nwbconverter.py`: the place where the `NWBConverter` class is defined.
* `schneider_2024_notes.md`: notes and comments concerning this specific conversion.

The directory might contain other files that are necessary for the conversion but those are the central ones.
└── schneider_lab_to_nwb
├── __init__.py
├── another_conversion
└── zempolich_2024
├── __init__.py
├── zempolich_2024_behaviorinterface.py
├── zempolich_2024_convert_all_sessions.py
├── zempolich_2024_convert_session.py
├── zempolich_2024_intrinsic_signal_imaging_interface.py
├── zempolich_2024_metadata.yaml
├── zempolich_2024_notes.md
├── zempolich_2024_nwbconverter.py
├── zempolich_2024_open_ephys_recording_interface.py
└── zempolich_2024_optogeneticinterface.py

For the conversion `zempolich_2024` you can find a directory located in `src/schneider-lab-to-nwb/zempolich_2024`. Inside that conversion directory you can find the following files:

* `__init__.py` : This init file imports all the datainterfaces and NWBConverters so that they can be accessed directly from schneider_lab_to_nwb.zempolich_2024.
* `zempolich_2024_convert_session.py` : This conversion script defines the `session_to_nwb()` function, which converts a single session of data to NWB.
When run as a script, this file converts 4 example sessions to NWB, representing all the various edge cases in the dataset.
* `zempolich_2024_convert_dataset.py` : This conversion script defines the `dataset_to_nwb()` function, which converts the entire Zempolich 2024 dataset to NWB.
When run as a script, this file calls `dataset_to_nwb()` with the appropriate arguments.
* `zempolich_2024_nwbconverter.py` : This module defines the primary conversion class, `Zempolich2024NWBConverter`, which aggregates all of the various datainterfaces relevant for this conversion.
* `zempolich_2024_behaviorinterface.py` : This module defines `Zempolich2024BehaviorInterface`, which is the data interface for behavioral .mat files.
* `zempolich_2024_optogeneticinterface.py` : This module defines `Zempolich2024OptogeneticInterface`, which is the data interface for optogenetic stimulation from .mat files.
* `zempolich_2024_intrinsic_signal_imaging_interface.py` : This module defines `Zempolich2024IntrinsicSignalOpticalImagingInterface`, which is the data interface for intrinsic signal images (.tiff and .jpg).
* `zempolich_2024_open_ephys_recording_interface.py` : This module defines `Zempolich2024OpenEphysRecordingInterface`, which is a lightweight wrapper around neuroconv's `OpenEphysLegacyRecordingInterface` that is responsible for converting the OpenEphys recording data.
This interface adds some extra conversion-specific metadata like relative channel positions, brain area, etc.
* `zempolich_2024metadata.yaml` : This metadata .yaml file provides high-level metadata for the nwb files directly as well as useful dictionaries for some of the data interfaces.
For example,
- Subject/species is "Mus musculus", which is directly included in the NWB file.
- Ecephys/folder_name_to_start_datetime gives a mapping from 2-part folder names (ex. m53/Day1_A1) to session start times,
which is used in cases where the session start time recorded by OpenEphys is ambiguous.

* `zempolich_2024_notes.md` : This markdown file contains my notes from the conversion for each of the data interfaces.
It specifically highlights various edge cases as well as questions I had for the Schneider Lab (active and resolved).

Future conversions for this repo should follow the example of zempolich_2024 and create another folder of
conversion scripts and datainterfaces. As a placeholder, here we have `src/schneider-lab-to-nwb/another_conversion`.

## Running a Conversion

To convert the 4 example sessions, simply run
```bash
python src/schneider_lab_to_nwb/zempolich_2024/zempolich_2024_convert_session.py
```

To convert the whole dataset, simply run
```bash
python src/schneider_lab_to_nwb/zempolich_2024/zempolich_2024_convert_dataset.py
```

Note that the dataset conversion uses multiprocessing, currently set to 4 workers. To use more or fewer workers, simply
change the `max_workers` argument to `dataset_to_nwb()`.
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,12 @@
from pydantic import FilePath
import numpy as np
from pymatreader import read_mat
from hdmf.common.table import DynamicTableRegion
from pynwb.behavior import BehavioralTimeSeries, TimeSeries
from pynwb.device import Device
from ndx_events import Events, AnnotatedEventsTable

from neuroconv.basedatainterface import BaseDataInterface
from neuroconv.utils import DeepDict, get_base_schema
from neuroconv.utils import get_base_schema
from neuroconv.tools import nwb_helpers


Expand All @@ -19,13 +18,14 @@ class Zempolich2024BehaviorInterface(BaseDataInterface):
keywords = ("behavior",)

def __init__(self, file_path: FilePath):
super().__init__(file_path=file_path)

def get_metadata(self) -> DeepDict:
# Automatically retrieve as much metadata as possible from the source files available
metadata = super().get_metadata()
"""Initialize the behavior interface.

return metadata
Parameters
----------
file_path : FilePath
Path to the behavior .mat file.
"""
super().__init__(file_path=file_path)

def get_metadata_schema(self) -> dict:
metadata_schema = super().get_metadata_schema()
Expand Down Expand Up @@ -89,9 +89,23 @@ def get_metadata_schema(self) -> dict:
}
return metadata_schema


def add_to_nwbfile(
self, nwbfile: NWBFile, metadata: dict, normalize_timestamps: bool = False, verbose: bool = False
):
"""Add behavior data to the NWBFile.

Parameters
----------
nwbfile : pynwb.NWBFile
The in-memory object to add the data to.
metadata : dict
Metadata dictionary with information used to create the NWBFile.
normalize_timestamps : bool, optional
Whether to normalize the timestamps to the start of the first behavioral time series, by default False
verbose: bool, optional
Whether to print extra information during the conversion, by default False.
"""
# Read Data
file_path = self.source_data["file_path"]
file = read_mat(file_path)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,25 @@
import traceback
from tqdm import tqdm
import shutil
from pydantic import FilePath, DirectoryPath

from schneider_lab_to_nwb.zempolich_2024.zempolich_2024_convert_session import session_to_nwb


def dataset_to_nwb(
*,
data_dir_path: str | Path,
output_dir_path: str | Path,
data_dir_path: DirectoryPath,
output_dir_path: DirectoryPath,
max_workers: int = 1,
verbose: bool = True,
):
"""Convert the entire dataset to NWB.

Parameters
----------
data_dir_path : str | Path
data_dir_path : DirectoryPath
The path to the directory containing the raw data.
output_dir_path : str | Path
output_dir_path : DirectoryPath
The path to the directory where the NWB files will be saved.
max_workers : int, optional
The number of workers to use for parallel processing, by default 1
Expand Down Expand Up @@ -51,22 +52,34 @@ def dataset_to_nwb(
pass


def get_nwbfile_name_from_kwargs(session_to_nwb_kwargs):
def get_nwbfile_name_from_kwargs(session_to_nwb_kwargs: dict) -> str:
"""Get the name of the NWB file from the session_to_nwb kwargs.

Parameters
----------
session_to_nwb_kwargs : dict
The arguments for session_to_nwb.

Returns
-------
str
The name of the NWB file that would be created by running session_to_nwb(**session_to_nwb_kwargs).
"""
behavior_file_path = session_to_nwb_kwargs["behavior_file_path"]
subject_id = behavior_file_path.name.split("_")[1]
session_id = behavior_file_path.name.split("_")[2]
nwbfile_name = f"sub-{subject_id}_ses-{session_id}.nwb"
return nwbfile_name


def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: str | Path):
def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: FilePath):
"""Convert a session to NWB while handling any errors by recording error messages to the exception_file_path.

Parameters
----------
session_to_nwb_kwargs : dict
The arguments for session_to_nwb.
exception_file_path : Path
exception_file_path : FilePath
The path to the file where the exception messages will be saved.
"""
exception_file_path = Path(exception_file_path)
Expand All @@ -78,15 +91,12 @@ def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: str
f.write(traceback.format_exc())


def get_session_to_nwb_kwargs_per_session(
*,
data_dir_path: str | Path,
):
def get_session_to_nwb_kwargs_per_session(*, data_dir_path: DirectoryPath):
"""Get the kwargs for session_to_nwb for each session in the dataset.

Parameters
----------
data_dir_path : str | Path
data_dir_path : DirectoryPath
The path to the directory containing the raw data.

Returns
Expand Down Expand Up @@ -118,7 +128,27 @@ def get_session_to_nwb_kwargs_per_session(
return session_to_nwb_kwargs_per_session


def get_brain_region_kwargs(ephys_path, ephys_behavior_path, opto_path, brain_region):
def get_brain_region_kwargs(
ephys_path: DirectoryPath, ephys_behavior_path: DirectoryPath, opto_path: DirectoryPath, brain_region: str
):
"""Get the session_to_nwb kwargs for each session in the dataset for a given brain region.

Parameters
----------
ephys_path : DirectoryPath
Path to the directory containing electrophysiology data for subjects.
ephys_behavior_path : DirectoryPath
Path to the directory containing electrophysiology behavior data files.
opto_path : DirectoryPath
Path to the directory containing optogenetics behavior data files.
brain_region : str
The brain region associated with the sessions.

Returns
-------
list[dict[str, Any]]
A list of dictionaries containing the kwargs for session_to_nwb for each session in the dataset within a specific brain region.
"""
session_to_nwb_kwargs_per_session = []
for subject_dir in ephys_path.iterdir():
subject_id = subject_dir.name
Expand Down
Loading