Skip to content

Preliminary Steps

TeresaEsch edited this page Feb 16, 2024 · 7 revisions

What you need before you start using DaCapo

Data

DaCapo trains machine learning models to identify objects in 2D and 3D images collected with light or electron microscopy.

File Types

The following file types are supported:

Data Classes

In order to train and validate the model, you have to tell it what the "ground truth" (right answer) is for some fraction of the data. Thus, you have to begin by annotating small crops of the raw data.

Two pairings of raw and ground truth data are needed:

  • Training data: used for initial training of the model
  • Validation data: used to verity that your model has not simply memorized the training set

Hardware Requirements

We recommend your hardware meets the following specifications:

Software Requirements

Conda Environment

We suggest you create a separate conda environment on your computer for using DaCapo. This helps ensure that you use the proper versions of software, etc.

To create a new environment, type the following on the command line:

conda create -n dacapo python=3.10

To switch to this environment, type:

conda activate dacapo

DaCapo

You can install DaCapo via GitHub using pip. Type:

pip install git+https://github.com/janelia-cellmap/dacapo.git

Alternatively, you can clone the repository and install it locally:

git clone https://github.com/janelia-cellmap/dacapo.git
cd dacapo
pip install -e

Be sure to select this environment when you want to run an experiment.

Python

DaCapo is written in Python and you tell it what to do usingn Puthon scripts. This user guide aims to tell you what you need to do in Python, even if you have never used this programming language before.

Recommended but optional software

MongoDB

We encourage you save the data created during your DaCapo run (e.g., loss and validation data) to a MongoDB database. This will allow you to quickly fetch specific scores from a range of experiments for comparison.

Note: this option requires some set up: you need a mongodb accessible and you need to configure DaCapo to know where to find it.

If you just want to get started quickly, you can save all data to disk.

JupyterLab

Jupyter notebooks are a convenient way to create, run, and document code. Once you have a Jupyter notebook for running an experiment, you can make a copy that notebook to quickly configure a similar experiment: you just need to change the parameters you want to modify, without retyping all the necessary code. We also plan to offer several example notebooks to get you started.