Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExternalImageSeries #462

Draft
wants to merge 9 commits into
base: dev
Choose a base branch
from
Draft

Add ExternalImageSeries #462

wants to merge 9 commits into from

Conversation

rly
Copy link
Contributor

@rly rly commented Aug 7, 2020

The ImageSeries neurodata_type is a subtype of TimeSeries. TimeSeries requires the dataset data; however, ImageSeries makes data optional because external_file may be provided in place of data. This breaks the rules of inheritance -- a child must have all required fields of a parent, and this inconsistency creates challenges in NeurodataWithoutBorders/pynwb#1274

This PR makes ImageSeries.data required and updates the documentation. If external_file is provided, then data should be set to an empty 3D array.

See also discussion here: NeurodataWithoutBorders/pynwb#1220

Seeking your input @oruebel @ajtritt @bendichter

@rly rly requested review from ajtritt, oruebel and bendichter August 7, 2020 01:51
oruebel
oruebel previously approved these changes Aug 7, 2020
@bendichter
Copy link
Contributor

This has always been annoying. Maybe we should just create a separate data type for external videos

@rly
Copy link
Contributor Author

rly commented Aug 7, 2020

This has always been annoying. Maybe we should just create a separate data type for external videos

I am also in favor of this solution and would prefer it. The new data type ExternalImageSeries (?) could inherit from a base Events type https://github.com/rly/ndx-events since it would consist of just timestamps and a dataset of filenames.

@oruebel
Copy link
Contributor

oruebel commented Aug 7, 2020

he new data type ExternalImageSeries (?) could inherit from a base Events

I'm fine with creating a separate type, but I don't fully understand why this would be treated as Events? Isn't this just a regular timeseries (with timestamps, data, unit etc.) where the only difference is that the data is a 1D array of strings indicating the file that corresponds to the respective image frame.

@rly
Copy link
Contributor Author

rly commented Aug 7, 2020

he new data type ExternalImageSeries (?) could inherit from a base Events

I'm fine with creating a separate type, but I don't fully understand why this would be treated as Events? Isn't this just a regular timeseries (with timestamps, data, unit etc.) where the only difference is that the data is a 1D array of strings indicating the file that corresponds to the respective image frame.

So if a single tiff file is the external image file for 300 timestamps, then data would just be an array consisting of that filename string 300 times?

That would be fine with me. It uses a bit more space than it needs to, but it is clear and explicit.

@bendichter
Copy link
Contributor

Do we need to replicate the filename 300 times? That's even worse than the current solution.

@rly rly added this to the NWB 2.4.0 milestone Aug 17, 2020
@bendichter
Copy link
Contributor

need to test creating an empty array in MatNWB

@rly rly changed the title Make data required for ImageSeries, update docs Add ExternalImageSeries, require ImageSeries.data Aug 2, 2021
@rly
Copy link
Contributor Author

rly commented Aug 2, 2021

After discussion with @bendichter, we decided to:

  1. Add a new type ExternalImageSeries which inherits from NWBDataInterface. It contains a dataset of external file paths and an attribute indicating the frame within the image series at which each external file begins (copied from the old ImageSeries, but I improved the documentation).
    • Q: Unlike all other types that end in Series, this is not a TimeSeries. I am OK with this exception, but it should be noted. What do you think?
    • This does not inherit from the Events type because we want to allow users to specify a starting time and sampling rate, which the Events type does not support. We could add that option for the Events type, but it does not seem useful for the vast majority of use cases of the Events type. Thus, this type inherits from NWBDataInterface and has both a timestamps dataset and a starting_time dataset, just like TimeSeries.
  2. Make ImageSeries/data required
  3. Discourage the use of ImageSeries/external_file and setting ImageSeries/format to something other than the default value of 'raw' with the long-term goal of deprecating it.

@rly rly requested review from oruebel, ajtritt and bendichter and removed request for bendichter and ajtritt August 2, 2021 22:48
core/nwb.image.yaml Show resolved Hide resolved
@rly
Copy link
Contributor Author

rly commented Aug 3, 2021

@bendichter How are labs using storing time series of multi-channel image data in the NWB file, if at all? According to the dims and doc for ImageSeries, the data of an ImageSeries should either be N single-channel images (e.g., grayscale) or N single-channel volumes. Is anyone storing N multi-channel images (e.g., RGB or RGBA) in an ImageSeries? If so, that is not the documented use of this type, and it would not be possible for a user or machine to distinguish between a volume and a multi-channel image. AFAIK, there is no type that supports storing N multi-channel images, except perhaps IndexSeries.

Would it be better to have an ImageSeries type for images and a VolumeSeries type for volumes?

@rly
Copy link
Contributor Author

rly commented Aug 3, 2021

See also #316 where we changed OpticalSeries to have dimensions (frame, x, y, rgb)

@bendichter
Copy link
Contributor

We could call it an ExternalImageStack if we want to maintain consistency that a "Series" is always a TimeSeries

@rly
Copy link
Contributor Author

rly commented Aug 3, 2021

Also worth noting with this change is that all types that extend ImageSeries and use external files will need to have a corresponding new External type. E.g. ExternalTwoPhotonSeries. That's doable but not ideal...

@rly rly closed this Aug 3, 2021
@rly rly reopened this Aug 3, 2021
@rly
Copy link
Contributor Author

rly commented Aug 3, 2021

To summarize this issue and PR:

Problem:
The ImageSeries neurodata_type is a subtype of TimeSeries. TimeSeries requires the dataset 'data'; however, ImageSeries makes 'data' optional because 'external_file' may be provided in place of data. This breaks the rules of inheritance -- a child must have all required fields of a parent, and this inconsistency creates challenges in the API and validation.

The main use cases of ImageSeries (AFAIK) are:

  • user wants to store a time series of images acquired from the brain (e.g., two-photon images, which are often single multi-image/page TIFF files or an ordered set of single-image image files), either in the original image format to maximize interoperability with other tools or embedded as raw values in the file to maximize portability
  • user wants to store a time series of image stimuli presented to the animal, either in the original image format to maximize interoperability with other tools or embedded as raw values in the file to maximize portability
  • user wants to store video recordings of the animal / environment, in the original video format (e.g., MP4) because HDF5 video compression is poor
  • user/developer wants to display images acquired from the brain, video recordings, or image stimuli individually or as an animated movie
  • user/developer wants to perform computation on the images individually or all together (e.g., mean, filter, ROI extraction, pose estimation)

In the majority of cases, the user wants to store the raw image/video data outside of the NWB file to maximize interoperability with other tools.

Some possible solutions:

Option 1: Make ImageSeries/data required. If 'external_file' is provided, then 'data' should be set to an empty 3D array.
Pros:

  • Space-efficient

Cons:

  • If a starting time & rate are provided, then there is no way to know how many images were presented without counting the number of frames of the individual external files.

Option 2: Make ImageSeries/data required. If 'external_file' is provided, then 'data' should be set to a 1D array with the same length as timestamps where each element is the filename of the external file corresponding to that time. For example, if 'external_file' has 1 file with 300 frames, then 'data' is a string array containing 'filename.tiff' repeated 300 times. If 'external_file' has 10 files each with 1 frame, then 'data' has the 10 filenames, i.e., the same as 'external_file'.
Pros:

  • Clear format

Cons:

  • A little harder to enter the data correctly (e.g., in MatNWB)
  • Not as space-efficient but usually there aren't that many image files

Option 3: Make ImageSeries/data required. Aim to deprecate the use of ImageSeries and types that extend it for storing external image files. Create a new type ExternalImageSeries and corresponding types that extend it for storing external image files. Other types that link to an ImageSeries will now also need to link to an ExternalImageSeries (or a new common base type).
Pros:

  • Clear delineation between the two types of image storing methods
  • No rules within ImageSeries or new types that require understanding the documentation (e.g., if field1 is x, then field2 is required and field3 should be set this way) -- these are confusing and not easily validatable
  • Space-efficient

Cons:

  • More types, confusing

Option 4: Make new ImageObjectSeries where 'data' is a dataset of references to Image or ExternalImage objects (is reftype: A or B possible? or do we need to make a BaseImage type?). Add new ExternalImage object that is a reference to a single file and an optional index into that file for multi-image files. Aim to deprecate the use of ImageSeries in favor of this type.
Pros:

  • Reduces duplication of how image data are represented across types

Cons:

  • Datasets of references are hard to inspect outside of the APIs

@bendichter
Copy link
Contributor

r How are labs using storing time series of multi-channel image data in the NWB file, if at all? According to the dims and doc for ImageSeries, the data of an ImageSeries should either be N single-channel images (e.g., grayscale) or N single-channel volumes. Is anyone storing N multi-channel images (e.g., RGB or RGBA) in an ImageSeries? If so, that is not the documented use of this type, and it would not be possible for a user or machine to distinguish between a volume and a multi-channel image. AFAIK, there is no type that supports storing N multi-channel images, except perhaps IndexSeries.

Would it be better to have an ImageSeries type for images and a VolumeSeries type for volumes?

Labs with multiple channels are storing them as multiple ImageSeries or TwoPhotonSeries objects

@rly rly marked this pull request as draft August 5, 2021 18:25
@rly rly changed the title Add ExternalImageSeries, require ImageSeries.data Add ExternalImageSeries Aug 5, 2021
@rly rly modified the milestones: NWB 2.4.0, NWB 2.5.0 Aug 5, 2021
@rly
Copy link
Contributor Author

rly commented Aug 5, 2021

After discussion with @oruebel, we have decided to go with Option 1 for now and put the other changes in this PR regarding handling of external files on hold for NWB 2.5.0. This will give us more time to solicit input from relevant parties and decide on the best course of action.

Additionally, we had a few more thoughts:
Option 5. Make ImageSeries/data required. Change external_file to be a list of filenames, one per timestamp. If external_file is provided, then data should be a 1-D array of frame numbers of the corresponding file in external_file.

Option 4 is not ideal for image data stored in the file because it could result in 100s or 1000s of Image objects stored non-contiguously with each other. This results in inefficient data access for reading a stack, computing across images in a stack, and compressing image data in a stack.

Option 3 is not ideal because all extensions of ImageSeries would have to define an extension of ExternalImageSeries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants