Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring spot finding #1450

Closed
shanaxel42 opened this issue Jul 16, 2019 · 6 comments
Closed

Refactoring spot finding #1450

shanaxel42 opened this issue Jul 16, 2019 · 6 comments
Assignees
Labels
Milestone

Comments

@shanaxel42
Copy link
Collaborator

shanaxel42 commented Jul 16, 2019

Design Proposal

The approach that seqFish takes to spot finding breaks some of our current processing modals. The existing spot finding ecosystem was built up pretty independently for each assay and makes reusability difficult. This proposal refactors our current spot finders into a few small packages that are hopefully modular enough to support seqFIsh and any other future spot finding approach.

starfish.spots would include the following 3 packages:

  • LocateSpots (find the locations of spots in an image, or imagestack)
  • MeasureSpots (measure the intensity of spots at a list of locations)
  • DecodeSpots (decode the value of spots with intensities)

To support each step the following three data models are required

-Spot Attributes (a dataframe representing spots and their attributes x,y,z,radius, and optional info: passes threshold, target, intensity)

-IntensityTable (aka spot traces, a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value. )

-DecodedIntensityTable(aka decdoded spot traces. a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value AND their decoded target value)

The inputs and outputs of these packages would be:

  • imagestack & -> LocateSpots -> SpotAttributes
    spot finding algorithm

  • spotAttributes & imagestack -> MeasureSpots -> IntensityTable

  • IntensityTable & -> DecodeSpots -> DecodedIntensityTable
    Codebook

This introduces the concept of a DecodedIntensityTable which would just be a subclass of IntensityTable that includes a 'targets' dimension. This is a minor change that I think will reduce a lot of confusion around the outputs of each step.

In order to incorporate methods that use local spot finding like SeqFish. We also need to add the concept of a "spot id" to the SpotAttributes Dataframe. This will allow us to use the same spot in multiple possible groupings and track it.

So what does this mean for current spotFinders:

ex. osmFISH
osmFish currently uses a LocalMaxPeakFinder with no reference image. It's new workflow would be:

lmp = LocateSpots.LocalMaxPeakFinder(
    min_distance=6,
    stringency=0,
    min_obj_area=6,
    max_obj_area=600,
)
spot_locations = lmp.run(imagestack)

ms = MeasureSpots.MeasureSpotsAlgorithm()
intensityTable = ms.run(spot_locations, imagestack)

ex. ISS
ISS currently uses a blob detector on a blobs images. It's new workflow would be:

p = LocateSpots.BlobDetector(
    min_sigma=1,
    max_sigma=10,
    num_sigma=30,
    threshold=0.01,
    measurement_type='mean',
)
spot_locations = p.run(image=dots)
m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=spot_locations)

ex. BaristaSeq
baristaSeq currently uses PixelSpotDecoding. Pixel spot decoding will remain it's own module that does both locating and decoding.

ex. StarMap
Starmap currently uses LocalSearchBlobDetector. It's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)

smFish
smFish currently used TrackpyLocalMaxPeakFinder with no reference image, it's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)
@shanaxel42 shanaxel42 added this to the 0.2.0 milestone Jul 16, 2019
@ambrosejcarr ambrosejcarr changed the title Harmonize of spot finding Harmonization of spot finding Jul 18, 2019
@shanaxel42 shanaxel42 changed the title Harmonization of spot finding Refactoring spot finding Jul 23, 2019
@ambrosejcarr
Copy link
Member

ambrosejcarr commented Jul 24, 2019

I think you're on to something here. :-)

I have some clarifying questions.


LocateSpots (find the locations of spots in an image)

is "an image" here an arbitrary series of 2d or 3d planes?


For the ISS example, should

p = DetectSpots.BlobDetector(

be p = LocateSpots.BlobDetector(?


A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this
output, and won't be a good substrate for MeasureSpots.


MeasureSpots (measure the intensity of spots at a list of locations)

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Also, MeasureSpots is interesting. There are lots of options here such as taking the max/average intensity of a gaussian blob, ellipse, or ellipsoid. Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

It's also worth noting that the skimage.feature.peak_local_max is used in skimage.feature.blob_log; we could potentially simplify our code a bit if we're breaking measurement out of spot finding.


decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

@ttung
Copy link
Collaborator

ttung commented Jul 24, 2019

My opinions (and therefore @shanaxel42 should chime in with her opinion).

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this
output, and won't be a good substrate for MeasureSpots.

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c? Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

@ambrosejcarr
Copy link
Member

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c?

I think we would, because...

Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

This would decrease precision of the outcome by averaging over regions with signal and no signal, or in cases with crowded signal, produce the wrong result.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

👍

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

I was trying to articulate that I'm not confident that simply reporting (z, y, x, r, c, radius-or-equivalent) is always adequate to determine the spot intensity. I suspect we can do this, but as I understand the proposal, we'll need to dig into blob_log and trackpy.locate to understand what they're doing to measure intensity and factor that out.

@shanaxel42
Copy link
Collaborator Author

shanaxel42 commented Jul 24, 2019

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

Yes SpotLocate will always accept a 5d tensor and will work the same as the is_volume does now. So it will either apply the function to 2d or 3d tiles.

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

trackpy currently creates SpotAttribute with 'y', 'x', 'total_intensity', 'radius', 'eccentricity', 'intensity', 'raw_mass', 'ep' then uses the regular 'measure_spots` method the rest of the algorithms do so I see this fitting pretty nicely into the rewrite. I'm fine with adding more attributes to SpotAttribtues as long as we maintain that it's the output of LocatSpots and that MeasureSpots is a separate step.

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Depends on the SpotLocate method used, as above. But always a SpotAttributes dataframe with at least x,y positions.

decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

Since the output of LocateSpots will always be a SpotAttibutes dataframe, a simple step is left in the Pixel scenario to turn that into a DecodedIntensityTable, Decode.Lookup() would just do that last bit of mapping the key values of each spot to it's corresponding gene value. For the rest of the spot finding examples the last output is a regular IntensityTable, so the last step is just normal decoding to go from IntensityTable -> DecodedIntensityTable. The same as things work now.

For the ISS example, should

p = DetectSpots.BlobDetector(
be p = LocateSpots.BlobDetector(?

yuuup my bad.

@ttung
Copy link
Collaborator

ttung commented Jul 29, 2019

I strongly suspect that a table with a feature_id column and a spot_id column might work well as a data structure that accommodates all our decoding methodologies. It may complicate how users configure the spot detection methods, but might make it possible to have one data structure for everything.

@shanaxel42
Copy link
Collaborator Author

Closing in favor of new direction detailed here: #1500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants