Refactoring spot finding #1450

shanaxel42 · 2019-07-16T16:31:51Z

Design Proposal

The approach that seqFish takes to spot finding breaks some of our current processing modals. The existing spot finding ecosystem was built up pretty independently for each assay and makes reusability difficult. This proposal refactors our current spot finders into a few small packages that are hopefully modular enough to support seqFIsh and any other future spot finding approach.

starfish.spots would include the following 3 packages:

LocateSpots (find the locations of spots in an image, or imagestack)
MeasureSpots (measure the intensity of spots at a list of locations)
DecodeSpots (decode the value of spots with intensities)

To support each step the following three data models are required

-Spot Attributes (a dataframe representing spots and their attributes x,y,z,radius, and optional info: passes threshold, target, intensity)

-IntensityTable (aka spot traces, a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value. )

-DecodedIntensityTable(aka decdoded spot traces. a representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value AND their decoded target value)

The inputs and outputs of these packages would be:

imagestack & -> LocateSpots -> SpotAttributes
spot finding algorithm
spotAttributes & imagestack -> MeasureSpots -> IntensityTable
IntensityTable & -> DecodeSpots -> DecodedIntensityTable
Codebook

This introduces the concept of a DecodedIntensityTable which would just be a subclass of IntensityTable that includes a 'targets' dimension. This is a minor change that I think will reduce a lot of confusion around the outputs of each step.

In order to incorporate methods that use local spot finding like SeqFish. We also need to add the concept of a "spot id" to the SpotAttributes Dataframe. This will allow us to use the same spot in multiple possible groupings and track it.

So what does this mean for current spotFinders:

ex. osmFISH
osmFish currently uses a LocalMaxPeakFinder with no reference image. It's new workflow would be:

lmp = LocateSpots.LocalMaxPeakFinder(
    min_distance=6,
    stringency=0,
    min_obj_area=6,
    max_obj_area=600,
)
spot_locations = lmp.run(imagestack)

ms = MeasureSpots.MeasureSpotsAlgorithm()
intensityTable = ms.run(spot_locations, imagestack)

ex. ISS
ISS currently uses a blob detector on a blobs images. It's new workflow would be:

p = LocateSpots.BlobDetector(
    min_sigma=1,
    max_sigma=10,
    num_sigma=30,
    threshold=0.01,
    measurement_type='mean',
)
spot_locations = p.run(image=dots)
m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=spot_locations)

ex. BaristaSeq
baristaSeq currently uses PixelSpotDecoding. Pixel spot decoding will remain it's own module that does both locating and decoding.

ex. StarMap
Starmap currently uses LocalSearchBlobDetector. It's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)

smFish
smFish currently used TrackpyLocalMaxPeakFinder with no reference image, it's new workflow would be:

spot_locater = LocateSpots.LocalSearchBlob()
spot_locations = spot_locater.run(imagestack)

m = MeasureSpots.MeasureSpotsAlgorithm(measurement_function=np.max, radius_is_gyration=False)
intensities = m.run(data_image=imagestack, spot_attributes=reduced_spot_locations)

The text was updated successfully, but these errors were encountered:

ambrosejcarr · 2019-07-24T00:46:21Z

I think you're on to something here. :-)

I have some clarifying questions.

LocateSpots (find the locations of spots in an image)

is "an image" here an arbitrary series of 2d or 3d planes?

For the ISS example, should

p = DetectSpots.BlobDetector(

be p = LocateSpots.BlobDetector(?

A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this
output, and won't be a good substrate for MeasureSpots.

MeasureSpots (measure the intensity of spots at a list of locations)

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Also, MeasureSpots is interesting. There are lots of options here such as taking the max/average intensity of a gaussian blob, ellipse, or ellipsoid. Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

It's also worth noting that the skimage.feature.peak_local_max is used in skimage.feature.blob_log; we could potentially simplify our code a bit if we're breaking measurement out of spot finding.

decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

ttung · 2019-07-24T05:49:04Z

My opinions (and therefore @shanaxel42 should chime in with her opinion).

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

A hiccup in GroupSpots is that in cases where we engage in fuzzy matching (e.g. LocalBlobDetector, the position of the matched spots won't be the same across rounds and channels, and therefore our existing IntensityTable approach won't be flexible enough to hold this
output, and won't be a good substrate for MeasureSpots.

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c? Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

ambrosejcarr · 2019-07-24T14:21:45Z

is "an image" here an arbitrary series of 2d or 3d planes?

It should be a 5D tensor, with axes that we group by.

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

If we group a bunch of spots with pixel coordinates that are not identical, would we need the coordinates of each spot in each r·c?

I think we would, because...

Would the average location not suffice? MeasureSpots could have an implementation that tolerates some fuzz around where it's doing the measurement.

This would decrease precision of the outcome by averaging over regions with signal and no signal, or in cases with crowded signal, produce the wrong result.

MeasureSpots (measure the intensity of spots at a list of locations)

I think this needs an "axes to measure across". Sequential assays should be doing this across C or Z (not mutually exclusive).

👍

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

are you saying these do more than one thing?

I was trying to articulate that I'm not confident that simply reporting (z, y, x, r, c, radius-or-equivalent) is always adequate to determine the spot intensity. I suspect we can do this, but as I understand the proposal, we'll need to dig into blob_log and trackpy.locate to understand what they're doing to measure intensity and factor that out.

shanaxel42 · 2019-07-24T15:43:04Z

Spot finders really only work on 2d and 3d images, though -- grouping over ([r|c], z, y, x) would likely produce weird discontinuities in the (r, c) dimensions since we don't expect coherent spot locations across rounds or channels. So, refining this, it should be a 5d tensor, and accept grouping over (y, x) or (z, y, x), similar to the is_volume flag we have now?

Yes SpotLocate will always accept a 5d tensor and will work the same as the is_volume does now. So it will either apply the function to 2d or 3d tiles.

Some spot detectors (like the trackpy one) estimate the area in which a fraction of the mass (brightness) of the spot lies. These methods could be tricky to disentangle from their implementations and rewrite in starfish.

trackpy currently creates SpotAttribute with 'y', 'x', 'total_intensity', 'radius', 'eccentricity', 'intensity', 'raw_mass', 'ep' then uses the regular 'measure_spots` method the rest of the algorithms do so I see this fitting pretty nicely into the rewrite. I'm fine with adding more attributes to SpotAttribtues as long as we maintain that it's the output of LocatSpots and that MeasureSpots is a separate step.

Could you flesh out the contract more explicitly? What indices would be passed in? (x, y, z)? (x, y, z, r)? How would these be matched to the images that would be measured.

Depends on the SpotLocate method used, as above. But always a SpotAttributes dataframe with at least x,y positions.

decoded_intensities = starfish.spots.Decode.Lookup()

could you explain why this is needed, and perhaps how decoding would work in the remainder of the spot finding examples?

Since the output of LocateSpots will always be a SpotAttibutes dataframe, a simple step is left in the Pixel scenario to turn that into a DecodedIntensityTable, Decode.Lookup() would just do that last bit of mapping the key values of each spot to it's corresponding gene value. For the rest of the spot finding examples the last output is a regular IntensityTable, so the last step is just normal decoding to go from IntensityTable -> DecodedIntensityTable. The same as things work now.

For the ISS example, should

p = DetectSpots.BlobDetector(
be p = LocateSpots.BlobDetector(?

yuuup my bad.

ttung · 2019-07-29T20:43:24Z

I strongly suspect that a table with a feature_id column and a spot_id column might work well as a data structure that accommodates all our decoding methodologies. It may complicate how users configure the spot detection methods, but might make it possible to have one data structure for everything.

shanaxel42 · 2019-08-26T19:09:49Z

Closing in favor of new direction detailed here: #1500

shanaxel42 added this to the 0.2.0 milestone Jul 16, 2019

shanaxel42 added the Epic label Jul 16, 2019

ambrosejcarr changed the title ~~Harmonize of spot finding~~ Harmonization of spot finding Jul 18, 2019

shanaxel42 changed the title ~~Harmonization of spot finding~~ Refactoring spot finding Jul 23, 2019

shanaxel42 mentioned this issue Aug 5, 2019

Spot_ID missing in IntensityTable using LocalSearchBlobDetector #1479

Closed

ambrosejcarr assigned shanaxel42 Aug 9, 2019

ambrosejcarr mentioned this issue Aug 15, 2019

decoding method in seqFISH #1485

Closed

shanaxel42 mentioned this issue Aug 20, 2019

Refactoring Spot Finding TAKE 2 #1500

Closed

shanaxel42 closed this as completed Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring spot finding #1450

Refactoring spot finding #1450

shanaxel42 commented Jul 16, 2019 •

edited

Loading

ambrosejcarr commented Jul 24, 2019 •

edited

Loading

ttung commented Jul 24, 2019

ambrosejcarr commented Jul 24, 2019

shanaxel42 commented Jul 24, 2019 •

edited

Loading

ttung commented Jul 29, 2019

shanaxel42 commented Aug 26, 2019

Refactoring spot finding #1450

Refactoring spot finding #1450

Comments

shanaxel42 commented Jul 16, 2019 • edited Loading

Design Proposal

ambrosejcarr commented Jul 24, 2019 • edited Loading

ttung commented Jul 24, 2019

ambrosejcarr commented Jul 24, 2019

shanaxel42 commented Jul 24, 2019 • edited Loading

ttung commented Jul 29, 2019

shanaxel42 commented Aug 26, 2019

shanaxel42 commented Jul 16, 2019 •

edited

Loading

ambrosejcarr commented Jul 24, 2019 •

edited

Loading

shanaxel42 commented Jul 24, 2019 •

edited

Loading