Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Spot Finding TAKE 2 #1500

Closed
shanaxel42 opened this issue Aug 20, 2019 · 10 comments
Closed

Refactoring Spot Finding TAKE 2 #1500

shanaxel42 opened this issue Aug 20, 2019 · 10 comments
Assignees

Comments

@shanaxel42
Copy link
Collaborator

shanaxel42 commented Aug 20, 2019

Spot Finding

As I started to execute on the original small module design proposal #1450 I realized each method didn't fit quite as nicely as I'd hoped. The outputs of each spot finding method were two varied and it would have caused too much confusion around the difference between a located spot and a spot trace. After some more speculation I've been coming around to the idea that what spot finding needs is not smaller modules but instead more abstraction. This proposal represents the "more abstraction" approach:

starfish.spots would include the following two packages:

  • FindSpots
  • DecodeSpots

Their base contracts would be:

ImageStack -> FindSpots -> SpotFindingResults

codebook,
SpotFindingResults -> DecodeSpots -> DecodedIntensityTable

FindSpots will handle
-locating and measuring spots in every r/ch pair

DecodeSpots will handle
-building spot traces
-decoding the spots traces

To support each step the following two new data models are required:

  • SpotFindingResults : a dictionary of round/ch indices and their corresponding SpotAttribtues

  • DecodedIntensityTable : aka decoded spot traces. A representation of spots, their attributes and their corresponding traces, for mulitplexed methods this trace is a vector. For non multiplexed methods the trace is a singe value. Each spot also has a decoded target value

With this refactor we are abstracting the idea of an IntensityTable from the user. Instead spot traces will be built when needed as part of the decoding step. From a user perspective they will go straight from finding spots to decoding them.

The new workflows for each notebook would be:

osmFISH*
osmFish currently uses a LocalMaxPeakFinder with no reference image. It's new workflow would be:

lmp = FindSpots.LocalMaxPeakFinder(
    min_distance=6,
    stringency=0,
    min_obj_area=6,
    max_obj_area=600,
)
spot_results= lmp.run(imagestack)

decoder = Decode.PerRoundMaxChannel(codebook)
decoded_intensites = decoder.run(spot_results)

ISS
ISS currently uses a blob detector on a blobs images. It's new workflow would be:

p = FindSpots.BlobDetector(
    min_sigma=1,
    max_sigma=10,
    num_sigma=30,
    threshold=0.01,
    measurement_type='mean',
)
spot_results = p.run(reference_image=dots, image=imagestack)

decoder = Decode.PerRoundMaxChannel(codebook)
decoded_intensites = decoder.run(spot_results)

BaristaSeq
Uses PixelSpotDecoder. I think trying to rework PixelSpotDecoding into this modal is too much work for it's worth. But I do think that explaining it as just a Decoder that goes directly from an ImageStack to DecdodedIntensityTable will be clearer then what we currently have.

starMap
Starmap currently uses LocalSearchBlobDetector. It's new workflow would be:
We'll actually be able to delete LocalSearchBlobDetector after this refactor and instead use a combination of regular BlobDetector and trace building using a 'nearest_neighbors' method.

spot_finder = FindSpots.BlobDetector()
spot_results = spot_finder.run(imagestack)

decoder = Decode.PerRoundMaxChannel(codebook)
decoded_intensites = decoder.run(spot_locations, build_traces='nearest_neighbor')

smFISH
smFish currently used TrackpyLocalMaxPeakFinder with no reference image, it's new workflow would be:

spot_locater = FindSpots.LocalSearchBlob()
spot_results = spot_locater.run(imagestack)

decoder = Decode.PerRoundMaxChannel(codebook)
decoded_intensites = decoder.run(spot_results)

The high level plan of attack for the refactor is:

  • Create DecodedIntensityTable (already done Adding DecodedIntensityTable class #1489)
  • Create FindSpots module with contract ImageStack -> SpotFindingResults
  • Create DecodeSpots module with contract SpotFindingResults -> DecodedIntensityTable
  • Fix provenance logging with new modules
  • Move measure_spot_intensity_code to a utility file in the FindSpots module
  • Move existing decoders to DecodeSpots module.
  • Refactor existing spot finding methods to always take in an imagestack and use and output SpotFindingResults (I'll probably first just make copies of each one under the new FindSpots module so I don't break everything)
  • Refactor existing notebooks to use this new framework
  • Delete the DetectSpots package.

By abstracting the step of building spot traces to decoding we allow for more flexibility in the data structures and process used. For example Gabriel's graph based approach #1482

For SeqFish the workflow would be LocateSpots.LocalSearchBlob() to get the spot locations. And then a new Decoder that would handle creating the possible trace groupings and decoding.

@ttung
Copy link
Collaborator

ttung commented Aug 20, 2019

I have this notion that there's an assay that uses the max projected image to locate spots, but gets the intensities from original image. How would that work in this model?

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Aug 20, 2019

This looks great! I have a couple questions but overall the flow of the refactor looks solid.

Create LocateSpots module with contract ImageStack -> SpotAttributes

This is more properly a rename operation on DetectSpots, followed by exchanging IntensityTable construction with a container-like SpotAttributes class that mimics dict[Indices, DataFrame] right?

Move measure_spot_intensity_code to a utility file in the DecodeSpots module

Can you remind me why DecodeSpots needs the "measure" operation instead of simply putting it with LocateSpots? I get that the now defunct GroupSpots needs to be in DecodeSpots. If MeasureSpots stayed in LocateSpots, the ImageStack would always be present for measurement, and the SpotAttributes would always have an intensity field. This would address Tony's question too, I think.

[...] I think trying to rework PixelSpotDecoding into this modal is too much work for it's worth. But I do think that explaining it as just a Decoder that goes directly from an ImageStack to DecdodedIntensityTable will be clearer then what we currently have.

Agree, but the PixelSpotDecoder use the same decoders as the other spot finders. If you're modifying DecodeSpots to make it emit a 2D IntensityTable with shape (features, r * c), this will need to be ported over to the Pixel approach soon after this refactor.

Create DecodedIntensityTable (already done #1489)

Semantic question: If we're deleting the user-facing IntensityTable in favor of SpotAttributes, could we just use the existing IntensityTable as the output of decode (in lieu of the more complex name DecodedIntensityTable), but require it always have a gene field?

Raising for later: the same confusion that prompted you to address IntensityTable -> DecodedIntensityTable will make you want to create DecodedIntensityTableWithCellAssignments. Not sure what the right solution here is. :-)

@shanaxel42
Copy link
Collaborator Author

I have this notion that there's an assay that uses the max projected image to locate spots, but gets the intensities from original image. How would that work in this model?

So you'd LocateSpots on the max projected image to get the spot locations then pass the imagestack into Decode for the measure_intensities step. It should work the same as if they did the spot finding on a reference image.

Can you remind me why DecodeSpots needs the "measure" operation instead of simply putting it with LocateSpots? I get that the now defunct GroupSpots needs to be in DecodeSpots. If MeasureSpots stayed in LocateSpots, the ImageStack would always be present for measurement, and the SpotAttributes would always have an intensity field. This would address Tony's question too, I think.

I wanted to keep LocateSpots completely in individual spot space (not traces). by adding the measure spot intensities step there you have spot traces as a potential output of LocateSpots, but not always (like in the case of LocalSearchBlobDetector). This is essentially the model we have now with IntensityTable being the output of Detect.

Semantic question: If we're deleting the user-facing IntensityTable in favor of SpotAttributes, could we just use the existing IntensityTable as the output of decode (in lieu of the more complex name DecodedIntensityTable), but require it always have a gene field?

We could. What I'd really like to do is actually just rename DecodedIntensityTable to DecodedSpotTraces to make things more clear. But I'm also wary of introducing too many changes, so open to just having IntensityTable.

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Aug 20, 2019

I wanted to keep LocateSpots completely in individual spot space (not traces). by adding the measure spot intensities step there you have spot traces as a potential output of LocateSpots, but not always (like in the case of LocalSearchBlobDetector). This is essentially the model we have now with IntensityTable being the output of Detect.

I either disagree or don't understand. The relationship between spot and the measured intensity is 1:1. There's no efficiency gained by building traces at the time of measurement; measurement and trace building should be safely separable. It should be possible for MeasureSpots to live in LocateSpots and GroupSpots to live in DecodeSpots unless I'm missing something important.

@shanaxel42
Copy link
Collaborator Author

shanaxel42 commented Aug 20, 2019

The relationship between spot and the measured intensity is 1:1

This is not always true. There's a difference between a spot found in a reference image and a spot found in an imagestack using local search. For the later, there's a need to treat these spots as unique entities you can group in varying ways to build your traces (seqfish). For the first I agree the spot and measured trace are 1:1. Maybe the confusion is around what the measure spot intensities code is doing?

@ambrosejcarr
Copy link
Member

This is not always true. There's a difference between a spot found in a reference image and a spot found in an imagestack using local search. For the later, there's a need to treat these spots as unique entities you can group in varying ways to build your traces (seqfish)

Maybe! I think measure spot intensities is identifying the intensity of a spot given a set of indices.

Concretely I see it as a method that is run on each SpotAttributes table in cases where the Intensity is not generated by LocateSpots:

detect_spots(SpotAttributes[-intensity], ImageStack) -> SpotAttributes[+intensity]

And, I'm arguing this should be in LocateSpots because you've defined the chain of operations as:

def LocateSpots(ImageStack) -> SpotAttributes[?intensity]
    pass
def DecodeSpots(SpotAttributes, Optional[ImageStack]) -> IntensityTable
    pass

But if my suggestion is feasible, we'd instead have:

def LocateAndMeasureSpots(ImageStack) -> SpotAttributes
    pass
def DecodeSpots(SpotAttributes) -> IntensityTable
    pass

Addressing your question, I think that In both of your cases, the intensity of each spot can be identified during the localization. I agree that merging into traces is different, but don't see why that must be related to measuring the spot.

I recall during our earlier discussions we decided that the spot could be measured anywhere in the chain of operations because SeqFISH doesn't really use the spot intensity in its approach.

@shanaxel42
Copy link
Collaborator Author

shanaxel42 commented Aug 20, 2019

ah ok yes we're thinking of two different things. So the existing code within measure_spot_intensities builds your spot traces based off locations found in a reference or max projected image then outputs an intensityTable. The method should really be called build_spot_traces or something. And I'm arguing that anything related to building traces or grouping spots into traces should happen in the decoding step. So I think we agree, and yea in that scenario I could move the measuring of the individual spot intensities to the Locate step but it's slightly more work.

@ambrosejcarr
Copy link
Member

Ok, great. I agree -- "trace" building should all go to DetectSpots.

I feel, based on how it simplifies the contracts, that the refactor to put MeasureSpots into LocateSpots is worthwhile, and in the long run will result in simpler, cleaner code.

@ttung
Copy link
Collaborator

ttung commented Aug 20, 2019

I'm generally confused. This seems pretty similar to the original proposal in #1450 except there's no MeasureSpots.

Also, what is the distinction between SpotAttributes and IntensityTable?

@ttung
Copy link
Collaborator

ttung commented Aug 20, 2019

I have to dig out my notes for what we talked about in Toronto...

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Aug 20, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants