Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoding method in seqFISH #1485

Closed
pdichiaro opened this issue Aug 6, 2019 · 7 comments · Fixed by #1602
Closed

decoding method in seqFISH #1485

pdichiaro opened this issue Aug 6, 2019 · 7 comments · Fixed by #1602
Assignees
Labels
bug An issue with an existing feature
Milestone

Comments

@pdichiaro
Copy link

after generation of intensityTable by LocalSearchBlobDetecor I use codebook.decode_per_round_max(intensities) to decode each spot. It does not work(error: all-Nan slice encountered).
Screenshot 2019-08-06 at 19 43 20

@ambrosejcarr
Copy link
Member

Thank you for the bug report. We should make this more user friendly. What's happening is that with this spot finder, sometimes there is not a spot detected within the search radius in each round, but those are still stored in the IntensityTable. The decode_per_round_max method expects one spot per round, and breaks as you observe. However, you probably want to decode the remainder of the spots!

For the time being, the following work-around should suffice. It will decode valid spots, and the features that don't meet the expectations will not decode.

intensities = intensities.fillna(0)
decoded = exp.codebook.decode_per_round_max(intensities)

@pdichiaro
Copy link
Author

Thanks for your reply.
I was wondering if decode_per_round_max method is implemented for the error correction during decoding.
In seqFISH it is necessary to generate a barcode scheme that can tolerate loss of a single round of hybridization. For example, if I wanted to decode 100 genes using 4 rounds of hybridizations it should be possible to uniquely assign barcodes to genes even with only 3 signals out of 4.

@ambrosejcarr
Copy link
Member

ambrosejcarr commented Aug 15, 2019

I was wondering if decode_per_round_max method is implemented for the error correction during decoding.
In seqFISH it is necessary to generate a barcode scheme that can tolerate loss of a single round of hybridization. For example, if I wanted to decode 100 genes using 4 rounds of hybridizations it should be possible to uniquely assign barcodes to genes even with only 3 signals out of 4

No, not yet. We have an open PR from @gapartel (#1482) that implements graph-based distance-aware decoding and provides a nice theoretical solution for this problem. We've also drafted work to implement this decoding in #1450 which @shanaxel42 is actively working on. I'd hesitate to give a tight ETA for these features, though. The spot finding refactor is a bit tricky. :-)

#1311 chronicles the initial work to implement the SeqFISH spot finding and decoding, if you are interested.

@shanaxel42 shanaxel42 added this to the 0.2.0 milestone Aug 19, 2019
@berl
Copy link
Collaborator

berl commented Sep 21, 2019

I'm going to tack on here since this is still open- I'm seeing some weird behavior with the per-round-max decoding. I start with a DataArray (because of the fillna(0) as above), here I included just 2 particles:

<xarray.DataArray (features: 2, c: 3, r: 13)>
array([[[0.00075 , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ]],

       [[0.001359, 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ]]])
Coordinates:
  * c                (c) int64 0 1 2
  * r                (r) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
    eccentricity     (features) float64 nan nan
    ep               (features) float64 0.0006958 0.0003738
    radius           (features) float64 1.384 1.393
    raw_mass         (features) float64 0.0112 0.02086
    total_intensity  (features) float64 0.01049 0.0201
    x                (features) float64 1.39e+03 1.284e+03
    y                (features) float64 5.179 18.18
    z                (features) float64 1.714 2.009
  * features         (features) int64 0 1
    xc               (features) float64 -4.117e+03 -4.128e+03
    yc               (features) float64 3.665e+03 3.666e+03
    zc               (features) float64 -9.086 -8.997

I apply the per_round_max_decoding and get this:

<xarray.DecodedIntensityTable (features: 2, c: 3, r: 13)>
array([[[0.00075 , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ]],

       [[0.001359, 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ],
        [0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
         0.      ]]])
Coordinates:
  * c                  (c) int64 0 1 2
  * r                  (r) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
    eccentricity       (features) float64 nan nan
    ep                 (features) float64 0.0006958 0.0003738
    radius             (features) float64 1.384 1.393
    raw_mass           (features) float64 0.0112 0.02086
    total_intensity    (features) float64 0.01049 0.0201
    x                  (features) float64 1.39e+03 1.284e+03
    y                  (features) float64 5.179 18.18
    z                  (features) float64 1.714 2.009
  * features           (features) int64 0 1
    xc                 (features) float64 -4.117e+03 -4.128e+03
    yc                 (features) float64 3.665e+03 3.666e+03
    zc                 (features) float64 -9.086 -8.997
    target             (features) <U2 'Th' 'Th'
    distance           (features) float64 0.0 0.0
    passes_thresholds  (features) bool True True

So both of these features map to the same codeword, Th in this case, and with zero distance.

However, if you go look at that codeword, you find that it looks like this:

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
Coordinates:
    target   <U2 'Th'
  * c        (c) int64 0 1 2
  * r        (r) int64 0 1 2 3 4 5 6 7 8 9 10 11 12

which is nowhere near the "per round max" of the input IntensityTables, unless I'm misunderstanding how the per round max is supposed to work (very possible).

Based on the documentation, I expect the above IntensityTable features to decode to nan or at least a codeword that has a non-zero value in the (r,c) or round where these features have a non-zero value.

@berl
Copy link
Collaborator

berl commented Sep 21, 2019

this is with

starfish.__version__

'0.1.6'

@ambrosejcarr ambrosejcarr added the bug An issue with an existing feature label Sep 22, 2019
@gapartel
Copy link

I noticed these behavior too. If all the channels of a given round have 0 intensity (that's the case with fillna(0)), then decode_per_round_max as it is now decodes the first channel as max, ending up with erroneous sequences if these are present in the codebook. I worked around by filtering all the spots in the IntensityTable with nan values in the graph decoding filter that I am devoloping.

@ambrosejcarr
Copy link
Member

Thanks @berl @gapartel and @pdichiaro for surfacing this bug. We're going to tackle this next.

ttung pushed a commit that referenced this issue Oct 8, 2019
1. When the entire row is nan, the decoder chokes.  This is remedied by decoding on an array where the nan values are replaced with 0s.
2. When the entire row is of equal intensity, the `np.argmax` arbitrarily picks the first column as the winner.  That erroneously decodes as ch=0 having the max intensity.  This code detects that scenario, and rewrites the ch to an impossible value in that situation.

Test plan: Wrote tests that failed with the existing code, applied fixes and verified that they now work.
Fixes #1485
ttung pushed a commit that referenced this issue Oct 8, 2019
1. When the entire row is nan, the decoder chokes.  This is remedied by decoding on an array where the nan values are replaced with 0s.
2. When the entire row is of equal intensity, the `np.argmax` arbitrarily picks the first column as the winner.  That erroneously decodes as ch=0 having the max intensity.  This code detects that scenario, and rewrites the ch to an impossible value in that situation.

Test plan: Wrote tests that failed with the existing code, applied fixes and verified that they now work.
Fixes #1485
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants