Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File hangs both apps #92

Closed
AlistairNWard opened this issue Jul 31, 2018 · 11 comments
Closed

File hangs both apps #92

AlistairNWard opened this issue Jul 31, 2018 · 11 comments
Assignees

Comments

@AlistairNWard
Copy link
Member

This file kills everything. Not sure why. Can't even get the inspector open.

https://s3.amazonaws.com/iobio/samples/test_files/Horizon-HD728_Panel1385_Hyb11_HFW2KBBXX.bam

@anderspitman
Copy link
Member

@AlistairNWard on my machine if I wait long enough it does eventually become responsive again, though none of the visualizations show anything. The root problem seems to be that the backend bamReadDepther service is returning a lot more data for this file, and I think we have a processing algorithm with time requirements that increase exponentially with the size of the input. Maybe not O(n^2), but definitely worse than O(n). It appears to be spending all its time in here. My initial inspection makes me think that this loop is the culprit. From what I can tell it shouldn't need to re-loop over keys it's already processed. However, my first hack to bypass that doesn't seem to have fixed the slowdown, so I need to look at it some more. Regardless, even if I find a fix for the slowdown we'll need to be very careful to make sure the results are the same. We should probably extract this algorithm into a testable unit before changing the way it works.

It might be more obvious to @chmille4 what the problem could be.

@AlistairNWard
Copy link
Member Author

We should definitely make sure that the algorithm isn't getting hung up based on the file. We have thrown around the idea of modifying how we handle files of different sizes. For example, #89 deals with handling small files. When files are small enough, there should be no need to sample from the file, rather we can process the whole file. But we'd need something up front to make a decision on what constitutes small enough etc.

@anderspitman
Copy link
Member

anderspitman commented Aug 8, 2018

I think I've identified the attribute of the file that makes it slow, as compared to the demo data. There's a callback invocation for every sequence. It looks like the file here has many more reads than the demo data, even though the readDepther file sizes are similar. Does that sound right?

Anyway, I'll make sure I understand what the read depth stuff is doing and see if I can work out a simpler algorithm.

Feel free to interrupt me if something else is higher priority.

@anderspitman
Copy link
Member

To be more specific, both files are similar in structure for about 90% of the file, corresponding to the first 21 sequences. The demo data ends there, but the hanging file proceeds to fill the last 10% of the file with an additional 3000 sequences, each of which adds a callback invocation.

@anderspitman
Copy link
Member

@AlistairNWard after further investigation, I've identified that all the time is being spent sorting the key names for the dropdown menu on the left. We're using the third party library javascript-natural-sort for the sorting. It does all sorts of crazy stuff internally that we probably don't need. Changing .sort(naturalSort) to .sort() fixes the hanging problem on my system, and still seems to be a reasonable sort order. Do you happen to know whether we really need to use natural sort here?

@AlistairNWard
Copy link
Member Author

Would .sort() yield the chromosome order 1, 10, 11, ..., 2, 20, 21, 22, 3, 4, 5...? If so, this is probably why we do the natural sort so that the chromosomes appear in an order that makes sense to a human. That said, the most common chromosomes will appear in the wheel, so maybe this is less of a concern?

@anderspitman
Copy link
Member

Honestly neither the normal sort or the natural sort really seemed to yield a very good result. For example I noticed chromosomes starting with "ch6" were spread throughout the list, I think because they are different lengths.

@anderspitman
Copy link
Member

Turns out natural sort is pretty close to what we want. Fortunately, using modern browser API's for this is actually pretty fast (see https://stackoverflow.com/a/38641281/943814). It seems to be quick enough using that in place of javascript-natural-sort. Rendering is still taking too long when trying to display all chromosomes. It looks like that's been a problem in the past, since there's a text banner that pops up to inform the user they need to use the dropdown to filter the chromosome they want. I just need to get that working better then I'll create a PR for this.

@AlistairNWard
Copy link
Member Author

This is probably an edge case that we should put aside and deal with after release.

@anderspitman
Copy link
Member

@AlistairNWard are you sure? #103 is pretty much ready to go once I fix the merge conflicts. @chmille4 already took a look at it. That said, it is true that the change is slightly involved and may have broken something unintentionally, so if you want to wait that wouldn't be too much trouble.

@anderspitman
Copy link
Member

Fixed in #103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants