Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lzma package is missing for cram files which were compress with use_lzma samtools flag #129

Closed
Imoteph opened this issue Oct 27, 2023 · 15 comments

Comments

@Imoteph
Copy link

Imoteph commented Oct 27, 2023

Dear @cmdcolin

We found another bug, if cram´s have been compressed additionally with the lzma. As the decompression is not available.

image

Maybe this is similar to the bzip2 issue, and the library just needs to be added? (https://github.com/LZMA-JS/LZMA-JS)

@Imoteph
Copy link
Author

Imoteph commented Oct 27, 2023

Okay, just saw on the main page, that this is actually known.

Is it possible to get this feature?

@cmdcolin
Copy link
Contributor

are you seeing these"in the wild"? I haven't looked into lzma but it could potentially be added if we can find a way to do so. Bzip2 was kinda of lucky because I had just found a library on npm for it. I see there might be one for lzma too but it would have to be tested

@Imoteph
Copy link
Author

Imoteph commented Oct 27, 2023

Yes, it is not so common but in our case, all our alignments (genomic, single-cell, and transcriptomic) have been used with both flags to reduce the footprint on our limited disk space.
In CRAM at compression level above 8 lzma is enabled automatically.

@cmdcolin
Copy link
Contributor

unfortunately, it looks like lzma-js might not work as a drop in solution. i created this branch to test https://github.com/GMOD/cram-js/tree/lzma_not_working-lzmajs

you can clone repo, checkout that branch, run "yarn; yarn test -t lzma" and see it says "corrupted input". the cram spec says it operates on 'stream format' of xz, not sure how to do that https://samtools.github.io/hts-specs/CRAMv3.pdf

note that lzma-native does potentially work, but native dependencies are generally a big problem to add to the browser so it's a bit of a non-starter https://github.com/GMOD/cram-js/tree/lzma_working_with_lzma-native

@Imoteph
Copy link
Author

Imoteph commented Oct 30, 2023

@cmdcolin thanks for your response and help!

my colleague found this https://github.com/robey/node-xz , maybe it's a better suit regards the streaming requirement?
We are not sure, as neither of us has any experience with JavaScript.

If we use the branch with the lzma-native, how could we configure this so that the respective jbrowseR/jbrowse would pick this up?

@cmdcolin
Copy link
Contributor

I would be surprised if it's even really possible (reasonably possible* nothing is impossible lol) to get it working in jbrowseR. there is no information at https://github.com/addaleax/lzma-native about compiling for the browser, that's part of why i say it's a non-starter. it would at the very least involve spining up web assembly toolchains but that is not documeted in lzma-native so I dunno if it is something that can be done.

@cmdcolin
Copy link
Contributor

node-xz probably has similar challenges, it is using native c++ code and there is no documented way of using in the browser. example thread here https://github.com/robey/node-xz/issues/17

@cmdcolin
Copy link
Contributor

cmdcolin commented Oct 30, 2023

the most realistic way forward is finding a pure js lzma decompressor that works, which could involve debugging the lzma-js situation (welcome to try this if you want to dig into the bits and bytes) or find some other pure js lzma library

@cmdcolin
Copy link
Contributor

I went on a little exploration to see if this could be done with available packages, and found a package xz-decompress. It is actually interesting because it uses native code compiled to wasm, so it is not pure js, but I found it actually worked in a simple test case

I made a sample branch in JBrowseR that uses it if you want to try it out @Imoteph

GMOD/JBrowseR#29

you can use devtools::install_github with that branch (i think something like devtools::install_github('GMOD/JBrowseR',ref='lzma_example')

I will have to do a little more research into this but I thought you might be interested

@Imoteph
Copy link
Author

Imoteph commented Nov 1, 2023

Yes, this looks really good: GMOD/JBrowseR#29 !!

Thanks for your exploration!

@Imoteph
Copy link
Author

Imoteph commented Nov 14, 2023

Just wondering if you consider the found solution sound enough to be merged into the master?

@cmdcolin
Copy link
Contributor

I think it may be able to be merged, I might do a little more testing but I was pretty happy with how it looks so far

@cmdcolin
Copy link
Contributor

linked PR here #130

@cmdcolin
Copy link
Contributor

cmdcolin commented Dec 1, 2023

I published a new version of cram-js as v2.0.0 with lzma. its a major version since it does include some webassembly. It will take some time to get this added to the (step 1) @jbrowse/react-linear-genome-view and then (step 2) JBrowseR from there

@cmdcolin
Copy link
Contributor

cmdcolin commented Dec 21, 2023

@Imoteph this was now fully released in @gmod/cram@v2.0.0, then @jbrowse/react-linear-genome-view@v2.10.0, and then finally bundled into JBrowseR@v0.10.2 which was released to CRAN yesterday (our first JBrowseR release in a long time!)

hope that helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants