Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculating memory requirements based on bam file size #86

Open
avilella opened this issue Aug 23, 2018 · 7 comments
Open

calculating memory requirements based on bam file size #86

avilella opened this issue Aug 23, 2018 · 7 comments

Comments

@avilella
Copy link

Is it possible to approximately calculate the memory requirements of EPIC based on the size or number of reads of the bam files given as input? This is in line with the efforts to try to give the smallest possible instance to the job that will not run out of memory.

@endrebak
Copy link
Member

endrebak commented Aug 23, 2018

No, I have not thought much about memory usage, just done the obvious things to not make it a memory-hog. But epic is memory-intensive, which is what allows it to create all those nice bigwigs/matrixes in the end.

@endrebak
Copy link
Member

Having two scripts, one to produce the enriched regions and another one (using the original input files and the enriched regions) to produce bigwigs and matrixes would make the first much more memory efficient and even faster. But as I do not see a paper coming out of epic, I do not have the resources to prioritize it :/

@avilella
Copy link
Author

avilella commented Aug 28, 2018 via email

@endrebak
Copy link
Member

It would require some modifications to the current code.

@endrebak
Copy link
Member

Sorry for being unclear. It was almost halfway a note to self. I could create an epic-light that does not preserve any bin-info other than that required to find the enriched regions. But I do not see that coming out anytime soon due to resource constraints.

@endrebak
Copy link
Member

I have realized that in order to get epic accepted as an application note I should reduce the memory requirements greatly. This is not possible for the bigwig/matrix-producing parts, but should be possible when just calling islands. My first priority is getting pyranges out there though, since I see it as a foundational library for genomics/bioinformatics, while epic is just a piece of software for a very specific task. It is in my backlog though.

Unfortunately I cannot guarantee that it will ever happen :/

@endrebak
Copy link
Member

@avilella See SICER2 at https://github.com/endrebak/SICER2

memory_sicer2_vs_sicer_no_bigwig
speed_sicer2_vs_sicer_no_bigwig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants