Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterioration of IlluminaBasecallsToSam performance and memory usage over the past year #1771

Open
1 task
alecw opened this issue Jan 18, 2022 · 6 comments
Open
1 task

Comments

@alecw
Copy link
Contributor

alecw commented Jan 18, 2022

Bug Report

Affected tool(s)

IlluminaBasecallsToSam

Affected version(s)

  • 2.25.0 .. 2.26.2

Description

IlluminaBasecallsToSam has gotten slower and memory use has increased in releases over the past year.

Steps to reproduce

I ran a small job on various versions of picard:
java -Djava.io.tmpdir=/broad/hptmp/alecw/HFNTVAFX3 -XX:+UseParallelOldGC -XX:ParallelGCThreads=1 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx29384m -jar picard.jar IlluminaBasecallsToSam TMP_DIR=/broad/hptmp/alecw/HFNTVAFX3 VALIDATION_STRINGENCY=SILENT BASECALLS_DIR=/broad/mccarroll/dropulation_census/raw_data/2022-01-04/225552339/Data/Intensities/BaseCalls LANE=1 RUN_BARCODE=HFNTVAFX3 NUM_PROCESSORS=1 READ_STRUCTURE=83T83T INCLUDE_NON_PF_READS=false APPLY_EAMSS_FILTER=false MAX_RECORDS_IN_RAM=600000 IGNORE_UNEXPECTED_BARCODES=false SEQUENCING_CENTER=BI OUTPUT=test.bam SAMPLE_ALIAS=xxx LIBRARY_NAME=xxx ADAPTERS_TO_CHECK=null RUN_START_DATE=01/04/2022

Expected behavior

_Successful completion in ~15 minutes.

Actual behavior

  • 2.24.2: succeeds in 15 minutes
  • 2.25.0: succeeds in 31 minutes
  • 2.25.6: runs out of memory
  • 2.26.0: runs out of memory
  • 2.26.10: runs out of memory

Let me know if you'd like me to try other versions, or if there is anything else I can do.

@gbggrant
Copy link
Contributor

gbggrant commented Feb 2, 2022

Thanks @alecw for this summary of the performance degradation. We have also seen some significant performance degradation for some of these tools.

The degradation may be due to some pull requests that have been contributed to Picard. @tfenne @jacarey do you have any input here?

@jacarey
Copy link
Collaborator

jacarey commented Feb 2, 2022

@gbggrant I'm happy to take a look. Would it be possible to share this data with me as a test set?

@gbggrant
Copy link
Contributor

gbggrant commented Feb 2, 2022

@alecw - to you - is it okay to share this data set with @jacarey ?

@alecw
Copy link
Contributor Author

alecw commented Feb 2, 2022

Run folder is 21G. Where should I copy it so that Jay can access?

@alecw
Copy link
Contributor Author

alecw commented Feb 3, 2022

@gbggrant , @jacarey waiting to hear from you

@gbggrant
Copy link
Contributor

@jacarey have you been able to look at the data that was sent along for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants