Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SRA experiments with multiple lanes mapped on distinct runs #94

Open
4 tasks
arteymix opened this issue Jul 3, 2024 · 1 comment
Open
4 tasks

Comments

@arteymix
Copy link
Member

arteymix commented Jul 3, 2024

  • check the runinfo to make sure the sample can be combined (i.e. same pairedness, insert size, read length)
  • make sure we don't process the same file twice
  • concatenate output files into a single file for the experiment
  • extract batch information for each run

Example: https://www.ncbi.nlm.nih.gov/sra/?term=SRX19303543

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR23362510,2023-03-04 00:13:53,2023-02-07 15:21:34,33757271,4287173417,0,127,1642,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362510/SRR23362510.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,E1D7908479DB68AC5BF2D02363843723,74BBC66805FB82F94EB3452E14DF9B20
SRR23362511,2023-03-04 00:13:55,2023-02-07 15:07:32,33582785,4265013695,0,127,1645,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362511/SRR23362511.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,F9EBCCF2EE039048F7DF04362D0B9A7B,1EE5D4D071EDB9940102EC1A47C2012E
SRR23362512,2023-03-04 00:13:55,2023-02-07 15:13:12,33586989,4265547603,0,127,1635,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362512/SRR23362512.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,C6294D9108B441B431D0A79E3FD38AB1,F76192753E8D0311E3BD7E2919B72AF1
SRR23362513,2023-03-04 00:13:55,2023-02-07 15:17:01,33427011,4245230397,0,127,1631,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362513/SRR23362513.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,3967C30F1B279DC47ABA8FFBBA9ADF75,D62A8FB4F31C612652A6E4F8B28A56E2
SRR23362514,2023-03-04 00:13:55,2023-02-07 15:24:06,57117283,7253894941,0,127,2899,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362514/SRR23362514.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,2AC482F669A1A6473F9F344E3A2C240F,342BE1E8386C569711B667C03A3D1184
SRR23362515,2023-03-04 00:13:55,2023-02-07 15:29:43,57138159,7256546193,0,127,2932,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362515/SRR23362515.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,27DECC599ED381BF772E39872B0639F7,66155FA1849CF725DA6A2C2F1EB9D81E
SRR23362516,2023-03-04 00:13:55,2023-02-07 15:28:47,57071250,7248048750,0,127,2914,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362516/SRR23362516.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,2C3181C595DAA7686DB8AD836B9FD9E7,539051CB6A33D15E2F4BBC766FFBA7C2
SRR23362517,2023-03-04 00:13:55,2023-02-07 15:31:39,56810433,7214924991,0,127,2891,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-3/SRR023/23362/SRR23362517/SRR23362517.lite.1,SRX19303543,GSM7031205,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP421366,PRJNA932339,,932339,SRS16703502,SAMN33190761,simple,9606,Homo sapiens,GSM7031205,,,,,,,no,,,,,"MOLECULAR NEUROGENETICS, WALLENBERG NEUROSCIENCE CENTER, LUND UNIVERSITY",SRA1586479,,public,FBA981C516A70A4F42107649F7056FC1,89575037DD543FD8F09124BF8E6DCFAF
@arteymix
Copy link
Member Author

arteymix commented Jul 3, 2024

These can generally safely be concatenated (even as gzip!) before being processed further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant