Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplesheet (see Sample file format ) or a list of all sample BAM/FASTA(gz)/FASTQ(gz) files (wildcard * accepted). #134

Open
Umair1441 opened this issue Aug 19, 2023 · 15 comments

Comments

@Umair1441
Copy link

Hy
I have 20 GB of data that is store in subdirectories like A1 folder then A1 has two subfolders A1-1 and A1-2 then so on...I want to add all the files as input and I cand understand how ti use sample sheet for that.

thanks

@t-neumann
Copy link
Owner

t-neumann commented Aug 21, 2023

Does wildcard not work? like */*/*fq.gz?

@Umair1441
Copy link
Author

slamdunk all -r hg19.fa -b Hg.bed -o output -rl 100 -ss data/*.fq.gz
I write this ...

@Umair1441
Copy link
Author

Hy .
slamdunk all -r hg19.fa -b Hg.bed -o output -rl 100 -ss data/ *.fq.gz
This command runs for me.
I have 16 files that is 20 GB of data. The slamdunk command is running from the last 24 hours on my server it creates one bam file in 24 hours and then stuck..
Kindly guide me about that.

@t-neumann
Copy link
Owner

Is the process itself also stuck or still running? What does top say, does it still use CPU?

@Umair1441
Copy link
Author

Now I again run the command and top => %CPU -> 1466.

I use 16 threads, and I have 16 .fq files which is 20 GB. Could you guide me how much time it takes to run on all 16 files?

@t-neumann
Copy link
Owner

Hi - do you have 20GB per file or in total?
It shouldnt really run much longer than 1 hour per file, so for sure be done within 24 hours

@Umair1441
Copy link
Author

20 GB total 16 files..

Thank you.

@Umair1441
Copy link
Author

Umair1441 commented Sep 4, 2023

Hellow.
I have 16 fastq files with 64GB of size and I run slamdunk all this on the server with 16 threads.
it is running from the last 13 days and just mapped 14 files till now.
please tell me why it takes so much time for me.

@t-neumann
Copy link
Owner

Hi - that indeed sounds unreasonably slow. What command did you use, what's your memory size and did you make sure that NextGenMap is running with 16 cores (e.g. with top)?

Worst case I can run it myself if you are willing to supply the dataset to me, to check what's going on

@Umair1441
Copy link
Author

Umair1441 commented Sep 5, 2023

WhatsApp Image 2023-09-05 at 11 30 25 AM
Hi, I use the following command.

slamdunk all -r hg19.fa -b Hg.bed -o output -t 16 -rl 100 -ss data/*.fq.gz

The server has total 49 threads and 16 are running while I check from the top -H -p .

The server has a total of 191891 memory.

@t-neumann
Copy link
Owner

OH sorry now I think I see what's going on. it seems to be running with only 1 core per process. What happens if you do -t 256 and then again check with top, how much %CPU is utilized?

@Umair1441
Copy link
Author

So Can you guide me please how can now increase the threads in the running process?

@t-neumann
Copy link
Owner

Yes try slamdunk all -r hg19.fa -b Hg.bed -o output -t 256 -rl 100 -ss data/*.fq.gz

`

@Umair1441
Copy link
Author

Yes I applied the same command on the last file but it is still slow ..
Any other suggestion Can I increase the number of threads to 1000 or higher?

@t-neumann
Copy link
Owner

What does the CPU utilization in top say? You can increase the number of threads, just at some point the communication overhead outweighs the gain in speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants