Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: out of memory #905

Open
bestz123 opened this issue Nov 26, 2024 · 7 comments
Open

CUDA error: out of memory #905

bestz123 opened this issue Nov 26, 2024 · 7 comments

Comments

@bestz123
Copy link

bestz123 commented Nov 26, 2024

Expected Behavior

Hello, when I run mmseq_gpu on my own data set, this error is reported. Is there any command restriction to prevent it from reporting such an error?

Context

stdout:
search /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB /tmp/tmp4r6v014n -a --alignment-mode 2 --min-aln-len 10 -s 8 -e 0.1 --max-seqs 10000 --gpu 1

MMseqs Version: 562a47f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.1
Seq. id. threshold 0
Min alignment length 10
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 128
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 8
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 10000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 1
Use GPU server 0
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

ungappedprefilter /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 0 --prefilter-mode 3 --threads 128 --compressed 0 -v 3

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276
Error: Alignment died

stderr:

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 562a47f
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Statically-compiled
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): AMD 9654, 768G,A100(80G)
  • Operating system and version: ubuntu 20.04
@milot-mirdita
Copy link
Member

What GPU are you using and what size is the target database? Can you list all commands you ran please?

@bestz123
Copy link
Author

Hi @milot-mirdita
The relevant information has been updated. I used A100 (80G) to perform msa on the uniprot_all_2021_04.fa (101GB) dataset of af3.

@milot-mirdita
Copy link
Member

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

@bestz123
Copy link
Author

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

Yes, I am using this precompiled version, and this error sometimes does not appear when I re-run it.

@milot-mirdita
Copy link
Member

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

@milot-mirdita
Copy link
Member

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

@bestz123
Copy link
Author

bestz123 commented Nov 29, 2024

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

I have checked that the GPU is not occupied by other tasks. I use CUDA_VISIBLE_DEVICES=0

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

Before I start the MMseqs2 GPU server, the GPU memory usage is normal.

Could it be related to the length of my alignment sequence? The longer the sequence, the more likely it is to cause CUDA errors. Or could it be related to the mode I set? When I choose --num-iterations 3, this error often occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants