CUDA error: out of memory #905

bestz123 · 2024-11-26T03:16:33Z

Expected Behavior

Hello, when I run mmseq_gpu on my own data set, this error is reported. Is there any command restriction to prevent it from reporting such an error?

Context

stdout:
search /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB /tmp/tmp4r6v014n -a --alignment-mode 2 --min-aln-len 10 -s 8 -e 0.1 --max-seqs 10000 --gpu 1

MMseqs Version: 562a47f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.1
Seq. id. threshold 0
Min alignment length 10
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 128
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 8
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 10000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 1
Use GPU server 0
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

ungappedprefilter /tmp/tmpzvdqhge1/query_DB /home/inspur/zyz/alphafold3/database/uniprot_all_2021_04_gpu/uni /tmp/tmpzvdqhge1/resultDB --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 0 --prefilter-mode 3 --threads 128 --compressed 0 -v 3

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276
Error: Alignment died

stderr:

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 562a47f
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Statically-compiled
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): AMD 9654, 768G，A100(80G)
Operating system and version: ubuntu 20.04

The text was updated successfully, but these errors were encountered:

milot-mirdita · 2024-11-26T04:57:10Z

What GPU are you using and what size is the target database? Can you list all commands you ran please?

bestz123 · 2024-11-28T05:40:42Z

Hi @milot-mirdita，
The relevant information has been updated. I used A100 (80G) to perform msa on the uniprot_all_2021_04.fa (101GB) dataset of af3.

milot-mirdita · 2024-11-28T06:14:31Z

CUDA error: out of memory : /home/vsts/work/1/s/lib/libmarv/src/cudasw4.cuh, line 1276

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

bestz123 · 2024-11-29T01:01:21Z

This is a very odd place for a crash. How did you compile MMseqs2? Can you try with the precompiled binary instead please:
https://github.com/soedinglab/MMseqs2/releases/download/16-747c6/mmseqs-linux-gpu.tar.gz

Yes, I am using this precompiled version, and this error sometimes does not appear when I re-run it.

milot-mirdita · 2024-11-29T05:59:34Z

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

milot-mirdita · 2024-11-29T06:11:57Z

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

bestz123 · 2024-11-29T09:16:38Z

Are the GPUs already in use by other processes? Can you try to explicitly set CUDA_VISIBLE_DEVICES to the GPU you want to use?

I have checked that the GPU is not occupied by other tasks. I use CUDA_VISIBLE_DEVICES=0

Also can you check if you previously started a MMseqs2 GPU server and didn't clean it up (ps aux | grep gpuserver)?

Before I start the MMseqs2 GPU server, the GPU memory usage is normal.

Could it be related to the length of my alignment sequence? The longer the sequence, the more likely it is to cause CUDA errors. Or could it be related to the mode I set? When I choose --num-iterations 3, this error often occurs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: out of memory #905

CUDA error: out of memory #905

bestz123 commented Nov 26, 2024 •

edited

Loading

milot-mirdita commented Nov 26, 2024

bestz123 commented Nov 28, 2024

milot-mirdita commented Nov 28, 2024

bestz123 commented Nov 29, 2024

milot-mirdita commented Nov 29, 2024

milot-mirdita commented Nov 29, 2024

bestz123 commented Nov 29, 2024 •

edited

Loading

CUDA error: out of memory #905

CUDA error: out of memory #905

Comments

bestz123 commented Nov 26, 2024 • edited Loading

Expected Behavior

Context

Your Environment

milot-mirdita commented Nov 26, 2024

bestz123 commented Nov 28, 2024

milot-mirdita commented Nov 28, 2024

bestz123 commented Nov 29, 2024

milot-mirdita commented Nov 29, 2024

milot-mirdita commented Nov 29, 2024

bestz123 commented Nov 29, 2024 • edited Loading

bestz123 commented Nov 26, 2024 •

edited

Loading

bestz123 commented Nov 29, 2024 •

edited

Loading