Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error running test profile with conda #551

Open
AnotherSimon opened this issue Nov 20, 2024 · 11 comments
Open

error running test profile with conda #551

AnotherSimon opened this issue Nov 20, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@AnotherSimon
Copy link

Description of the bug

While setting up taxprofiler on my local machine. I encountered an issue when using conda as the executor. There seems to be a reproducible failure for the process "NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE" stemming from an upstream error in the sample "2613_db3.centrifuge".

This error does not occur when using singularity as the executor on the same machine. Not sure if this is a Nextflow, conda or taxprofiler issue.

Command used and terminal output

nextflow run nf-core/taxprofiler -r 1.2.0 -profile test,conda --outdir ./taxprofiler_test_conda

Relevant files

nextflow.conda.log
nextflow.singularity.log

System information

Nextflow version 24.10.0 build 5928
conda 24.9.2
Hardware: 21 CPU threads, 30 GB RAM
Ubuntu 22.04 LTS
Taxprofiler release 1.2.0

@AnotherSimon AnotherSimon added the bug Something isn't working label Nov 20, 2024
@jfy133
Copy link
Member

jfy133 commented Nov 21, 2024

The corresponding error

Nov-20 11:53:58.454 [TaskFinalizer-2] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE (centrifuge|db3)'

Caused by:
  Process `NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE (centrifuge|db3)` terminated with an error exit status (1)


Command executed:

  taxpasta merge \
      --profiler centrifuge \
      --output centrifuge_db3.tsv \
       \
       \
       \
      2613_db3.centrifuge.txt 2612_db3.centrifuge.txt
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE":
      taxpasta: $(taxpasta --version)
  END_VERSIONS

Command exit status:
  1

Command output:
  [11:53:57] CRITICAL Error in sample '2613_db3.centrifuge' with profile '2613_db3.centrifuge.txt'.                                                                                               merge.py:419
             CRITICAL   schema_context column         check check_number failure_case  index                                                                                                      merge.py:424
                      0         Column   name  not_nullable         None          NaN      1                                                                                                                  

Command error:
  [11:53:57] CRITICAL Error in sample '2613_db3.centrifuge' with profile '2613_db3.centrifuge.txt'.                                                                                               merge.py:419
             CRITICAL   schema_context column         check check_number failure_case  index                                                                                                      merge.py:424
                      0         Column   name  not_nullable         None          NaN      1                                                                                                                  

Work dir:
  /home/simon/Documents/work/0a/975f9fc06ff33f526fa0cc153fb477

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I also see other errors such as:

Nov-20 11:53:23.754 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:KRAKENTOOLS_COMBINEKREPORTS_CENTRIFUGE (1); work-dir=/home/simon/Documents/work/e5/a5db9df68893383c4711a60b64b39d
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:KRAKENTOOLS_COMBINEKREPORTS_CENTRIFUGE (1)` terminated with an error exit status (1)

Could you maybe go into

cd  /home/simon/Documents/work/c4/978646<...autocomplete>`

and send the contents of .command.log? I have a feeling centrifuge failed for some reason producing an empty file and wasn't picked up for some reason

@jfy133
Copy link
Member

jfy133 commented Nov 21, 2024

OK actually I was able to replicate the conda error, I'm going to look into this now :)

The two cnetrifuge report files that get generated are:

$ cat *
 99.49  1801191 1801191 U       0       unclassified
  0.51  9313    9313    -       1
 99.90  434053  434053  U       0       unclassified
  0.10  440     440     -       1

So I guess this might be a taxprofiler bug - @Midnighter , this is because is no taxon name there, or?

Uhh strange, tax id 9313 is 😅

Image

@jfy133
Copy link
Member

jfy133 commented Nov 21, 2024

Docker produces:

 cat *.txt
 99.49  1801191 1801191 U       0       unclassified
  0.51  9313    0       -       1       root
  0.51  9313    0       -       131567    cellular organisms
  0.51  9313    0       D       2759        Eukaryota
  0.50  8972    0       K       33090         Viridiplantae
  0.50  8972    0       P       35493           Streptophyta
  0.50  8972    0       -       131221            Streptophytina
  0.50  8972    0       -       3193                Embryophyta
  0.50  8972    0       -       58023                 Tracheophyta
  0.50  8972    0       -       78536                   Euphyllophyta
  0.50  8972    0       -       58024                     Spermatophyta
  0.50  8972    0       C       3398                        Magnoliopsida
  0.50  8972    0       -       1437183                       Mesangiospermae
  0.50  8972    0       -       71240                           eudicotyledons
  0.50  8972    0       -       91827                             Gunneridae
  0.50  8972    0       -       1437201                             Pentapetalae
  0.50  8972    0       -       71274                                 asterids
  0.50  8972    0       -       91888                                   lamiids
  0.50  8972    0       O       91889                                     Garryales
  0.50  8972    0       F       4390                                        Eucommiaceae
  0.50  8972    0       G       4391                                          Eucommia
  0.50  8972    8972    S       4392                                            Eucommia ulmoides
  0.02  341     0       -       33154         Opisthokonta
  0.02  341     0       K       33208           Metazoa
  0.02  341     0       -       6072              Eumetazoa
  0.02  341     0       -       33213               Bilateria
  0.02  341     0       -       33511                 Deuterostomia
  0.02  341     0       P       7711                    Chordata
  0.02  341     0       -       89593                     Craniata
  0.02  341     0       -       7742                        Vertebrata
  0.02  341     0       -       7776                          Gnathostomata
  0.02  341     0       -       117570                          Teleostomi
  0.02  341     0       -       117571                            Euteleostomi
  0.02  341     0       -       8287                                Sarcopterygii
  0.02  341     0       -       1338369                               Dipnotetrapodomorpha
  0.02  341     0       -       32523                                   Tetrapoda
  0.02  341     0       -       32524                                     Amniota
  0.02  341     0       C       40674                                       Mammalia
  0.02  341     0       -       32525                                         Theria
  0.02  341     0       -       9347                                            Eutheria
  0.02  341     0       -       1437010                                           Boreoeutheria
  0.02  341     0       -       314146                                              Euarchontoglires
  0.02  341     0       O       9443                                                  Primates
  0.02  341     0       -       376913                                                  Haplorrhini
  0.02  341     0       -       314293                                                    Simiiformes
  0.02  341     0       -       9526                                                        Catarrhini
  0.02  341     0       -       314295                                                        Hominoidea
  0.02  341     0       F       9604                                                            Hominidae
  0.02  341     0       -       207598                                                            Homininae
  0.02  341     0       G       9605                                                                Homo
  0.02  341     341     S       9606                                                                  Homo sapiens
 99.90  434053  434053  U       0       unclassified
  0.10  440     0       -       1       root
  0.10  440     0       -       131567    cellular organisms
  0.10  440     0       D       2759        Eukaryota
  0.08  364     0       K       33090         Viridiplantae
  0.08  364     0       P       35493           Streptophyta
  0.08  364     0       -       131221            Streptophytina
  0.08  364     0       -       3193                Embryophyta
  0.08  364     0       -       58023                 Tracheophyta
  0.08  364     0       -       78536                   Euphyllophyta
  0.08  364     0       -       58024                     Spermatophyta
  0.08  364     0       C       3398                        Magnoliopsida
  0.08  364     0       -       1437183                       Mesangiospermae
  0.08  364     0       -       71240                           eudicotyledons
  0.08  364     0       -       91827                             Gunneridae
  0.08  364     0       -       1437201                             Pentapetalae
  0.08  364     0       -       71274                                 asterids
  0.08  364     0       -       91888                                   lamiids
  0.08  364     0       O       91889                                     Garryales
  0.08  364     0       F       4390                                        Eucommiaceae
  0.08  364     0       G       4391                                          Eucommia
  0.08  364     364     S       4392                                            Eucommia ulmoides
  0.02  76      0       -       33154         Opisthokonta
  0.02  76      0       K       33208           Metazoa
  0.02  76      0       -       6072              Eumetazoa
  0.02  76      0       -       33213               Bilateria
  0.02  76      0       -       33511                 Deuterostomia
  0.02  76      0       P       7711                    Chordata
  0.02  76      0       -       89593                     Craniata
  0.02  76      0       -       7742                        Vertebrata
  0.02  76      0       -       7776                          Gnathostomata
  0.02  76      0       -       117570                          Teleostomi
  0.02  76      0       -       117571                            Euteleostomi
  0.02  76      0       -       8287                                Sarcopterygii
  0.02  76      0       -       1338369                               Dipnotetrapodomorpha
  0.02  76      0       -       32523                                   Tetrapoda
  0.02  76      0       -       32524                                     Amniota
  0.02  76      0       C       40674                                       Mammalia
  0.02  76      0       -       32525                                         Theria
  0.02  76      0       -       9347                                            Eutheria
  0.02  76      0       -       1437010                                           Boreoeutheria
  0.02  76      0       -       314146                                              Euarchontoglires
  0.02  76      0       O       9443                                                  Primates
  0.02  76      0       -       376913                                                  Haplorrhini
  0.02  76      0       -       314293                                                    Simiiformes
  0.02  76      0       -       9526                                                        Catarrhini
  0.02  76      0       -       314295                                                        Hominoidea
  0.02  76      0       F       9604                                                            Hominidae
  0.02  76      0       -       207598                                                            Homininae
  0.02  76      0       G       9605                                                                Homo
  0.02  76      76      S       9606                                                                  Homo sapiens

With upstream step having the following log:

$ cat .command.sh 
#!/bin/bash -euo pipefail
db_name=`find -L test-db-centrifuge -name "*.1.cf" -not -name "._*"  | sed 's/\.1.cf$//'`
centrifuge-kreport -x $db_name 2612_db3.centrifuge.results.txt > 2612_db3.centrifuge.txt

cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_KREPORT":
    centrifuge: $( centrifuge --version  | sed -n 1p | sed 's/^.*centrifuge-class version //')
END_VERSIONS
gitpod /workspace/taxprofiler/testing/work/e2/f913bcb9291d065de0883c86f03299 (master) $ cat .command.log 
Loading taxonomy ...
Loading names file ...
Loading nodes file ...

@jfy133
Copy link
Member

jfy133 commented Nov 21, 2024

I've ran out of time today unfortuantely,

But I need to run the two test profiles and compare what the centrifuge process itself reports.

@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

OK back at investigating this now!

@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

So the output from CENTRIFUGE_CENTRIFUGE appears to be identical

Top (c7) is conda, (a6) is docker

(nf-core) james@bionb103:~/git/nf-core/taxprofiler/testing/work/a6/25e9b2169e9af5a9a4e0b6deeac240 (dev)$ cat 2613_ERR5766181_db3.centrifuge.report.txt 
name    taxID   taxRank genomeSize      numReads        numUniqueReads  abundance
Eucommia ulmoides       4392    species 12157105        78966   38245   0.0
Homo sapiens    9606    species 16569   81      81      1
(nf-core) james@bionb103:~/git/nf-core/taxprofiler/testing/work/c7/98ae6a407d84527c1d100b7991ca99 (dev)$ cat 2613_ERR5766181_db3.centrifuge.report.txt 
name    taxID   taxRank genomeSize      numReads        numUniqueReads  abundance
Eucommia ulmoides       4392    species 12157105        78966   38245   0.0
Homo sapiens    9606    species 16569   81      81      1

The issue is in CENTRIGUE_KREPORT

e5 is docker (top)

(nf-core) james@bionb103:~/git/nf-core/taxprofiler/testing/work/e5/8b9cb14e6de6158901a6b89028a086 (dev)$ cat .command.log 
Loading taxonomy ...
Loading names file ...
Loading nodes file ...
(nf-core) james@bionb103:~/git/nf-core/taxprofiler/testing/work/a6/79e32090f2b3a18451615afe13d30d (dev)$ head .command.log 
Loading taxonomy ...
Loading names file ...
Traceback (most recent call last):
  File "/home/james/cache/conda/env-8778d98cc9a2cc48-6e91f0d1f67aabb2a2b8a0425835d8e6/bin/centrifuge-inspect", line 24, in <module>
    import imp
ModuleNotFoundError: No module named 'imp'
Loading nodes file ...
Traceback (most recent call last):
  File "/home/james/cache/conda/env-8778d98cc9a2cc48-6e91f0d1f67aabb2a2b8a0425835d8e6/bin/centrifuge-inspect", line 24, in <module>
    import imp
Couldn't find parent of taxID 4392 - directly assigned to root.
<AND MANY MORE LINES OF ERROR>

@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

OK, I think the conda recipe is slightly broken:

https://docs.python.org/3.11/library/imp.html

imp was deprecated in 3.12, but the centrifuge conda environment is using python 3.13

This needs a fix to the conda recipe!

@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

Which infact has already been solved with an new version of centrfigue!

https://github.com/DaehwanKimLab/centrifuge/releases/tag/v1.0.4.2

So just need to update the nf-core module :)

@jfy133 jfy133 self-assigned this Dec 12, 2024
@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

In the meantime, if you're in a rush (although given you've already patiently 3 weeks, so I assume it's not pressing)

I think if you make a custom config, you should be able to get it to work. The contents of the custom.config should look something like this:

process {
    withName: CENTRIFUGE_CENTRIFUGE {
        conda "bioconda::centrifuge=1.0.4.2"
    }
    withName: CENTRIFUGE_KREPORT {
        conda "bioconda::centrifuge=1.0.4.2"
    }
}

And run nextflow run nf-core/taxprofiler -r 1.2.0 -profile test,conda --outdir ./taxprofiler_test_conda -c custom.config

It should work

@jfy133
Copy link
Member

jfy133 commented Dec 12, 2024

nf-core/modules#7205

@AnotherSimon
Copy link
Author

No indeed, if it was a really pressing issue i would have bene more active here ;)
Thanks for tracking it down though! It will prove valuable (to me at least) come January. Was testing the pipeline in preparation of an anticipated project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants