Tadbit Parse Error #345

eng3001 · 2021-03-16T18:48:06Z

Hello, sorry for the troubles.
I am trying to run tadbit parse using the conda installed version of the tool and am unsure how to resolve this error.
Command: tadbit parse -w MAP_DIR --genome assembly.fasta
Error: Exception: ERROR: genome_seq should be given
There is no --genome_seq option shown in the manual or the help page. Is there also an example workflow for the bash command line tools available? Thank you for the help!

The text was updated successfully, but these errors were encountered:

david-castillo · 2021-03-17T07:44:04Z

Hi,

That seems correct. Is assembly.fasta what you used to generate the index?
genome_seq is a parameter in the inner function doing the parse. That means TADbit fails to process your fasta file. It needs to create tmp files, maybe is a problem of disk space?

Here I paste a workflow that we used in the past for one of the courses:

### yeast replica 1
    # map first end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep1 --fastq FASTQs/yeast_1.fastq.dsrc --read 1 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # map other end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep1 --fastq FASTQs/yeast_2.fastq.dsrc --read 2 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # parse mapped reads data into a new BED-like file. Estimated time: 18 min
    tadbit parse -w yeast_rep1 --genome genome/R64-1-1/R64-1-1.fa --compress_input
    # Computes the intersection of the mapping of the two ends, and filter reads
    tadbit filter -w yeast_rep1 --apply 1 2 3 4 5 6 7 8 9 10. Estimated time: 15 min
    # normalize Hi-C data. Estimated time: 1 min
    tadbit normalize -w yeast_rep1 -r 20000 --min_count 10
    
    ### yeast replica 2
    # map first end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep2 --fastq FASTQs/SRR5077821_2.fastq.dsrc --read 1 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # map other end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep2 --fastq FASTQs/SRR5077821_2.fastq.dsrc --read 2 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # parse mapped reads data into a new BED-like file. Estimated time: 18 min
    tadbit parse -w yeast_rep2 --genome genome/R64-1-1/R64-1-1.fa --compress_input
    # Computes the intersection of the mapping of the two ends, and filter reads
    tadbit filter -w yeast_rep2 --apply 1 2 3 4 5 6 7 8 9 10. Estimated time: 15 min
    # normalize Hi-C data. Estimated time: 1 min
    tadbit normalize -w yeast_rep2 -r 20000 --min_count 10
    
    #### merge replicas. Estimated time: 3 min
    tadbit merge -w yeast -w1 yeast_rep1 -w2 yeast_rep2 -r 20000 --norm
    # normalize Hi-C data at diferent resolutions. Estimated time: 1 min
    tadbit normalize -w yeast -r 40000 --min_count 10
    tadbit normalize -w yeast -r 20000 --min_count 10
    # search for TAD and compartments. Estimated time: 20 sec
    tadbit segment yeast -r 20000 -C 8 -j 3
    # bin Hi-C.. Estimated time: 20 sec
    tadbit bin -w yeast --norm norm --plot -r 20000 -c chrIV
    tadbit bin -w yeast --norm raw norm --plot -r 20000
    
    # Chromosomes I, II and III
    # modelling: parameter optimization. Estimated time: 6 min
    tadbit model -w yeast --optimize --beg 0 --end 1360022 --reso 20000 --maxdist 400:600:100 --upfreq=-0.2:0.2:0.1 --lowfreq=-0.4:-0.2:0.1 --nmodels 20 --nkeep 20 -j 6 --cpu 8
    # modelling: model generation. Estimated time: 2 min
    tadbit model -w yeast --model --project 3DAROC --species 'Saccharomyces cerevisiae' --assembly 'R64-1-1' --beg 0 --end 1360022 --reso 20000 --nmodels 200 --nkeep 200 -j 6 --cpu 8
    # modelling: model analysis.  Estimated time: 2 min
    tadbit model --analyze -w yeast --fig_format png -j 8

David

david-castillo · 2021-03-17T09:17:31Z

If fasta file is correct and you are using conda, can you send me your configuration (conda env export > environment. yml)?
David

eng3001 · 2021-03-17T17:19:53Z

The fasta file is correct. Thank you for your time and sending over the workflow! Here is my environment.yml:

name: tadbit
channels:
  - conda-forge
  - bioconda
  - etetoolkit
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.1=h7f98852_1
  - ca-certificates=2020.12.5=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2020.12.5=py37h89c1867_1
  - curl=7.75.0=h979ede3_0
  - cycler=0.10.0=py_2
  - freetype=2.10.4=h0708190_1
  - future=0.18.2=py37h89c1867_3
  - gem3-mapper=3.6.1=h2f06484_8
  - h5py=3.1.0=nompi_py37h1e651dc_100
  - hdf5=1.10.6=nompi_h6a2412b_1114
  - jpeg=9d=h36c2ea0_0
  - kiwisolver=1.3.1=py37h2527ec5_1
  - krb5=1.17.2=h926e7f8_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.35.1=hea4e1c9_2
  - libblas=3.9.0=8_openblas
  - libcblas=3.9.0=8_openblas
  - libcurl=7.75.0=hc4aaa36_0
  - libdeflate=1.0=h14c3975_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.3=h58526e2_2
  - libgcc=7.2.0=h69d50b8_2
  - libgcc-ng=9.3.0=h2828fa1_18
  - libgfortran-ng=9.3.0=hff62375_18
  - libgfortran5=9.3.0=hff62375_18
  - libgomp=9.3.0=h2828fa1_18
  - liblapack=3.9.0=8_openblas
  - libnghttp2=1.43.0=h812cca2_0
  - libopenblas=0.3.12=pthreads_h4812303_1
  - libpng=1.6.37=h21135ba_2
  - libssh2=1.9.0=ha56f1ee_6
  - libstdcxx-ng=9.3.0=h6de172a_18
  - libtiff=4.2.0=hdc55705_0
  - libwebp-base=1.2.0=h7f98852_0
  - lz4-c=1.9.3=h9c3ff4c_0
  - matplotlib-base=3.3.4=py37h0c9df89_0
  - mcl=14.137=h470a237_3
  - ncurses=6.2=h58526e2_4
  - numpy=1.20.1=py37haa41c4c_0
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1j=h7f98852_0
  - perl=5.32.0=h36c2ea0_0
  - pillow=8.1.2=py37h4600e1f_0
  - pip=21.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysam=0.15.3=py37hda2845c_1
  - python=3.7.10=hffdb5ce_100_cpython
  - python-dateutil=2.8.1=py_0
  - python_abi=3.7=1_cp37m
  - readline=8.0=he28a2e2_2
  - samtools=1.7=1
  - scipy=1.6.0=py37h14a347d_0
  - setuptools=49.6.0=py37h89c1867_3
  - six=1.15.0=pyh9f0ad1d_0
  - sqlite=3.34.0=h74cdb3f_0
  - tadbit=1.0.1=py37hfa133b6_0
  - tk=8.6.10=h21135ba_1
  - tornado=6.1=py37h5e8e339_1
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h516909a_1010
  - zstd=1.4.9=ha95c52a_0
prefix: /home/wyatte/.conda/envs/tadbit

eng3001 · 2021-03-17T20:02:40Z

Is this issue stemmed from my use of --mapper hisat2 --index S_obliquus in the tadbit map command?

david-castillo · 2021-03-20T07:54:18Z

Hi,

After the mapping all the produced files should have the same format, so using hisat2 should not affect the parsing.
I tried with your conda environment and had no problem using hisat2 (which I guess you installed outside conda). Another thing you can try is to delete a temporary file generated by TADbit from your fasta. The file should be in the same directory as your fasta and ends by '_genome.TADbit'. If deleted TADbit will generate again, maybe it failed to do it the first time. Upon regeneration check that the file is not empty, that's maybe the cause of the problems.

Regards

David

eng3001 · 2021-03-24T18:42:45Z

Hi David,
Thank you so much for your help! TADbit map created two temp directories that are both empty MAP_DIR_tmp_r1_2a7289b7fa & MAP_DIR_tmp_r2_60fad5d92c. TADbit map also failed to produce a '_genome.TADbit' file in the directory where the fasta file is or any other directory. That might be the issue.

Mapping commands:
/usr/bin/time -v tadbit map -w MAP_DIR --fastq FASTQ_Files/s_obliquus_S3HiC_R1_clean.fastq --mapper hisat2 --index S_obliquus_hisat2_index --read 1 --cpus 7 --renz Sau3AI
/usr/bin/time -v tadbit map -w MAP_DIR --fastq FASTQ_Files/s_obliquus_S3HiC_R2_clean.fastq --mapper hisat2 --index S_obliquus_hisat2_index --read 2 --cpus 7 --renz Sau3AI

MAP_DIR Contents:

01_mapped_r1
01_mapped_r2
process.log
s_obliquus_S3HiC_R1_clean.fastq_Sau3AI_2a7289b7fa.png
s_obliquus_S3HiC_R2_clean.fastq_Sau3AI_60fad5d92c.png
TADbit_and_dependencies_versions.log
trace.db
trace.log

Contents of 01_mapped_r1 in MAP_DIR

s_obliquus_S3HiC_R1_clean_frag_1-end_2a7289b7fa.map
s_obliquus_S3HiC_R1_clean_full_1-end_2a7289b7fa.map

Parse Command:
tadbit parse -w MAP_DIR --genome GENOME_Dir/genome.fasta --compress_input
Error:
Exception: ERROR: genome_seq should be given

pollicipes · 2023-06-09T11:24:20Z

Hi,

Not sure if this is solved, but just for future reference, this gets solved if you add the option --filter_chroms "<_a regular expression with your chromosomes_>" For example, if the name of your chromosomes in fasta file start with "chr" you can do:

--filter_chroms "chr*"

That should do it.

Cheers,
J

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tadbit Parse Error #345

Tadbit Parse Error #345

eng3001 commented Mar 16, 2021 •

edited

Loading

david-castillo commented Mar 17, 2021

david-castillo commented Mar 17, 2021

eng3001 commented Mar 17, 2021

eng3001 commented Mar 17, 2021

david-castillo commented Mar 20, 2021

eng3001 commented Mar 24, 2021

pollicipes commented Jun 9, 2023

Tadbit Parse Error #345

Tadbit Parse Error #345

Comments

eng3001 commented Mar 16, 2021 • edited Loading

david-castillo commented Mar 17, 2021

david-castillo commented Mar 17, 2021

eng3001 commented Mar 17, 2021

eng3001 commented Mar 17, 2021

david-castillo commented Mar 20, 2021

eng3001 commented Mar 24, 2021

pollicipes commented Jun 9, 2023

eng3001 commented Mar 16, 2021 •

edited

Loading