Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tadbit Parse Error #345

Open
eng3001 opened this issue Mar 16, 2021 · 7 comments
Open

Tadbit Parse Error #345

eng3001 opened this issue Mar 16, 2021 · 7 comments

Comments

@eng3001
Copy link

eng3001 commented Mar 16, 2021

Hello, sorry for the troubles.
I am trying to run tadbit parse using the conda installed version of the tool and am unsure how to resolve this error.
Command: tadbit parse -w MAP_DIR --genome assembly.fasta
Error: Exception: ERROR: genome_seq should be given
There is no --genome_seq option shown in the manual or the help page. Is there also an example workflow for the bash command line tools available? Thank you for the help!

@david-castillo
Copy link
Contributor

Hi,

That seems correct. Is assembly.fasta what you used to generate the index?
genome_seq is a parameter in the inner function doing the parse. That means TADbit fails to process your fasta file. It needs to create tmp files, maybe is a problem of disk space?

Here I paste a workflow that we used in the past for one of the courses:

### yeast replica 1
    # map first end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep1 --fastq FASTQs/yeast_1.fastq.dsrc --read 1 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # map other end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep1 --fastq FASTQs/yeast_2.fastq.dsrc --read 2 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # parse mapped reads data into a new BED-like file. Estimated time: 18 min
    tadbit parse -w yeast_rep1 --genome genome/R64-1-1/R64-1-1.fa --compress_input
    # Computes the intersection of the mapping of the two ends, and filter reads
    tadbit filter -w yeast_rep1 --apply 1 2 3 4 5 6 7 8 9 10. Estimated time: 15 min
    # normalize Hi-C data. Estimated time: 1 min
    tadbit normalize -w yeast_rep1 -r 20000 --min_count 10
    
    ### yeast replica 2
    # map first end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep2 --fastq FASTQs/SRR5077821_2.fastq.dsrc --read 1 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # map other end of the read to yeast reference genome (fragment based mapping). Estimated time: 7.5 min
    tadbit map -w yeast_rep2 --fastq FASTQs/SRR5077821_2.fastq.dsrc --read 2 --index genome/R64-1-1/GEM/R64-1-1.gem --renz DpnII -C 8
    # parse mapped reads data into a new BED-like file. Estimated time: 18 min
    tadbit parse -w yeast_rep2 --genome genome/R64-1-1/R64-1-1.fa --compress_input
    # Computes the intersection of the mapping of the two ends, and filter reads
    tadbit filter -w yeast_rep2 --apply 1 2 3 4 5 6 7 8 9 10. Estimated time: 15 min
    # normalize Hi-C data. Estimated time: 1 min
    tadbit normalize -w yeast_rep2 -r 20000 --min_count 10
    
    #### merge replicas. Estimated time: 3 min
    tadbit merge -w yeast -w1 yeast_rep1 -w2 yeast_rep2 -r 20000 --norm
    # normalize Hi-C data at diferent resolutions. Estimated time: 1 min
    tadbit normalize -w yeast -r 40000 --min_count 10
    tadbit normalize -w yeast -r 20000 --min_count 10
    # search for TAD and compartments. Estimated time: 20 sec
    tadbit segment yeast -r 20000 -C 8 -j 3
    # bin Hi-C.. Estimated time: 20 sec
    tadbit bin -w yeast --norm norm --plot -r 20000 -c chrIV
    tadbit bin -w yeast --norm raw norm --plot -r 20000
    
    # Chromosomes I, II and III
    # modelling: parameter optimization. Estimated time: 6 min
    tadbit model -w yeast --optimize --beg 0 --end 1360022 --reso 20000 --maxdist 400:600:100 --upfreq=-0.2:0.2:0.1 --lowfreq=-0.4:-0.2:0.1 --nmodels 20 --nkeep 20 -j 6 --cpu 8
    # modelling: model generation. Estimated time: 2 min
    tadbit model -w yeast --model --project 3DAROC --species 'Saccharomyces cerevisiae' --assembly 'R64-1-1' --beg 0 --end 1360022 --reso 20000 --nmodels 200 --nkeep 200 -j 6 --cpu 8
    # modelling: model analysis.  Estimated time: 2 min
    tadbit model --analyze -w yeast --fig_format png -j 8

David

@david-castillo
Copy link
Contributor

If fasta file is correct and you are using conda, can you send me your configuration (conda env export > environment. yml)?
David

@eng3001
Copy link
Author

eng3001 commented Mar 17, 2021

The fasta file is correct. Thank you for your time and sending over the workflow! Here is my environment.yml:

name: tadbit
channels:
  - conda-forge
  - bioconda
  - etetoolkit
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.1=h7f98852_1
  - ca-certificates=2020.12.5=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2020.12.5=py37h89c1867_1
  - curl=7.75.0=h979ede3_0
  - cycler=0.10.0=py_2
  - freetype=2.10.4=h0708190_1
  - future=0.18.2=py37h89c1867_3
  - gem3-mapper=3.6.1=h2f06484_8
  - h5py=3.1.0=nompi_py37h1e651dc_100
  - hdf5=1.10.6=nompi_h6a2412b_1114
  - jpeg=9d=h36c2ea0_0
  - kiwisolver=1.3.1=py37h2527ec5_1
  - krb5=1.17.2=h926e7f8_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.35.1=hea4e1c9_2
  - libblas=3.9.0=8_openblas
  - libcblas=3.9.0=8_openblas
  - libcurl=7.75.0=hc4aaa36_0
  - libdeflate=1.0=h14c3975_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.3=h58526e2_2
  - libgcc=7.2.0=h69d50b8_2
  - libgcc-ng=9.3.0=h2828fa1_18
  - libgfortran-ng=9.3.0=hff62375_18
  - libgfortran5=9.3.0=hff62375_18
  - libgomp=9.3.0=h2828fa1_18
  - liblapack=3.9.0=8_openblas
  - libnghttp2=1.43.0=h812cca2_0
  - libopenblas=0.3.12=pthreads_h4812303_1
  - libpng=1.6.37=h21135ba_2
  - libssh2=1.9.0=ha56f1ee_6
  - libstdcxx-ng=9.3.0=h6de172a_18
  - libtiff=4.2.0=hdc55705_0
  - libwebp-base=1.2.0=h7f98852_0
  - lz4-c=1.9.3=h9c3ff4c_0
  - matplotlib-base=3.3.4=py37h0c9df89_0
  - mcl=14.137=h470a237_3
  - ncurses=6.2=h58526e2_4
  - numpy=1.20.1=py37haa41c4c_0
  - olefile=0.46=pyh9f0ad1d_1
  - openssl=1.1.1j=h7f98852_0
  - perl=5.32.0=h36c2ea0_0
  - pillow=8.1.2=py37h4600e1f_0
  - pip=21.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pysam=0.15.3=py37hda2845c_1
  - python=3.7.10=hffdb5ce_100_cpython
  - python-dateutil=2.8.1=py_0
  - python_abi=3.7=1_cp37m
  - readline=8.0=he28a2e2_2
  - samtools=1.7=1
  - scipy=1.6.0=py37h14a347d_0
  - setuptools=49.6.0=py37h89c1867_3
  - six=1.15.0=pyh9f0ad1d_0
  - sqlite=3.34.0=h74cdb3f_0
  - tadbit=1.0.1=py37hfa133b6_0
  - tk=8.6.10=h21135ba_1
  - tornado=6.1=py37h5e8e339_1
  - wheel=0.36.2=pyhd3deb0d_0
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h516909a_1010
  - zstd=1.4.9=ha95c52a_0
prefix: /home/wyatte/.conda/envs/tadbit

@eng3001
Copy link
Author

eng3001 commented Mar 17, 2021

Is this issue stemmed from my use of --mapper hisat2 --index S_obliquus in the tadbit map command?

@david-castillo
Copy link
Contributor

Hi,

After the mapping all the produced files should have the same format, so using hisat2 should not affect the parsing.
I tried with your conda environment and had no problem using hisat2 (which I guess you installed outside conda). Another thing you can try is to delete a temporary file generated by TADbit from your fasta. The file should be in the same directory as your fasta and ends by '_genome.TADbit'. If deleted TADbit will generate again, maybe it failed to do it the first time. Upon regeneration check that the file is not empty, that's maybe the cause of the problems.

Regards

David

@eng3001
Copy link
Author

eng3001 commented Mar 24, 2021

Hi David,
Thank you so much for your help! TADbit map created two temp directories that are both empty MAP_DIR_tmp_r1_2a7289b7fa & MAP_DIR_tmp_r2_60fad5d92c. TADbit map also failed to produce a '_genome.TADbit' file in the directory where the fasta file is or any other directory. That might be the issue.

Mapping commands:
/usr/bin/time -v tadbit map -w MAP_DIR --fastq FASTQ_Files/s_obliquus_S3HiC_R1_clean.fastq --mapper hisat2 --index S_obliquus_hisat2_index --read 1 --cpus 7 --renz Sau3AI
/usr/bin/time -v tadbit map -w MAP_DIR --fastq FASTQ_Files/s_obliquus_S3HiC_R2_clean.fastq --mapper hisat2 --index S_obliquus_hisat2_index --read 2 --cpus 7 --renz Sau3AI

MAP_DIR Contents:

01_mapped_r1
01_mapped_r2
process.log
s_obliquus_S3HiC_R1_clean.fastq_Sau3AI_2a7289b7fa.png
s_obliquus_S3HiC_R2_clean.fastq_Sau3AI_60fad5d92c.png
TADbit_and_dependencies_versions.log
trace.db
trace.log

Contents of 01_mapped_r1 in MAP_DIR

s_obliquus_S3HiC_R1_clean_frag_1-end_2a7289b7fa.map
s_obliquus_S3HiC_R1_clean_full_1-end_2a7289b7fa.map

Parse Command:
tadbit parse -w MAP_DIR --genome GENOME_Dir/genome.fasta --compress_input
Error:
Exception: ERROR: genome_seq should be given

@pollicipes
Copy link

Hi,

Not sure if this is solved, but just for future reference, this gets solved if you add the option --filter_chroms "<_a regular expression with your chromosomes_>" For example, if the name of your chromosomes in fasta file start with "chr" you can do:

--filter_chroms "chr*"

That should do it.

Cheers,
J

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants