-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAL alignments and chaining: post processing TOGA outputs #143
Comments
Hi, sorry to hear that the lastz pipe is not working. I asked @kirilenkobm to have a look, as I don't know this code too well. In principle, aligning genomes <1 GB should be no problem. To the questions.
Best |
I am sorry for the delay - a bit overwhelmed with my primary job. Please let me know if it works for you. Theoretically, the flow must be pretty much the same in v1 and v2, but seems like it's not... |
Thank you @MichaelHiller , Your answers really helped me to understand some of my questions. Yes, my every genome is softmasked, and I also did the HAL projection with my Ref genome and then later on I started the chaining with other query genomes. I will also use the make_lastz_chains to make sure the accuracy in my chainings and I will write back to you again. |
Hi @kirilenkobm , Thank you for your response As per your suggestion, I tried the older version of the make chains pipeline, but on my HPC-server somehow this version is not able to install UCSC-tools (which are conda installation). Acquiring packages necessary to run Acquiring axtChainConda available: trying this channel...
In the latest version, I did not face any installation issues. |
Hi @kirilenkobm Greetings, as per your suggestion I tried the [d1/4aa5ff] NOTE: Process Caused by: Command executed: /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/runRepeatFiller.sh /scrfs/storage/vlamba/home/make_lastz_chains-1.0.0/test_out2/TEMP_run.fillChain/jobs/infillChain_158 Command exit status: Command output: Command error: Work dir: Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named /home/vlamba/python3.14/lib/python3.9/site-packages/py_nf/py_nf.py:404: UserWarning: Nextflow pipeline fillChain_targetquery failed! Execute function returns 1. |
Hello Dr. Hiller and @kirilenkobm First I wanted to inform you both that make_lastz worked for me. As @kirilenkobm suggested me to first run makelastz using one chromosome and I tried and it worked for me. And again when I ran makelastz for whole genome I got chain alignment for my genome data. Second I wanted to update you regarding my comparison between the orthology inference of HAL and makelastz alignment for toga: HAL-chaining orthology.tsvSp A:
make_lastz-chaing orthology.tsv:Sp A:
When I compared the gene loss between both tools in makelastz I got comparetively less number; For example for this same Sp using HAL alignment I got 4289 gene loss and using makelastz I got lost gene number 3788. I am going to use makelastz further for my data. Thank you |
Hi Vinita, What I find interesting is that hal-chains have more 1:1 and more 1:0, which is unexpected. We have now also succeeded in running Cactus and will make a few tests for mammals in the next weeks. Thx |
Thank you Dr. Hiller for pointing out orthology differences. Yes, I do have annotations for my 2 query genomes, and I tried to map the I am sorry, I am just at the very basics of bioinformatics, and I don't know how to map these annotations to query genomes or manipulate the such complex data on whole genome level
Apologies for a lot of the queries, but trust me, your suggestions, hints, or links to any discussion will help me take a rigth path to analyse this data further. Best regards |
Hi Vinita, To the questions
|
Thank you very much Dr. Hiller for answering my all queries. Looking forward to see if there is significant differences in orthology between Cactus and make_lastz. |
Hi Dr. @MichaelHiller
I have couple of queries regarding my TOGA outputs. I am working with vertebrates genomes (size ranging from 550Mb to 1Gb, all genomes that I considered are of very good quality genomes, BUSCO > 90%). I wanted to explore gene loss among my selected group of species (PS: attached Phylogenetic fig of my data). Currently I considered only one species as reference, which I marked as Ref in tree. I wanted to see how many genes and what genes are lost in all my query genomes.
For this I used HAL alignment because your pipeline make_lastz_chains is NOT working with test data hillerlab/make_lastz_chains#46 and hillerlab/make_lastz_chains#45. I also raised issue there but I guess because of busy schedule they are not yet resolved.
To convert Hal into chain I used following command:
hal2fasta
and thenfaToToBit
for .2bithalStats
to .bed sequences; for all query genomes to get .pslhalLiftover
to get .psl between ref and targetaxtChain
to get the .chain alignment between each genome.This
hal to chain
process took 3-4 hrs for most of the genomes but for some it took 6-7 hrs.Further I used these .chain files to run the TOGA, because my reference genome is very well annotated and with isoform file I got list of genes which are lost and other category as well in
loss_summ_data.tsv
.I varified with some already published genome that found some genes lost for my focal/query species and I also got these genes as 'lost' in my loss_summ file.
Some questions regarding my outputs:
out of 1000 loss are 453
and rest are "novel gens" in Ref species. Out of these 453 lots of genes are have identical gene descriptions but still have different gene idsI would really appreciate any suggestion from your end.
Thank you very much
The text was updated successfully, but these errors were encountered: