-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: best way to merge resulting a3m msa DBs #900
Comments
EDIT: I think I screwed something up somewhere as I was able to get mergedbs to work for this use case. |
Reopening because it is REALLY slow, eg
Note that this is but a subset of the files I will eventually be merging. Each final.a3m has ~150k MSAs in it. I recognize that this is a sh*t ton of data, but am not sure why this particular step would take so long - shouldn't it basically be just a concatenate of the a3m files then recomputing the .index? Thanks for wisdom. |
This looks correct, if you have root you can install/use |
Expected Behavior
mmseqs concatdbs ... --preserve-keys
with a3m MSA db inputs should produce a single file that is effectively a bash concat of them.Current Behavior
"Empty msa1! Skipping entry" occurs for MSAs in first file other than first entry
eg.
`>1
MI>6
MLAGLLLAGPALTPMASATPGPLYRNPHASVSSRVDDLLKRMSLDDKVGQMTQAERGAVTPDQAAALKLGSLLSGGGSVPAGNTPNGWADMVDSYQKAAVSTPLGIPTIYGVDAVHGHNNVYGATIFPHNIGLGAANNPRLVEKIGRATALEVAGTGPQWDFSPCLCVARDDRWGRTYESFGESPRDAVANASAITGLQGHGLGEKPGSVLATAKHYVGDGGTTNGVDQGNTEISERELRQIHLPPFREAIDRGVGSVMISFSSFQGVRMHAQKYLITDVLKKELRFSGLVISDYNAINQIDGQEGFTPEEVRLSVNAGIDMFMVPWDAPQFIAYLKAEVEAGRVPTARIDDANRRILAEKFKLGLFEHPYTDRSLQKTFGSKEHRELARQAVRESQVLLKNDGVLPLAKKNNKIFVAGKNANDIGNQAGGWTLTWQGQSGPVIPGTTILDGLKSGAGKGTTVTYDRAGDGIDGSYQVAVAVVGETPYAEGQGDRPNGFGLDAEDLATIAKLKSSGVPVVVVTVSGRPLDIAAQLPQFDGLVAAWLPGSEGAGVADVLYGDYNPTGKLTFSWPASATQEPVNVGDGKKALYPYGFGLRYRR
Is the output.
I have confirmed that the MSAs are present in the input files and have more than one sequence.
Any guidance appreciated.
Ultimately I have hundreds of a3ms to combine... a result of
splitdbs
and running jobs on separate nodes.mergedbs
requires a query input, an output, and then the list of dbs to merge - this is contrived for this use case because each I have many query files, unless I am (likely) misunderstanding.The text was updated successfully, but these errors were encountered: