Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Print output instead of writing to file #33

Closed
tmaklin opened this issue Aug 2, 2024 · 7 comments
Closed

Feature request: Print output instead of writing to file #33

tmaklin opened this issue Aug 2, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@tmaklin
Copy link
Contributor

tmaklin commented Aug 2, 2024

Hi, would it be possible to support printing the output to stdout or stderr (preferably stdout so it can be piped easily) instead of requiring writes to a file? This would enable processing the results immediately which is particularly helpful if the plaintext output is very large.

At least on Linux it's possible to somewhat work around this by supplying /dev/stdout or /dev/stderr as the argument to -o but Fulgor currently prints status messages to both std::cerr and std::cout which then end up in the output, so it's not ideal. Would be nice if this could be supported directly from Fulgor eg. by printing when the -o argument is not supplied (and using only std::cerr for status messages), or by adding some kind of --quiet flag that silences the status messages.

I quickly put together something along these lines in https://github.com/tmaklin/fulgor but my implementation needs C++20 and requires changing the logger nested inside sshash -> pthash -> essentials so I didn't want to turn it into a pull request.

@rob-p
Copy link
Collaborator

rob-p commented Aug 2, 2024

Hi @tmaklin,

Thanks for this suggestion. We also have other uses for a --quiet mode in piscem and so I think this is something that's on the short list of features.

I think it makes sense to allow writing to stdout, but I'd also like to take this opportunity to re-raise the idea of potentially implementing a binary output format based on the RAD container specification. I'm calling it a "container" specification because the idea is that there are very few requirements or restrictions for something to be considered a RAD format file, and it's designed so that different tools taking advantage of it can customize their output records in different ways. Regardless, I think what would make a lot of sense for the output here is something like an alignment format that just specifies the subset of references (present in the header, and referenced by index) that are "compatible" with the query. Alternatively or additionally, one could also add score information to each alignment, but that may not be necessary. Anyway, perhaps this is an orthogonal feature request, but it relates to the I/O format used by fulgor and so may be relevant here.

I'm also cc'ing @jamshed who has a prototype C++ library for multithreaded writing in the RAD format from C++ (though I believe that repo is still private).

--Rob

@tmaklin
Copy link
Contributor Author

tmaklin commented Aug 2, 2024

Thanks! I actually did some work related to the file format + compressor in June and ended up with something that seems to resemble the the RAD format very closely. Mine's currently more for command line use rather than as a library but it can also do conversions to/from themisto, fulgor, metagraph, bifrost and .sam files from data written in the format.

I'll read the specification more carefully and continue the discussion here next week.

@jermp jermp added the enhancement New feature or request label Aug 6, 2024
@jermp
Copy link
Owner

jermp commented Aug 6, 2024

Hi @tmaklin,
thank you for your feature request. It makes perfect sense to allow for this functionality and I'll implement it as soon as I get back from holidays :)
As @rob-p mentioned, we could eventually think of agreeing on some pseudo-alignment output format. Also, as discussed with Jarno Alanko, we could also put together a Bioinformatics application note to try to standardize input formats for colored (compacted) dBGs (we discussed that here #28). Ideally, the output of ./fulgor dump should already offer something to start with (in textual format).

To sum up, standardize input/ouput formats for colored compacted dBGs would be a nice collaborative effort. Let me know if this plan sounds good to you!

Best,
-Giulio

@jermp
Copy link
Owner

jermp commented Aug 7, 2024

It's a rainy day here, so I introduced the requested quiet mode for pseudoalignment ( see a651de0).

Now we can do something like:

./fulgor pseudoalign -i ../test_data/index.fur -q ~/Downloads/queries.fastq -o /dev/stdout -t 4 --quiet 2> errors.log

where we specified /dev/stdout to directly print to stdout the result of pseudoalignment and pass the --quiet option to silence status messages. Possibile warnings are printed to std::cerr, hence captured by the 2> errors.log redirection.
(I did not have to touch the essentials dependency.)

Please @tmaklin, let me know if this works well for you.

Best,
-Giulio

@jermp
Copy link
Owner

jermp commented Aug 14, 2024

Closing this for now.

@jermp jermp closed this as completed Aug 14, 2024
@tmaklin
Copy link
Contributor Author

tmaklin commented Aug 19, 2024

thanks, this works for my case! Implemented like that there's no need to change the essentials dependency :)

@jermp
Copy link
Owner

jermp commented Aug 19, 2024

Yup! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants