Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to append to existing file for DataWriter / ctapipe-process #2663

Open
maxnoe opened this issue Nov 28, 2024 · 2 comments
Open

Comments

@maxnoe
Copy link
Member

maxnoe commented Nov 28, 2024

Please describe the use case that requires this feature.

The current processing on the GRID runs multiple ctapipe-process processes for multiple input files and then merges the resulting small files. This is done because the jobs would other wise be to short to be efficient with the GRID job submission system.

It would be more efficient to not process and then merge but to directly append to a single output file.

Describe the solution you'd like

Add possibility to DataWriter and ctapipe-process to append to an existing outputfile.

Alternatives considered

Create a tool / modify ctapipe-process to directly run on multiple input files.

@kosack
Copy link
Contributor

kosack commented Nov 28, 2024

To be clear: I assume you don't mean simultaneous writing from many jobs to one, as that is what we do on the grid, but rather within one grid job allowing multiple input files.

Really this has nothing to do with the grid processing, just more about allowing multiple EventSources to be chained (like itertools.chain) and have DataWriter correctly write the configuration data when the input changes. Mainly that means just to re-run DataWriter._setup_outputfile() when the obs_id changes (which I think is the minimal way to detect a new EventSource in the stream). DataWriter on it's own doesn't know about input files, only the "event" structure in memory, but it currently assumes that no "header" info changes event-to-event.

@maxnoe
Copy link
Member Author

maxnoe commented Nov 28, 2024

Really this has nothing to do with the grid processing,

Technically no, but this is where the motivation comes from.

DataWriter on it's own doesn't know about input files, only the "event" structure in memory, but it currently assumes that no "header" info changes event-to-event.

Yes, which is why it might be easier to just support running ctapipe-process multiple times, with the same output file and introduce an --append option to not overwrite / require non-existence of the output file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants