You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please describe the use case that requires this feature.
The current processing on the GRID runs multiple ctapipe-process processes for multiple input files and then merges the resulting small files. This is done because the jobs would other wise be to short to be efficient with the GRID job submission system.
It would be more efficient to not process and then merge but to directly append to a single output file.
Describe the solution you'd like
Add possibility to DataWriter and ctapipe-process to append to an existing outputfile.
Alternatives considered
Create a tool / modify ctapipe-process to directly run on multiple input files.
The text was updated successfully, but these errors were encountered:
To be clear: I assume you don't mean simultaneous writing from many jobs to one, as that is what we do on the grid, but rather within one grid job allowing multiple input files.
Really this has nothing to do with the grid processing, just more about allowing multiple EventSources to be chained (like itertools.chain) and have DataWriter correctly write the configuration data when the input changes. Mainly that means just to re-run DataWriter._setup_outputfile() when the obs_id changes (which I think is the minimal way to detect a new EventSource in the stream). DataWriter on it's own doesn't know about input files, only the "event" structure in memory, but it currently assumes that no "header" info changes event-to-event.
Really this has nothing to do with the grid processing,
Technically no, but this is where the motivation comes from.
DataWriter on it's own doesn't know about input files, only the "event" structure in memory, but it currently assumes that no "header" info changes event-to-event.
Yes, which is why it might be easier to just support running ctapipe-process multiple times, with the same output file and introduce an --append option to not overwrite / require non-existence of the output file
Please describe the use case that requires this feature.
The current processing on the GRID runs multiple ctapipe-process processes for multiple input files and then merges the resulting small files. This is done because the jobs would other wise be to short to be efficient with the GRID job submission system.
It would be more efficient to not process and then merge but to directly append to a single output file.
Describe the solution you'd like
Add possibility to
DataWriter
andctapipe-process
to append to an existing outputfile.Alternatives considered
Create a tool / modify
ctapipe-process
to directly run on multiple input files.The text was updated successfully, but these errors were encountered: