-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for corrupted / zombie files in ldmx-sw? #1406
Comments
We actually already check for existence of both the file and the event tree in the EventFile constructor. ldmx-sw/Framework/src/Framework/EventFile.cxx Lines 53 to 69 in d74f8cc
Perhaps we include a check for Personally, I really don't like the idea of continuing smoothly past a corrupted file. I think the Framework should end the program if a corrupted file is encountered in order to inform the user of this corruption. Maybe we can get around the failing-on-the-last-file annoyance by making sure to |
Well for the use case I ran into this, this would be not great. I had to ask LK to delete the input file and until they did it, I couldn't move on. (My use case was making a BDT skim, where I have 50 input files per output file) So continuing on the bad file, and reporting that it was bad in the end of the workflow would be my preferred solution. |
This is showing my own ignorance around Rucio and LDCS - is it possible to mark files as bad and re-run a workflow so that bad files are skipped? I'm just speaking from my own (non-Rucio, non-LDCS) experience where I make a file list and just remove the bad files from the list as I encounter them (if I can't personally delete them). In any case, I'm comfortable with incorporating this skipping behavior, but I would like it to be designed with new users in mind as well. i.e. just making sure the error message is very clear and we don't accidentally silence the other error messages (no file or no tree). This might be a case where we add another flag to the constructor about "strictness" 🤔 |
My understanding from @bryngemark is that this cannot be done.
Yes another flag could work, so in default we'd exit if the file is problematic, but if the person doing this knows about the flag they can let the code skip files (the pool for people who use LDCS is anyway pretty limited) |
That's right, there is currently no smooth way out of this with LDCS. What we do is specify an input data set, and a number of files per job, and then A flag sounds like a sensible solution to me, and would be hugely useful when we run harsh skims. For book-keeping, though, it would be great if the number of successfully opened files could be reported somewhere the metadata harvesting for |
Is your feature request related to a problem? Please describe.
This LDMX-Software/LDCS#23 made me think if we should have a built-in check for input files being corrupted / zombie.
Describe the solution you'd like
I'd think here
https://github.com/LDMX-Software/ldmx-sw/blob/trunk/Framework/src/Framework/Process.cxx#L279
is where this should go, but I dont know enough about the
EventFile
class, and a quick look almost suggests that it's like aTFile
wrapper. So this makes me think if we could doThe text was updated successfully, but these errors were encountered: