-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve architecture for HUGE shoots #36
Comments
Currently, the transfer lambda deals with shoots. It should deal with packaged-up sub-shoots. Something upstream of it should deal with chopping up shoots into packages. This can be done even before restoration, as we only need the s3 folder list and file size as reported in ObjectSummary. Both of these are still available when the actual data is squirrelled away in a Glacier. |
So the flow should look something like this: flowchart LR
Start -- CP_1234 --> Splitter
Splitter -- CP_1234_001:
a.tif,b.tif --> Restorer
Splitter -- CP_1234_002:
c.tif,d.tif,e.tif --> Restorer
Splitter -- CP_1234_003:
f.tif --> Restorer
Restorer -- CP_1234_001:
a.tif,b.tif --> Transferrer
Restorer -- CP_1234_002:
c.tif,d.tif,e.tif --> Transferrer
Restorer -- CP_1234_003:
f.tif --> Transferrer
|
That would make all the nodes much more predictable. Currently, if a lambda fails because there are too many shoots, it can be retried, because the transferrer will ignore anything that is has already been transferred. This is an inefficient manual process. It also requires the lambda to be scaled to cope with the largest possible shoots. |
Some of the last remaining shoots have been so large they need to be broken into more than just a few bundles - one consists of at least seven bundles. This cannot be handled in a single Lambda execution, regardless of how much memory, time, and threading we make available.
It is also deceptive WRT the amount left to transfer, and also messes with the assumption that Archivematica can cope with (roughly) a certain number of shoots per day.
Assuming it can normally cope with receiving 20 shoots, and somewhere up to 30 is likely to mostly succeed, we would ask for 20 shoots to be transferred.
If we ask it to transfer 20 shoots, mostly with enough photos to make one or two packages, then we are probably looking at 20-30 bundles going to Archivematica, and that will be fine.
If, however, most of them result in three packages and some of them five or even seven, then it's going to result in over 70 reaching the target system, and most of those will fail.
The text was updated successfully, but these errors were encountered: