-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End to end scheduling #32
Conversation
|
||
def post_messages(session, shoot_numbers): | ||
sns = session.resource("sns") | ||
topic = sns.Topic(f"arn:aws:sns:eu-west-1:760097843905:restore_shoots-production") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in making the env configurable here, if only for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do, but there is no staging transfer throttle, which is the only way in which this matters.
If we want to test things going to staging we can do that in steps using restore.py and start_transfers.py locally.
* shifts them onto the transfer queue | ||
|
||
The transferrer then transfers everything on its queue | ||
```mermaid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
* Notifies the transfer throttle queue. | ||
|
||
Restoration takes a nondeterministic amount of time up to 12 hours | ||
```mermaid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! 👍
@@ -14,7 +14,7 @@ module "input_queue" { | |||
|
|||
queue_name = "${var.action_name}-${var.environment}" | |||
|
|||
topic_arns = [module.notification_topic.arn] | |||
topic_arns = concat(var.extra_topics, [module.notification_topic.arn]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super clear why there's a "notification_topic" and also "extra_topics"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This notification queue module creates an SNS/SQS pair, so the SQS is fed by the SNS (notification topic).
The Restoration->Transfer transition requires something to happen on one account and result in queue messages on the other.
It seemed to be easier and clearer (as well as the Right Thing to Do, semantically) for the source to notify its own topic, and SQS to listen to that across the account boundary, rather than for the source to notify across the account boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty file
module "transfer_scheduler" { | ||
source = "../lambda_scheduler" | ||
cron = "cron(30 7,9,11,13,15,16 ? * MON-FRI *)" | ||
description = "Restore a batch of shoots in the evening so they are ready to be transferred in the morning" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Moves batches of shoots to the transferrer at a rate Archivematica can handle"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops CTRL-C CTRL-V
@@ -0,0 +1,7 @@ | |||
terraform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the provider can be declared once in the top-level provider.tf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because this whole TF operates over two accounts. This allows us to have both accounts at the top level and pass the right one down into each module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nit-picky comments and questions but LGTM 👍 Nice piece of automated optimisation!
I forgot: can you add something in a prominent place to explain how to turn the scheduling on and off again once everything has been transferred? I assume we don't want it to run all year round |
What does this change?
Resolves #27
You can now kick off an end to end restore and transfer by placing shoots on the restorer queue.
This will restore the images from Glacier on day one, then spread their transfer across day two, in batches of 60 per day.
This also adds some Makefile features to check what is yet to do, in order to place them onto the right list.
How to test
There are currently 31 shoots on the restore_shoots_production queue. This number should go to zero tonight, and across tomorrow, all of them should be run through the transferrer (caveat).
How can we measure success?
Future transfers of editorial photography should be a one-step process, with perhaps a little mopping up of errors afterwards.
Have we considered potential risks?
The point of a lot of this is to mitigate the risk of Archivematica falling over. The two relevant lambdas are run on a schedule so that the shots are processed at a rate that the target system can cope with.
The model relies on the restorer and transferrer being in step with one another - i.e. that on the evening of day one, Objects are restored and the transferrer queue populated, and across day two, that queue is emptied.
Currently, the values are not linked in the definitions, partly because of the cron definition, which is manually written into the TF (i.e one is do 60 once and the other is do 10 six times, evenly spaced across the available hours)