Skip to content

Commit

Permalink
Merge pull request #24 from wellcomecollection/document-photography-i…
Browse files Browse the repository at this point in the history
…ngest-commands

adds to documentation
  • Loading branch information
agnesgaroux authored Aug 2, 2024
2 parents 4581660 + a70acd6 commit 60b1c29
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 22 deletions.
20 changes: 10 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@ shoots/clean:
rm shoots/*transferred
rm shoots/*slice*

# Slice a given input file into manageable chunks, so that you can run them through the
# transfer process separately without overwhelming the target system.
# The right number for archivematica is probably about 20.

%.sliced: %
split -l 20 $< $<.

# Request the Glacier restoration of the shoots in the given file
# The file is expected to contain one shoot identifier per line.
# In order to run this, set your AWS profile to one with authority in the workflow account.
# In order to run this, set your AWS profile to one with authority in the platform account.
%.restored : %
cat $< | python src/restore.py
cp $< $@
Expand All @@ -33,14 +40,7 @@ shoots/clean:
cat $< | python src/start_transfers.py production
cp $< $@

# Slice a given input file into manageable chunks, so that you can run them through the
# transfer process separately without overwhelming the target system.
# The right number for archivematica is probably about 20.

%.sliced: %
split -l 20 $< $<.

# Touch the files already on AWS. This will stimulate the corresponding transfer lambdas.
# Touch the files already in Archivematica source bucket. This will stimulate the corresponding transfer lambdas.
# The target system can sometimes be unexpectedly unavailable or overwhelmed,
# resulting in failures.
# This allows us to invoke the process from just before the failure
Expand All @@ -49,4 +49,4 @@ shoots/clean:
cat % | python src/touch.py staging

%.touched.production: %
cat % | python src/touch.py production
cat $< | python src/touch.py production
34 changes: 27 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# editorial-photography-ingest

Tool for transferring editorial photography from Glacier Storage to Archivematica.
Tool for transferring editorial photography from Glacier Storage to Archivematica, so they can then be ingested into the storage-service.

## Background

Expand All @@ -12,12 +12,32 @@ Each batch is provided as a list of identifiers for the shoot.
Shoots consist of the photographs themselves, and some metadata which is irrelevant for this purpose.

## Procedure

1. Restore the shoots from Glacier using batch restore
2. When they have been successfully restored, run the tool to transfer from the editorial photography bucket to the ingest bucket
3. The receiving system may be down, so step 2 may need to be retried, without having to restore again from step 1

Starting with a file containing one shoot identifier per line:

1. Slice the input file into manageable chunks, that the downstream system can manage.
Currently hardcoded as 20 in the Makefile `sliced` command.
```
make path_to/your_shoot_identifiers_file.sliced
```
2. Restore the shoots from Glacier using batch restore.
The files are restored for 1 day by default.
It will take several hours to restore the files.
```
AWS_PROFILE=platform-developer make path_to/a_slice_file.restored
```
3. When they have been successfully restored, run the tool to transfer from the editorial photography bucket to the ingest bucket.
```
AWS_PROFILE=digitisation-developer make path_to/a_slice_file.transferred.production
```
4. The receiving system may be down, so step 5 may need to be run, without having to restore again from step 2.
You can use this [dashboard](https://c783b93d8b0b4b11900b5793cb2a1865.eu-west-1.aws.found.io:9243/s/storage-service/app/dashboards#/view/04532600-2dfc-11ed-8fbf-7d74cdf8bbb4?_g=(filters:!(),refreshInterval:(pause:!t,value:60000),time:(from:now-1d,to:now))) to check whether the files have been successfully ingested into the storage-service. Filter by `bag.info.externalIdentifier`, the bag's `status.id` should be `succeeded`.

5. (Optional) If you cannot find your bag in the above dashboard, or if it `failed`, use the `touch.py` script to re-trigger the Archivematica workflow. Feed it a file containing one S3 key to touch per line
```
AWS_PROFILE=digitisation-developer make path_to/your_list_of_S3_keys_to_touch.touched.production
```

## How

See `Makefile` for more detail about each step
1. Restoration is asynchronous and can be triggered from a command line.
2. Once restored, the transfer should be triggered. This will run on Lambda, driven by a queue.
9 changes: 4 additions & 5 deletions src/touch.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
"""
Like unix touch, this updates objects in s3 without substantive change.
Call with a newline-separated list of keys, thus
echo 'born-digital-accessions/2754_CP000179.zip \n born-digital-accessions/2754_CP000181.zip' | touch.py
This is intended for use when Archivematica has ephemerally failed to accept some of the shoot zips.
A fortnightly list of failed zips is published to Slack, but you could also generate it by looking elsewhere.
A fortnightly list of failed zips is published to Slack (TBC) but you could also generate it by looking elsewhere,
eg. the storage-service's ingests dashboard in the reporting cluster
"""

import sys

import boto3
Expand Down

0 comments on commit 60b1c29

Please sign in to comment.