AVATAR: Bentley A/V dAtabase To ARchivesspace
Image Credit: Melissa Hernández-Durán
Creates or updates ArchivesSpace <dsc>
archival and digital object elements using data output from the A/V Database
This CLI, which supports the Bentley's A/V Database --> ArchivesSpace workflow, assumes a spreadsheet with the following columns as an input:
- resource id (not from A/V Database)*
- object id (not from A/V Database)*
- Type of obj id (not from A/V Database)* <-- Parent | Item | Part
- CollItem No*
- DigFile Calc*
- AVType::ExtentType*
- AVType::Avtype*
- ItemTitle*
- ItemPartTitle
- ItemDate*
- MiVideoID
- NoteContent
- NoteTechnical
- AUDIO_ITEMCHAR::Fidleity
- AUDIO_ITEMCHAR::ReelSize
- AUDIO_ITEMCHAR::TapeSpeed
- AUDIO_ITEMCHAR::ItemSourceLength
- ItemPolarity
- ItemColor
- ItemSound
- ItemLength
- ItemTime
Note: Required columns are designated with an asterisk (*).
You will need to do a little cleanup on the source .XLSX file. Convert it to a UTF-8 encoded CSV, and clean up any character encoding issues, e.g., fractions in AUDIO_ITEMCHAR::TapeSpeed and formatting in ItemTime.
It also assumes a configuration file detailing both DEV and PROD ArchivesSpace instances. Potentially a SANDBOX or others as well.
Finally, it assumes an export from Kaltura with the following columns (which, when run against the script in utils/create_access_profile_pickle.py
--with a CSV hard-coded into the script--is converted to a pickle file saved as access_profiles.p
in the "avatar" directory) as an input:
- entry_id*
- accessControlId*
Note: "876301" is reading room, "1694751" is public, and "2227181" is U-M campus.
AVATAR reads the pickle file and determines the appropriate Conditions Governing Access note.
With the -c
(or --coll_info
) argument, the following fields to the collection-level resource in ArchivesSpace:
- Extents: An extent statement is added with "x digital audio files" or "x digital video files," accordingly. It only counts extents for items that are in MiVideo. It also ensures that all extent portions have the value of "Part."
- "Processing Information" note: Adds a
processinfo
note with "In preparing digital material for long-term preservation and access, the Bentley Historical Library adheres to professional best practices and standards to ensure that content will retain its authenticity and integrity. For more information on procedures for the ingest and processing of digital materials, please see Bentley Historical Library Digital Processing Note. Access to digital material may be provided either as a direct link to an individual file or as a downloadable package of files bundled in a zip file." - Revision Statements: Four revision statements are added with the date...
- "Revised Extent Note, Processing Information Note and Existence and Location of Copies Note."
- "Added links to digitized content."
- "Added Conditions Governing notes for digitized content."
- "Added audio recording genre." OR "Added video recording genre."
- "Existence and Locations of Copies" note: Adds a
altformavail
note with "Digitization: A number of recordings within this collection have been digitized. The resulting files are available for playback online or in the Bentley Library Reading Room according to rights. Original media are only available for staff use." - "Conditions Governing Access" note: Adds a
accessrestrict
note with "Select recordings within this collection have been digitized. Original sound recordings are only available for staff use." - Genre/Form: Adds the ArchivesSpace subject for "digital file formats" as well as the appropriate subjects for "sound recording" and "video recording."
With the -d
(or --dsc
) argument, the following occurs...
First, AVATAR characterizes each row in the spreadsheet to determine:
- whether the corresponding ArchivesSpace archival object is a parent, item, or part of the row using the "Type of obj id" column;
- whether the row is an item ONLY or and item with parts using the "DigFile Calc," "CollItem No," and "AVType:ExtentType" columns (i.e., if they match or if it has an extent type of "videocassettes," "videotapes," "film reels," or "video recordings," it assumes it is an "item only" and if they don't it assumes it is an "item with parts"); and
- whether the row is audio or moving image using the "DigFile Calc" column (i.e., if there is an "SR" it is audio).
The basic logic for creating or updating archival objects and creating and linking digital objects in ArchivesSpace, is, then:
Expression | Statement |
---|---|
If the corresponding ArchivesSpace archival object is a parent and the row is an item only... | ...create a child archival object (including instance with top container), if not a duplicate, create and link a digital object (preservation) to the child archival object, and, if it exists, create and link digital object (access) to the child archival object. |
Else if the corresponding ArchivesSpace archival object is an item and the row is an item only... | ... update the archival object, if not a duplicate, create and link a digital object (preservation) to the archival object, and, if it exists, create and link a digital object (access) to the archival object. |
Else if the corresponding ArchivesSpace archival object is an part and the row is an item only... | NOT APPLICABLE |
Else if the corresponding ArchivesSpace archival object is a parent and the row is an item with parts... | ...create a child archival object for the item (including instance with top container), create and link a digital object (preservation) to the child archival object, create a child archival object to the child archival object for the part, and, if it exists, create and link a digital object (access) to the child archival object of the child archival object. |
Else if the corresponding ArchivesSpace archival object is an item and the row is an item with parts... | ...update the archival object for the item, create and link a digital object (preservation) to the archival object, create a child archival for the part, and, if it exists, create and link a digital object (access) to the child archival object for the part. |
Else if the corresponding ArchivesSpace archival object is an part and the row is an item with parts... | ...update the parent archival object for the item, if it does not exist, create and link a digital object (preservation) to the parent archival object, update the archival object for the part, and, if it exists, create and link a digital object (access) to the archival object. |
Note: AVATAR identifies recordings that are duplicates that were not digitized (but still need to be tracked for collections management purposes) if the row is an item only but there is no MiVideoID.
Key:
- "Quotation marks": Hard-coded
- Italicized: From the A/V Database export
Consolas
: From the ArchivesSpace API
- Title = ItemTitle OR (ItemTitle + " " + ItemPartTitle (optional))
- Component Unique Identifier = DigFile Calc
- Level of Description = "File"
- Dates
- Label = "Creation"
- Expression = ItemDate
- Type = "Inclusive Dates"
- Extents
- Portion = "Whole"
- Number = "1"
- Type = AVType::ExtentType
- Physical Details = ", ".join(AVType::Avtype, ItemColor (optional), ItemPolarity (optional), ItemSound (optional), AUDIO_ITEMCHAR::Fidleity (optional), AUDIO_ITEMCHAR::TapeSpeed (optional))
- Dimensions = ", ".join(AUDIO_ITEMCHAR::ReelSize (optional), ItemLength (optional), AUDIO_ITEMCHAR::ItemSourceLength (optional))
- Notes
- Note (Optional)
- Type = "Abstract"
- Text = NoteContent
- Note (Optional)
- Type = "General"
- Content = ItemTime (optional)
- Note (Optional)
- Type = "Conditions Governing Access"
- Text = "Access to this material is restricted to the reading room of the Bentley Historical Library." OR "Access to digitized content is enabled for users who are able to authenticate via the University of Michigan weblogin."
- Note (Optional)
- Type = "General"
- Publish = False
- Text = "Internal Technical Note: " + NoteContent
- Note (Optional)
- Instances
- Top Container
- Indicator =
indicator
- Container Type =
type
- Indicator =
- Top Container
- Title = ItemTitle
- Component Unique Identifier = CollItem No
- Level of Description = "Other Level"
- Other Level = "item-main"
- Dates
- Label = "Creation"
- Expression = ItemDate
- Type = "Inclusive Dates"
- Extents
- Portion = "Whole"
- Number = "1"
- Type = AVType::ExtentType
- Physical Details = ", ".join(AVType::Avtype, ItemColor (optional), ItemPolarity (optional), ItemSound (optional), AUDIO_ITEMCHAR::Fidleity (optional), AUDIO_ITEMCHAR::TapeSpeed (optional))
- Dimensions = ", ".join(AUDIO_ITEMCHAR::ReelSize (optional), ItemLength (optional), AUDIO_ITEMCHAR::ItemSourceLength (optional))
- Instances
- Top Container
- Indicator =
indicator
- Container Type =
type
- Indicator =
- Top Container
- Title = ItemPartTitle
- Component Unique Identifier = DigFile Calc
- Level of Description = "Other Level"
- Other Level = "item-part"
- Dates
- Label = "Creation"
- Expression = ItemDate
- Type = "Inclusive Dates"
- parent = Archival Object (Item)
uri
- Notes
- Note (Optional)
- Type = "Abstract"
- Text = NoteContent
- Note (Optional)
- Type = "General"
- Content = ItemTime (optional)
- Note (Optional)
- Type = "Conditions Governing Access"
- Text = "Access to this material is restricted to the reading room of the Bentley Historical Library." OR "Access to digitized content is enabled for users who are able to authenticate via the University of Michigan weblogin."
- Note (Optional)
- Type = "General"
- Publish = False
- Text = "Internal Technical Note: " + NoteContent
- Note (Optional)
- Title = Archival Object (Item)
display_string
+ " (Preservation)" - Identifier = DigFile Calc (Item) (i.e., "07143-70" in "07143-70-1" or "07143-SR-63" in "07143-SR-63-1")
- Publish? = False
- File Versions
- File URI = "\bhl-digitalarchive.m.storage.umich.edu\bhl-digitalarchive/AV Collections/" + ("Audio" or "Moving Image") + "/" + Collection ID (i.e., "9841" in "9841 Bimu 2" or "umich-bhl-9841") + "/" + CollItem No (i.e., "07143-70" or "07143-SR-63")
- Title = Archival Object (Item)
display_string
+ " " + Archival Object (Part)display_string
+ " (Access)" Identifier = MiVideoID - File Versions
- File URI = "https://bentley.mivideo.it.umich.edu/media/t/" + MiVideoID
- XLink Actuate Attribute = "onRequest"
- XLink Show Attribute = "new"
In order to configure the baseline preservation path, authenticate to ArchivesSpace, and use the ArchivesSpace API, supply a "config.ini" file in the "avatar" directory that looks like this:
[PRESERVATION]
BasePreservationPath = ``
# These are configurations for ArchivesSpace instances
[DEV]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.
[PROD]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.
[SANDBOX]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.
usage: avatar.py [-h] [-c] [-d] [-r] [-o /path/to/output/directory] /path/to/project/csv.csv {dev,prod,sandbox}
Creates or updates ArchivesSpace `<dsc>` archival and digital object elements using data output from the A/V Database
positional arguments:
/path/to/project/csv.csv
Path to a project CSV
{dev,prod,sandbox} Choose configuration for DEV, PROD, or SANDBOX ArchivesSpace instance
optional arguments:
-h, --help show this help message and exit
-c, --coll_info Updates collection-level-information
-d, --dsc Updates container list
-r, --revert_back Undoes collection- and container-level updates
-o /path/to/output/directory, --output /path/to/output/directory
Path to output directory for results
AVATAR outputs a CSV file with the DigFile Calc (for the Item or Part, depending on whether it's an "item ONLY" or "item with parts," respectively) and the corresponding archival_object_id
. This can be used to update the A/V Database. The optional --output
argument can be used to specify a destination directory.
AVATAR creates a cache of resources it updates as well as archival objects and digital objects it creates (ID only) or updates (JSON) for individual media files (digfile_calcs
). To iniate the cache, use utils/create_digfile_calcs_pickle.py
and ensure that there is a "cache" directory with a "resources" and "digfile_calcs" subdirectories in the home folder. For resources, they are simply stored in a cached JSON representation of the resource in a file named [resource_id].json
. Media files, however, are stored in a pickle structured like:
[{
'85242-1': [{
'type': 'archival_object',
'id': '371206',
'status': 'updated'
}, {
'type': 'digital_object',
'id': '43062',
'status': 'created'
}, {
'type': 'digital_object',
'id': '43063',
'status': 'created'
}]
}]
Any updated archival objects are stored in a file named "[archival_object_id].json."
AVATAR can use the cache it creates to "revert back" to a previous state, i.e., to undo collection- and container-level updates. This should only be use in a non-PROD environment.