Skip to content

bentley-historical-library/avatar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

avatar

AVATAR: Bentley A/V dAtabase To ARchivesspace

AVATAR logo

Image Credit: Melissa Hernández-Durán

Description

Creates or updates ArchivesSpace <dsc> archival and digital object elements using data output from the A/V Database

Input

A/V Database Export

This CLI, which supports the Bentley's A/V Database --> ArchivesSpace workflow, assumes a spreadsheet with the following columns as an input:

  • resource id (not from A/V Database)*
  • object id (not from A/V Database)*
  • Type of obj id (not from A/V Database)* <-- Parent | Item | Part
  • CollItem No*
  • DigFile Calc*
  • AVType::ExtentType*
  • AVType::Avtype*
  • ItemTitle*
  • ItemPartTitle
  • ItemDate*
  • MiVideoID
  • NoteContent
  • NoteTechnical
  • AUDIO_ITEMCHAR::Fidleity
  • AUDIO_ITEMCHAR::ReelSize
  • AUDIO_ITEMCHAR::TapeSpeed
  • AUDIO_ITEMCHAR::ItemSourceLength
  • ItemPolarity
  • ItemColor
  • ItemSound
  • ItemLength
  • ItemTime

Note: Required columns are designated with an asterisk (*).

You will need to do a little cleanup on the source .XLSX file. Convert it to a UTF-8 encoded CSV, and clean up any character encoding issues, e.g., fractions in AUDIO_ITEMCHAR::TapeSpeed and formatting in ItemTime.

Configurations for ArchivesSpace Instances

It also assumes a configuration file detailing both DEV and PROD ArchivesSpace instances. Potentially a SANDBOX or others as well.

Kaltura Export and Conditions Governing Access Notes

Finally, it assumes an export from Kaltura with the following columns (which, when run against the script in utils/create_access_profile_pickle.py--with a CSV hard-coded into the script--is converted to a pickle file saved as access_profiles.p in the "avatar" directory) as an input:

  • entry_id*
  • accessControlId*

Note: "876301" is reading room, "1694751" is public, and "2227181" is U-M campus.

AVATAR reads the pickle file and determines the appropriate Conditions Governing Access note.

Update Collection-Level Information

With the -c (or --coll_info) argument, the following fields to the collection-level resource in ArchivesSpace:

  • Extents: An extent statement is added with "x digital audio files" or "x digital video files," accordingly. It only counts extents for items that are in MiVideo. It also ensures that all extent portions have the value of "Part."
  • "Processing Information" note: Adds a processinfo note with "In preparing digital material for long-term preservation and access, the Bentley Historical Library adheres to professional best practices and standards to ensure that content will retain its authenticity and integrity. For more information on procedures for the ingest and processing of digital materials, please see Bentley Historical Library Digital Processing Note. Access to digital material may be provided either as a direct link to an individual file or as a downloadable package of files bundled in a zip file."
  • Revision Statements: Four revision statements are added with the date...
    • "Revised Extent Note, Processing Information Note and Existence and Location of Copies Note."
    • "Added links to digitized content."
    • "Added Conditions Governing notes for digitized content."
    • "Added audio recording genre." OR "Added video recording genre."
  • "Existence and Locations of Copies" note: Adds a altformavail note with "Digitization: A number of recordings within this collection have been digitized. The resulting files are available for playback online or in the Bentley Library Reading Room according to rights. Original media are only available for staff use."
  • "Conditions Governing Access" note: Adds a accessrestrict note with "Select recordings within this collection have been digitized. Original sound recordings are only available for staff use."
  • Genre/Form: Adds the ArchivesSpace subject for "digital file formats" as well as the appropriate subjects for "sound recording" and "video recording."

Update Container List

With the -d (or --dsc) argument, the following occurs...

Basic Logic

First, AVATAR characterizes each row in the spreadsheet to determine:

  • whether the corresponding ArchivesSpace archival object is a parent, item, or part of the row using the "Type of obj id" column;
  • whether the row is an item ONLY or and item with parts using the "DigFile Calc," "CollItem No," and "AVType:ExtentType" columns (i.e., if they match or if it has an extent type of "videocassettes," "videotapes," "film reels," or "video recordings," it assumes it is an "item only" and if they don't it assumes it is an "item with parts"); and
  • whether the row is audio or moving image using the "DigFile Calc" column (i.e., if there is an "SR" it is audio).

The basic logic for creating or updating archival objects and creating and linking digital objects in ArchivesSpace, is, then:

Expression Statement
If the corresponding ArchivesSpace archival object is a parent and the row is an item only... ...create a child archival object (including instance with top container), if not a duplicate, create and link a digital object (preservation) to the child archival object, and, if it exists, create and link digital object (access) to the child archival object.
Else if the corresponding ArchivesSpace archival object is an item and the row is an item only... ... update the archival object, if not a duplicate, create and link a digital object (preservation) to the archival object, and, if it exists, create and link a digital object (access) to the archival object.
Else if the corresponding ArchivesSpace archival object is an part and the row is an item only... NOT APPLICABLE
Else if the corresponding ArchivesSpace archival object is a parent and the row is an item with parts... ...create a child archival object for the item (including instance with top container), create and link a digital object (preservation) to the child archival object, create a child archival object to the child archival object for the part, and, if it exists, create and link a digital object (access) to the child archival object of the child archival object.
Else if the corresponding ArchivesSpace archival object is an item and the row is an item with parts... ...update the archival object for the item, create and link a digital object (preservation) to the archival object, create a child archival for the part, and, if it exists, create and link a digital object (access) to the child archival object for the part.
Else if the corresponding ArchivesSpace archival object is an part and the row is an item with parts... ...update the parent archival object for the item, if it does not exist, create and link a digital object (preservation) to the parent archival object, update the archival object for the part, and, if it exists, create and link a digital object (access) to the archival object.

Note: AVATAR identifies recordings that are duplicates that were not digitized (but still need to be tracked for collections management purposes) if the row is an item only but there is no MiVideoID.

Crosswalk: A/V Database --> ArchivesSpace

Key:

  • "Quotation marks": Hard-coded
  • Italicized: From the A/V Database export
  • Consolas: From the ArchivesSpace API

Archival Objects

Items ONLY
  • Title = ItemTitle OR (ItemTitle + " " + ItemPartTitle (optional))
  • Component Unique Identifier = DigFile Calc
  • Level of Description = "File"
  • Dates
    • Label = "Creation"
    • Expression = ItemDate
    • Type = "Inclusive Dates"
  • Extents
    • Portion = "Whole"
    • Number = "1"
    • Type = AVType::ExtentType
    • Physical Details = ", ".join(AVType::Avtype, ItemColor (optional), ItemPolarity (optional), ItemSound (optional), AUDIO_ITEMCHAR::Fidleity (optional), AUDIO_ITEMCHAR::TapeSpeed (optional))
    • Dimensions = ", ".join(AUDIO_ITEMCHAR::ReelSize (optional), ItemLength (optional), AUDIO_ITEMCHAR::ItemSourceLength (optional))
  • Notes
    • Note (Optional)
      • Type = "Abstract"
      • Text = NoteContent
    • Note (Optional)
      • Type = "General"
      • Content = ItemTime (optional)
    • Note (Optional)
      • Type = "Conditions Governing Access"
      • Text = "Access to this material is restricted to the reading room of the Bentley Historical Library." OR "Access to digitized content is enabled for users who are able to authenticate via the University of Michigan weblogin."
    • Note (Optional)
      • Type = "General"
      • Publish = False
      • Text = "Internal Technical Note: " + NoteContent
  • Instances
    • Top Container
      • Indicator = indicator
      • Container Type = type
Items with Parts
Item
  • Title = ItemTitle
  • Component Unique Identifier = CollItem No
  • Level of Description = "Other Level"
  • Other Level = "item-main"
  • Dates
    • Label = "Creation"
    • Expression = ItemDate
    • Type = "Inclusive Dates"
  • Extents
    • Portion = "Whole"
    • Number = "1"
    • Type = AVType::ExtentType
    • Physical Details = ", ".join(AVType::Avtype, ItemColor (optional), ItemPolarity (optional), ItemSound (optional), AUDIO_ITEMCHAR::Fidleity (optional), AUDIO_ITEMCHAR::TapeSpeed (optional))
    • Dimensions = ", ".join(AUDIO_ITEMCHAR::ReelSize (optional), ItemLength (optional), AUDIO_ITEMCHAR::ItemSourceLength (optional))
  • Instances
    • Top Container
      • Indicator = indicator
      • Container Type = type
Parts
  • Title = ItemPartTitle
  • Component Unique Identifier = DigFile Calc
  • Level of Description = "Other Level"
  • Other Level = "item-part"
  • Dates
    • Label = "Creation"
    • Expression = ItemDate
    • Type = "Inclusive Dates"
  • parent = Archival Object (Item) uri
  • Notes
    • Note (Optional)
      • Type = "Abstract"
      • Text = NoteContent
    • Note (Optional)
      • Type = "General"
      • Content = ItemTime (optional)
    • Note (Optional)
      • Type = "Conditions Governing Access"
      • Text = "Access to this material is restricted to the reading room of the Bentley Historical Library." OR "Access to digitized content is enabled for users who are able to authenticate via the University of Michigan weblogin."
    • Note (Optional)
      • Type = "General"
      • Publish = False
      • Text = "Internal Technical Note: " + NoteContent

Digital Objects

Preservation
  • Title = Archival Object (Item) display_string + " (Preservation)"
  • Identifier = DigFile Calc (Item) (i.e., "07143-70" in "07143-70-1" or "07143-SR-63" in "07143-SR-63-1")
  • Publish? = False
  • File Versions
    • File URI = "\bhl-digitalarchive.m.storage.umich.edu\bhl-digitalarchive/AV Collections/" + ("Audio" or "Moving Image") + "/" + Collection ID (i.e., "9841" in "9841 Bimu 2" or "umich-bhl-9841") + "/" + CollItem No (i.e., "07143-70" or "07143-SR-63")
Access
  • Title = Archival Object (Item) display_string + " " + Archival Object (Part) display_string + " (Access)" Identifier = MiVideoID
  • File Versions

Configuration File

In order to configure the baseline preservation path, authenticate to ArchivesSpace, and use the ArchivesSpace API, supply a "config.ini" file in the "avatar" directory that looks like this:

[PRESERVATION]
BasePreservationPath = ``

# These are configurations for ArchivesSpace instances
[DEV]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.

[PROD]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.

[SANDBOX]
BaseURL = ''
User = ''
Password = ''
RepositoryID = '' # Note: AVATAR assumes a default ArchivesSpace repository ID of 2.

Usage

usage: avatar.py [-h] [-c] [-d] [-r] [-o /path/to/output/directory] /path/to/project/csv.csv {dev,prod,sandbox}

Creates or updates ArchivesSpace `<dsc>` archival and digital object elements using data output from the A/V Database

positional arguments:
  /path/to/project/csv.csv
                        Path to a project CSV
  {dev,prod,sandbox}    Choose configuration for DEV, PROD, or SANDBOX ArchivesSpace instance

optional arguments:
  -h, --help            show this help message and exit
  -c, --coll_info       Updates collection-level-information
  -d, --dsc             Updates container list
  -r, --revert_back     Undoes collection- and container-level updates
  -o /path/to/output/directory, --output /path/to/output/directory
                        Path to output directory for results

Output

AVATAR outputs a CSV file with the DigFile Calc (for the Item or Part, depending on whether it's an "item ONLY" or "item with parts," respectively) and the corresponding archival_object_id. This can be used to update the A/V Database. The optional --output argument can be used to specify a destination directory.

Cache

AVATAR creates a cache of resources it updates as well as archival objects and digital objects it creates (ID only) or updates (JSON) for individual media files (digfile_calcs). To iniate the cache, use utils/create_digfile_calcs_pickle.py and ensure that there is a "cache" directory with a "resources" and "digfile_calcs" subdirectories in the home folder. For resources, they are simply stored in a cached JSON representation of the resource in a file named [resource_id].json. Media files, however, are stored in a pickle structured like:

[{
	'85242-1': [{
		'type': 'archival_object',
		'id': '371206',
		'status': 'updated'
	}, {
		'type': 'digital_object',
		'id': '43062',
		'status': 'created'
	}, {
		'type': 'digital_object',
		'id': '43063',
		'status': 'created'
	}]
}]

Any updated archival objects are stored in a file named "[archival_object_id].json."

Revert Back (In Development)

AVATAR can use the cache it creates to "revert back" to a previous state, i.e., to undo collection- and container-level updates. This should only be use in a non-PROD environment.

About

A/V dAtabase To ARchivesspace

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages