Skip to content

laissezfarrell/rl-bitcurator-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rl-bitcurator-scripts

Some resources (bash, python) developed to use in BitCurator. Simple (too simple) bash scripts, mostly.

bash/append.sh

  • appends text to a set of Kryoflux log filenames.
  • used to differentiate different image logs when we create multiple disk images from the same stream files

bash/batch_brunn.sh and batch_brunn-noclam.sh

  • runs Brunnhilde.py against all directories in a given directory with standard RL options
  • -no-clam runs Brunnhilde.py without generating hashes or virus scans

bash/batchmkdir.sh

  • simply makes a directory in the administrative_metadata folder for every directory in a given directory without recursing further.
  • assumes you are following the basic dir structure for RL AIPs

bash/sha1batch.sh

  • runs sha1deep over every subdirectory of a given directory, creating an output file at the subdirectory level in the administrative_metadata folder.
  • assumes you are following the basic dir structure for RL AIPs

bash/multifiwalk.sh

  • borrowed whole cloth from the [BitCurator Users Google Group (thanks, Kam!)] (https://goo.gl/n8kke7)
  • output dumped into directory storing the disk images
  • TO-DO:
    • once script for running bitcurator-reports in batch is created, modify to ensure outputs are what an be expected by bc-reports as input.

bash/multi_be.sh

  • modified from multifiwalk.sh (thanks, Kam!)
  • output dumped into subdirectories storing the disk image
  • TO-DO:
    • once script for running bitcurator-reports in batch is created, modify to ensure outputs are what an be expected by bc-reports as input.
    • modify to include the regex group of regular expressions to search.
    • modify to include other customizations run in RL

bash/multi_annofeatures.sh

  • modified from multifiwalk.sh (thanks, Kam!)
  • output dumped into subdirectories storing the disk image
  • TO-DO:
    • once script for running bitcurator-reports in batch is created, modify to ensure outputs are what an be expected by bc-reports as input.

be_regex/uaregex.txt

  • starting list of regexes to run against all UA collections, looking for some types of FERPA and other sensitive data.
  • not exhaustive. Needs updating.

Deprecated or otherwise abandoned

bash/simgen_prod.sh -- Should be considered deprecated at this point (December 2018). May return to this at some point

  • works with image files feat. ISO and E01 extensions
  • works with disk images featuring an ISO9660 filesystem
  • output currently dumped to desktop
  • outputs:
    • virus scan report (clamTK)
    • sdhash similarity digest
    • contents list
  • TO-DO:
    • compatibility with IMG, DD file extension disk images
    • compatibility with FAT, NTFS, HFS+ filesystems
    • ability to work with disk image files that have been split or at least document that the user must stitch these together prior to running
    • remove virus scan function (handled by Brunnhilde)

regex/rlist.txt

  • an evolving list of regular expressions common enough to RL that all bulk_extractor searches should search them
  • could serve as useful for testing, but otherwise, deprecated in favor of lists that can be used in different collection profiles.

About

Some scripts (bash, python) developed to automate processes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published