Skip to content

Releases: DataONEorg/hashstore

1.1.0

02 Oct 00:34
c11d5bc
Compare
Choose a tag to compare

HashStore 1.1.0 🎉

Release date: 2024-10-01

Release Notes

This minor HashStore release refactors the storage of data objects from persistent identifier-based to content identifier-based hashes with a tagging system, while optimizing thread safety, synchronization, readability, and logging.

Overview of Major Changes ⚙️

  • Data objects are now stored with their content identifier, and are managed with reference files to establish the relationship between its authority-based or persistent identifiers (pids) I-73
  • Clients can now store a data object without an identifier. They are then expected to call tag_object separately to create this connection between a data object and its identifier. Additionally, we recommend to call delete_if_invalid_object afterwards which will remove a data object that is determined to be invalid
  • Refactored delete_object to also remove all associated metadata for a given identifier and improved the atomicity of the process by first renaming the files before proceeding to delete I-87
  • Metadata (ex. sysmeta, annotations) are now stored with a document name formed by the hash of the pid+format_id and stored in a hashstore directory formed with the hash of the pid I-99

New Features & Enhancements 🛠️

  • New Public API methods: tag_object, delete_if_invalid_object and supporting methods & processes. tag_object creates reference files linking an identifier (ex. pid) to its content identifier. I-124, I-75, I-76, I-81, I-97, I-101, I-109, I-111, I-113, I-114, I-122, I-124
  • The hashstore.yaml config file content relating to the keys and values are now created with a .yaml library to ensure reliability of content written I-138
  • Misc. improvements to the hashstore client, along with a script entry point which is a part of the poetry install process which simplifies the syntax/client usage I-92, I-82, I-94, I-106
  • Enhanced the thread/process synchronization process with specific threading and mulitprocessing locks to address race conditions (improved pytest time to less than 2s!) I-98
  • Added thread safety to all public API calls when working with metadata objects I-99
  • Refactored ObjectMetadata is be a @dataclass I-126
  • Various bug fixes and optimizations to the overall codebase to improve overall readability and clarity to resolve linting warnings I-139, I-136, I-72, I-112, I-119, I-121, I-125, I-85
  • Revised python docstrings into reStructuredText (sphinx autodocumentation compatible format) and added type hints I-70, I-137
  • Cleaned up logging statements which now utilizes a logger object I-93, I-140, I-90

1.0.0

19 Oct 17:47
294b4fe
Compare
Choose a tag to compare

HashStore 1.0.0 🎉

We are excited to announce the first major release of HashStore (1.0.0). To start using HashStore, include it in your project using your preferred package manager or download the source code from our GitHub repository. To see code/usage examples, please refer to our documentation.

Key Features

  • Public API:
    • store_object
    • store_metadata
    • delete_object
    • delete_metadata
    • retrieve_object
    • retrieve_metadata
    • get_hex_digest
  • Command Line Tool
    • Create a new HashStore or interact directly with a HashStore with the command line/terminal

Developer Notes:

  • HashStore has been extensively tested with python's multiprocessing and threading standard libraries, please see issue-32 for more details.
  • Interrupting store_object abruptly (like with a forceful volume unmount or a keyboard interrupt) will leave temporary files behind. To manage these files, we recommend implementing a separate monitor/watchdog to keep on top of the processes.

Feedback and Contributions:

  • HashStore is an open source project, and we welcome full participation in the project. Contributions are reviewed and suggestions are made to increase the value of HashStore to the community. We strive to incorporate code, documentation, and other useful contributions quickly and efficiently while maintaining a high-quality software product.
  • We would appreciate any feedback! If you encounter any issues, have suggestions, or want to contribute to the project, please create an issue or submit a pull request on our GitHub repository.