-
Notifications
You must be signed in to change notification settings - Fork 0
iRODS
-
Technology Description
-
The Integrated Rule-Oriented Data System (iRODS) is open source data management software used by research, commercial, and governmental organizations worldwide. iRODS is released as a production-level distribution aimed at deployment in mission critical environments. It virtualizes data storage resources, so users can take control of their data, regardless of where and on what device the data is stored. The development infrastructure supports exhaustive testing on supported platforms. The plugin architecture supports microservices, storage systems, authentication, networking, databases, rule engines, and an extensible API.
-
iRODS can be considered a programmable virtual filesystem. Every operation within the system can be configured to do something to help achieve an organization's data management policy.
-
-
Learn from an expert
- A recorded presentation by Nirav Merchant: https://youtu.be/BIAxUzWDM_g
- Slides: https://docs.google.com/presentation/d/1pONfAfrM8hf0vyXJoIsSJb9Al3PbEtwiDxrcm7jwHdo
- Many additional videos: https://www.youtube.com/irodsconsortium
-
More Information
- The iRODS official website: https://irods.org/
- Past User Group Meetings, slides, papers, recordings: https://irods.org/ugm
-
Cost to setup
- Software is free under 3-clause BSD License.
- Setup and maintenance requires a learning curve.
-
Pros
- Data collections access through various (custom?) iRODS clients
- User-defined metadata can be assigned to data in addition to traditional system metadata
- Rule-based data and metadata management
- Secure collaboration through Tickets (controlled public access to data collections), Permissions (analogous to UNIX file system permissions), and Federation (Data federation among iRODS zones)
-
Cons
- iRODS is geared towards storage and analysis of very large datasets (flatfiles) by collaborative groups. It is likely not suitable for smaller datasets, or highly structured data (e.g. from databases), or for discovery of open (published) data. Also, setup and maintenance likely requires dedicated, trained staff.
-
Use Case
- A consortium generates large-scale plant phenotyping datasets, and needs to automate ingest, provide granular access to the data for different user types, associate detailed provenance information with the data processing, storage, and data access.
-
Findability - Metadata and data should be easy to find for both humans and computers.
- F1 - (Meta)data are assigned a globally unique and persistent identifier
iRODS allows the assignment of persistent identifiers through its publishing capability. https://irods.org/images/capability_publishing.png
This could also be achieved through a policy enforcement point (PEP) to generate / retrieve a new DOI and assign it within iRODS metadata (AVUs).
- F2 - Data are described with rich metadata (defined by R1 below)
iRODS allows users to define custom metadata - which could follow ontologies or data dictionaries defined either internally or by third parties. Metadata are defined as tuples of strings - AVUs (attribute-value-units) where the units are optional. These strings have meaning to humans and/or other scripts. These strings can represent the content of the data within the file, the history of the file, the state of the file, and/or the future of the file (or collection).
- F3 - Metadata clearly and explicitly include the identifier of the data they describe
The database holding the metadata holds the relationship / connection to the data element it is attached to. Making this connection explicit is a matter of asking the database via SQL. This is built-in and available.
- F4 - (Meta)data are registered or indexed in a searchable resource
iRODS allows metadata search via the iRODS API - and made available through two command line tools, imeta and iquest. Many other iRODS clients can access and make available the query endpoint of the iRODS API. iRODS also provides indexing capabilities to outsource search to a more specific search technology (elasticsearch, etc.). https://irods.org/images/capability_indexing.png
-
Accessibility - Once the user finds the required data, it should be clear how the data can be fully accessed.
- A1 - (Meta)data are retrievable by their identifier using a standardized communications protocol
Through various client applications, the data and metadata are available. See https://irods.org/clients/.
- A1.1 - The protocol is open, free, and universally implementable
iRODS is BSD-3 and open source.
- A1.2 - The protocol allows for an authentication and authorization procedure, where necessary
Every API call into the iRODS zone is authenticated. iRODS allows for custom data permissions and access via Tickets, Permissions, and Federation.
- A2 - Metadata are accessible, even when the data are no longer available
The metadata in iRODS is associated in the same catalog that knows where the data lives. Exporting this metadata to another system or to a static representation can easily be performed at a particular time or periodically with a query.
-
Interoperability - The data should easily interoperate with other data, as well as applications for analysis, storage, and processing.
- I1 - (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
Export is available to any representation that is desired. There are no iRODS Consortium-maintained exporters for any particular format at this time, but that can be changed with attention and time.
- I2 - (Meta)data use vocabularies that follow FAIR principles
iRODS allows users to define custom metadata - as such iRODS can support metadata using vocabularies that follow FAIR principles.
- I3 - (Meta)data include qualified references to other (meta)data
Including qualified references to other metadata is possible, but up to the user / administrator defining the metadata schema.
-
Reusability - Metadata and data should be well-described so that they can be replicated and/or combined in different settings.
- R1 - (Meta)data are richly described with a plurality of accurate and relevant attributes
Rich description is possible, and up to the user / administrator. The administrator can also require a minimum level of metadata - to be enforced ahead of time or detected on existing metadata.
- R1.1 - (Meta)data are released with a clear and accessible data usage license
Associating a data usage license is possible, and up to the user / administrator.
- R1.2 - (Meta)data are associated with detailed provenance
Detailed provenance description is possible, and up to the user / administrator. This information could be associated within the metadata itself, or attached to an affiliated file.
- R1.3 - (Meta)data meet domain-relevant community standards
Metadata adherence to domain-relevant community standards is possible, and up to the user / administrator.
Created by the AgBioData Data Federation Training Working Group