Skip to content

Software Metadata Extraction and Curation Software (SMECS)

License

Notifications You must be signed in to change notification settings

NFDI4Energy/SMECS

Repository files navigation

Software Metadata Extraction and Curation Software (SMECS)

A web application to extract and curate research software metadata following the codemeta software metadata standard.

SMECS facilitates the extraction of research software metadata from repositories on GitHub/GitLab. It offers a user-friendly graphical user interface for visualizing the retrieved metadata. This empowers researchers to create good metadata for their research software without reentering data which is already available elsewhere. Ultimately, SMECS delivers the curated metadata in JSON format, enhancing usability and accessibility.


Authors: Stephan Ferenz, Aida Jafarbigloo

Key Stages in SMECS

The figure below illustrates the sequential processes and data flows within SMECS. First, users input data, triggering the tool to extract metadata associated with specific URLs. This metadata is then visualized, allowing users to review and interact with it. Users can curate, modify, and finalize the metadata according to their needs. Once satisfied, they can download the curated metadata in JSON format, providing an interoperable output for further use.


SMECS Workflow Visualization


  1. Metadata Extraction Stage
    • Metadata Extraction
      • SMECS extracts metadata from GitHub and GitLab repositories. For details on the specific metadata that SMECS can extract, please refer to Metadata Terms in SMECS
    • API Interactions: Use GitHub and GitLab APIs to fetch relevant metadata.
    • Data Parsing: Analyze the retrieved metadata and translate it into CodeMeta metadata for further processing.
    • Cross-Walk and Metadata Mapping
      • Standardization: Align metadata fields from GitHub and GitLab to a common dictionary.
      • Field Matching: Map equivalent fields between GitHub and GitLab. For example, mapping GitHub "topics" to GitLab "keywords".
  2. Visualization and Curation Stage
    • Visualization: Extracted metadata is displayed in a structured form.
    • User Interface: Interactive and simple UI for exploring the extracted and curated metadata.
    • Metadata Curation: Refine the extracted metadata based on user preferences.
    • Missing Metadata Identification: Identify and highlight fields where metadata is absent.
    • User Input for Missing Metadata: Enable users to add missing metadata directly via the user interface.
    • Real-Time Metadata Curation: Enable the possibility of representing the JSON format of the metadata based on the CodeMeta standard in real time, allowing one-direction changes from form format to JSON to show real-time metadata curation.
  3. Export Stage
    • Export Formats: Save extracted and curated metadata in JSON format.


Installation and Usage

Install from GitHub

  1. Cloning the repository
    • Copy URL of the project from Clone with HTTPS.
    • Change the current working directory to the desired location.
    • Run git clone <URL> in command prompt. (GitBash can be used as well)
  2. Creating virtual environment
    • Make sure Python is installed.

    • Ensure you can run Python from command prompt.
      • On Windows: Run py --version.
      • On Unix/MacOS: Run python3 --version.
    • Create the virtual environment by running this code in the command prompt.
      • On Windows: Run py -m venv <name-of-virtual-environment>.
      • On Unix/MacOS: Run python3 -m venv <name-of-virtual-environment>.
      for more details visit Creation of virtual environments
    • Activate virtual environment.
      • On Windows: Run env\Scripts\activate.
      • On Unix/MacOS: Run source env/bin/activate.

      env is the selected name for the virtual environment. Note that activating the virtual environment change the shell's prompt and show what virtual environment is being used.

  3. Managing Packages with pip
    • Ensure you can run pip from command prompt.
      • On Windows: Run py -m pip --version.
      • On Unix/MacOS: Run python3 -m pip --version.
    • Install a list of requirements specified in a Requirements.txt.
      • On Windows: Run py -m pip install -r requirements.txt.
      • On Unix/MacOS: Run python3 -m pip install -r requirements.txt.
    for more details visit Installing Packages


Running the project

  • Open and run the project in an editor (e.g. VS code).
  • Run the project.
    • On Windows: Run py manage.py runserver.
    • On Unix/MacOS: Run python3 manage.py runserver.
  • To see the output on the browser follow the link shown in the terminal. (e.g. http://127.0.0.1:8000/)


Install through Docker

To get started with SMECS using Docker, follow the steps below:

  • Prerequisites
    • Make sure Docker is installed on your local machine.
  • Cloning the Repository
git clone https://github.com/NFDI4Energy/SMECS.git
  • Navigate to the Project Directory
cd smecs
  • Building the Docker Images
docker-compose build
  • Starting the Services
docker-compose up
  • Accessing the Application
    • Navigate to http://localhost:8000 in your web browser.
  • Stopping the Services
docker-compose down

Setting Up GitLab/GitHub Personal Token
To enhance the functionality of this program and ensure secure interactions with the GitLab/GitHub API, users are required to provide their personal access token. Follow these steps to integrate your token:

Tip for developers
If the page does not refresh correctly, clear the browser cache. You can force Chrome to pull in new data and ignore the saved ("cached") data by using the keyboard shortcut Cmd+Shift+R on Mac, and Ctrl+F5 or Ctrl+Shift+R on Windows.

Collaboration

We believe in the power of collaboration and welcome contributions from the community to enhance the SMECS workflow. Whether you have found a bug, have a feature idea, or want to share feedback, your contribution matters. Feel free to submit a pull request, open up an issue, or reach out with any questions or concerns.

To see upcoming features, please refer to our open issues.


License and Citation

The code is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
See LICENSE.txt for further information.

Acknowledgements

We would like to thank meta_tool for providing the foundational framework upon which this project is built.

About

Software Metadata Extraction and Curation Software (SMECS)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published