Skip to content

Commit

Permalink
Update READMEs with the suggested edits
Browse files Browse the repository at this point in the history
  • Loading branch information
NicholasSynovic committed Jan 25, 2023
1 parent e4573ff commit 316cb08
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 27 deletions.
30 changes: 14 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,17 @@
- [From Source](#from-source)
- [How to Run](#how-to-run)
- [As Individual Scripts](#as-individual-scripts)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Pre-Packaged Dataset](#pre-packaged-dataset)
- [How to Cite](#how-to-cite)
- [References](#references)

## About

This repository contains the scripts to generate the *PTMTorrent* dataset.

*PTMTorrent* is a dataset created to be submitted to the
[2023 Mining Software Repositories (MSR) Conference Data and Tool Showcase Track](https://conf.researchr.org/track/msr-2023/msr-2023-data-showcase).
The dataset contains either the partial or entire set of pre-trained machine
learning models (PTM) repositories hosted on popular model hubs.

The list of currently supported model hubs can be found
[here](#supported-model-hubs).
The dataset contains sets of pre-trained machine
learning models (PTM) [`git`](https://git-scm.com) repositories hosted on popular model hubs.
Supporting metadata from each model hub as well as standardized metadata specified by [this JSON Schema](ptm_torrent/utils/schemas/ptmtorrent.json) is also included in.

### Supported Model Hubs

Expand All @@ -50,9 +46,7 @@ This project is dependent upon the following software:

- [`Python 3.10.9`](https://www.python.org/downloads/release/python-3109/)

> Python dependencies and packaging are handled by
> [`pip`](https://pip.pypa.io/en/stable/) and
> [`poetry`](https://python-poetry.org/)
> Package dependencies are given in [`pypoetry.toml`](pyproject.toml) and handled by [`poetry`](https://python-poetry.org/)
- [`Git`](https://git-scm.com)
- [`Git LFS`](https://git-lfs.com/)
Expand Down Expand Up @@ -81,7 +75,7 @@ The package can either be installed from our
1. Create a `Python 3.10` virtual environment: `python3.10 -m venv env`
1. Activate virtual environment: `source env/bin/activate`
1. Upgrade `pip`: `python -m pip install --upgrade pip`
1. Install `poetry`: `python -m pip install -r requirements.txt`
1. Install `poetry`: `python -m pip install poetry`
1. Install `Python` dependencies through `poetry`: `python -m poetry install`
1. Build with `poetry`: `python -m poetry build`
1. Install with `pip`: `python -m pip install dist/ptm_torrent*.tar.gz`
Expand Down Expand Up @@ -113,8 +107,7 @@ which to run these scripts (should the `__main__.py` file be insufficient) is
described in each model hub's `README.md` file within the scripts folder.

> NOTE: Hugging Face's `__main__.py` can be parameritized to allow for a
> specific percentage of the model hub to be downloaded. By default, it is 0.1
> (10%).
> specific percentage of the model hub to be downloaded. By default, it is the first 0.1 (10%) of models sorted by downloads in descending order.
To run any of the scripts, execute the following command pattern:

Expand All @@ -124,7 +117,7 @@ For example, to run Hugging Face's scripts:

- `python ptm_torrent/huggingface/__main__.py`

## Data Storage
## Data Representation

Each model hub script generates the following directory structure **per model
hub**:
Expand Down Expand Up @@ -163,6 +156,11 @@ or concurrently.
Specifics about the types of metadata files and content that are produced by the
scripts can be found in each model hub's script folder's `README.md` file.

## Pre-Packaged Dataset

An existing dataset is availible on [this Purdue University Globus share](https://app.globus.org/file-manager?origin_id=d1db77ac-9b53-11ed-a84b-256017f36728&origin_path=%2F%7E%2F).
It currently is 99.79 GB as compressed `tar.gz` archives.

## How to Cite

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7570357.svg)](https://doi.org/10.5281/zenodo.7570357)
Expand Down
4 changes: 2 additions & 2 deletions ptm_torrent/huggingface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [How to Run](#how-to-run)
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/huggingface/html`](#datahuggingfacehtml)
- [`data/huggingface/json`](#datahuggingfacejson)
Expand Down Expand Up @@ -71,7 +71,7 @@ dependencies must first be installed. See this project's root
1. `python downloadJSON.py`
1. `python downloadRepos.py`

## Data Storage
## Data Representation

> The following directory structure was taken on 1/25/2023.
Expand Down
4 changes: 2 additions & 2 deletions ptm_torrent/modelhub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- [How to Run](#how-to-run)
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/modelhub/html`](#datamodelhubhtml)
- [`data/modelhub/json`](#datamodelhubjson)
Expand Down Expand Up @@ -49,7 +49,7 @@ dependencies must first be installed. See this project's root
1. `python downloadRepos.py`
1. `python createSchema.py`

## Data Storage
## Data Representation

> The following directory structure was taken on 1/25/2023. Files within the
> `data/modelhub/json/metadata/models` directory have been removed from the
Expand Down
4 changes: 2 additions & 2 deletions ptm_torrent/modelzoo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- [How to Run](#how-to-run)
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/modelzoo/html`](#datamodelzoohtml)
- [`data/modelzoo/json`](#datamodelzoojson)
Expand Down Expand Up @@ -49,7 +49,7 @@ dependencies must first be installed. See this project's root
1. `python downloadRepos.py`
1. `python createSchema.py`

## Data Storage
## Data Representation

> The following directory structure was taken on 1/25/2023. Files within the
> `data/modelzoo/json/metadata/models` directory have been removed from the
Expand Down
4 changes: 2 additions & 2 deletions ptm_torrent/onnxmodelzoo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [How to Run](#how-to-run)
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/onnxmodelhub/html/metadata`](#dataonnxmodelhubhtmlmetadata)
- [`data/onnxmodelhub/json`](#dataonnxmodelhubjson)
Expand Down Expand Up @@ -51,7 +51,7 @@ dependencies must first be installed. See this project's root
1. `python parseHubHTML.py`
1. `python parseModelHTML.py`

## Data Storage
## Data Representation

> The following directory structure was taken on 1/25/2023.
Expand Down
4 changes: 2 additions & 2 deletions ptm_torrent/pytorchhub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- [How to Run](#how-to-run)
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Representation](#data-representation)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/pytorchhub/html/metadata`](#datapytorchhubhtmlmetadata)
- [`data/pytorchhub/json`](#datapytorchhubjson)
Expand Down Expand Up @@ -48,7 +48,7 @@ dependencies must first be installed. See this project's root
1. `python parseModelMetadata.py`
1. `python downloadRepos.py`

## Data Storage
## Data Representation

> The following directory structure was taken on 1/25/2023.
Expand Down
1 change: 0 additions & 1 deletion requirements.txt

This file was deleted.

0 comments on commit 316cb08

Please sign in to comment.