Skip to content

Commit

Permalink
Merge pull request #13 from NicholasSynovic/main
Browse files Browse the repository at this point in the history
Add HuggingFace schema script
  • Loading branch information
NicholasSynovic committed Jan 25, 2023
2 parents c4c7cb0 + 9406017 commit 392e664
Showing 1 changed file with 41 additions and 0 deletions.
41 changes: 41 additions & 0 deletions ptm_torrent/huggingface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
- [Through `__main__.py`](#through-__main__py)
- [As Individual Files](#as-individual-files)
- [Data Storage](#data-storage)
- [Data Directory Specifics](#data-directory-specifics)
- [`data/huggingface/html`](#datahuggingfacehtml)
- [`data/huggingface/json`](#datahuggingfacejson)
- [`data/huggingface/json/metadata`](#datahuggingfacejsonmetadata)
- [`data/huggingface/repos`](#datahuggingfacerepos)
- [References](#references)

## About
Expand Down Expand Up @@ -68,13 +73,16 @@ dependencies must first be installed. See this project's root

## Data Storage

> The following directory structure was taken on 1/25/2023.
```shell
📦data
┗ 📂huggingface
┃ ┣ 📂html
┃ ┃ ┗ 📂metadata
┃ ┃ ┃ ┗ 📂models
┃ ┣ 📂json
┃ ┃ ┣ 📜huggingface.json
┃ ┃ ┗ 📂metadata
┃ ┃ ┃ ┣ 📂models
┃ ┃ ┃ ┗ 📜hf_metadata.json
Expand All @@ -98,6 +106,39 @@ Model hub scripts do not overwrite the folder. In other words, it is a safe
operation to run multiple model hub scripts from the same directory sequentially
or concurrently.

### Data Directory Specifics

#### `data/huggingface/html`

This directory is not currently utilized by our scripts. It is kept within the
program for consistency across model hubs.

#### `data/huggingface/json`

This directory contains JSON files formatted to fit specific schemas.

The top level file (`huggingface.json`) is formatted to work with the general
[PTMTorrent JSON Schema](../utils/schemas/onnxmodelhubModelMetadata.json).

#### `data/huggingface/json/metadata`

This directory contains JSON files formatted to fit specific schemas.

The file (`hf_metadata.json`) is formatted to work with the
[Model Zoo hub metadata JSON Schema](../utils/schemas/huggingfaceMetadata.json).

#### `data/huggingface/repos`

This directory contains the repository downloaded from the model hub.

Repositories are downloaded into this directory in the format `AUTHOR/REPO`.

Repositories paths are generated by taking the `git` compatible cloning URL and
parsing it for the model *author* and *owner*

> Example: <https://github.com/SoftwareSystemsLaboratory/PTM-Torrent> ->
> SoftwareSystemsLaboratory/PTM-Torrent
## References

> References are sorted by alphabetical order and not how they appear in this
Expand Down

0 comments on commit 392e664

Please sign in to comment.