From 94060172e93f57820779ad5ae891be1a49a1f1cd Mon Sep 17 00:00:00 2001 From: "Nicholas M. Synovic" Date: Wed, 25 Jan 2023 16:25:16 -0600 Subject: [PATCH] Add readme content --- ptm_torrent/huggingface/README.md | 41 +++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/ptm_torrent/huggingface/README.md b/ptm_torrent/huggingface/README.md index 53d0345..f84b454 100644 --- a/ptm_torrent/huggingface/README.md +++ b/ptm_torrent/huggingface/README.md @@ -12,6 +12,11 @@ - [Through `__main__.py`](#through-__main__py) - [As Individual Files](#as-individual-files) - [Data Storage](#data-storage) + - [Data Directory Specifics](#data-directory-specifics) + - [`data/huggingface/html`](#datahuggingfacehtml) + - [`data/huggingface/json`](#datahuggingfacejson) + - [`data/huggingface/json/metadata`](#datahuggingfacejsonmetadata) + - [`data/huggingface/repos`](#datahuggingfacerepos) - [References](#references) ## About @@ -68,6 +73,8 @@ dependencies must first be installed. See this project's root ## Data Storage +> The following directory structure was taken on 1/25/2023. + ```shell 📦data ┗ 📂huggingface @@ -75,6 +82,7 @@ dependencies must first be installed. See this project's root ┃ ┃ ┗ 📂metadata ┃ ┃ ┃ ┗ 📂models ┃ ┣ 📂json + ┃ ┃ ┣ 📜huggingface.json ┃ ┃ ┗ 📂metadata ┃ ┃ ┃ ┣ 📂models ┃ ┃ ┃ ┗ 📜hf_metadata.json @@ -98,6 +106,39 @@ Model hub scripts do not overwrite the folder. In other words, it is a safe operation to run multiple model hub scripts from the same directory sequentially or concurrently. +### Data Directory Specifics + +#### `data/huggingface/html` + +This directory is not currently utilized by our scripts. It is kept within the +program for consistency across model hubs. + +#### `data/huggingface/json` + +This directory contains JSON files formatted to fit specific schemas. + +The top level file (`huggingface.json`) is formatted to work with the general +[PTMTorrent JSON Schema](../utils/schemas/onnxmodelhubModelMetadata.json). + +#### `data/huggingface/json/metadata` + +This directory contains JSON files formatted to fit specific schemas. + +The file (`hf_metadata.json`) is formatted to work with the +[Model Zoo hub metadata JSON Schema](../utils/schemas/huggingfaceMetadata.json). + +#### `data/huggingface/repos` + +This directory contains the repository downloaded from the model hub. + +Repositories are downloaded into this directory in the format `AUTHOR/REPO`. + +Repositories paths are generated by taking the `git` compatible cloning URL and +parsing it for the model *author* and *owner* + +> Example: -> +> SoftwareSystemsLaboratory/PTM-Torrent + ## References > References are sorted by alphabetical order and not how they appear in this