From 122b8440c16286b76301c706406461e52a11e034 Mon Sep 17 00:00:00 2001 From: Adam Narozniak <51029327+adam-narozniak@users.noreply.github.com> Date: Fri, 29 Nov 2024 15:12:09 +0100 Subject: [PATCH] docs(datasets) Add dataset contribution guide (#4601) Co-authored-by: jafermarq --- .../contributor-how-to-contribute-dataset.rst | 56 +++++++++++++++++++ datasets/doc/source/index.rst | 7 +++ 2 files changed, 63 insertions(+) create mode 100644 datasets/doc/source/contributor-how-to-contribute-dataset.rst diff --git a/datasets/doc/source/contributor-how-to-contribute-dataset.rst b/datasets/doc/source/contributor-how-to-contribute-dataset.rst new file mode 100644 index 000000000000..07a6ba6378f7 --- /dev/null +++ b/datasets/doc/source/contributor-how-to-contribute-dataset.rst @@ -0,0 +1,56 @@ +How to contribute a dataset +=========================== + +To make a dataset available in Flower Dataset (`flwr-datasets`), you need to add the dataset to `HuggingFace Hub `_ . + +This guide will explain the best practices we found when adding datasets ourselves and point to the HFs guides. To see the datasets added by Flower, visit https://huggingface.co/flwrlabs. + +Dataset contribution process +---------------------------- +The contribution contains three steps: first, on your development machine transform your dataset into a ``datasets.Dataset`` object, the preferred format for datasets in HF Hub; second, upload the dataset to HuggingFace Hub and detail it its readme how can be used with Flower Dataset; third, share your dataset with us and we will add it to the `recommended FL dataset list `_ + +Creating a dataset locally +^^^^^^^^^^^^^^^^^^^^^^^^^^ +You can create a local dataset directly using the `datasets` library or load it in any custom way and transform it to the `datasets.Dataset` from other Python objects. +To complete this step, we recommend reading our :doc:`how-to-use-with-local-data` guide or/and the `Create a dataset `_ guide from HF. + +.. tip:: + We recommend that you do not upload custom scripts to HuggingFace Hub; instead, create the dataset locally and upload the data, which will speed up the processing time each time the data set is downloaded. + +Contribution to the HuggingFace Hub +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Each dataset in the HF Hub is a Git repository with a specific structure and readme file, and HuggingFace provides an API to push the dataset and, alternatively, a user interface directly in the website to populate the information in the readme file. + +Contributions to the HuggingFace Hub come down to: + +1. creating an HF repository for the dataset. +2. uploading the dataset. +3. filling in the information in the readme file. + +To complete this step, follow this HF's guide `Share dataset to the Hub `_. + +Note that the push of the dataset is straightforward, and here's what it could look like: + +.. code-block:: python + + from datasets import Dataset + + # Example dataset + data = { + 'column1': [1, 2, 3], + 'column2': ['a', 'b', 'c'] + } + + # Create a Dataset object + dataset = Dataset.from_dict(data) + + # Push the dataset to the HuggingFace Hub + dataset.push_to_hub("you-hf-username/your-ds-name") + +To make the dataset easily accessible in FL we recommend adding the "Use in FL" section. Here's an example of how it is done in `one of our repos `_ for the cinic10 dataset. + +Increasing visibility of the dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If you want the dataset listed in our `recommended FL dataset list `_ , please send a PR or ping us in `Slack `_ #contributions channel. + +That's it! You have successfully contributed a dataset to the HuggingFace Hub and made it available for FL community. Thank you for your contribution! \ No newline at end of file diff --git a/datasets/doc/source/index.rst b/datasets/doc/source/index.rst index 6f7c47bf2416..422d93582a02 100644 --- a/datasets/doc/source/index.rst +++ b/datasets/doc/source/index.rst @@ -66,6 +66,13 @@ Information-oriented API reference and other reference material. recommended-fl-datasets ref-telemetry +.. toctree:: + :maxdepth: 1 + :caption: Contributor tutorials + + contributor-how-to-contribute-dataset + + Main features ------------- Flower Datasets library supports: