We want to make contributing to this project as easy and transparent as possible.
We appreciate all contributions. If you are interested in contributing to TorchData, there are many ways to help out. Your contributions may fall into the following categories:
-
It helps the project if you can
- Report issues that you're facing
- Give a 👍 on issues that others reported and that are relevant to you
-
Answering questions on the issue tracker, investigating bugs are very valuable contributions to the project.
-
You would like to improve the documentation. This is no less important than improving the library itself! If you find a typo in the documentation, do not hesitate to submit a GitHub pull request.
-
If you would like to fix a bug:
- comment on the issue that you want to work on this issue
- send a PR with your fix, see below.
-
If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.
- We have a checklist of things to go through while adding a new DataPipe. See below.
-
If you would like to feature a usage example in our documentation, discuss that with us in an issue.
We use GitHub issues to track public bugs. Please follow the existing templates if possible and ensure that the description is clear and has sufficient instructions to be able to reproduce the issue.
For question related to the usage of this library, please post a question on the PyTorch forum, under the "data" category.
conda install pytorch -c pytorch-nightly
# or with pip (see https://pytorch.org/get-started/locally/)
# pip install numpy
# pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
git clone https://github.com/pytorch/data.git
cd data
python setup.py develop
pip install flake8 typing mypy pytest expecttest
We actively welcome your pull requests.
- Fork the repo and create your branch from
main
. - If you've added code that should be tested, add tests.
- If you've changed APIs, update the documentation and examples.
- Ensure the test suite passes.
- If you haven't already, complete the Contributor License Agreement ("CLA").
torchdata
enforces a fairly strict code format through pre-commit
. You can install it with
pip install pre-commit
or
conda install -c conda-forge pre-commit
To check and in most cases fix the code format, stage all your changes (git add
) and run pre-commit run
. To perform
the checks automatically before every git commit
, you can install them with pre-commit install
.
When adding a new DataPipe, there are few things that need to be done to ensure it is working and documented properly.
- Naming - please following the naming convention as described here.
- Testing - please add unit tests to ensure that the DataPipe is functioning properly. Here are the
test requirements that we have.
- One test that is commonly missed is the serialization test. Please add the new DataPipe to
test_serialization.py
. - If your test requires interacting with files in the file system (e.g. opening a
csv
ortar
file, we prefer those files to be generated during the test (seetest_local_io.py
). If the file is on a remote server, seetest_remote_io.py
.
- One test that is commonly missed is the serialization test. Please add the new DataPipe to
- Documentation - ensure that the DataPipe has docstring, usage example, and that it is added to the right category of
the right RST file to be rendered.
- If your DataPipe has a functional form (i.e.
@functional_datapipe(...)
), include at the end of the first sentence of your docstring. This will make sure it correctly shows up in the summary table of our documentation.
- If your DataPipe has a functional form (i.e.
- Import - import the DataPipe in the correct
__init__.py
file. - Interface - if the DataPipe has a functional form, make sure that is generated properly by
gen_pyi.py
into the relevant interface file.- You can re-generate the pyi files by re-running
python setup.py develop
, then you can examine the new outputs.
- You can re-generate the pyi files by re-running
In order to accept your pull request, we need you to submit a CLA. You only need to do this once to work on any of Facebook's open source projects.
Complete your CLA here: https://code.facebook.com/cla
By contributing to TorchData, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.