All our notebooks are stored here. Read the stuff below to acquaint urself with the workflow. Remember to read the Workflow segment!
- Install
pipenv
withpip3 install pipenv
- Clone this repository
- Run
pipenv install
to pull dependencies - (Windows only) Special module for Windows users
- Enter shell with
pipenv shell
- Run
pip install pywin32
- Enter shell with
- Setup git LFS
- Follow step 1 of getting started from the Git LFS website if you have not installed git LFS on your computer
- That's it! I have already initialized git lfs in this repo so no further setup required. Use
git lfs track <file(s)>
so that the specifiedfile(s)
are handled by LFS. After that just push and pull as per normal.
- Install dependencies for this repository with
pipenv install XXX
and inform everybody of any new dependencies you have pushed into the repo - When pulling new commits from the repo, its a good practice to run
pipenv update
to ensure you pull any new dependencies - Remove dependencies with pipenv uninstall (make sure you push the changes)
- Run the code using
pipenv run XXX.py
or enter shell withpipenv shell
and then execute the file- This runs the code in the virtual environment in which all these dependencies were installed to (you would have seen its location when u first ran
pipenv install
in getting started)
- This runs the code in the virtual environment in which all these dependencies were installed to (you would have seen its location when u first ran
- Following the previous section, you run JupyterLab by:
- Enter shell mode with
pipenv shell
- Run
jupyter lab
- Enter shell mode with
- I take reference to this guide
Instructions based off of this guide
When you pull dependancies you should have everything you need to run the notebooks. I have created folders for each part of the competition (CV and NLP). Save your notebooks there
I am not gonna push the dataset to Github. For the sake of standardizing paths, create a folder called data
and extract the contents of the Download All
zip file here. I have also created a saved_models
folder within the two parts (CV and NLP) to save the models we have trained locally. Refer to the next section for more info
Here is how the overall directory looks on my pc. Folders that are not pushed to the Github repo are marked with *
.
├── CV
│ ├── CV-Notebooks-go-here!
│ ├── ravyu_RESNET50_defaulttemplate.ipynb
│ └── saved_models
├── data*
│ ├── NLP_submission_example.csv
│ ├── resnet50_weights_tf_dim_ordering_tf_kernels.h5
│ ├── resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
│ ├── TIL_NLP_test_dataset.csv
│ ├── TIL_NLP_train_dataset.csv
│ ├── train
│ ├── train.json
│ ├── train.p
│ ├── val
│ ├── val.json
│ ├── val.p
│ └── word_embeddings.pkl
├── docs
│ └── TIL 2020 Qualifier Info Pack v1 0.pdf
├── NLP
│ └── NLP-Notebooks-go-here!
├── Pipfile
├── Pipfile.lock
└── README.md
9 directories, 15 files
If you want to save any trained models you can do so. These models are usually large in size (mine was 90+ mb). Use git lfs for any binary files larger than 100mb.
If you followed the Getting started section, you would have installed git LFS, which allows you to upload binary files larger than 100mb.
- Use
git lfs track <file(s)>
so that the specifiedfile(s)
are handled by LFS (reflected in the .gitattributes file). After that just push and pull as per normal.
Please read this to understand how quotas are filled. Basically:
- Only push using LFS once I (Ravyu) have agreed to let you do so (since it counts against my quota).
- Try to restrict pushing with LFS to critical stuff like finalized models, etc.
- If there is something less critical but still worth sharing with the team, we will use something else (GDrive, my cloud server, etc.,)
- Remember, anything less than 100mb works just fine without LFS.
- Don't any push or you pay me $5 so that I can purchase more quota!