Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing the NODE_FEATURES in the annotation graph once for speed-up #277

Closed
wants to merge 0 commits into from

Conversation

KristinaUlicna
Copy link
Collaborator

@KristinaUlicna KristinaUlicna commented Sep 29, 2023

PR contribution summary

Why is this PR useful / good for? Please describe the problem(s) you're trying to address.

  • Script to append all .grace folders in the given path with nodes.parquet file which holds the NODE_FEATURES
  • GraphAttrs.NODE_FEATURES are now in the graph so that the feature extraction doesn't have to be repeated at every run
  • If you don't want to re-run the feature extraction, supply None (default) to the config.extractor_fn

List of proposed changes / linked issues & discussions

What should a reviewer concentrate their feedback on?

  • 🏃 Scripts to run

## Storing the node features directly in the graph:
If you want to perform a hyperparameter grid search for GNN training and you know that the (node) features of your graph dataset won't change, you can run this script to make sure you append the resnet-extracted features to your dataset graphs once and for all. It takes ~30-40 seconds per single image to get processed, so this significantly saves time if launching multiple runs on your (otherwise constant) dataset.
```sh
python3 grace/simulator/store_features.py --data_path=/Users/kulicna/Desktop/dataset/playground/infer/ --extractor_fn=/Users/kulicna/Desktop/classifier/extractor/resnet152.pt
```

  • 📝 Everything looks OK?

What type of PR is this? (check all applicable)

  • 🪄 Feature
  • #️⃣ Documentation / code annotation
  • 🧑‍💻 Code refactor / style

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need some help

PR review summary

Describe what this PR does & how you reviewed the individual items, where needed:

Some helper checks to tick off:

  • Focus on image annotation
  • Focus on model training
  • Could any optimization be applied?
  • Is there any redundant code?
  • Are there any spelling errors?

In conclusion, after my review, I'd like to:

  • 🙋 ask some clarifying questions
  • 🙅 suggest some specific changes

@KristinaUlicna KristinaUlicna changed the base branch from main to development September 29, 2023 16:40
@KristinaUlicna KristinaUlicna self-assigned this Sep 29, 2023
@KristinaUlicna KristinaUlicna added documentation Improvements or additions to documentation enhancement New feature or request methodology Building functional & diverse pipeline and removed documentation Improvements or additions to documentation labels Sep 29, 2023
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we prefer to do this in a separate script? Woudnt be easier that on the first run (if no graph features is provided as an input) the features are extracted and saved for a following run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this choice limits our ability to do patch / graph augmentations on the go. I think the entire approach of how I create the synthetic dataset need to be rebuild to allow some flexibility (PR incoming soon). This is a temporary hack that works for now, but I'll need to think about how not to jeopardise the option to include augmentations in the process.

grace/run.py Outdated
image_aug, graph_aug = img_graph_augs(image, graph)
return feature_extractor(image_aug, graph_aug)
# Prepare the feature extractor:
if config.extractor_fn is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the logic here is that the features are part of the loaded graph and is we don't want to compute new features we don't provided extractor_fn in the config file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly - just temporarily though, because none of my experiments include patch / graph augmentations (yet).

@crangelsmith
Copy link
Collaborator

Ok, so i think is ok to merge as it is something that will allow you to quickly iterate on the model improvement. However, should have an eye on how we want the pipeline to look for an external user and what to keep inside or outside of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request methodology Building functional & diverse pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] Append node features to the graph once and for all
2 participants