Storing the `NODE_FEATURES` in the annotation graph once for speed-up #277

KristinaUlicna · 2023-09-29T16:37:39Z

PR contribution summary

Why is this PR useful / good for? Please describe the problem(s) you're trying to address.

Script to append all .grace folders in the given path with nodes.parquet file which holds the NODE_FEATURES
GraphAttrs.NODE_FEATURES are now in the graph so that the feature extraction doesn't have to be repeated at every run
If you don't want to re-run the feature extraction, supply None (default) to the config.extractor_fn

List of proposed changes / linked issues & discussions

✅ Resolves [FEATURE REQUEST] Append node features to the graph once and for all #276

What should a reviewer concentrate their feedback on?

🏃 Scripts to run

grace/grace/simulator/README.md

Lines 32 to 38 in 4feae20

    
           ## Storing the node features directly in the graph: 
        
           If you want to perform a hyperparameter grid search for GNN training and you know that the (node) features of your graph dataset won't change, you can run this script to make sure you append the resnet-extracted features to your dataset graphs once and for all. It takes ~30-40 seconds per single image to get processed, so this significantly saves time if launching multiple runs on your (otherwise constant) dataset. 
        
           ```sh 
        
           python3 grace/simulator/store_features.py --data_path=/Users/kulicna/Desktop/dataset/playground/infer/ --extractor_fn=/Users/kulicna/Desktop/classifier/extractor/resnet152.pt 
        
           ```

📝 Everything looks OK?

What type of PR is this? (check all applicable)

🪄 Feature
#️⃣ Documentation / code annotation
🧑‍💻 Code refactor / style

Added tests?

👍 yes
🙅 no, because they aren't needed
🙋 no, because I need some help

PR review summary

Describe what this PR does & how you reviewed the individual items, where needed:

Some helper checks to tick off:

Focus on image annotation
Focus on model training
Could any optimization be applied?
Is there any redundant code?
Are there any spelling errors?

In conclusion, after my review, I'd like to:

🙋 ask some clarifying questions
🙅 suggest some specific changes

crangelsmith · 2023-10-03T09:48:22Z

grace/simulator/store_features.py

I wonder why do we prefer to do this in a separate script? Woudnt be easier that on the first run (if no graph features is provided as an input) the features are extracted and saved for a following run?

I think this choice limits our ability to do patch / graph augmentations on the go. I think the entire approach of how I create the synthetic dataset need to be rebuild to allow some flexibility (PR incoming soon). This is a temporary hack that works for now, but I'll need to think about how not to jeopardise the option to include augmentations in the process.

crangelsmith · 2023-10-03T09:50:05Z

grace/run.py

-            image_aug, graph_aug = img_graph_augs(image, graph)
-            return feature_extractor(image_aug, graph_aug)
+        # Prepare the feature extractor:
+        if config.extractor_fn is not None:


So the logic here is that the features are part of the loaded graph and is we don't want to compute new features we don't provided extractor_fn in the config file?

Exactly - just temporarily though, because none of my experiments include patch / graph augmentations (yet).

crangelsmith · 2023-10-03T10:47:04Z

Ok, so i think is ok to merge as it is something that will allow you to quickly iterate on the model improvement. However, should have an eye on how we want the pipeline to look for an external user and what to keep inside or outside of it.

KristinaUlicna changed the base branch from main to development September 29, 2023 16:40

KristinaUlicna requested a review from crangelsmith September 29, 2023 16:54

KristinaUlicna self-assigned this Sep 29, 2023

KristinaUlicna added documentation Improvements or additions to documentation enhancement New feature or request methodology Building functional & diverse pipeline and removed documentation Improvements or additions to documentation labels Sep 29, 2023

crangelsmith reviewed Oct 3, 2023

View reviewed changes

KristinaUlicna force-pushed the storage branch from d2abf14 to 99c1042 Compare October 3, 2023 10:40

KristinaUlicna closed this Oct 3, 2023

KristinaUlicna force-pushed the storage branch from 670afc0 to 340cd1f Compare October 3, 2023 11:10

crangelsmith mentioned this pull request Oct 4, 2023

Separate hidden layers for graph convolutions + dense Linear layers #283

Merged

2 tasks

KristinaUlicna deleted the storage branch October 4, 2023 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing the `NODE_FEATURES` in the annotation graph once for speed-up #277

Storing the `NODE_FEATURES` in the annotation graph once for speed-up #277

KristinaUlicna commented Sep 29, 2023 •

edited

Loading

crangelsmith Oct 3, 2023

KristinaUlicna Oct 3, 2023

crangelsmith Oct 3, 2023

KristinaUlicna Oct 3, 2023

crangelsmith commented Oct 3, 2023

	## Storing the node features directly in the graph:

	If you want to perform a hyperparameter grid search for GNN training and you know that the (node) features of your graph dataset won't change, you can run this script to make sure you append the resnet-extracted features to your dataset graphs once and for all. It takes ~30-40 seconds per single image to get processed, so this significantly saves time if launching multiple runs on your (otherwise constant) dataset.

	```sh
	python3 grace/simulator/store_features.py --data_path=/Users/kulicna/Desktop/dataset/playground/infer/ --extractor_fn=/Users/kulicna/Desktop/classifier/extractor/resnet152.pt
	```

Storing the NODE_FEATURES in the annotation graph once for speed-up #277

Storing the NODE_FEATURES in the annotation graph once for speed-up #277

Conversation

KristinaUlicna commented Sep 29, 2023 • edited Loading

PR contribution summary

List of proposed changes / linked issues & discussions

What should a reviewer concentrate their feedback on?

What type of PR is this? (check all applicable)

Added tests?

PR review summary

crangelsmith Oct 3, 2023

Choose a reason for hiding this comment

KristinaUlicna Oct 3, 2023

Choose a reason for hiding this comment

crangelsmith Oct 3, 2023

Choose a reason for hiding this comment

KristinaUlicna Oct 3, 2023

Choose a reason for hiding this comment

crangelsmith commented Oct 3, 2023

Storing the `NODE_FEATURES` in the annotation graph once for speed-up #277

Storing the `NODE_FEATURES` in the annotation graph once for speed-up #277

KristinaUlicna commented Sep 29, 2023 •

edited

Loading