Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NodeGroup.grouping_node #11613 #11614

Open
wants to merge 19 commits into
base: dev/8.0.x
Choose a base branch
from

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Nov 11, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

Before
From a node, to find the grouping node of that node's nodegroup, you would need to prefetch the sibling nodes and compare the results to the id of the nodegroup, or else just issue another Node.objects.get() to individually fetch the wanted node.

After
There is now a self-referencing foreign key to the other node that serves as the nodegroup root.
Per discussion, there is now a OneToOneField on the NodeGroup model pointing to the root node for the nodegroup.

Example query:

In [22]: from django.db import connection, reset_queries

In [23]: reset_queries()

In [24]: ngs = NodeGroup.objects.filter(grouping_node__isnull=False).select_related("grouping_node").distinct()

In [25]: for ng in ngs:
    ...:     print(ng.grouping_node.alias)
    ...: 
app_names
basic_search_settings
sparql_endpoint_providers
system_settings
map_settings
project_extent
map_zoom
saved_searches
mapbox
temporal_search
analytics
search_results_grid

In [26]: len(connection.queries)
Out[26]: 1

Issues Solved

Closes #11613

Checklist

  • I targeted one of these branches:
    • dev/8.0.x (under development): features, bugfixes not covered below
    • dev/7.6.x (main support): regressions, crashing bugs, security issues, major bugs in new features
    • dev/6.2.x (extended support): major security issues, data loss issues
  • I added a changelog in arches/releases
  • I submitted a PR to arches-docs (if appropriate)
  • Unit tests pass locally with my changes
  • I added tests that prove my fix is effective or that my feature works
  • My test fails on the target branch

Ticket Background

@jacobtylerwalls jacobtylerwalls changed the title Add Node.nodegroup_root #11613 Add Node.grouping_node #11613 Nov 14, 2024
@@ -907,15 +915,21 @@ def __init__(self, *args, **kwargs):
def clean(self):
if not self.alias:
Graph.objects.get(pk=self.graph_id).create_node_alias(self)
self.grouping_node_id = self.nodegroup_id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation yesterday I decided to run this unconditionally instead of when empty. That would make the changes in a9b2f32 not strictly necessary, but I left them in. (They'll probably change anyway after #11609.)

@chrabyrd chrabyrd self-requested a review November 14, 2024 17:17
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm 🚀 ! One question in the body of code.

On a more meta note, if it's possible to add the Node relation to NodeGroup instead of this change and still power what you're trying to do I think there'd be an additional win gained there. There have been quite a few times I've needed to get the node from the nodegroup and end up running to the db again 💀

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

arches/app/models/graph.py Outdated Show resolved Hide resolved
@jacobtylerwalls
Copy link
Member Author

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?


If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

@chrabyrd
Copy link
Contributor

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

Cool cool! One thing re slack convo: with_grouping_node_alias() -> with_grouping_node() ? What if I wanted other info from the nodegroup node besides alias? And if you're going that route would it be a big lift to add with_node() to the nodegroup? For the same win mentioned earlier 😄

And ++ for having the same thoughts about naming. Hopefully that makes it in 🤞

@jacobtylerwalls
Copy link
Member Author

Yeah I think if we want other stuff from the node then it should go back to an FK, but I can definitely explore putting the column on the nodegroup. I'll give it some thought.

@chiatt
Copy link
Member

chiatt commented Nov 15, 2024

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

I don't have a strong opinion on this. My preference is grouping_node because it describes the node's role in the group. It's also a term already used in the label based graph tests. nodegroup_node would be fine, but it seems a bit awkward to me. I realize that's pretty subjective, so maybe others would like to weigh in (@apeters, @robgaston?)

@jacobtylerwalls
Copy link
Member Author

Also, root could be confused with the root node of the graph or the grouping node of the root nodegroup (in graph.py, this is defined as "(the first nodegroup that doesn't have a parentnodegroup)"

@jacobtylerwalls
Copy link
Member Author

On a more meta note, if it's possible to add the Node relation to NodeGroup instead of this change and still power what you're trying to do I think there'd be an additional win gained there. There have been quite a few times I've needed to get the node from the nodegroup and end up running to the db again 💀

I gave it some thought, and although it makes intuitive sense to put it on NodeGroup, I think there are some considerations going the other way:

  • the relation would have to be nullable given the order we create nodegroups before nodes, plus the fact that nodegroups without nodes are allowed at a db level (although we do have a manage.py validate command to delete them). I'd rather it be not nullable.
  • with the information in a different table, you can no longer have a database check constraint to enforce that the FK has a certain kind of relationship (e.g. you could enforce that the grouping_node pk == the pk of the nodegroup, but I don't think you could enforce that every node has a grouping node, e.g. that every not-root-node has a nodegroup with a grouping node set).
  • In a simpler application, one could use django signals to keep tables in sync, but you know how much code doesn't go through django these days.

get the node from the nodegroup and end up running to the db again 💀

This is actually my main motivation, and now there's a way to do that when fetching the data. On the other proposal (column on NodeGroup) you'd have to select_related() the grouping node when fetching data, so it's not like the developer gets a permission slip to not think about this ;)

# Make sure to get grouping nodes before you need them
In [49]: with_grouping_nodes = NodeGroup.objects.filter(node__gt=0).prefetch_related('node_set__grouping_node').distinct()

In [50]: for ng in with_grouping_nodes:
    ...:     print(ng.node_set.all()[0].grouping_node.alias)

@apeters
Copy link
Member

apeters commented Nov 19, 2024

I don't have a strong opinion on this. My preference is grouping_node because it describes the node's role in the group. It's also a term already used in the label based graph tests. nodegroup_node would be fine, but it seems a bit awkward to me. I realize that's pretty subjective, so maybe others would like to weigh in (@apeters, @robgaston?)

I agree with @chiatt. I think grouping_node describes its function nicely.

@apeters
Copy link
Member

apeters commented Nov 19, 2024

I would also say that from a "what makes the most sense" perspective (and not to throw a monkey wrench 🐵 🔧 into this discussion) that having this on the "Nodegroup" table makes the most sense. So, in that case you'd end with something like this Node.nodegroup.rootnode to get to the "grouping_node" (but in this case, I think "rootnode" makes most sense when it's hanging off of the Nodegroup table).

Now pragmatically speaking this may not net us less db calls if that's what we're trying to achieve with this, but it does feel very natural.

The system check in GraphModel.check() depends on the migrated
model.
@jacobtylerwalls jacobtylerwalls changed the title Add Node.grouping_node #11613 Add NodeGroup.root_node #11613 Nov 19, 2024
@jacobtylerwalls
Copy link
Member Author

Yeah, it makes good sense to move it. I can handle the missing safety with a Django system check.

@jacobtylerwalls
Copy link
Member Author

@chiatt ready for another look :-)

If the test database is not set up, this check used to crash when
running manage.py test.
arches/app/models/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OOO today but saw these interesting updates and wanted to stick my nose in 👃

operations = [
migrations.AddField(
model_name="nodegroup",
name="root_node",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue just node here. root_node is less clear and can have different connotations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hold this until everyone's happy with the name, but node is the query name for a related item from node_set, so I don't think we can overload that. I think we need to land on something.

@jacobtylerwalls jacobtylerwalls changed the title Add NodeGroup.root_node #11613 Add NodeGroup.grouping_node #11613 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add grouping_node column to NodeGroup
4 participants