Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use set instead of list for dags' tags #41695

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

Avihais12344
Copy link

@Avihais12344 Avihais12344 commented Aug 23, 2024

Closes issue 41420.

Thank you for viewing this PR :)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Aug 23, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@shahar1 shahar1 added airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:core labels Aug 23, 2024
@shahar1 shahar1 added this to the Airflow 3.0.0 milestone Aug 23, 2024
Copy link
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I agree with the idea that set is more appropriate in this case, as tags should be unique. I'd be happy for more opinions whether it's worth making it as a breaking chage.

@@ -767,7 +767,7 @@ def __init__(

self.doc_md = self.get_doc_md(doc_md)

self.tags = tags or []
self.tags: abc.MutableSet[str] = set(tags or [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just use Collection. We don’t really expect users to add more tags to a DAG after it’s created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true. In cluster policies, we may want to add custome tags to DAGs (for orginization hierarcy for example).
And I don't think it would be correct to make it an immutable collection that we would need to create every time we want to change something.

If we would go with Collection, we would go with a different programing paradigram, and I am not sure everyone would like it, so I would keep the door open and let everyone chose what they want to.

airflow/models/dag.py Outdated Show resolved Hide resolved
Comment on lines +3737 to +3744
def test__tags_mutable():
expected_tags = {"6", "7"}
test_dag = DAG("test-dag")
test_dag.tags.add("6")
test_dag.tags.add("7")
test_dag.tags.add("8")
test_dag.tags.remove("8")
assert test_dag.tags == expected_tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I want to check behaviour, if we change the behaviour, we should know about it.
For example, changing the implementation of the tags from set to list would break this change, and we would know about it.

That's why I think we should have this test, but let's keep the conversation here, as we talk about the same thing.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup! Especially as is is "non-breaking" for existing DAGs, so no effect on DAG authors but better data types. Just small proposals to pin the typing to strings

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests fail, otherwise OK for me. Small nit as comments

newsfragments/41420.significant.rst Outdated Show resolved Hide resolved
newsfragments/41420.significant.rst Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
Avihais12344 and others added 2 commits September 8, 2024 19:47
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
@uranusjr
Copy link
Member

uranusjr commented Sep 11, 2024

I do not like this, but feel free to merge this without me if others feel more strongly this i beneficial.

…it raises the error: Subscripted generics cannot be used with class and instance checks.
@Avihais12344
Copy link
Author

Avihais12344 commented Sep 12, 2024

@uranusjr why you don't like this?

@uranusjr
Copy link
Member

I mostly outlined the reasons above, but to summarise, you want to keep this mutable so you can call add after a DAG is created, but by changing this from a list to set, you are also breaking everyone that’s already adding tags after a DAG is created. The two things you are trying to do (use set instead of list, and keep the attribute mutable) have directly conflicted reasoning. IMO it is simply not worthwhile to break everyone so you can more easily handle duplicated tags. You can just do that easily in your own code.

With that said, 3.0 is the exact time to break everyone, so I am still open for this to be merged if people feel fine about it. But I will not approve nor merge this change myself.

@jscheffl
Copy link
Contributor

I mostly outlined the reasons above, but to summarise, you want to keep this mutable so you can call add after a DAG is created, but by changing this from a list to set, you are also breaking everyone that’s already adding tags after a DAG is created. The two things you are trying to do (use set instead of list, and keep the attribute mutable) have directly conflicted reasoning. IMO it is simply not worthwhile to break everyone so you can more easily handle duplicated tags. You can just do that easily in your own code.

With that said, 3.0 is the exact time to break everyone, so I am still open for this to be merged if people feel fine about it. But I will not approve nor merge this change myself.

Would it make you more happy if we keep the interface as a list (so: non breaking) but during setting/init it is temporarily converted to a set to ensure no duplicates are in the list?

@Avihais12344
Copy link
Author

@uranusjr I am sorry to hear you don't like my change. I think the solution of @jscheffl is reasonable to handle your conflict, even though I think we should break the interface to a set. Due to the release of Airflow 3, it's a great time (that may not come back) to break interfaces.

But can you (@uranusjr) at least close your conversations? As it blockes the merge request (that was approved by other people).
Thank you for your understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow3.0:breaking Candidates for Airflow 3.0 that contain breaking changes airflow3.0:candidate Potential candidates for Airflow 3.0 area:core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants