Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use set instead of list for dags' tags #41695

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
3881820
Started working on dag tags, moved the tags to set, and added atest t…
Avihais12344 Aug 23, 2024
079ebb2
Fixed more areas of the c'tor.
Avihais12344 Aug 23, 2024
cfaa915
Fixed test of dag tags.
Avihais12344 Aug 23, 2024
c4534de
Added a check to see if the tags are mutable.
Avihais12344 Aug 23, 2024
a03c055
Added newsfragment.
Avihais12344 Aug 26, 2024
b31dcca
Removed unecessary check.
Avihais12344 Aug 26, 2024
bd24f30
Merge remote-tracking branch 'upstream/main'
Avihais12344 Sep 2, 2024
a20eeb7
Removed str specification at the type, for compatability with python …
Avihais12344 Sep 4, 2024
c5c2e02
Removed more type specification as part of compatability with python 3.8
Avihais12344 Sep 4, 2024
49c7743
Fixed the newsfragment.
Avihais12344 Sep 4, 2024
8a611d2
Added missing word.
Avihais12344 Sep 4, 2024
9432103
Used `` for code segemnts at the rst file.
Avihais12344 Sep 6, 2024
659d75b
Reformatted the file.
Avihais12344 Sep 6, 2024
816d73e
Fixed wrong method for adding tag.
Avihais12344 Sep 6, 2024
1e8af76
Added type hinting at the dag bag.
Avihais12344 Sep 6, 2024
ce824cb
Deserialized the tags to set.
Avihais12344 Sep 6, 2024
40605cd
Adjusted the tests for the set type.
Avihais12344 Sep 6, 2024
557ced7
Added type hinting.
Avihais12344 Sep 6, 2024
41e5112
Sorting the tags by name.
Avihais12344 Sep 6, 2024
035e50a
Merge branch 'main' of github.com:Avihais12344/airflow
Avihais12344 Sep 6, 2024
6a433ef
Changed to typing.
Avihais12344 Sep 6, 2024
669ede1
Update newsfragments/41420.significant.rst
Avihais12344 Sep 8, 2024
fd652b3
Update newsfragments/41420.significant.rst
Avihais12344 Sep 8, 2024
08a5689
Removed the generic specification at the dag args expected types, as …
Avihais12344 Sep 12, 2024
eb20c7b
Added tags to the expected serialized DAG.
Avihais12344 Sep 20, 2024
d43f204
Added sorting the tags keys by the name key.
Avihais12344 Sep 20, 2024
64dcb34
Merge remote-tracking branch 'upstream/main'
Avihais12344 Sep 20, 2024
bae6471
Fixed sorting tags by name to use `sorted` instead of `.sort`
Avihais12344 Sep 20, 2024
18773a7
Fixed tags comparesion, as it's now a set, and not a list.
Avihais12344 Sep 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions airflow/models/dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ def _create_orm_dagrun(
"doc_md": str,
"is_paused_upon_creation": bool,
"render_template_as_native_obj": bool,
"tags": list,
"tags": abc.Collection[str],
"auto_register": bool,
"fail_stop": bool,
"dag_display_name": str,
Expand Down Expand Up @@ -552,7 +552,7 @@ def __init__(
is_paused_upon_creation: bool | None = None,
jinja_environment_kwargs: dict | None = None,
render_template_as_native_obj: bool = False,
tags: list[str] | None = None,
tags: abc.Collection[str] | None = None,
owner_links: dict[str, str] | None = None,
auto_register: bool = True,
fail_stop: bool = False,
Expand Down Expand Up @@ -767,7 +767,7 @@ def __init__(

self.doc_md = self.get_doc_md(doc_md)

self.tags = tags or []
self.tags: abc.MutableSet[str] = set(tags or [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just use Collection. We don’t really expect users to add more tags to a DAG after it’s created.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's true. In cluster policies, we may want to add custome tags to DAGs (for orginization hierarcy for example).
And I don't think it would be correct to make it an immutable collection that we would need to create every time we want to change something.

If we would go with Collection, we would go with a different programing paradigram, and I am not sure everyone would like it, so I would keep the door open and let everyone chose what they want to.

self._task_group = TaskGroup.create_root(self)
self.validate_schedule_and_params()
wrong_links = dict(self.iter_invalid_owner_links())
Expand Down Expand Up @@ -2948,7 +2948,7 @@ def bulk_write_to_db(
else:
orm_dag.calculate_dagrun_date_fields(dag, last_automated_data_interval)

dag_tags = set(dag.tags or {})
dag_tags = dag.tags or set()
uranusjr marked this conversation as resolved.
Show resolved Hide resolved
orm_dag_tags = list(orm_dag.tags or [])
for orm_tag in orm_dag_tags:
if orm_tag.name not in dag_tags:
Expand Down Expand Up @@ -3825,7 +3825,7 @@ def dag(
is_paused_upon_creation: bool | None = None,
jinja_environment_kwargs: dict | None = None,
render_template_as_native_obj: bool = False,
tags: list[str] | None = None,
tags: abc.Collection[str] | None = None,
owner_links: dict[str, str] | None = None,
auto_register: bool = True,
fail_stop: bool = False,
Expand Down
1 change: 1 addition & 0 deletions newsfragments/41420.significant.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Replaced Python's `list` with `collections.abc.MutableSet` for the `Dag` tags.
romsharon98 marked this conversation as resolved.
Show resolved Hide resolved
26 changes: 25 additions & 1 deletion tests/models/test_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -1060,7 +1060,7 @@ def test_bulk_write_to_db(self):

# Removing all tags
for dag in dags:
dag.tags = None
dag.tags = set()
with assert_queries_count(9):
DAG.bulk_write_to_db(dags)
with create_session() as session:
Expand Down Expand Up @@ -3720,6 +3720,30 @@ def test__tags_length(tags: list[str], should_pass: bool):
DAG("test-dag", schedule=None, tags=tags)


@pytest.mark.parametrize(
"input_tags, expected_result",
[
pytest.param([], set(), id="empty tags"),
pytest.param(["a normal tag"], {"a normal tag",}, id="one tag"),
pytest.param(["a normal tag", "another normal tag"], {"a normal tag", "another normal tag"}, id="two different tags"),
pytest.param(["a", "a"], {"a",}, id="two same tags"),
],
)
def test__tags_duplicates(input_tags: list[str], expected_result: set[str]):
result = DAG("test-dag", tags=input_tags)
assert result.tags == expected_result


def test__tags_mutable():
expected_tags = {"6", "7"}
test_dag = DAG("test-dag")
test_dag.tags.add("6")
test_dag.tags.add("7")
test_dag.tags.add("8")
test_dag.tags.remove("8")
assert test_dag.tags == expected_tags
Comment on lines +3412 to +3419
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I want to check behaviour, if we change the behaviour, we should know about it.
For example, changing the implementation of the tags from set to list would break this change, and we would know about it.

That's why I think we should have this test, but let's keep the conversation here, as we talk about the same thing.



@pytest.mark.need_serialized_dag
def test_get_dataset_triggered_next_run_info(dag_maker, clear_datasets):
dataset1 = Dataset(uri="ds1")
Expand Down