Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff #148

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tiangolo
Copy link

@tiangolo tiangolo commented Nov 26, 2023

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff.

Motivation

Of of the main use cases for me, and where insert_assert shines the most (for me) is updating the assert for a big OpenAPI output from FastAPI (in FastAPI tests and SQLModel tests).

Nevertheless, as the previous tests used Pydantic 1.x, the output generated by Pydantic v2 has some slight changes.

But, Pydantic v2 outputs some keys in JSON Schema in different order than v1... which is fine, because dicts are not ordered, equality is the same, tests would still pass, etc. ...but the resulting diff from the previous data and the new inserted data is quite big, just for these differences (e.g. title now comes before the rest). And that makes it more difficult to see the actual changes (e.g. values with str | None now have a schema of "any between string and null").

For the FastAPI tests, during the migration to Pydantic v2, I manually updated all those differences one by one to check the actual content change.

Now I'm updating SQLModel and having a local version of this helps a lot, the git diff shows only what actually changed, and I can verify and update anything necessary much more quickly.

Problem Example

Imagine you have a test that looks like:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

But now get_data() was updated and returns:

{
    "bar": [
        {
            "description": "Data validation library",
            "tags": ["validation", "json"],
            "name": "Pydantic",
        },
        {"name": "FastAPI", "description": "Web API framework in Python"},
        {"description": "DBs and Python", "name": "SQLModel"},
        {"name": "ARQ"},
    ],
    "baz": 6,
    "foo": 12,
}

If you just run insert_assert as before:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data)

You would normally get this:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "bar": [
            {
                "description": "Data validation library",
                "tags": ["validation", "json"],
                "name": "Pydantic",
            },
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"description": "DBs and Python", "name": "SQLModel"},
            {"name": "ARQ"},
        ],
        "baz": 6,
        "foo": 12,
    }

This has a larger diff, although the differences are not that big:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
-        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"description": "Data validation library", "tags": ["validation", "json"], "name": "Pydantic"},
-            {"name": "FastAPI", "description": "Web API framework in Python"},
+           {"description": "Web API framework in Python", "name": "FastAPI"},
-            {"name": "SQLModel"},
+           {"description": "DBs and Python", "name": "SQLModel"},
+           {"name": "ARQ"},
        ],
-        "baz": 3,
+       "baz": 6,
+      "foo": 1,
    }

Solution

Now let's start with the same original example:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

When updating it to run insert_assert again, you can pass as the second argument the old data:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    })

And now when you run it, it will have the same new data, but with the keys in the new dicts sorted based on the order of the older data, minimizing the git diff:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"name": "Pydantic", "tags": ["validation", "json"], "description": "Data validation library"},
            {"name": "FastAPI", "description": "Web API framework in Python"},
-            {"name": "SQLModel"},
+           {"name": "SQLModel", "description": "DBs and Python"},
+           {"name": "ARQ"},
        ],
-       "baz": 3,
+      "baz": 6,
    })

Notice, for example, how "foo" was kept at the top of the dict, so there's no diff for "foo" now (which didn't change).

And the dict for FastAPI doesn't have diff changes.

Copy link

codecov bot commented Nov 26, 2023

Codecov Report

Merging #148 (d32dd60) into main (ec406ff) will decrease coverage by 0.02%.
The diff coverage is 96.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #148      +/-   ##
==========================================
- Coverage   96.29%   96.27%   -0.02%     
==========================================
  Files           8        8              
  Lines         729      752      +23     
  Branches      111      120       +9     
==========================================
+ Hits          702      724      +22     
  Misses         21       21              
- Partials        6        7       +1     
Files Coverage Δ
devtools/pytest_plugin.py 89.00% <96.00%> (+0.86%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec406ff...d32dd60. Read the comment docs.

@tiangolo tiangolo marked this pull request as ready for review November 26, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant