✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff #148

tiangolo · 2023-11-26T11:46:30Z

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff.

Motivation

Of of the main use cases for me, and where insert_assert shines the most (for me) is updating the assert for a big OpenAPI output from FastAPI (in FastAPI tests and SQLModel tests).

Nevertheless, as the previous tests used Pydantic 1.x, the output generated by Pydantic v2 has some slight changes.

But, Pydantic v2 outputs some keys in JSON Schema in different order than v1... which is fine, because dicts are not ordered, equality is the same, tests would still pass, etc. ...but the resulting diff from the previous data and the new inserted data is quite big, just for these differences (e.g. title now comes before the rest). And that makes it more difficult to see the actual changes (e.g. values with str | None now have a schema of "any between string and null").

For the FastAPI tests, during the migration to Pydantic v2, I manually updated all those differences one by one to check the actual content change.

Now I'm updating SQLModel and having a local version of this helps a lot, the git diff shows only what actually changed, and I can verify and update anything necessary much more quickly.

Problem Example

Imagine you have a test that looks like:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

But now get_data() was updated and returns:

{
    "bar": [
        {
            "description": "Data validation library",
            "tags": ["validation", "json"],
            "name": "Pydantic",
        },
        {"name": "FastAPI", "description": "Web API framework in Python"},
        {"description": "DBs and Python", "name": "SQLModel"},
        {"name": "ARQ"},
    ],
    "baz": 6,
    "foo": 12,
}

If you just run insert_assert as before:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data)

You would normally get this:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "bar": [
            {
                "description": "Data validation library",
                "tags": ["validation", "json"],
                "name": "Pydantic",
            },
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"description": "DBs and Python", "name": "SQLModel"},
            {"name": "ARQ"},
        ],
        "baz": 6,
        "foo": 12,
    }

This has a larger diff, although the differences are not that big:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
-        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"description": "Data validation library", "tags": ["validation", "json"], "name": "Pydantic"},
-            {"name": "FastAPI", "description": "Web API framework in Python"},
+           {"description": "Web API framework in Python", "name": "FastAPI"},
-            {"name": "SQLModel"},
+           {"description": "DBs and Python", "name": "SQLModel"},
+           {"name": "ARQ"},
        ],
-        "baz": 3,
+       "baz": 6,
+      "foo": 1,
    }

Solution

Now let's start with the same original example:

def test_dict(insert_assert):
    data = get_data()
    # insert_assert(data)
    assert data == {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    }

When updating it to run insert_assert again, you can pass as the second argument the old data:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
            {"name": "Pydantic", "tags": ["validation", "json"]},
            {"name": "FastAPI", "description": "Web API framework in Python"},
            {"name": "SQLModel"},
        ],
        "baz": 3,
    })

And now when you run it, it will have the same new data, but with the keys in the new dicts sorted based on the order of the older data, minimizing the git diff:

def test_dict(insert_assert):
    data = get_data()
    insert_assert(data, {
        "foo": 1,
        "bar": [
-            {"name": "Pydantic", "tags": ["validation", "json"]},
+            {"name": "Pydantic", "tags": ["validation", "json"], "description": "Data validation library"},
            {"name": "FastAPI", "description": "Web API framework in Python"},
-            {"name": "SQLModel"},
+           {"name": "SQLModel", "description": "DBs and Python"},
+           {"name": "ARQ"},
        ],
-       "baz": 3,
+      "baz": 6,
    })

Notice, for example, how "foo" was kept at the top of the dict, so there's no diff for "foo" now (which didn't change).

And the dict for FastAPI doesn't have diff changes.

…ious run) to minimize the diff

…essarily

codecov · 2023-11-26T11:47:45Z

Codecov Report

Merging #148 (d32dd60) into main (ec406ff) will decrease coverage by 0.02%.
The diff coverage is 96.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #148      +/-   ##
==========================================
- Coverage   96.29%   96.27%   -0.02%     
==========================================
  Files           8        8              
  Lines         729      752      +23     
  Branches      111      120       +9     
==========================================
+ Hits          702      724      +22     
  Misses         21       21              
- Partials        6        7       +1

Files	Coverage Δ
devtools/pytest_plugin.py	`89.00% <96.00%> (+0.86%)`	⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec406ff...d32dd60. Read the comment docs.

tiangolo added 3 commits November 26, 2023 12:11

✨ Sort data in insert_assert based on previous data (e.g. from a prev…

b84d16f

…ious run) to minimize the diff

✅ Add tests for insert_assert including old data

5c40f8a

✅ Update and simplify test to highlight how data is not changed unnec…

7d9bf19

…essarily

tiangolo added 5 commits November 26, 2023 12:52

🐛 Fix test after local modifications

d037a0a

🐛 Fix format in test result

7b26b87

🐛 Fix format in output, double to single quotes and single lining

ebbbb97

🐛 Fix types for sort_data

1cac84c

🐛 Fix format

d32dd60

tiangolo marked this pull request as ready for review November 26, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff #148

✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff #148

tiangolo commented Nov 26, 2023 •

edited

Loading

codecov bot commented Nov 26, 2023 •

edited

Loading

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff #148

Are you sure you want to change the base?

✨ Add support for sorting data in insert_assert based on previous data (e.g. from a previous run) to minimize the diff #148

Conversation

tiangolo commented Nov 26, 2023 • edited Loading

Motivation

Problem Example

Solution

codecov bot commented Nov 26, 2023 • edited Loading

Codecov Report

✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff #148

✨ Add support for sorting data in `insert_assert` based on previous data (e.g. from a previous run) to minimize the diff #148

tiangolo commented Nov 26, 2023 •

edited

Loading

codecov bot commented Nov 26, 2023 •

edited

Loading