You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hypothesis-jsonschema is basically just a function which maps a json schema to a Hypothesis strategy for generating instances which conform to that schema... and a few internal helpers.
The basic problem is that there are very many eqivalent ways to express the same set of allowed objects, including many where the obvious translation to a strategy is terribly inefficient. We therefore start by "canonicalizing" the schema: taking the intersection or union of overlapping parts (as appropriate), and generally transforming the schema so that it expresses the same constraints but is as easy to convert to an efficient strategy as possible.
So what needs to change?
I wrote most of hypothesis-jsonschema about four years ago, and it never graduated from beta. There are some fundamental design flaws in how we deal with both recursive references and schema versioning, as well as some implementation issues where the organic growth of the code has left it slower and harder to understand than it ought to be. We'll also want to support schema versions newer than draft-07, which are now in common use.
I think this basically requires a from-scratch rewrite of the canonicalization logic. Happily, I learned a lot about what (not) to do last time around, so the next version can be substantially cleaner and I don't expect that we'd need to do this again. The rewrite could be in Python; or it could easily be extracted to Rust - performance challenges led to the omission of several useful rewrite passes, and the interface is "(serialized?) json schema in, (serialized?) json schema out" with no complicated control-flow or handoff.
canonicalish(), which takes a schema and runs many imperative modifications to handle various subtypes of schema
merged(), which takes n schemas and returns their intersection (or None, if infeasible)
some helpers to compute numeric bounds expressed by a schema
The design I want
Represent everything with objects!
A schema (or subschema) is represented as an immutable object. Along with the contents of the (sub)schema, this should contain a reference to the top-level schema (possibly self) to allow for resolution of references. The pair of (root schema, json pointer) stably and uniquely identifies each Schema object - use memoization for improved performance.
Create subclasses of Schema for each major subtype of schemas - types, all_of, any_of, etc.; and version-specific variants where needed. Parsing a schema which permits multiple types should automatically convert to anyof over those types and allof over any other constraints. This "splitting" step is really important!
Identify all referenced subschemas, i.e. locations which are pointed-to from elsewhere. After a first pass at canonicalization, inline any pointed-to subschemas which do not themselves contain references. Repeat until there are no such subschemas - now, any reference left must be recursive and we can convert to a recursive strategy with st.deferred().
Give our schema objects explicit .intersection() and .union() methods. Also implment the other set methods based on these; we often want (e.g.) subtraction in practice.
The text was updated successfully, but these errors were encountered:
Background
hypothesis-jsonschema
is basically just a function which maps a json schema to a Hypothesis strategy for generating instances which conform to that schema... and a few internal helpers.The basic problem is that there are very many eqivalent ways to express the same set of allowed objects, including many where the obvious translation to a strategy is terribly inefficient. We therefore start by "canonicalizing" the schema: taking the intersection or union of overlapping parts (as appropriate), and generally transforming the schema so that it expresses the same constraints but is as easy to convert to an efficient strategy as possible.
So what needs to change?
I wrote most of
hypothesis-jsonschema
about four years ago, and it never graduated from beta. There are some fundamental design flaws in how we deal with both recursive references and schema versioning, as well as some implementation issues where the organic growth of the code has left it slower and harder to understand than it ought to be. We'll also want to support schema versions newer than draft-07, which are now in common use.I think this basically requires a from-scratch rewrite of the canonicalization logic. Happily, I learned a lot about what (not) to do last time around, so the next version can be substantially cleaner and I don't expect that we'd need to do this again. The rewrite could be in Python; or it could easily be extracted to Rust - performance challenges led to the omission of several useful rewrite passes, and the interface is "(serialized?) json schema in, (serialized?) json schema out" with no complicated control-flow or handoff.
A sketch of the current design
https://github.com/python-jsonschema/hypothesis-jsonschema/blob/master/src/hypothesis_jsonschema/_canonicalise.py contains:
canonicalish()
, which takes a schema and runs many imperative modifications to handle various subtypes of schemamerged()
, which takes n schemas and returns their intersection (orNone
, if infeasible)The design I want
Represent everything with objects!
self
) to allow for resolution of references. The pair of(root schema, json pointer)
stably and uniquely identifies eachSchema
object - use memoization for improved performance.Schema
for each major subtype of schemas - types, all_of, any_of, etc.; and version-specific variants where needed. Parsing a schema which permits multiple types should automatically convert to anyof over those types and allof over any other constraints. This "splitting" step is really important!st.deferred()
..intersection()
and.union()
methods. Also implment the other set methods based on these; we often want (e.g.) subtraction in practice.The text was updated successfully, but these errors were encountered: