Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersection schema generation is order-dependent #603

Merged
merged 5 commits into from
Jul 22, 2024

Conversation

altearius
Copy link
Contributor

Addresses #602 and #597

Copy link
Owner

@bcherny bcherny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

Your approach looks good to me. I tried a more surgical fix in https://github.com/bcherny/json-schema-to-typescript/pull/605/files, but like that your approach is more generic and cleaner.

A few minor things before we merge this:

  1. Would you mind splitting your PR into two commits: one to add the new test cases, and one to make code changes and change the output of your test cases? That would make it more clear how your changes change the output of the test cases.
  2. Would you mind simplifying the naming in your cases (eg. A instead of Level2A)? (example: 1252285)

schema.$types = types

if (types.length > 1) {
schema.$intersection = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$intersection and $types may conflict with schema properties (in JSON-Schema, users can define custom properties). Instead, can you use Symbols (similar to Parent)? This name collision could also be good to add as a test case.

export const Parent = Symbol('Parent')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion significantly improved the code quality, thank you.


delete schema.$id
delete schema.description
delete schema.name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about title?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed title as it was not handled by the original maybeStripNameHints function, though I have now added it.

src/parser.ts Outdated
@@ -54,13 +55,13 @@ export function parse(

// Be careful to first process the intersection before processing its params,
// so that it gets first pick for standalone name.
const intersectionSchema = schema.$intersection
if (!intersectionSchema) {
throw new Error('Expected intersection schema')
Copy link
Owner

@bcherny bcherny Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new Error('Expected intersection schema')
throw new ReferenceError('Expected intersection schema. Please file an issue on GitHub.')

@@ -71,6 +72,8 @@ export interface LinkedJSONSchema extends JSONSchema {
}

export interface NormalizedJSONSchema extends LinkedJSONSchema {
$intersection?: NormalizedJSONSchema
$types: ReturnType<typeof typesOfSchema>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please type this explicitly to keep it simple.

Suggested change
$types: ReturnType<typeof typesOfSchema>
$types: readonly [SchemaType, ...SchemaType[]]

@bcherny bcherny mentioned this pull request Jun 24, 2024
Closed
- Given a schema that contains a named definition (`B`),
- And that named definition is referenced in multiple locations,
- And that named schema is also an intersection type (`allOf` in this
  example),

Then when parsed, the generated TypeScript will contain the correct
reference only for the _first_ location in which the named schema is
encountered, during a depth-first traversal.

Subsequent references to the same schema will be generated as though
they were only the intersection type, and not the named schema.

Example

Given the following schema:

```yaml
$id: Intersection
type: object
oneOf:
  - $ref: '#/definitions/A'
  - $ref: '#/definitions/B'

definitions:
  A:
    type: object
    additionalProperties: false
    allOf: [$ref: '#/definitions/Base']
    properties:
      b: {$ref: '#/definitions/B'}
  B:
    type: object
    additionalProperties: false,
    allOf: [$ref: '#/definitions/Base']
    properties:
      x: {type: string}
  Base:
    type: object
    additionalProperties: false,
    properties:
      y: {type: string}
```

The current resulting TypeScript will be (comments adjusted
for clarity):

```ts
// Incorrect: should be `type Intersection = A | B`
// Note that the B type at this location is the _second_ reference to
// B during a depth-first traversal.
export type Intersection = A | B1;
export type A = A1 & {
  b?: B;
};
export type A1 = Base;
export type B = B1 & {
  x?: string;
};
export type B1 = Base;

export interface Base {
  y?: string;
}
```

Root Cause

In `parser.ts`, [lines 57 - 75][1], when schema that matches multiple
"types" is encountered, the parser generates a new `ALL_OF` intersection
schema to contain each sub-type, then adds each sub-type to the new
`ALL_OF` schema.

Each sub-type is then parsed sequentially. During this process,
`maybeStripNameHints` is called, which mutates the schema by removing
the `$id`, `description`, and `name` properties.

Notably, these properties are used by `typesOfSchema` to detect the
`NAMED_SCHEMA` type. As a result, this schema object will never again be
detected as a `NAMED_SCHEMA` type.

Therefore, the _first_ instance of the schema object is correctly
handled as an intersection schema **and** a named schema, but all
subsequent instances are treated as though they are **only** an
intersection schema.

Proposed Solution

- The call to `typesOfSchema` is moved from `parser.ts` to
  `normalizer.ts`, with the goal of avoiding confusion due to a mutated
  schema object. The resulting list of schema types is persisted on the
  schema using a newly-introduced `Types` symbol.

- The generated intersection schema is _also_ moved from `parser.ts` to
  `normalizer.ts`. This is because it is advantageous to let the
  generated intersection schema participate in the caching mechanism
  (which it could not previously do, since it was generated dynamically
  during each encounter). Without this, multiple instances of the same
  schema are generated.

Related Issues

- bcherny#597

[1]: https://github.com/bcherny/json-schema-to-typescript/blob/31993def993b610ba238d3024260129e31ddc371/src/parser.ts#L57-L75 'parser.ts, lines 57 - 75'
@altearius
Copy link
Contributor Author

Thank you for the code review! This is ready for a second pass.

Copy link
Owner

@bcherny bcherny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better!

The output is a nice improvement across a number of tests. I wonder if there might be a way to improve the output more for the test cases you added.

To help debug, try adding export const only = true to your test case, then running VERBOSE=1 npm test to enable debug tracing.

*/␊
export type Intersection = A | B;␊
export type A = A1 & {␊
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed this earlier -- this is a case where the output in #605 is better than #603. I wonder if there's a way to improve this output, so we don't generate A1 and B1, and instead just use Base directly.

Like I did here: https://github.com/bcherny/json-schema-to-typescript/pull/605/files#diff-34615f003eb49f90c3044b8fafebe3fe6b162e290b9b724082a6d72f58cfa4afR1876

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I took a try at that in 7d3f495 by pulling schema's pre-existing allOf rule into the intersection. Would you mind taking a third pass and let me know if I missed anything?

*/␊
export type Intersection = Car | Truck;␊
export type Car = Car1 & {␊
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -448968,8 +448968,7 @@ Generated by [AVA](https://avajs.dev).
* and run json-schema-to-typescript to regenerate this file.␊
*/␊
export type UnionWithProps = UnionWithProps1 & UnionWithProps2;␊
export type UnionWithProps1 =␊
export type UnionWithProps = (␊
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

/**␊
* Any property starting with _ is valid.␊
*␊
* This interface was referenced by \`JSONSchemaForNPMPackageJsonFiles2\`'s JSON-Schema definition␊
* This interface was referenced by \`undefined\`'s JSON-Schema definition␊
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have regressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up: I haven't lost interest in moving this forward. I've had some deadlines come up, but I plan to return to this over the weekend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your patience. I've taken a look at this regression and I'm not sure what the expected behavior should be. Could you advise?

This is what I observe, simplified to the relevant bits.

Given this schema:

title: JSON schema for NPM package.json files
type: object
patternProperties:
  '^_':
    description: Any property starting with _ is valid.
properties:
  # etc, etc, etc, actual properties clipped

The original code produced this structure:

export type JSONSchemaForNPMPackageJsonFiles =
  JSONSchemaForNPMPackageJsonFiles1 & JSONSchemaForNPMPackageJsonFiles2;

export type JSONSchemaForNPMPackageJsonFiles1 = { [k: string]: unknown; };

export interface JSONSchemaForNPMPackageJsonFiles2 {
  /**
   * Any property starting with _ is valid.
   * 
   * This interface was referenced by \`JSONSchemaForNPMPackageJsonFiles2\`'s
   * JSON-Schema definition via the \`patternProperty\` "^_".
   */
  [k: string]: any;
}

This branch produces a different structure:

export type JSONSchemaForNPMPackageJsonFiles =
  { [k: string]: unknown; } &
  {
    /**
     * Any property starting with _ is valid.
     * 
     * This interface was referenced by \`undefined\`'s JSON-Schema definition
     * via the \`patternProperty\` "^_".
     */
    [k: string]: any;
  };

While I agree that the change in the comment from JSONSchemaForNPMPackageJsonFiles2 to undefined is unfortunate, this behavior is in line with other instances in which this comment is generated due to the presence of patternProperties on the root schema.

For instance, here is the current output of the intersection.2.ts test on the master branch, which has a similar scenario:

export type Intersection = (A & B) & {
  c: string;
  d: string;
  /**
   * This interface was referenced by \`undefined\`'s JSON-Schema definition
   * via the \`patternProperty\` "^x-".
   */
  [k: string]: unknown;
};

export interface A {
  a?: string;
  [k: string]: unknown;
}

export interface B {
  b?: string;
  [k: string]: unknown;
}

I would argue that neither behavior is ideal, and an improvement might be to suppress the comment entirely if the interface is not named.

I can make this change, but I worry that it may not be what you have in mind, and that it feels like something that should be a separate PR.

Could you please provide direction on how this might be resolved?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable -- we can improve this as a follow-up.

@@ -225147,7 +225124,7 @@ Generated by [AVA](https://avajs.dev).
/**␊
* The list of custom alert time-window rules.␊
*/␊
timeWindowRules?: ((ActiveConnectionsNotInAllowedRange | AmqpC2DMessagesNotInAllowedRange | MqttC2DMessagesNotInAllowedRange | HttpC2DMessagesNotInAllowedRange | AmqpC2DRejectedMessagesNotInAllowedRange | MqttC2DRejectedMessagesNotInAllowedRange | HttpC2DRejectedMessagesNotInAllowedRange | AmqpD2CMessagesNotInAllowedRange | MqttD2CMessagesNotInAllowedRange | HttpD2CMessagesNotInAllowedRange | DirectMethodInvokesNotInAllowedRange | FailedLocalLoginsNotInAllowedRange | FileUploadsNotInAllowedRange | QueuePurgesNotInAllowedRange | TwinUpdatesNotInAllowedRange | UnauthorizedOperationsNotInAllowedRange)[] | string)␊
timeWindowRules?: (TimeWindowCustomAlertRule[] | string)␊
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression. Asked about it here to understand the expected behavior: json-schema-org/json-schema-spec#1529

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this schema:

type: object
required: [ruleType, timeWindowSize]

oneOf:
  - $ref: '#/definitions/ActiveConnectionsNotInAllowedRange'
  # ...etc...
  - $ref: '#/definitions/UnauthorizedOperationsNotInAllowedRange'

properties:
  ruleType:
    type: string
    enum: [ TimeWindowCustomAlertRule ]

  timeWindowSize:
    type: string

The old behavior is:

export interface DeviceSecurityGroupProperties {
  // Note: does **not** include the ruleType or timeWindowSize properties
  timeWindowRules?:
    | (
        | ActiveConnectionsNotInAllowedRange
        // ...etc...
        | UnauthorizedOperationsNotInAllowedRange
      )[]
    | string;
  [k: string]: unknown;
}

export type TimeWindowCustomAlertRule = {
  ruleType: "TimeWindowCustomAlertRule";
  timeWindowSize: string;
  [k: string]: unknown;
} & (
  | ActiveConnectionsNotInAllowedRange
  // ...etc...
  | UnauthorizedOperationsNotInAllowedRange
);

The new behavior is:

export interface DeviceSecurityGroupProperties {
  // Note: Now includes ruleType and timeWindowSize.
  timeWindowRules?: TimeWindowCustomAlertRule[] | string;
}

export type TimeWindowCustomAlertRule = {
  ruleType: "TimeWindowCustomAlertRule";
  timeWindowSize: string;
  [k: string]: unknown;
} & (
  | ActiveConnectionsNotInAllowedRange
  // ...etc...
  | UnauthorizedOperationsNotInAllowedRange
);

I would argue that the new behavior is more nearly correct. My reasoning is based on the definition of the oneOf keyword in the JSON Schema spec:

10.2.1.3. oneOf

This keyword's value MUST be a non-empty array. Each item of the array MUST be a valid JSON Schema.

An instance validates successfully against this keyword if it validates successfully against exactly one schema defined by this keyword's value.

This definition does not address the possibility that oneOf may be combined with additional keywords, such as properties. However, one reasonable interpretation is that the properties keyword should be applied in addition to the oneOf keyword. In such a case, both oneOf and properties (or other keywords) would all apply.

This interpretation aligns with section 10.1:

10.1 Keyword Independence

Schema keywords typically operate independently, without affecting each other's outcomes.

This suggests that oneOf and properties should both affect the output.

This interpretation is also in line with the Factoring Schemas example, which demonstrates how one schema:

oneOf:
  - { type: number, multipleOf: 5 }
  - { type: number, multipleOf: 3 }

Can be factored into a schema that combines oneOf with additional keywords (type, in this case, but I presume an example with properties would be equally valid):

type: number
oneOf:
  - { multipleOf: 5 }
  - { multipleOf: 3 }

I realize that other interpretations of the oneOf schema are possible, but this is what feels most intuitive to me. Please let me know if other changes need to be made.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable interpretation, but as you said, there may be other interpretations. Implicit intersection schemas like this make me scratch my head sometimes. I think your interpretation is correct, and the new behavior is correct, but since it's a subtle breaking change that will affect the largest consumers of JSTT, I want to double check with JSON-Schema maintainers first.

I bumped the Github issue to see is someone is willing to clarify.

@bcherny bcherny merged commit 62cc052 into bcherny:master Jul 22, 2024
18 checks passed
@bcherny
Copy link
Owner

bcherny commented Jul 22, 2024

Looking good. Thank you for the contribution, and for iterating!

@bcherny
Copy link
Owner

bcherny commented Jul 22, 2024

Published as v15.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants