This tool can normalize given JSON schemas and check them for recursion. It is based on Draft04.
- Normalization:
To normalize your schemas some additional parameters are needed:
- For
-repositorytype
, you can choose between-normal
,-testsuite
,-corpus
depending on where your schemas are from. - For
-allowDistributedSchemas
you have to choose between-true
and-false
depending on whether you want to allow distributed schemas. - For
-fetchSchemasOnline
you have to choose between-true
and-false
depending on whether you want to download references from the internet. If-false
is chosen, the referenced URI will only be looked up in the fileUriOfFiles.csv
. InUriOfFiles.csv
each line should look likefile, URI
. This means that the schema infile
will be referenced if actuallyURI
is referenced.file
should be stored in a directory calledStore
. If the referenced URI is not inUriOfFiles.csv
aStoreException
is thrown. But if-true
is chosen, afterward it will be tried to load the schema fromURI
. - Finally,
"pathToDir"
should be the path to the directory in which the schemas are stored. Path should be in quotation marks
java -jar jarfile -normalize -repositorytype -allowDistributedSchemas -fetchSchemasOnline "pathToDir"
Ifcorpus
was chosen for therepositorytype
an additional parameter with the path to the file repos_fullpath.csv (pathToReposFullpath
) is needed.
java -jar jarfile -normalize -corpus -allowDistributedSchemas -fetchSchemasOnline "pathToDir" "pathToReposFullpath"
"linksToPermalinks"
should be the path to a CSV file where a link prefix maps to a permalink prefix such that all web references in a schema are loaded using the permalink instead the original one. This parameter is optional and can be omitted if no permalink should be specified.
- For
- Recursion checking:
See here for an explanation.
AgainpathToDir
should be the path to the directory in which the schemas are stored.
java -jar jarfile -recursion "pathToDir"
- Statistics:
Statistics about the distribution of single-file and distributed schemas and the frequency of recursion in them are made. Additionally, the change of the lines of code from the unnormalized to the normalized schemas is gathered. An overall overview is created, too.
AgainpathToDir
should be the path to the directory in which the schemas are stored.pathToNormalizedDir
should be the path to the directory in which the normalized schemas are stored.
java -jar jarfile -stats "pathToDir" "pathToNormalizedDir"
A dockerfile can be found here. In this, the schemas of the TestSuite (commit 0c223de), the SchemaStore (commit 2ad0b3d) and the SchemaCorpus (commit 9c0e796) will be normalized and afterward the statistics are fetched. To keep this process reproducible all external references have already been downloaded. These downloaded references will be used.
In a normalized schema, all references should follow the JSON Pointer Syntax and all of them should point to direct children of the definitions-section or to the top-level schema. Therefore distributed schemas are consolidated in one file.
{
"properties": {
"name": {"type": "string"},
"surname": {"$ref": "#/properties/name"},
"children": {
"type": "array",
"items": {"$ref": "#"}
}
}
}
All references are in JSON Pointer Syntax, but "#/properties/Vorname" is not pointing to a direct child of definitions-section. Therefore this reference is resolved and copied to the definitions-section. The normalized version of the above schema:
{
"properties": {
"name": {"type": "string"},
"surname": {"$ref": "#/definitions/properties_name"},
"children": {
"type": "array",
"items": {"$ref": "#"}
}
},
"definitions": {
"properties_name": {"type": "string"}
}
}
Schemas can be distributed in separate files, too. See following:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"surname": {"type": "string"},
"address": {"$ref": "folder/locations.json#/defintions/address"}
}
}
File: schema.json
{
"definitions": {
"address": {
"street": {"type": "string"},
"number": {"type": "integer"},
"city": {"type": "string"},
"country": {"type": "string"}
}
}
}
File: folder/locations.json
The schema in schema.json has a reference to a child in folder/locations.json. Therefore the content of the reference is copied to the definitions-section. The normalized version of the schema in schema.json:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"surname": {"type": "string"},
"address": {"$ref": "#/defintions/folder_locations.json_defintions_address"}
},
"definitions": {
"folder_locations.json_defintions_address": {
"street": {"type": "string"},
"number": {"type": "integer"},
"city": {"type": "string"},
"country": {"type": "string"}
}
}
}
A distinction is made between guarded and unguarded recursiveness. The difference is that the behavior of unguarded-recursive schemas during validation is not defined and therefore possibly leads to no validation in finite time. The tool only guarantees the correct output for normalized schemas.
This tool is based on Draft04 and therefore uses its specific keywords. There are two major problems when normalizing schemas using a higher draft.
One is that "id" was replaced with "$id". To keep things working, before normalization the tool scans for the keyword "$id" in the schema. If one is found, "$id" will be used for base-URI resolution.
Another is that in Draft06 and above unknown keywords should be explicitly ignored, which is not the case in Draft04. Therefore it can be referred to ids under unknown keywords. This leads to no problem unless an id is used more than once, which should never be the case.
Keep in mind that schemas using higher drafts are still normalized using Draft04 specific keywords.
This work is licensed under the Apache 2.0 License.