-
Notifications
You must be signed in to change notification settings - Fork 355
JSON streams
JSON is a very convenient format for exchanging data that can be used by many different programming languages.
A slight disadvantage of JSON is that it doesn't come with a stream format. If you need to send a large number of JSON objects from one place to another, you don't want to put them all in one JSON list, because then you have to parse the entire list before you can get at any of the objects.
However, it's really easy to impose a stream format on top of JSON. This format is not standardized, but it's obvious enough to have become common. You could call it "line-delimited JSON", "streaming JSON", or "JSON streams".
- A JSON stream is a UTF-8 encoded text file.
- Each line of the file contains a single JSON object. (Here "object" means the structure that is surrounded by
{curly braces}
, which Python would call a dictionary.) - These objects are interpreted according to the json.org standard, using existing JSON tools. The only difference from the standard is that line breaks within an object are not allowed.
Some straightforward tools for working with JSON streams appear in the conceptnet5.formats.json_stream
module.
It appears to be common to use the extension .json
on a file containing a JSON stream. This seems problematic, because the file will not parse as JSON. We recommend the extension .jsons
, and use it throughout ConceptNet.
This is a toy example of a JSON stream with three objects:
{"one": "this is the first object"}
{"two": "this is the second object", "empty_list": []}
{"three": 3}
Here are a few actual objects from ConceptNet:
{"id": "/e/3a1115850d0ebb92409af473df2cfee63477f28a", "uri": "/a/[/r/DefinedAs/,/c/en/skylight/,/c/en/window_in_roof/]", "rel": "/r/DefinedAs", "weight": 1, "features": ["/c/en/skylight /r/DefinedAs -", "/c/en/skylight - /c/en/window_in_roof", "- /r/DefinedAs /c/en/window_in_roof"], "context": "/ctx/all", "sources": "/and/[/s/activity/omcs/omcs1,_possibly_free_text/,/s/contributor/omcs/mindpixel/]", "surfaceText": "*[[windows in the roof]] are called [[skylights]]", "dataset": "/d/conceptnet/4/en", "end": "/c/en/window_in_roof", "license": "/l/CC/By", "start": "/c/en/skylight"}
{"id": "/e/11532550e7bca8efa06411b916a8ebb9bef1eb0c", "uri": "/a/[/r/DefinedAs/,/c/en/skylight/,/c/en/window_in_roof/]", "rel": "/r/DefinedAs", "weight": 1, "features": ["/c/en/skylight /r/DefinedAs -", "/c/en/skylight - /c/en/window_in_roof", "- /r/DefinedAs /c/en/window_in_roof"], "context": "/ctx/all", "sources": "/and/[/s/activity/omcs/vote/,/s/contributor/omcs/mindpixel/]", "surfaceText": "*[[windows in the roof]] are called [[skylights]]", "dataset": "/d/conceptnet/4/en", "end": "/c/en/window_in_roof", "license": "/l/CC/By", "start": "/c/en/skylight"}
{"id": "/e/8a72233924f5bb7d9be2c5248e0428b9d2a46834", "uri": "/a/[/r/IsA/,/c/en/island/,/c/en/land_mass/]", "rel": "/r/IsA", "weight": 1, "features": ["/c/en/island /r/IsA -", "/c/en/island - /c/en/land_mass", "- /r/IsA /c/en/land_mass"], "context": "/ctx/all", "sources": "/and/[/s/activity/omcs/omcs1,_possibly_free_text/,/s/contributor/omcs/skoerber/]", "surfaceText": "[[An island]] is [[a land mass]]", "dataset": "/d/conceptnet/4/en", "end": "/c/en/land_mass", "license": "/l/CC/By", "start": "/c/en/island"}
{"id": "/e/cb9b74d2601ff183a55a0b27d1a66861e847afec", "uri": "/a/[/r/IsA/,/c/en/island/,/c/en/land_mass/]", "rel": "/r/IsA", "weight": 1, "features": ["/c/en/island /r/IsA -", "/c/en/island - /c/en/land_mass", "- /r/IsA /c/en/land_mass"], "context": "/ctx/all", "sources": "/and/[/s/activity/omcs/vote/,/s/contributor/omcs/skoerber/]", "surfaceText": "[[An island]] is [[a land mass]]", "dataset": "/d/conceptnet/4/en", "end": "/c/en/land_mass", "license": "/l/CC/By", "start": "/c/en/island"}
Starting points
Reproducibility
Details