encoding/json/v2 #63397
Replies: 87 comments 504 replies
-
It is imperative that v1 and v2 interoperate well to provide a gradual migration from v1 to v2. Any code using v1 today must continue to function the same today and into the future. The key to v1-to-v2 interoperability lies in the API for composable options. Across the "jsontext" packages and v2 and v1 "json" packages, we have: package jsontext // "encoding/json/jsontext"
type Options = jsonopts.Options
func AllowDuplicateNames(v bool) Options // affects encode and decode
func AllowInvalidUTF8(v bool) Options // affects encode and decode
func EscapeForHTML(v bool) Options // affects encode only
func EscapeForJS(v bool) Options // affects encode only
func WithIndent(indent string) Options // affects encode only
func WithIndentPrefix(prefix string) Options // affects encode only
func Expand(v bool) Options // affects encode only package json // "encoding/json/v2"
type Options = jsonopts.Options
// DefaultOptionsV2 is the full set of all options that define v2 semantics.
func DefaultOptionsV2() Options
func StringifyNumbers(v bool) Options // affects marshal and unmarshal
func Deterministic(v bool) Options // affects marshal only
func FormatNilMapAsNull(v bool) Options // affects marshal only
func FormatNilSliceAsNull(v bool) Options // affects marshal only
func MatchCaseInsensitiveNames(v bool) Options // affects marshal and unmarshal
func DiscardUnknownMembers(v bool) Options // affects marshal only
func RejectUnknownMembers(v bool) Options // affects unmarshal only
func WithMarshalers(v *Marshalers) Options // affects marshal only
func WithUnmarshalers(v *Unmarshalers) Options // affects unmarshal only package json // "encoding/json"
type Options = jsonopts.Options
// DefaultOptionsV1 is the full set of all options that define v1 semantics.
func DefaultOptionsV1() Options
func FormatByteArrayAsArray(v bool) Options // affects marshal and unmarshal
func FormatTimeDurationAsNanosecond(v bool) Options // affects marshal and unmarshal
func IgnoreStructErrors(v bool) Options // affects marshal and unmarshal
func MatchCaseSensitiveDelimiter(v bool) Options // affects marshal and unmarshal
func MergeWithLegacySemantics(v bool) Options // affects unmarshal only
func OmitEmptyWithLegacyDefinition(v bool) Options // affects marshal only
func RejectFloatOverflow(v bool) Options // affects unmarshal only
func ReportLegacyErrorValues(v bool) Options // affects marshal and unmarshal
func SkipUnaddressableMethods(v bool) Options // affects marshal and unmarshal
func StringifyWithLegacySemantics(v bool) Options // affects marshal and unmarshal
func UnmarshalArrayFromAnyLength(v bool) Options // affects unmarshal only For brevity, we will use There are several things to note:
Thus, these options can be composed together to obtain behavior that is identical to v1, identical to v2, or anywhere in between. For example:
The implementation of There are several advantages to implementing v1 in terms of v2:
In the long-term, we do not plan on ever deprecating the v1 "encoding/json" package. Rather, we will declare the high-level functions and types in that package and point users to the v2 equivalents. Deprecation of v1 functionality would not happen until at least two releases (i.e. 1 year) after v2 is available. |
Beta Was this translation helpful? Give feedback.
-
This is further discussion on the behavior of unmarshaling into a non-empty value. In v2, we aim to provide consistent merge semantics and recommend in There are many reasonable semantics for merging, but we should have a consistent approach to how inputs are merged. The merge semantics in v2 takes inspiration from JSON Merge Patch (RFC 7386). At a high level:
For examples of differences between v1 and v2, see this behavior difference test. |
Beta Was this translation helpful? Give feedback.
-
This is further discussion on the marshaling order of Go map entries. The proposed v2 behavior is for Go maps to marshal in a non-deterministic order, matching the order (or lack thereof) provided by Go map iteration. Non-deterministic output can be made deterministic by setting the In contrast, the v1 behavior is deterministic marshaling of maps. Non-deterministic marshaling is more performant since maps can be marshaled in a truly streaming manner. Any form of deterministic ordering would require sorting the map, which incurs O(n⋅log(n)) runtime and O(n) memory costs. Performance is important in RPC protocols where the ordering of JSON objects does not matter. However, non-deterministic ordering is detrimental to any use-case that assumes that the serialized output is stable, such as in tests, in the detection of changed configuration files, or in caching. There are several sources of instability in the JSON grammar:
Fortunately, RFC 8785 provides guidance for how JSON numbers and strings are to be formatted and v2 complies with that specification. The whitespace is non-existent by default or at least well-specified under It comes down to a tradeoff between performance and convenience, with neither benefit clearly outweighing the other. Vote on this comment for the default Go map ordering:
Voting is no guarantee that the most popular behavior be adopted if compelling arguments for a given approach presents itself. |
Beta Was this translation helpful? Give feedback.
-
This is further discussion on how to marshal nil slices and maps. It is clear from #27589 and #37711 that users need the ability to control whether nil Go slices and maps marshal as either JSON null or empty JSON arrays or objects. However, what nil Go slices and maps should marshal as by default is less clear. The proposed default in v2 is to marshal nil Go slices and maps as empty JSON arrays and objects, while providing a Benefit:
Detriment:
Vote on this comment for what the default nil Go slice or map representation should be:
Voting is no guarantee that the most popular behavior be adopted if compelling arguments for a given approach presents itself. |
Beta Was this translation helpful? Give feedback.
-
This is further discussion on how to omit fields during marshal. Being able to control which fields are omitted is a reasonable feature. However, the challenge is providing the most flexible API for deciding when to omit a field. Broadly speaking, there are two dimensions that omission can be determined by:
Which approach is the best? Both have legitimate usages and neither covers all of the common use-cases by itself. For that reason, v2 proposes support for both Omission with
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the really excellent detailed proposal. While servers often use JSON as both input and output, there are a number of programs that generate JSON without reading it, and there are a number that read JSON without generating it. The JSON encoding and decoding functions seem largely distinct. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @dsnet. I've been looking forward to this discussion! One minor thing I want to bring up is the naming of the V2 MarshalJSON and UnmarshalJSON methods: type MarshalerV2 interface {
MarshalJSONV2(*jsontext.Encoder, Options) error
}
type UnmarshalerV2 interface {
UnmarshalJSONV2(*jsontext.Decoder, Options) error
} I'd like to suggest methods without "V2" in their names. Two years down the line, if someone writes a type implementing only the V2 methods, I don't think "V2" should be present in their signatures. That's a small degree of noise that'll exist solely for historical reasons. I understand that being able to implement both V1 and V2 interfaces is a hard requirement so we cannot re-use the name MarshalJSON, but we can probably find something else. To start the conversation on the interface method names, how about EncodeJSON and DecodeJSON? type MarshalerV2 interface {
EncodeJSON(*jsontext.Encoder, Options) error
}
type UnmarshalerV2 interface {
DecodeJSON(*jsontext.Decoder, Options) error
} |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
@dsnet Are you planning to use some form of the pattern known as "functional options" for configuring
More of my thoughts about how to make most of the functional-options pattern are available online in video format. Interesting that we agree on declaring the option type as an opaque interface and that we both (ab)use type aliases! 😉 |
Beta Was this translation helpful? Give feedback.
-
If we're thinking to the next decade, I'd like to make sure it will be easy to add support for HuJSON/JWCC. (I know Go isn't a trend-setter, but I'd even love for it to be available from the beginning.) |
Beta Was this translation helpful? Give feedback.
-
What do you think about having an inverted control API as well (i.e. instead of writing json, you get a reader)? Relevant historical discussion: https://gophers.slack.com/archives/C0VP8EF3R/p1655155603882039, #51092 (comment). There's a general sense that the Right Fix is something general purpose involving io.Pipe, but error handling remains a unsolved issue. I'm bringing it up again here because generating JSON for use as an HTTP POST body is so common. |
Beta Was this translation helpful? Give feedback.
-
Dealing with union-typed JSON (e.g. a given field is either a |
Beta Was this translation helpful? Give feedback.
-
Is it possible to specify a max stack/object depth and reject the JSON on hitting the limit? I don't see an |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Thank you for a very detailed proposal. I read through it but maybe I missed it, but it seems to me that current proposal does not provide access to struct tags for I see that there is new Or, in other words, I think |
Beta Was this translation helpful? Give feedback.
-
Current implementation of json Unmarshall requires to create a variable that will be used to unmarshall data to. Since go has generics, we could have that step wrapped into a helper function: func UnmarshallTo[T any](data []byte) (T, error) {
var result T
err := json.Unmarshal(data, &result)
return result, err
} |
Beta Was this translation helpful? Give feedback.
-
Thanks for this proposal, the level of detail and thoughtfulness is impressive. I've read over everything and I couldn't see any discussion of this point, so I'll raise it here. I suspect the answer will be "out of scope for initial release", but since everything else is in here I figure it's worth noting. As far as I can tell there are 3 options for how JSON names map to struct names:
In most systems/ecosystems, there is a standard convention for naming of keys, often I believe it would be useful to be able to configure Marshal or Unmarshal with a strict mapping that uses user-specified functions to perform the translation. This would allow the caller of marhshal/unmarshal to enforce the naming convention without requiring a tag on every struct field. |
Beta Was this translation helpful? Give feedback.
-
I've encountered a timestamp output from an external API that I've been unable to resolve with the current format tags. The timestamp is sent as a json number, not a string. Would it be possible to have time.Time values with format:unix,unixnano,etc. tags be able to parse from a string and from a number? |
Beta Was this translation helpful? Give feedback.
-
I have a similar question like (#63397 (comment)) regarding limiting the size of the JSON. What do you think about limiting the number of leaf elements in a JSON and reject it if the limit is reached? Did you consider this possibility already? If not, then what is the opinion on adding this possibility to the new release? |
Beta Was this translation helpful? Give feedback.
-
Not sure if this was mentioned or not (it was not as far as I could see but thread is too big): there should be an option of adding mapping function for struct field name -> JSON name and vice versa. Simple example is that go forces your field to be capitalized to be unmarshalled (as that is needed for it to be public), while usually JSONs do not follow that pattern (they are either lowerCammelCase or snake_case, almost never UpperCammelCase). You can override name in the json tag, but you end up writing tag for every single field in your big codebase, and then writing devops that checks tags and generates code as well... And you end up with more complex situation which is much harder to maintain. |
Beta Was this translation helpful? Give feedback.
-
Is there any hope that encoding/json/v2 could include the ability to mark fields as required? One issue that I consistently run into is that I want to enforce that a field must be explicitly declared in the JSON, even if it contains a zero value (the purpose of this is usually detecting accidental misuse of an API or changes to the structure of data returned from a downstream API). I can kind of achieve this right now by using a pointer field and checking if it is nil, but this is tedious and error prone. Also I find using pointers for values that should actually never be nil kind of dirty. What I would really like to have is something like this:
And then unmarshalling a JSON object that does not have this field (e.g. |
Beta Was this translation helpful? Give feedback.
-
On Jul 9, 2024 at 9:44:42 AM, Juhan Oskar Hennoste ***@***.***> wrote:
I think validating the existence of a required field is kind of a special
case that should be handled by the unmarshalling logic, because the
unmarshalling logic is the last place where field presence is explicitly
known. After that any validation has to depend on various assumptions about
the unmarshalling logic, such as assuming that a missing field in the JSON
results in a zero value in the struct
I find this argument very convincing. Requiring the presence of a field
for successful unmarshaling feels like such a basic, obvious thing.
Message ID: ***@***.***>
… |
Beta Was this translation helpful? Give feedback.
-
I'm looking for a way to define a context-aware options which change the behavior of custom |
Beta Was this translation helpful? Give feedback.
-
Has there been any discussion about context support to enable a caller to interrupt a potentially long running encode / decode? |
Beta Was this translation helpful? Give feedback.
-
Is this discussion going anywhere? The discussion here seems that has leveled down and I hope almost 1 year is more than enough to make it to proposal phase. |
Beta Was this translation helpful? Give feedback.
-
Concerning A more logical default would be the number of seconds (you didn't mention it, but I'm assuming a floating-point number of seconds). |
Beta Was this translation helpful? Give feedback.
-
Urgh. I would be shocked if golang serialized timestamps and durations in a
way that wasn't RFC3339 (and thus ISO8601) conformant.
…On Mon, Oct 14, 2024, 6:23 AM Mitar ***@***.***> wrote:
rfc3339 has a very similar duration format, so not sure if it is really Go
specific? I have also seen it elsewhere in various HTTP protocols.
—
Reply to this email directly, view it on GitHub
<#63397 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEJE6QRN26GT6EQBGY4ZLZ3PAUDAVCNFSM6AAAAAA5US42KSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOJTGY4TKOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
My impression is that one of the goals of this package is to better handle the common 3 variable state such as:
These states can be elegantly modeled by a double-pointer but it seems like that is not supported as detailed below. Aside on double-pointer:
To demonstrate this concept, I extended the omit fields example with relevant excerpts below. The jist of it is that this package's 'omitzero' tag allows the conversion to JSON string to behave as I'd hoped - specifically that a field with a zero-value double pointer is omitted from the output. Extended example: ptrStringNil := (*string)(nil)
alpha := "alpha"
ptrStringVal := &alpha
// Demonstrate behavior of "omitzero".
b, err := json.Marshal(struct {
// ...
DblPtrNotExist **string `json:",omitzero"`
DblPtrExistNull **string `json:",omitzero"`
DblPtrExistVal **string `json:",omitzero"`
// ...
}{
// ...
DblPtrNotExist: nil, // omitted=true, Like PATCH with missing field
DblPtrExistNull: &ptrStringNil, // omitted=false, Like PATCH with field value: null
DblPtrExistVal: &ptrStringVal, // omitted=false, Like PATCH with field value: "alpha"
})
err = (*jsontext.Value)(&b).Indent("", "\t") // indent for readability
require.NoError(t, err)
const expected = `{
"Struct": {},
"Slice": [],
"Map": {},
"Pointer": "",
"Interface": null,
"DblPtrExistNull": null,
"DblPtrExistVal": "alpha"
}`
require.Equal(t, expected, string(b)) HOWEVER, In the following I found that the conversion from JSON string to struct does not behave as I'd like for the case of a *nil (aka null value) as the "null" string value results in a zero-value double pointer whereas I'd hoped for a *nil. Please let me know if I'm missing something here. Unit test verifying behavior ("inner nil" case fails from JSON to struct): func Test_Json_2way(t *testing.T) {
alpha := "alpha"
fieldVal := &alpha
var fieldNil *string
// This works as expected, the field is omitted in the json string
verifyExamplePatch(t, "outer nil", true, &ExamplePatch{Field: nil})
// This works as expected since all values are set
verifyExamplePatch(t, "inner non-null", true, &ExamplePatch{Field: &fieldVal})
// The interesting case:
// * v1 allows 3 field states (exists, null, non-null) but 'json.Marshal' ignores the concept of exists as modeled
// by an outer/double ptr and writes the value as "null" in the same way as an inner/single ptr. It would
// be better if a nil outer ptr caused the field to be omitted in the json string. The 'json.Marshal' behavior
// loses state (both !exists and null yield the same result) and the receiver always sees the field as null or non-null
// * v2 solves the case where v1 fails via 'omitzero' but when reading from string the inner ptr is ignored
// and only the outer ptr is set to nil
verifyExamplePatch(t, "inner nil", true, &ExamplePatch{Field: &fieldNil})
}
func verifyExamplePatch(t *testing.T, scenario string, expectPass bool, input *ExamplePatch) {
// Serialize type to string
str, err := ToJson2String(input)
require.NoError(t, err, scenario, input)
require.NotEmpty(t, str, str, scenario, input)
// Deserialize string to type
out, err := FromJson2String[ExamplePatch](str)
require.NoError(t, err, str, scenario, input)
require.Equal(t, expectPass, input.Equal(out), scenario)
}
type ExamplePatch struct {
Field **string `json:"field,omitzero"` // 3 states of (exists, null, non-null)
}
func (a *ExamplePatch) Equal(b *ExamplePatch) bool {
if (a.Field == nil) != (b.Field == nil) {
return false // 1 of 2 ptrs is nil
}
if a.Field == nil && b.Field == nil {
return true // Both outer ptrs nil
}
if (*a.Field == nil) != (*b.Field == nil) {
return false // 1 of 2 inner ptrs is nil
}
if *a.Field == nil && *b.Field == nil {
return true // Both inner ptrs nil
}
return **a.Field == **b.Field // Compare inner values
}
func ToJson2String[T any](input T) (string, error) {
jsonBytes, err := json.Marshal(input)
if err != nil {
return "", err
}
return string(jsonBytes), nil
}
func FromJson2String[T any](input string) (*T, error) {
var item T
err := json.Unmarshal([]byte(input), &item)
if err != nil {
return nil, err
}
return &item, nil
} Update: If I combine this package and a generic Null type 🙁 that allows the 3 states to flow through struct -> JSON -> struct. |
Beta Was this translation helpful? Give feedback.
-
On Nov 13, 2024 at 2:36:46 PM, Joe Tsai ***@***.***> wrote:
Unfortunately, TC39 does not define the length of units like "years",
"months", "weeks", etc., so we're still left with a problem in Go where we
might need to only support a subset of the grammar up to days.
This only applies to durations, right? -T
Message ID: ***@***.***>
… |
Beta Was this translation helpful? Give feedback.
-
Hey, I wanted to mention an issue i had a lot when working with json. When unmarshalling I want to know if I received all the keys, or be able to mark keys as mandatory, and consider it an error if some keys are missing when unmarshalling. Zero values are not ok randomly. I mean:
Does this make sense? |
Beta Was this translation helpful? Give feedback.
-
This is a discussion intended to lead to a formal proposal.
This was written with input from @mvdan, @johanbrandhorst, @rogpeppe, @chrishines, @rsc.
Background
The widely-used "encoding/json" package is over a decade old and has served the Go community well. Over time, we have learned much about what works well and what does not. Its ability to marshal from and unmarshal into native Go types, the ability to customize the representation of struct fields using Go struct tags, and the ability of Go types to customize their own representation has proven to be highly flexible.
However, many flaws and shortcomings have also been identified over time. Addressing each issue in isolation will likely lead to surprising behavior when non-orthogonal features interact poorly. This discussion aims to take a cohesive and comprehensive look at "json" and to propose a solution to support JSON in Go for the next decade and beyond.
Improvements may be delivered either by adding new functionality to the existing "json" package and/or by introducing a new major version of the package. To guide this decision, let us evaluate the existing "json" package in the following categories:
Missing functionality
There are quite a number of open proposals, where the most prominent feature requests are:
time.Time
(#21990)Most feature requests could be added to the existing "json" package in a backwards compatible way.
API deficiencies
The API can be sharp or restrictive:
There is no easy way to correctly unmarshal from an
io.Reader
. Users often writejson.NewDecoder(r).Decode(v)
, which is incorrect since it does not reject trailing junk at the end of the payload (#36225).Options can be set on the
Encoder
andDecoder
types, but cannot be used with theMarshal
andUnmarshal
functions. Similarly, types implementing theMarshaler
andUnmarshaler
interfaces cannot make use of the options. There is no way to plumb options down the call stack (#41144).The functions
Compact
,Indent
, andHTMLEscape
write to abytes.Buffer
instead of something more flexible like a[]byte
orio.Writer
. This limits the usability of those functions.These deficiencies could be fixed by introducing new API to the existing "json" package in a backwards compatible way at the cost of introducing multiple different ways of accomplishing the same tasks in the same package.
Performance limitations
The performance of the standard "json" package leaves much to be desired. Setting aside internal implementation details, there are externally visible APIs and behaviors that fundamentally limit performance:
MarshalJSON: The
MarshalJSON
interface method forces the implementation to allocate the returned[]byte
. Also, the semantics require that the "json" package parse the result both to verify that it is valid JSON and also to reformat it to match the specified indentation.UnmarshalJSON: The
UnmarshalJSON
interface method requires that a complete JSON value be provided (without any trailing data). This forces the "json" package to parse the JSON value to be unmarshaled in its entirety to determine when it ends before it can callUnmarshalJSON
. Afterwards, theUnmarshalJSON
method itself must parse the provided JSON value again. If theUnmarshalJSON
implementation recursively callsUnmarshal
, this leads to quadratic behavior. As an example, this is the source of dramatic performance degradation when unmarshaling intospec.Swagger
(kubernetes/kube-openapi#315).Encoder.WriteToken: There is no streaming encoder API. A proposal has been accepted but not implemented (#40127). The proposed API symmetrically matches
Decoder.Token
, but suffers from the same performance problems (see next point).Decoder.Token: The
Token
type is an interface, which can hold one of multiple types:Delim
,bool
,float64
,Number
,string
, ornil
. This unfortunately allocates frequently when boxing a number or string within theToken
interface type (#40128).Lack of streaming: Even though the
Encoder.Encode
andDecoder.Decode
methods operate on anio.Writer
orio.Reader
, they buffer the entire JSON value in memory. This hurts performance since it requires a second pass through the JSON. In theory, only the largest JSON token (i.e., a JSON string or number) should ever need to be buffered (#33714, #7872, #11046).Limitations 1 and 2 can be resolved by defining new interface methods that operate on a streaming encoder or decoder.
However, type-defined streaming methods are blocked on limitation 3 and 4, which requires having an efficient, streaming encoder and decoder API (#40127, #40128).
Even if an efficient streaming API is provided, the "json" package itself would still be constrained by limitation 5, where it does not operate in a truly streaming manner under the hood (#33714, #7872, #11046).
The "json" package should operate in a truly streaming manner by default when writing to or reading from an
io.Writer
orio.Reader
. Buffering the entire JSON value defeats the point of using anio.Reader
orio.Writer
. Use cases that want to avoid outputting JSON in the event of an error should callMarshal
instead and only write the output if the error is nil. Unfortunately, the "json" package cannot switch to streaming by default since this would be a breaking behavioral change, suggesting that a v2 "json" package is needed to accomplish this goal.Behavioral flaws
Various behavioral flaws have been identified with the "json" package:
Improper handling of JSON syntax: Over the years, JSON has seen increased amounts of standardization (RFC 4627, RFC 7159, RFC 7493, and RFC 8259) in order for JSON-based protocols to properly communicate. Generally speaking, the specifications have gotten more strict over time since loose guarantees lead to implementations disagreeing about the meaning of a particular JSON value.
The "json" package currently allows invalid UTF-8, while the latest internet standard (RFC 8259) requires valid UTF-8. The default behavior should at least be compliant with RFC 8259, which would require that the presence of invalid UTF-8 to be rejected as an error.
The "json" package currently allows for duplicate object member names. RFC 8259 specifies that duplicate object names result in unspecified behavior (e.g., an implementation may take the first value, last value, ignore it, reject it, etc.). Fundamentally, the presence of a duplicate object name results in a JSON value without any universally agreed upon semantic (#43664). This could be exploited by attackers in security applications and has been exploited in practice with severe consequences. The default behavior should err on the side of safety and reject duplicate names as recommended by RFC 7493.
While the default behavior should be more strict, we should also provide an option for backwards compatibility to opt-in to the prior behavior of allowing invalid UTF-8 and/or allowing duplicate names.
Case-insensitive unmarshaling: When unmarshaling, JSON object names are paired with Go struct field names using a case-insensitive match (#14750). This is a surprising default, a potential security vulnerability, and a performance limitation. It may be a security vulnerability when an attacker provides an alternate encoding that a security tool does not know to check for. It is also a performance limitation since matching upon a case-insensitive name cannot be performed using a trivial Go map lookup.
Inconsistent calling of type-defined methods: Due to "json" and its use of Go reflection, the
MarshalJSON
andUnmarshalJSON
methods cannot be called if the underlying value is not addressable (#22967, #27722, #33993, #55890). This is surprising to users when their declaredMarshalJSON
andUnmarshalJSON
methods are not called when the underlying Go value was retrieved through a Go map, interface, or another non-addressable value. The "json" package should consistently and always call the user-defined methods regardless of addressability. As an implementation detail, non-addressable values can always be made addressable by temporarily boxing them on the heap. This could arguably be considered a bug and be fixed in the current "json" package. However, previous attempts at fixing this resulted in the changes being reverted because it broke too many targets implicitly depending on the inconsistent calling behavior.Inconsistent merge semantics: When unmarshaling into a non-empty Go value, the behavior is inconsistent about whether it clears the target, resets but reuses the target memory, and/or whether it merges into the target (#27172, #31924, #26946). Most oddly, when unmarshaling into a non-nil Go slice, the unused elements between the length and capacity are merged into without being zeroed first (#21092). The merge semantics of "json" came about organically without much thought given to a systematic approach to merging, leading to fragmented and inconsistent behavior.
Inconsistent error values: There are three classes of errors that can occur when handling JSON:
io.Writer
orio.Reader
. This class of errors never occurs when marshaling to or unmarshaling from a[]byte
.The "json" package is currently inconsistent about whether it returns structured or unstructured errors. It is currently impossible to reliably detect each class of error.
These behavioral flaws of "json" cannot be changed without being a breaking change. Options could be added to specify different behavior, but that would be unfortunate since the desired behavior is not the default behavior. Changing the default behavior suggests the need for a v2 "json" package.
Proposal
The analysis above suggests that a new major version of the "json" package is necessary and worthwhile. In this section, we propose a rough draft of what a new major version could look like. Henceforth, we will refer to the existing "encoding/json" package as v1, and a hypothetical new major version as v2. This is a draft proposal as the proposed API and behavior is subject to change based on community discussion.
Goals
Let us define some goals for v2:
Mostly backwards compatible: If possible, v2 should aim to be mostly compatible with v1 in terms of both API and default behavior to ease migration. For example, the
Marshal
andUnmarshal
functions are the most widely used declarations in v1. It is sensible for equivalent functionality in v2 to be named the same and have mostly the same signature. Behaviorally, we should aim for 95% to 99% backwards compatibility. We do not aim for 100% compatibility since we want the freedom to break certain behaviors that are now considered to have been a mistake.More correct: JSON standardization has become increasingly more strict over time due to interoperability issues. The default serialization should prioritize correctness.
More performant: JSON serialization is widely used and performance gains translate to real-world resource savings. However, performance is secondary to correctness. For example, rejecting duplicate object names will hurt performance, but is the more correct behavior to have.
More flexible: We should aim to provide the most flexible features that address most usages. We do not want to overfit v2 to handle every possible use case. The provided features should be orthogonal in nature such that any combination of features results in as few surprising edge cases as possible.
Easy to use (hard to misuse): The API should aim to make the common case easy and the less common case at least possible. The API should avoid behavior that goes contrary to user expectation, which may result in subtle bugs.
Avoid unsafe: JSON serialization is used by many internet-facing Go services. It is paramount that untrusted JSON inputs cannot result in memory corruption. Consequently, standard library packages generally avoid the use of package "unsafe" even if it could provide a performance boost. We aim to preserve this property.
Overview
JSON serialization can be broken down into two primary components:
We use the terms "encode" and "decode" to describe syntactic functionality and the terms "marshal" and "unmarshal" to describe semantic functionality.
We aim to provide a clear distinction between functionality that is purely concerned with encoding versus that of marshaling. For example, it should be possible to encode a stream of JSON tokens without needing to marshal a concrete Go value representing them. Similarly, it should be possible to decode a stream of JSON tokens without needing to unmarshal them into a concrete Go value.
In v2, we propose that there be two packages: "jsontext" and "json". The "jsontext" package is concerned with syntactic functionality, while the "json" package is concerned with semantic functionality. The "json" package will be implemented in terms of the "jsontext" package. In order for "json" to marshal from and unmarshal into arbitrary Go values, it must have a dependency on the "reflect" package. In contrast, the "jsontext" package will have a relatively light dependency tree and be suitable for applications (e.g., TinyGo, GopherJS, WASI, etc.) where binary bloat is a concern.
This diagram provides a high-level overview of the v2 API. Purple blocks represent types, while blue blocks represent functions or methods. The direction of the arrows represent the approximate flow of data. The bottom half (as implemented by the "jsontext" package) of the diagram contains functionality that is only concerned with syntax, while the upper half (as implemented by the "json" package) contains functionality that assigns semantic meaning to syntactic data handled by the bottom half.
The "jsontext" package
The
jsontext
package provides functionality to process JSON purely according to the grammar. This package will have a small dependency tree such that it results in minimal binary bloat. Most notably, it does not depend on Go reflection.Overview
The basic API consists of the following:
Values and Tokens
The primary data types for interacting with JSON are
Kind
,Value
, andToken
.The
Kind
is an enumeration that describes the kind of a value or token.A
Value
is the raw representation of a single JSON value, which can represent entire array or object values. It is analogous to the v1RawMessage
type.The
Compact
andIndent
methods operate similar to the v1Compact
andIndent
function.The
Canonicalize
method canonicalizes the JSON value according to the JSON Canonicalization Scheme as defined in RFC 8785.A
Token
represents a lexical JSON token, which cannot represent entire array or object values. It is analogous to the v1Token
type, but is designed to be allocation-free by being an opaque struct type.Encoder and Decoder
The
Encoder
andDecoder
types provide the functionality for encoding to or decoding from anio.Writer
or anio.Reader
. AnEncoder
orDecoder
can be constructed withNewEncoder
orNewDecoder
using default options.The
Encoder
is a streaming encoder from raw JSON tokens and values. It is used to write a stream of top-level JSON values, each terminated with a newline character.The
Decoder
is a streaming decoder for raw JSON tokens and values. It is used to read a stream of top-level JSON values, each separated by optional whitespace characters.Some methods common to both
Encoder
andDecoder
report information about the current automaton state.Options
The behavior of
Encoder
andDecoder
may be altered by passing options toNewEncoder
andNewDecoder
, which take in a variadic list of options.The
Options
type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" package.Latter option specified in the variadic list passed to
NewEncoder
andNewDecoder
takes precedence over prior option values. For example,NewEncoder(AllowInvalidUTF8(false), AllowInvalidUTF8(true))
results inAllowInvalidUTF8(true)
taking precedence.Options that do not affect the operation in question are ignored. For example, passing
Expand
toNewDecoder
does nothing.The
WithIndent
andWithIndentPrefix
flags configure the appearance of whitespace in the output. Their semantics are identical to the v1Encoder.SetIndent
method.Errors
Errors due to non-compliance with the JSON grammar are reported as
SyntacticError
.Errors due to I/O are returned as an opaque error that unwrap to the original error returned by the failing
io.Reader.Read
orio.Writer.Write
call.The v2 "json" package
The v2 "json" package provides functionality to marshal or unmarshal JSON data from or into Go value types. This package depends on "jsontext" to process JSON text and the "reflect" package to dynamically introspect Go values at runtime.
Overview
The basic API consists of the following:
The
Marshal
andUnmarshal
functions mostly match the signature of the same functions in v1, however their behavior differs.The
MarshalWrite
andUnmarshalRead
functions are equivalent functionality that operate on anio.Writer
andio.Reader
instead of[]byte
. TheUnmarshalRead
function consumes the entire input untilio.EOF
and reports an error if any invalid tokens appear after the end of the JSON value (#36225).The
MarshalEncode
andUnmarshalDecode
functions are equivalent functionality that operate on an*jsontext.Encoder
and*jsontext.Decoder
instead of[]byte
.Default behavior
The marshal and unmarshal logic in v2 is mostly identical to v1 with following changes:
omitempty
is omitted if the field value is an empty Go value, which is defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string.omitempty
is omitted if the field value would encode as an empty JSON value, which is defined as a JSON null, or an empty JSON string, object, or array (more discussion).string
option does affect Go bools and strings.string
option does not affect Go bools and strings.string
option does not recursively affect sub-values of the Go field value.string
option does recursively affect sub-values of the Go field value.string
option sometimes accepts a JSON null escaped within a JSON string.string
option never accepts a JSON null escaped within a JSON string.MarshalJSON
andUnmarshalJSON
methods declared on a pointer receiver are inconsistently called.MarshalJSON
andUnmarshalJSON
methods declared on a pointer receiver are consistently called.time.Duration
is represented as a JSON number containing the decimal number of nanoseconds.time.Duration
is represented as a JSON string containing the formatted duration (e.g., "1h2m3.456s").±math.MaxFloat
).See here for details about every change.
Every behavior change will be configurable through options, which will be a critical part of how we achieve v1-to-v2 interoperability.
See here for more discussion.
Struct tag options
Similar to v1, v2 also supports customized representation of Go struct fields through the use of struct tags. As before, the
json
tag will be used. The following tag options are supported:omitzero: When marshaling, the "omitzero" option specifies that the struct field should be omitted if the field value is zero, as determined by the "IsZero() bool" method, if present, otherwise based on whether the field is the zero Go value (per
reflect.Value.IsZero
). This option has no effect when unmarshaling. (example)omitempty: When marshaling, the "omitempty" option specifies that the struct field should be omitted if the field value would have been encoded as a JSON null, empty string, empty object, or empty array. This option has no effect when unmarshaling. (example)
string: The "string" option specifies that
StringifyNumbers
be set when marshaling or unmarshaling a struct field value. This causes numeric types to be encoded as a JSON number within a JSON string, and to be decoded from either a JSON number or a JSON string containing a JSON number. This extra level of encoding is often necessary since many JSON parsers cannot precisely represent 64-bit integers.nocase: When unmarshaling, the "nocase" option specifies that if the JSON object name does not exactly match the JSON name for any of the struct fields, then it attempts to match the struct field using a case-insensitive match that also ignores dashes and underscores. (example)
inline: The "inline" option specifies that the JSON object representation of this field is to be promoted as if it were specified in the parent struct. It is the JSON equivalent of Go struct embedding. A Go embedded field is implicitly inlined unless an explicit JSON name is specified. The inlined field must be a Go struct that does not implement
Marshaler
orUnmarshaler
. Inlined fields of typejsontext.Value
andmap[string]T
are called “inlined fallbacks”, as they can represent all possible JSON object members not directly handled by the parent struct. Only one inlined fallback field may be specified in a struct, while many non-fallback fields may be specified. This option must not be specified with any other tag option. (example)jsontext.Value
(#6213).unknown: The "unknown" option is a specialized variant of the inlined fallback to indicate that this Go struct field contains any number of “unknown” JSON object members. The field type must be a
jsontext.Value
,map[string]T
. IfDiscardUnknownMembers
is specified when marshaling, the contents of this field are ignored. IfRejectUnknownMembers
is specified when unmarshaling, any unknown object members are rejected even if a field exists with the "unknown" option. This option must not be specified with any other tag option. (example)format: The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type. (example)
New in v2. The "format" option provides a general way to customize formatting of arbitrary types.
[]byte
and[N]byte
types accept "format" values of either "base64", "base64url", "base32", "base32hex", "base16", or "hex", where it represents the binary bytes as a JSON string encoded using the specified format in RFC 4648. It may also be "array" to treat the slice or array as a JSON array of numbers. The "array" format exists for backwards compatibility since the default representation of an array of bytes now uses Base-64.float32
andfloat64
types accept a "format" value of "nonfinite", where NaN and infinity are represented as JSON strings.Slice types accept a "format" value of "emitnull" to marshal a nil slice as a JSON null instead of an empty JSON array. (more discussion).
Map types accept a "format" value of "emitnull" to marshal a nil map as a JSON null instead of an empty JSON object. (more discussion).
The
time.Time
type accepts a "format" value which may either be a Go identifier for one of the format constants (e.g., "RFC3339") or the format string itself to use withtime.Time.Format
ortime.Parse
(#21990). It can also be "unix", "unixmilli", "unixmicro", or "unixnano" to be represented as a decimal number reporting the number of seconds (or milliseconds, etc.) since the Unix epoch.The
time.Duration
type accepts a "format" value of "sec", "milli", "micro", or "nano" to represent it as the number of seconds (or milliseconds, etc.) formatted as a JSON number. This exists for backwards compatibility since the default representation now uses a string representation (e.g., "53.241s"). If the format is "base60", it is encoded as a JSON string using the "H:MM:SS.SSSSSSSSS" representation.The "omitzero" and "omitempty" options are similar. The former is defined in terms of the Go type system, while the latter in terms of the JSON type system. Consequently they behave differently in some circumstances. For example, only a nil slice or map is omitted under "omitzero", while an empty slice or map is omitted under "omitempty" regardless of nilness. The "omitzero" option is useful for types with a well-defined zero value (e.g.,
netip.Addr
) or have anIsZero
method (e.g.,time.Time
).Type-specified customization
Go types may customize their own JSON representation by implementing certain interfaces that the "json" package knows to look for:
The v1 interfaces are supported in v2 to provide greater degrees of backward compatibility. If a type implements both v1 and v2 interfaces, the v2 variant takes precedence. The v2 interfaces operate in a purely streaming manner. This API can provide dramatic performance improvements. For example, switching from
UnmarshalJSON
toUnmarshalJSONV2
forspec.Swagger
resulted in an ~40x performance improvement.Caller-specified customization
In addition to Go types being able to specify their own JSON representation, the caller of the marshal or unmarshal functionality can also specify their own JSON representation for specific Go types (#5901). Caller-specified customization takes precedence over type-specified customization.
The
MarshalFuncV1
andUnmarshalFuncV1
functions can always be implemented in terms of the v2 variants, which calls into question their utility. There are several reasons for providing them:To maintain symmetry and consistency with the method interfaces (which must provide both v1 and v2 variants).
To make it interoperate well with existing functionality that operate on the v1 signature. For example, to integrate the v2 "json" package with proper JSON serialization of protocol buffers, one could construct a type-specific marshaler using
json.MarshalFuncV1(protojson.Marshal)
, whereprotojson.Marshal
provides the JSON representation for all types that implementproto.Message
(example).Caller-specified customization is a powerful feature. For example:
RawNumber
type.Options
Options may be specified that configure how marshal and unmarshal operates:
The
Options
type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" package. This is exactly the sameOptions
type as the one in the "jsontext" package.The same
Options
type is used for bothMarshal
andUnmarshal
as some options affect both operations.The
MarshalJSONV2
,UnmarshalJSONV2
,MarshalFuncV2
, andUnmarshalFuncV2
methods and functions take in a singularOptions
value instead of a variadic list because theOptions
type can represent a set of options. The caller (which is the "json" package) can coalesce a list of options before calling the user-specified method or function. Being given a singleOptions
value is more ergonomic for the user as there is only one options value to introspect withGetOption
.While the
JoinOptions
constructor technically removes the need forNewEncoder
,NewDecoder
,Marshal
, andUnmarshal
from taking in a variadic list of options, it is more ergonomic for it to be variadic as the user can more readily specify a list of options without needing to callJoinOptions
first.Errors
Errors due to the inability to correlate JSON data with Go data are reported as
SemanticError
.Experimental implementation
The draft proposal has been implemented by the
github.com/go-json-experiment/json
module.Stability
We have confidence in the correctness and performance of the module as it has been used internally at Tailscale in various production services. However, the module is an experiment and breaking changes are expected to occur based on feedback in this discussion, it should not be depended upon by publicly available code, otherwise we can run into situations where large programs fail to build.
Consider the following situation:
go-json-experiment/json@v0.5.0
.go-json-experiment/json@v0.8.0
.v0.5.0
andv0.8.0
.v0.8.0
be selected to build program P.v0.8.0
breaks module A since it is using the API forv0.5.0
, which is not compatible.If open source code does use
go-json-experiment
, we recommend that use of it be guarded by a build tag or the entire module be forked and vendored as a dependency.Performance
Due to a combination of both a more efficient implementation and also changes to the external API to better support performance, the experimental v2 implementation is generally as fast or slightly faster for marshaling and dramatically faster for unmarshaling.
See the benchmarks for results.
Beta Was this translation helpful? Give feedback.
All reactions