Mistakes in Go: JSON (un)marshalling
Published on — Filed under golang, protip
Disclaimer: The do's and dont's in this article strike me as obvious advice but, from time to time, one does run into these...
Suppose you want to decode two JSON documents and do something with them. Here's how it'd look (without error checking for the sake of brevity but you should always check your errors):
var alice Person
var bob Person
json.Unmarshal(aliceBytes, &alice)
doSomethingWithPerson(alice)
json.Unmarshal(bobBytes, &bob)
doSomethingWithPerson(bob)
Clean. One struct address per unmarshal, as the universal law mandates.
A not so unreasonable behavior
Now suppose, for whatever reason — cosmic rays, Mercury retrograde, the prohibitive cost of memory these days — you decided to write this instead:
var p Person
json.Unmarshal(aliceBytes, &p)
doSomethingWithPerson(p)
json.Unmarshal(bobBytes, &p)
doSomethingWithPerson(p)
Still seems correct, right? The answer lies somewhere between why the hell would you do that to maybe it'll do what you want.
Any information read into p
will be overwritten on the second call to json.Unmarshal
. The problem is the information not read into p
by the second call.
If Person
is this struct:
type Person struct {
Name string `json:"name"`
Age int `json:"age"`
}
... then, decoding from two JSON documents that look like this:
{"name": "alice", "age": 30}
{"name": "bob"}
... while reusing the same struct address &p
, the age in the first document will "leak" to the second — age
is not present in bob's JSON, so it won't be written to the age
property.
A more familiar face
The famous myth that declaring variables outside of the loop is more performant than per-iteration allocation can cause someone to write this:
var p Person
people := []Person{}
for _, r := range jsonArray {
if err := json.Unmarshal(r, &p); err == nil {
people = append(people, p)
}
}
Using the two documents above, Bob is now also 30 years old:
// fmt.Println(people)
[Person{Name: "alice", Age: 30} Person{Name: "bob", Age: 30}]
The lesson is simple: narrow down the scopes of your variables as much as possible and don't try to outsmart the compiler.
But it gets worse...
The insidious nature of pointers
Suppose now we have three JSON records, exactly like this, in this order:
{"name": "alice", "age": 30}
{"age": 10, "name": 3} /* Can't be converted to Person */
{"name": "bob"}
Let's go back to that reading loop:
jsonArray := [][]byte{
[]byte(aliceJSON),
[]byte(brokenJSON),
[]byte(bobJSON),
}
// ...
for _, r := range jsonArray {
if err := json.Unmarshal(r, &p); err == nil {
people = append(people, p)
}
}
fmt.Println(people)
Care to take a wild guess at the output? How old is Bob going to be?
[Person{Name: "alice", Age: 30} Person{Name: "bob", Age: 10}]
That's right. Even though the second record is broken JSON and produces an error during the unmarshal, even though we handle the error and skip it, we still get the side effects of its partial unmarshal "leaking" onto the next run of that loop.
Note
The quotes around leak are intentional; the code is doing exactly what it should, this is a case of PBKAC.
Long-story-short
- Don't reuse the same pointer for multiple JSON unmarshal passes
- Don't try to outsmart the compiler
- Narrow the scope of your variables as much as possible
- (Tangential but I can't stress this enough so I just drop it whenever I can:) Don't use pointers unless you really need to — immutability is your friend