Handling of duplicate keys in JSON #20949

laanwj commented at 11:47 AM on January 16, 2021: member

Our handling of duplicate keys in JSON dictionaries is different from that of other languages like Python, Javascript (and most others I know of).

While those languages parse the JSON and build hash tables, the last assignment of a key is what takes effect. e.g.

>>> json.loads('{"a": "first", "a": "second"}')
{'a': 'second'}

Our univalue, though, stores every (key,value) pair in an array, and when a key is queried it returns the first associated value.

#include <iostream>
#include <univalue.h>

int main() {
    UniValue val;
    val.read("{\"a\": \"first\", \"a\": \"second\"}");
    std::cout << val["a"].write() << std::endl;
    return 0;
}

$ g++ unitest.cpp -o unitest -Iunivalue/include -L src/univalue/.libs -lunivalue
$ ./unitest 
"first"

This is not generally a problem but can create subtle interoperability issues. For example a proxy that validates the allowed JSON-RPC commands and lets through only some combinations of command/argument (and doesn't take this into account) could be misled in this way.

Either reversing this order or adding a "strict mode" that rejects duplicate keys might make sense. I don't know.

(issue originally reported by Florian Mathieu)

laanwj added the label RPC/REST/ZMQ on Jan 16, 2021

prusnak referenced this in commit a715d489dc on Jan 16, 2021

prusnak referenced this in commit c79576dd75 on Jan 16, 2021

prusnak referenced this in commit 14c735bde4 on Jan 16, 2021

prusnak referenced this in commit 8e7fc4f871 on Jan 16, 2021

prusnak commented at 6:36 PM on January 16, 2021: contributor

Sent a pull request here: https://github.com/bitcoin-core/univalue/pull/25

prusnak referenced this in commit d492dd4eee on Jan 16, 2021

luke-jr commented at 5:13 PM on January 17, 2021: member

I think it would be best to error if duplicate keys are found where not supported.

laanwj commented at 2:50 PM on January 19, 2021: member

Right, detecting and failing on duplicated keys seems to be the preferred way forward here.

Changing the behaviour in univalue is technically the most easy but I doubt there will ever be agreement on what is the right behavior, and it would break software that makes the (currently right) assumption.

prusnak commented at 2:58 PM on January 19, 2021: contributor

Wouldn't it make sense to drop UniValue completely if duplicate keys are unwanted? Even simple std::map with std::variant could suffice in that case, no?

laanwj commented at 3:02 PM on January 19, 2021: member

No, tere are plenty of places where the JSON type makes a difference. Dropping our JSON parsing library over this seems bizarre.

ubbabeck commented at 4:32 PM on January 27, 2026: none

Hi,

I was investigating whether this issue still requires work and wanted to share my findings. I am fairly new to the bitcoin core codebase so excuse my ignorance.

Current State

Commit 6bd1d20b (PR #26628, ) added duplicate detection at the RPC layer in https://github.com/bitcoin/bitcoin/blob/27aeeff630147cca35488752c01aaf5750b69fbc/src/rpc/server.cpp#L380.

However, UniValue itself still:

Accepts duplicate keys during parsing
Stores both values internally but only returns the first
Makes duplicate values inaccessible (silent data loss)

Where Should Detection Happen?

Option 1: Parser Level (UniValue)

Pros: Complete protection everywhere, prevents data loss, consistent behavior Cons: Breaks test rpc_tests.cpp:96-97 which expects RPC-specific error, generic error messages, performance cost

Option 2: RPC Layer (Current)

Pros: Already implemented, specific error messages, non-breaking Cons: Doesn't protect nested objects like {"outer": {"key": 1, "key": 2}}, inconsistent behavior across contexts

Questions for Discussion

Should UniValue (general JSON library) enforce unique keys, or is this a consumer policy?
Does the nested object vulnerability need parser-level protection?
Should we add a strict parsing mode option to UniValue? as suggested in the issues description by @laanwj:

Proposed Solutions

I am not aware if there already exists a plan for the UniValue implementation, but here a two possible solutions I see.

Short term: Document current behavior and limitations

Long term: Consider adding optional strict mode

val.read(json, UniValue::STRICT_MODE);  // Rejects duplicates

This allows RPC to use strict mode while maintaining backward compatibility for other uses.

Happy to work on whichever approach maintainers prefer. I have a working parser-level implementation but it breaks the existing test expectations, suggesting RPC-layer detection was the intended design.

maflcko commented at 5:11 PM on January 27, 2026: member

Is there a use-case for silently continuing anywhere? I haven't looked deeper, but I think univalue should just reject duplicates at the lowest level while parsing or constructing any dict.

ubbabeck commented at 7:00 PM on January 27, 2026: none

Whenever bitcoin core has any use-cases for silently continuing. I do not know, my assumption is no, but I will look further into this.

When it comes to json, even the rfc8259 isn't clear on what is correct and states: The names within an object SHOULD be unique.. Though from how I understand and work with JSON it's mentally mapped as a MUST, so I agree with @maflcko's statement that univalue should reject duplicates.

One use-case I found for duplicate keys is for merging of objects, but I have never encountered this in the wild during my career.

When it comes to how to go about this. Do you think it makes more sense to thow a runtime error as in univalue_get.cpp? https://github.com/bitcoin/bitcoin/blob/c0e6556e4f51b454c3f6e9069044761c34e99f81/src/univalue/lib/univalue_get.cpp#L56

Considering how to implement this, do you believe it makes more sense to throw a runtime error, similar to what is done in univalue_get.cpp? This approach could allow developers to identify and handle exceptions effectively, instead of returning a bool as is done today.

maflcko commented at 7:58 PM on January 27, 2026: member

I would have recommended to start by adding sanity checks to pushKV (and friends), because they are only used internally and I think assumed/expected to not add duplicate keys, so they could throw a runtime error on a violation.

However, this does not seem so trivial, because duplicate checking could bring the calls back to O(n^2), which apparently makes a difference in some cases: 710a7136f93133bf256d37dc8c8faf5a6b9ba89d