UniValue should only parse the first JSON construct of a line #6558

domob1812 commented at 12:36 PM on August 15, 2015: contributor

The current parser UniValue::read continues until it gets to the end of the parsed string (JTOK_NONE) or until an error occurs while lexing (JTOK_ERR). If the string contains, for instance, multiple top-level objects or arrays, they are all parsed and only the last is saved in the UniValue object. On the other hand, parsing fails if a valid object is followed by garbage. (While parsing does not fail, surprisingly, if the string starts with a JTOK_ERR.)

I think this behaviour is highly confusing and should be changed. I'm willing to work on this myself, but would like to discuss it before starting. My suggestion is the following:

Reading from a string should only parse until a full JSON construct is found (object or array as now, or a single number/string/keyword). Afterwards, parsing should stop and discard any trailing parts of the string completely. I want to add also the option to return (in an optional output argument) the position where parsing stopped, so that the caller may get everything in there or check that no unparsed content remains; I do not think that Bitcoin currently needs this functionality, but it is almost "for free" and may be useful in the future.

Does this sound like a good idea, or is there a problem with my plan that I missed?

jonasschnelli commented at 3:25 PM on August 17, 2015: contributor

I agree. IMO UniValue::read should only read the first top level object/array and if another object opens ([ or {) (or garbage), it should abort the parsing and response with an error. This would be conform with the JSON standard.

Would be nice if you could try to PR a fix for this (please also provide unit tests).

domob1812 commented at 4:15 PM on August 17, 2015: contributor

Yes, of course I will write the patch (as promised), including proper testing.

I personally think that a recursive implementation of UniValue::read would be easier to read than the current one with explicit stack. Is it ok if I try to rewrite it as I think it is best, or should I try to keep the changes to a minimum? I. e., basically just change the stopping criterion of the loop while keeping the explicit stack?

dcousens commented at 1:03 AM on August 18, 2015: contributor

Why do we have our own JSON writer/parser anyway? Why not just use https://github.com/nlohmann/json, it is only a header?

jonasschnelli commented at 7:28 AM on August 18, 2015: contributor

UniValue is a very flexible and lightweight JSON parser/encoder. We just moved from JSON Spirit to UniValue because it's more lightweight and reduces compile time.

jonasschnelli commented at 7:32 AM on August 18, 2015: contributor

@domob1812 I think keeping the changes at a minimum would be good. If you have a bigger changeset in mind, you might open a PR for that at the UniValue upstream (https://github.com/jgarzik/univalue).

domob1812 commented at 7:47 AM on August 18, 2015: contributor

Alright, was just a thought. I'll submit a minimal PR for Bitcoin and will think about upstream patches later.

jgarzik commented at 1:48 PM on August 18, 2015: contributor

Related: PRs that sync upstream with latest Bitcoin Core appreciated...

domob1812 commented at 9:36 AM on August 19, 2015: contributor

@jgarzik: Is there some "defined procedure" for merging upstream patches for UniValue down to Bitcoin Core? E. g., do you do that regularly or should all patches be submitted against both repositories? If there is some procedure, I can very well submit my work to your repository first and have it "flow down" from there.

jgarzik commented at 12:17 PM on August 19, 2015: contributor

@domob1812 No defined procedure. Currently Bitcoin Core is ahead of "upstream", which needs to be remedied.

jonasschnelli commented at 7:10 AM on August 20, 2015: contributor

@domob1812: one way could be to improve/fix in this repository with focus on a small diff. Once merged, open a PR upstream and sync bitcoins univalue with upstream. Then you could make more conceptual changes upstream, once it's stable, open a PR in this repository with the upstream changes.

domob1812 commented at 7:13 AM on August 20, 2015: contributor

Thanks for the input, this is actually what I have been thinking as well. I'll do that.

domob1812 referenced this in commit e938122b7b on Aug 20, 2015

sipa commented at 3:05 PM on August 20, 2015: member

@Jeff after fixing and synchronizing upstream, we can maybe start treating univalue as a subtree like leveldb and secp256k1? There are utility scripts now to verify that the included code matches the specifies commit in the upstream repo etc.

laanwj commented at 9:00 AM on August 21, 2015: member

I personally think that a recursive implementation of UniValue::read would be easier to read than the current one with explicit stack

An explicit stack is preferred here. Easier to keep bounds on memory usage (we could, for example, limit the max depth easily). Also: execution stacks can be shallow on some architectures especially those of threads.

jgarzik commented at 1:34 PM on August 21, 2015: contributor

+1 @sipa RE subtree +1 @laanwj RE explicit stack

jonasschnelli commented at 1:41 PM on August 21, 2015: contributor

subtree make sense. I just synced upstream (https://github.com/jgarzik/univalue/commits/master). As soon as this is merged, i try to PR including UniValue over a subtree. +1 for explicit stack... but maybe another PR, maybe upstream.

laanwj closed this on Aug 24, 2015

jonasschnelli referenced this in commit 7e42d051f4 on Sep 2, 2015

MarcoFalke locked this on Sep 8, 2021