getJsonToken assumes underlying string is null-terminated but requires end pointer #28260

issue Brotcrunsher openend this issue on August 11, 2023
  1. Brotcrunsher commented at 7:32 pm on August 11, 2023: contributor

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    When a function requires a const char *end, one might assume that the string doesn’t need to be null terminated. getJsonToken requires this parameter:

    0enum jtokentype getJsonToken(std::string& tokenVal, unsigned int& consumed,
    1                            const char *raw, const char *end)
    

    However, the function does assume that the underlying memory block is null-terminated as is visible here:

    0        if (!strncmp(raw, "null", 4)) {
    1            //...
    2        } else if (!strncmp(raw, "true", 4)) {
    3            //...
    4        } else if (!strncmp(raw, "false", 5)) {
    5            //...
    

    and here:

    0if ((*firstDigit == '0') && json_isdigit(firstDigit[1]))
    

    and possibly others.

    If the string isn’t null terminated, then we run risk of reading uninitialized memory, which could lead to a crash.

    In practice this currently does not seem to lead to any issues, as all usages that I could find used underlying null-terminated blocks. It is however an uncomfortable pitfall for future development.

    Expected behaviour

    The function should either make sure to never access anything beyond or at the end pointer, or should document this requirement properly.

    Steps to reproduce

    0char innocentChar = 'n';
    1getJsonToken(/*...*/, &innocentChar, (&innocentChar) + 1);
    

    Relevant log output

    No response

    How did you obtain Bitcoin Core

    Other

    What version of Bitcoin Core are you using?

    master@3654d84

    Operating system and version

    Common code issue, os irrelevant

    Machine specifications

    No response

  2. Brotcrunsher renamed this:
    getJsonToken assumes underlying string is null terminated
    getJsonToken assumes underlying string is null-terminated but requires end pointer
    on Aug 11, 2023
  3. willcl-ark added the label Utils/log/libs on Jan 24, 2024
  4. daniel-btc-nym commented at 2:38 am on February 2, 2024: none
    I’m working on it. Plan is to harden the function to not read at end and after. I have some initial small draft changes, and want to add testing. Maybe, we can even highlight the issue via additional fuzz+msan tests.
  5. maflcko commented at 12:09 pm on February 5, 2024: member
    I guess this can be fixed by using std::string_view::starts_with
  6. daniel-btc-nym commented at 3:18 am on February 8, 2024: none

    The bug here isn’t actually that bad:

    • It uses strncmp(…, 4) and strncmp(…, 5) respectively: so, in worst case it will read 3 or 5 bytes after end.
    • Similar for the number parsing.

    In any case, I have a change that fixes both (adding guards for reading past end, and making the code pretty / easy to read again). I have unit tests for the keywords (true, false, null) but want to add some for the numbers.

    Haven’t made any progress on MSAN. But if we would poison data around begin and end, we could write some exhaustive tests for small inputs. If we restrict to (printable characters, or these mentioned in the code), we could then be pretty certain the new code is correct.

    Touching code is always a risk, so I am trying to minimize it.

    FWIW, my code so far: https://github.com/bitcoin/bitcoin/commit/b77afc50b14ea1af61ea3b32e75d2aa974b07844

  7. maflcko commented at 8:37 am on February 8, 2024: member

    Haven’t made any progress on MSAN.

    You can copy the bytes into a vector first and call shrink_to_fit, which should be enough to tell msan where the data really ends?

  8. daniel-btc-nym commented at 1:22 pm on February 8, 2024: none
    I still need to look into how MSAN is integrated into the current unit test suite. Questions like: how can I run the unit tests I added under MSAN? Are all unit tests run under MSAN with a certain make target … the basics. I am very new to this repository here :)
  9. maflcko commented at 1:25 pm on February 8, 2024: member

    It is probably not worth it to try to setup msan, as it requires building llvm from scratch.

    It is easier to just use valgrind. For example: valgrind ./src/test/test_bitcoin -t vector_tests

  10. daniel-btc-nym commented at 3:40 am on February 9, 2024: none

    Yeah, I ran some valgrind tests before and after my changes and it clearly showed the “uninitialized” accesses when we copy the data first into a tight vector as you suggested.

    I added more test coverage and think this should be enough for this issue.

    What do you think about the changes?

    If they look OK, I can merge them into a single commit – (I suppose by forking again?) – and then making a pull-request?

  11. maflcko commented at 8:45 am on February 9, 2024: member

    I like bugfixes the most, when they are touching only a single line of code. But if more changes are required, then that is fine, too.

    I checked that this is only a theoretical issue in Bitcoin Core, and can not be hit in production code, but it may still be good to fix it.

    It may be good to update the fuzz target to catch the issue here: src/test/fuzz/parse_univalue.cpp

  12. daniel-btc-nym commented at 4:50 pm on February 17, 2024: none

    I like bugfixes the most, when they are touching only a single line of code. But if more changes are required, then that is fine, too.

    My changes have a few orthogonal, but related fixes and test additions. We could make different pull-requests for them. Let me know what you prefer.

    For example:

    1. Check strncmp will not go out of bound. This is lines 99-115 in univalue_read.cpp ; unit tests that call no_reading_beyond_end_test
    2. Change parsing of numbers, where there were two places that might read past the end. This is lines 126-143 in univalue_read.cpp ; unit test that call: parsing_numbers_beyond_end_test.
    3. Added exhaustive_short_string_test unit test, which was previously causing errors when run in valgrind; now runs cleanly.
    4. Added a couple more unit tests such as parsing the empty buffer.
    5. Fixed: fail18.json which is a test where the nesting is too deep.
    6. Added pass5.json which is a test where the nesting is just shallow enough to pass.

    It may be good to update the fuzz target to catch the issue here: src/test/fuzz/parse_univalue.cpp

    I added it in, and ran the fuzzer using HONGGFUZZ. Since this isn’t a “runaway” overflow, we don’t get any crashes. When running the entire fuzzer under valgrind, we seem to get all sorts of (seemingly unrelated) problems.

    I thus added a mini-exhaustive test that runs quickly and cleanly under valgrind now, but didn’t before.

    A few more questions:

    • Are we adhering to any line lengths conventions?
    • Are we adhering to conventions for variable names and function names?
    • Do we have an auto-formatter that takes care of basic formatting such as where open-braces go, how many spaces to indent, and maybe even the line length?
    • My changes do change the semantics for some corner-cases where we were reading past the end of the buffer before: For example, previously: char buf[] = "true"; getJsonToken(tokenVal, consumed, buf, buf + 1); would parse as JTOK_KW_TRUE and now returns JTOK_ERR. There are also the opposite: previously, char buf[] = "01"; getJsonToken(tokenVal, consumed, buf, buf + 1); would error since we do not allow leading 0s, but now, this would parse as “0”.
    • Do we have some place where we could dump outputs from valgrind that I could link to in the commit messages or pull-request explanation? I would like to show before / after? Or basically, anywhere that creates durable data that can easily be linked to?
    • What do you suggest my next steps would be? I am fairly happy with the code.
  13. maflcko commented at 9:30 am on February 19, 2024: member

    My changes have a few orthogonal, but related fixes and test additions. We could make different pull-requests for them. Let me know what you prefer.

    According to the dev notes, they should be different commits. Ideally bugfixes and features in a different PR. See https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches (The doc, and it’s links to the dev notes and productivity notes should explain all your questions)


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-31 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me