listtransactions returns invalid JSON when comment contain non-UTF8 special chars #9166

issue c0deright opened this issue on November 15, 2016
  1. c0deright commented at 6:11 PM on November 15, 2016: none

    Can you reliably reproduce the issue?

    Yes

    If so, please list the steps to reproduce below:

    % bitcoin-cli listtransactions '*' 2 error: couldn't parse reply from server

    Expected behaviour

    JSON being printed on console

    Actual behaviour

    JSON-PARSER in bitcoin-cli complains about invalid JSON received from bitcoind

    What version of bitcoin-core are you using?

    % bitcoin-cli --version Bitcoin Core RPC client version v0.13.1.0-g03422e5`

    What is the last known version where listtransactions was working

    The problem was discovered after upgrading nodes from 0.11.2 to 0.13.1

    Why does this happen?

    The wallet.dat was used since the 0.3.XX days of Bitcoin Core and upgraded over time. Some old TXs in that wallet.dat contain comments with non-ASCII chars encoded as latin1.

    In this case it's the German Umlaut ü encoded as 0xFC (latin1) instead of 0xC3BC (UTF8).

    bitcoind passes the comment in it's JSON response to bitcoin-cli without transformation, so the parser in bitcoin-cli complains about the latin1 representation of that character.

    JSON must to be used with UTF8 encoding.

    tcpdump confirms that the problem only occurs with TXs with non-utf8 special chars in their comments.

    Skipping the TXs with the problematic comments works fine:

    % bitcoin-cli listtransactions '*' 1 2 (...)

    Does dumping wallet.dat with db_dump, modifying the dump file and db_load work as a workaround?

    Yes.

    Stop bitcoind, make a backup of your wallet.dat file.

    db_dump -p -f dump.txt wallet.dat
    grep -F comment dump.txt | perl -p -e 's#^.*comment(.*?)\\0b.*$#$1#'
    # this will output all your TX comments to inspect visually before modifying
    vim dump.txt
    db_load -f dump.txt new.dat
    mv wallet.dat wallet.dat.orig
    mv new.dat wallet.dat
    

    Problem: grep from above doesn't find all your comments. So there might still be comments left with latin1-encoded chars.

    Make 100% sure to replace the special chars only in your comments. Do not use search/replace! Replace each 1 byte char with exactly 1 byte ASCI char.

    In my case I replaced \fc with u (German ü --> u)

    If you change the length of the string by replacing one special char with more than 1 char, you will get errors like the following on next start of bitcoind:

    2016-11-15 17:29:55 init message: Loading wallet... Warning: Error reading wallet.dat! All keys read correctly, but transaction data or address book entries might be missing or incorrect.

    What's the best way to handle this?

  2. ryanofsky commented at 8:32 PM on November 15, 2016: member

    On the db_dump/db_load error, I think this would have to be caused by an accidental \fc -> _ replacement in a part of the dump that wasn't an address name. I don't know how many replacements you had to make, but if it was not too many, I might try visually inspecting a diff of the dump file before and after the replacements, or maybe just starting with a few replacements and then adding more so you can identify which one(s) cause the warning. Also if there aren't too many transactions, it might be easier to use the wallet gui interface to change the address labels rather than work with dump/load.

    On the underlying bug which caused latin1 to be stored in the database rather than utf8, it looks like it was fixed in 2011 with the setCodecForCStrings call added here:

    https://github.com/bitcoin/bitcoin/commit/3f0816e3d926e0ea78ac7b6cd43efe62355885c8#diff-8c9d79ba40bf702f01685008550ac100R116

    One improvement that might be a good idea to make now would be to have UniValue::setStr validate that strings passed to it are utf8. There actually already is utf8 validation code in the UniValue library (used during parsing), but it just isn't called right now by UniValue::setStr. Changing this could ensure that bitcoin never outputs invalid JSON, and make bugs like this one easier to track down and fix.

  3. laanwj added the label RPC/REST/ZMQ on Nov 16, 2016
  4. laanwj commented at 1:01 PM on November 16, 2016: member

    One improvement that might be a good idea to make now would be to have UniValue::setStr validate that strings passed to it are utf8.

    This would make sense, and I considered it at the time - however, it's not quite clear what to do in the case where it is not. It could raise an exception, but this would still make it impossible to list those transactions. And it'd fail server-side instead of client-side :(

  5. jonasschnelli commented at 1:16 PM on November 16, 2016: contributor

    One improvement that might be a good idea to make now would be to have UniValue::setStr validate that strings passed to it are utf8.

    This would make sense, and I considered it at the time - however, it's not quite clear what to do in the case where it is not. It could raise an exception, but this would still make it impossible to list those transactions. And it'd fail server-side instead of client-side :(

    Would it be possible to sanitize a string in such cases (resulting in a string with stripped out [or similar] chars)?

  6. laanwj commented at 1:25 PM on November 16, 2016: member

    Would it be possible to sanitize a string in such cases (resulting in a string with stripped out [or similar] chars)?

    Sure! But please don't do that in univalue. It shouldn't do its own hidden munging of data. In this case it may make sense to have a two-pronged approach:

    • Make univalue validate strings as unicode, raise an exception if not.
    • Make sure bitcoind never feeds it invalid unicode. This means strings read from the wallet database will need to be sanitized. I don't think we have any other un-sanitized sources of strings that are reported on RPC (e.g. "user agent" strings from the network are already sanitized in an extremely strict way.)
      • It should already be impossible for new non-unicode strings to get into the database, as unicode is checked on parsing the incoming JSON. So at least there will be no new cases.
  7. c0deright commented at 1:30 PM on November 16, 2016: none

    @ryanofsky

    It's not about the labels (well, maybe?) but about the TX comments in my case.

    I used db_dump -p to dump the wallet.dat. I then used vim and searched for the string comment. I found the TX comments with the comment being there in plain text. The ü chracter in this case was written as the 3 chars \fc. I just replaced these 3 chars with ue, saved the file, reimported it via db_load and fired up bitcoind with the above mentioned error message.

    I did not do a search/replace for \fc. I did only edit the TX comments by hand.

    But I guess bitcoind saves checksums over all the TXs in the wallet so this might have triggered the error message.

    At the moment we have no solution so far.

  8. laanwj commented at 1:36 PM on November 16, 2016: member

    At the moment we have no solution so far.

    Quick fix would be to delete the transaction from the database (using the db_dump/db_load trick) and do a rescan. It will be back without metadata.

  9. ryanofsky commented at 6:40 PM on November 16, 2016: member

    @adminblogger, I think the problem is replacing a single byte \fc with two bytes ue. The boost serialization in bitcoin uses length-prefixed strings, so you would need to find and update the length fields before the strings if you wanted to add the extra bytes.

  10. c0deright commented at 8:38 AM on November 17, 2016: none

    @ryanofsky Thanks for that hint. I should have thought of this myself.

    I'll try to do the dump/load thing again while replacing the 1byte special char with 1 char instead of 2 chars. That might work.

    EDIT

    I can confirm that this is working. I just modified 2 production wallets that way. Will edit first post to reflect this.

    EDIT

    With a third wallet the JSON parser still complained. Debugging this showed that one doesn't find all the comments through grepping for comment in the dump file.

    The preferable way still would be that bitcoind would handle this somehow.

  11. laanwj commented at 8:11 AM on November 21, 2016: member

    The preferable way still would be that bitcoind would handle this somehow.

    Yes. But I'm afraid you're going to have to do that yourself (have given some suggestions how above), this comes up extremely rarely. Wallets grandfathered in from that time are rare in themselves, and having non-unicode stuff in the database is doubly rare.

  12. MarcoFalke commented at 9:15 PM on April 26, 2020: member

    The feature request didn't seem to attract much attention in the past. Also, the issue seems not important enough right now to keep it sitting around idle in the list of open issues.

    Closing due to lack of interest. Pull requests with improvements are always welcome.

  13. MarcoFalke closed this on Apr 26, 2020

  14. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me