Add full UTF-8 support to RPC #7892

pull laanwj wants to merge 5 commits into bitcoin:master from laanwj:2016_04_i18n_unicode_rpc changing 14 files +209 −44
  1. laanwj commented at 1:50 PM on April 16, 2016: member

    (the univalue patch should be taken upstream - the RPC test and doc change should end up here. Posting it here for visibility and because this fixes an ancient issue: #2127. I think this is important to many non-English users)

    This adds full UTF-8 support for both on input and output through JSON.

    bitcoin-cli usage

    Before:

    $ src/bitcoin-cli -datadir=/store/tmp/testbtc getnewaddress "рыба"
    1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
    $ src/bitcoin-cli -datadir=/store/tmp/testbtc listaccounts
    {
    ...  "\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u0080\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u008b\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b1\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b0": 0.00000000
    }
    $ src/bitcoin-cli -datadir=/store/tmp/testbtc getaccount 1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
    рыба
    

    After:

    $ src/bitcoin-cli -regtest getaccountaddress "рыба"
    mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2
    $ src/bitcoin-cli -regtest listaccounts
    {
    ...
      "рыба": 0.00000000
    }
    $ src/bitcoin-cli -regtest getaccount mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2                                       
    рыба
    
    

    GUI

    This also affects the GUI debug console.

    Before:

    i18n_before

    When strings are passed directly (such as for getaccount's return argument) it works fine, but when they go through JSON formatting/parsing, it fails.

    After:

    i18n_after

    RPC test

    A test for this new functionality has been added to the wallet.py test.

    See the commit message of the first commit for details on what exactly had to be changed in univalue.

  2. laanwj added the label RPC/REST/ZMQ on Apr 16, 2016
  3. jonasschnelli commented at 5:51 PM on April 16, 2016: contributor

    Nice! Concept ACK.

    Not sure how we should treat the UniValue changes. Should we try to keep the "upstream" repository bitcoin/univalue in sync?

  4. MarcoFalke commented at 5:59 PM on April 16, 2016: member

    Concept ACK. Tests look good.

    "upstream" repository bitcoin/univalue in sync

    The patch goes to jgarzik/univalue as bitcoin/univalue is only for bitcoin related patches. (Still in the end we need to merge jgarzik/univalue into bitcoin/univalue because we use this as subtree.)

  5. laanwj force-pushed on Apr 18, 2016
  6. laanwj force-pushed on Apr 28, 2016
  7. gmaxwell commented at 7:48 AM on May 17, 2016: contributor

    Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

  8. in qa/rpc-tests/wallet.py:None in 0ef6abf32d outdated
       0 | @@ -1,4 +1,6 @@
       1 |  #!/usr/bin/env python2
       2 | +# coding=utf-8
       3 | +# ^^^^^^^^^^^^ TODO remove when supporting only Python3
    


    MarcoFalke commented at 8:05 AM on May 17, 2016:

    Needs rebase.

  9. laanwj commented at 10:25 AM on May 18, 2016: member

    Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

    I don't think that's possible with just unicode?

    ANSI escape sequences are a whole different story. In JSON objects they'll be escaped, but when printing strings (say, for help, when there is a single string result) everything is let through. Not sure whether this should be changed, but this would be a change local to bitcoin-cli, not the unicode framework. I'd suggest doing so in a separate pull.

    Nothing new here though: It's always been possible to pass any kind of crap characters from the server (just look at my OP) to bitcoin-cli, this just makes the round-trip sane and conform to UTF-8.

  10. dcousens commented at 1:03 PM on May 19, 2016: contributor

    concept ACK, once-over utACK (did not cross-verify UTF-8 impl.).

    Could probably use more tests for that JSONUTF8Writer...

  11. laanwj commented at 8:49 AM on May 20, 2016: member

    Could probably use more tests for that JSONUTF8Writer... ing

    There are few direct tests for it, but all the other tests that parses JSON with strings in it is testing it. What would be nice is doing some fuzzing using https://github.com/laanwj/univalue/tree/2015_11_unifuzz and see that everything is accepted is valid utf-8.

  12. gmaxwell commented at 4:10 PM on May 25, 2016: contributor

    @pstratem Any interest in fuzzing this?

  13. sipa commented at 4:20 PM on May 25, 2016: member

    Needs rebase.

  14. laanwj force-pushed on Jun 9, 2016
  15. laanwj commented at 6:21 AM on June 9, 2016: member

    Rebased, done and removed the Python3 TODOs

  16. Squashed 'src/univalue/' changes from 2740c4f..f32df99
    f32df99 Merge branch '2016_04_unicode' into bitcoin
    280b191 Merge remote-tracking branch 'jgarzik/master' into bitcoin
    c9a716c Handle UTF-8
    bed8dd9 Version 1.0.2.
    5e7985a Merge pull request #14 from laanwj/2015_11_escape_plan
    
    git-subtree-dir: src/univalue
    git-subtree-split: f32df99e96d99ab49e5eeda16cac93747d388245
    60ab9b2006
  17. Merge commit '60ab9b200654ef0914459711cf2b22be16be3dc2' 63151521fd
  18. test: add ensure_ascii setting to AuthServiceProxy
    Add a setting ensure_ascii to AuthServiceProxy. This setting,
    defaulting to True (backwards compatible),
    is passed through to json.dumps. If set to False, non-ASCII characters
    >0x80 are not escaped. This is useful for testing server
    input processing, as well as slightly more bandwidth friendly in case of
    heavy unicode usage.
    a406fcb6ca
  19. test: test utf-8 for labels in wallet 6bbb4ef399
  20. doc: Mention full UTF-8 support in release notes 7982fce64c
  21. laanwj force-pushed on Jun 10, 2016
  22. laanwj commented at 1:23 PM on June 10, 2016: member

    Ok:

    This should be ready now.

  23. gmaxwell commented at 10:26 AM on June 15, 2016: contributor

    ACK

  24. laanwj merged this on Jun 16, 2016
  25. laanwj closed this on Jun 16, 2016

  26. laanwj referenced this in commit 9c3d0fab36 on Jun 16, 2016
  27. laanwj commented at 10:23 AM on June 16, 2016: member

    It looks like the build system doesn't detect changes to univalue source files and will not automatically rebuild the library: if you get RPC test failures concerning unicode, please build from a clean tree

  28. MarcoFalke commented at 10:26 AM on June 16, 2016: member
    make clean
    make distclean
    

    should take care of this, I assume.

  29. laanwj commented at 10:29 AM on June 16, 2016: member

    Yes, make clean should be enough.

  30. codablock referenced this in commit adfca782a0 on Sep 16, 2017
  31. codablock referenced this in commit d2d4dfda2e on Sep 19, 2017
  32. codablock referenced this in commit 297aa47f0b on Dec 27, 2017
  33. codablock referenced this in commit 79ad5f7682 on Dec 28, 2017
  34. andvgal referenced this in commit 25dd406c61 on Jan 6, 2019
  35. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me