refactor: Use our own implementation of urlDecode #29904

pull fjahr wants to merge 5 commits into bitcoin:master from fjahr:2024-04-libevent-urldecode changing 13 files +110 βˆ’31
  1. fjahr commented at 1:06 pm on April 18, 2024: contributor

    Fixes #29654 (as a side-effect)

    Removing dependencies is a general goal of the project and the xz backdoor has been an additional wake up call recently. Libevent shows many of the same symptoms, few maintainers and slow releases. While libevent can not be removed completely over night we should start removing it’s usage where it’s possible, ideally with the end goal to removing it completely.

    This is a pretty easy win in that direction. The evhttp_uridecode function from libevent we were using in urlDecode could be easily emulated in fewer LOC. This also ports the applicable test vectors over from libevent.

  2. DrahtBot commented at 1:06 pm on April 18, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK theStack, maflcko, stickies-v, achow101
    Concept NACK luke-jr
    Concept ACK TheCharlatan
    Approach ACK hebasto
    Stale ACK Sjors, laanwj

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Refactoring on Apr 18, 2024
  4. laanwj commented at 1:11 pm on April 18, 2024: member

    Concept ACK yes please… the hooking we do for URL_DECODE and UrlDecodeFn made me kind of scared, to be honest (in that context), i think can we get rid of that at the same time?

    This will also fix @luke-jr’s #29654.

  5. laanwj added the label RPC/REST/ZMQ on Apr 18, 2024
  6. TheCharlatan commented at 1:22 pm on April 18, 2024: contributor
    Concept ACK
  7. stickies-v commented at 1:54 pm on April 18, 2024: contributor
    Concept ACK
  8. hebasto commented at 3:18 pm on April 18, 2024: member
    Concept ACK.
  9. in src/test/common_url_tests.cpp:43 in 2447bc7774 outdated
    38+
    39+BOOST_AUTO_TEST_CASE(decode_internal_nuls_test) {
    40+    std::string result{"\0\0x\0\0", 5};
    41+    std::string encoded{"%00%00x%00%00"};
    42+    BOOST_CHECK_EQUAL(urlDecode(encoded).size(), 5);
    43+    BOOST_CHECK(urlDecode(encoded) == result);
    


    stickies-v commented at 4:12 pm on April 18, 2024:

    nit

    0    BOOST_CHECK_EQUAL(urlDecode(encoded), result);
    

    fjahr commented at 7:19 pm on April 18, 2024:
    done
  10. in src/common/url.cpp:11 in 2447bc7774 outdated
    10 #include <string>
    11+#include <stdexcept>
    12+#include <charconv>
    13 
    14-std::string urlDecode(const std::string &urlEncoded) {
    15+std::string urlDecode(const std::string& urlEncoded) {
    


    stickies-v commented at 4:29 pm on April 18, 2024:
    We didn’t use plus decoding with evhttp_uridecode, so this implementation keeps functionality the same. Since urlDecode is a generic utility function, I think there should be a docstring that highlights that this implementation doesn’t decode + to in case it’s used for other purposes in the future.

    fjahr commented at 7:19 pm on April 18, 2024:
    Added a short docstring.
  11. in src/common/url.cpp:28 in 2447bc7774 outdated
    31+            if (ec == std::errc()) {
    32+                res += static_cast<char>(decoded_value);
    33+                // Next two characters are part of the percent encoding
    34+                i += 2;
    35+            } else {
    36+                // In case of invalid percent encoding, add the '%' and continue
    


    stickies-v commented at 4:33 pm on April 18, 2024:
    What’s the rationale for continuing here? Should we accept invalid URLs? At first sight, I think I’d prefer having this return a util::Result and have the function fail in case of missing hex digits or invalid hex?

    fjahr commented at 6:44 pm on April 18, 2024:
    I also found this a bit strange but it’s the behavior of libevent and they even have an explicit test case for this (which failed when I wrote my initial implementation). I didn’t investigate this further yet but I think the safest option is to keep this a pure refactor for now.

    laanwj commented at 8:25 am on April 22, 2024:

    For better or worse, this silent accepting behavior for invalid hex values in URL decoding is really common. See for example Python:

    0>>> import urllib.parse
    1>>> urllib.parse.unquote('%12%%xx%0a')
    2'\x12%%xx\n'
    

    It’s the web philopsophy of “in case of doubt, just do something!” πŸ˜… . Anyhow, in this refactor it doesn’t make sense imo to add error handling here.

  12. stickies-v commented at 4:46 pm on April 18, 2024: contributor

    Approach ACK

    Since urlDecode no longer needs libevent, we can clean up the workaround introduced in #18504 too (edit, oh that’s basically what @laanwj said a few comments up):

      0diff --git a/src/bitcoin-cli.cpp b/src/bitcoin-cli.cpp
      1index 129deeec60..830171f63a 100644
      2--- a/src/bitcoin-cli.cpp
      3+++ b/src/bitcoin-cli.cpp
      4@@ -11,7 +11,6 @@
      5 #include <clientversion.h>
      6 #include <common/args.h>
      7 #include <common/system.h>
      8-#include <common/url.h>
      9 #include <compat/compat.h>
     10 #include <compat/stdin.h>
     11 #include <policy/feerate.h>
     12@@ -51,7 +50,6 @@
     13 using CliClock = std::chrono::system_clock;
     14 
     15 const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
     16-UrlDecodeFn* const URL_DECODE = urlDecode;
     17 
     18 static const char DEFAULT_RPCCONNECT[] = "127.0.0.1";
     19 static const int DEFAULT_HTTP_CLIENT_TIMEOUT=900;
     20diff --git a/src/bitcoin-wallet.cpp b/src/bitcoin-wallet.cpp
     21index d5dfbbec27..fe90958a5f 100644
     22--- a/src/bitcoin-wallet.cpp
     23+++ b/src/bitcoin-wallet.cpp
     24@@ -11,7 +11,6 @@
     25 #include <clientversion.h>
     26 #include <common/args.h>
     27 #include <common/system.h>
     28-#include <common/url.h>
     29 #include <compat/compat.h>
     30 #include <interfaces/init.h>
     31 #include <key.h>
     32@@ -28,7 +27,6 @@
     33 #include <tuple>
     34 
     35 const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
     36-UrlDecodeFn* const URL_DECODE = nullptr;
     37 
     38 static void SetupWalletToolArgs(ArgsManager& argsman)
     39 {
     40diff --git a/src/bitcoind.cpp b/src/bitcoind.cpp
     41index 4f0a816388..54796c5abb 100644
     42--- a/src/bitcoind.cpp
     43+++ b/src/bitcoind.cpp
     44@@ -12,7 +12,6 @@
     45 #include <common/args.h>
     46 #include <common/init.h>
     47 #include <common/system.h>
     48-#include <common/url.h>
     49 #include <compat/compat.h>
     50 #include <init.h>
     51 #include <interfaces/chain.h>
     52@@ -35,7 +34,6 @@
     53 using node::NodeContext;
     54 
     55 const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
     56-UrlDecodeFn* const URL_DECODE = urlDecode;
     57 
     58 #if HAVE_DECL_FORK
     59 
     60diff --git a/src/common/url.h b/src/common/url.h
     61index b16b8241af..0283f471bc 100644
     62--- a/src/common/url.h
     63+++ b/src/common/url.h
     64@@ -7,8 +7,6 @@
     65 
     66 #include <string>
     67 
     68-using UrlDecodeFn = std::string(const std::string& url_encoded);
     69-UrlDecodeFn urlDecode;
     70-extern UrlDecodeFn* const URL_DECODE;
     71+std::string urlDecode(const std::string& urlEncoded);
     72 
     73 #endif // BITCOIN_COMMON_URL_H
     74diff --git a/src/qt/main.cpp b/src/qt/main.cpp
     75index ded057dbfa..16befd99e8 100644
     76--- a/src/qt/main.cpp
     77+++ b/src/qt/main.cpp
     78@@ -4,7 +4,6 @@
     79 
     80 #include <qt/bitcoin.h>
     81 
     82-#include <common/url.h>
     83 #include <compat/compat.h>
     84 #include <util/translation.h>
     85 
     86@@ -17,7 +16,6 @@
     87 extern const std::function<std::string(const char*)> G_TRANSLATION_FUN = [](const char* psz) {
     88     return QCoreApplication::translate("bitcoin-core", psz).toStdString();
     89 };
     90-UrlDecodeFn* const URL_DECODE = urlDecode;
     91 
     92 const std::function<std::string()> G_TEST_GET_FULL_NAME{};
     93 
     94diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
     95index 2c18184261..38350b33cc 100644
     96--- a/src/test/util/setup_common.cpp
     97+++ b/src/test/util/setup_common.cpp
     98@@ -14,7 +14,6 @@
     99 #include <banman.h>
    100 #include <chainparams.h>
    101 #include <common/system.h>
    102-#include <common/url.h>
    103 #include <consensus/consensus.h>
    104 #include <consensus/params.h>
    105 #include <consensus/validation.h>
    106@@ -81,7 +80,6 @@ using node::RegenerateCommitments;
    107 using node::VerifyLoadedChainstate;
    108 
    109 const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
    110-UrlDecodeFn* const URL_DECODE = nullptr;
    111 
    112 /** Random context to get unique temp data dirs. Separate from g_insecure_rand_ctx, which can be seeded from a const env var */
    113 static FastRandomContext g_insecure_rand_ctx_temp_path;
    114diff --git a/src/wallet/rpc/util.cpp b/src/wallet/rpc/util.cpp
    115index 06ec7db1bc..4e93b0eac5 100644
    116--- a/src/wallet/rpc/util.cpp
    117+++ b/src/wallet/rpc/util.cpp
    118@@ -61,9 +61,9 @@ bool ParseIncludeWatchonly(const UniValue& include_watchonly, const CWallet& wal
    119 
    120 bool GetWalletNameFromJSONRPCRequest(const JSONRPCRequest& request, std::string& wallet_name)
    121 {
    122-    if (URL_DECODE && request.URI.substr(0, WALLET_ENDPOINT_BASE.size()) == WALLET_ENDPOINT_BASE) {
    123+    if (request.URI.substr(0, WALLET_ENDPOINT_BASE.size()) == WALLET_ENDPOINT_BASE) {
    124         // wallet endpoint was used
    125-        wallet_name = URL_DECODE(request.URI.substr(WALLET_ENDPOINT_BASE.size()));
    126+        wallet_name = urlDecode(request.URI.substr(WALLET_ENDPOINT_BASE.size()));
    127         return true;
    128     }
    129     return false;
    

    nit: Would then update the fn signature to std::string UrlDecode(const std::string& url_encoded)

  13. fjahr commented at 6:37 pm on April 18, 2024: contributor

    Addressed @laanwj ’s comment and removed the hooking.

    FWIW, I wondered for a moment why this is in common and not util but that was a conscious decision, see #26294.

    yes please… the hooking we do for URL_DECODE and UrlDecodeFn made me kind of scared, to be honest (in that context), i think can we get rid of that at the same time?

    Absolutely, I hadn’t even seen that yet. For reviewers that wonder as well, the point of this was to not have the wallet depend on libevent, see #18504.

    This will also fix @luke-jr’s #29654.

    Nice, I also missed that one!

  14. fjahr force-pushed on Apr 18, 2024
  15. fjahr commented at 7:20 pm on April 18, 2024: contributor
    Addressed @stickies-v ’s feedback as well, thanks a lot!
  16. fjahr force-pushed on Apr 18, 2024
  17. in src/Makefile.test.include:88 in f037e67fb7 outdated
    84@@ -85,6 +85,7 @@ BITCOIN_TESTS =\
    85   test/checkqueue_tests.cpp \
    86   test/coins_tests.cpp \
    87   test/coinstatsindex_tests.cpp \
    88+	test/common_url_tests.cpp \
    


    laanwj commented at 7:42 pm on April 18, 2024:
    Indentation isn’t right here

    fjahr commented at 7:45 pm on April 18, 2024:
    fixed
  18. fjahr force-pushed on Apr 18, 2024
  19. in src/common/url.cpp:21 in caa2040edc outdated
    25+        char c = urlEncoded[i];
    26+        // Special handling for percent which should be followed by two hex digits
    27+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    28+        if (c == '%' && i + 2 < urlEncoded.size()) {
    29+            int decoded_value = 0;
    30+            auto [ptr, ec] = std::from_chars(&urlEncoded[i + 1], &urlEncoded[i + 3], decoded_value, 16);
    


    Sjors commented at 8:36 am on April 19, 2024:
    caa2040edc46291b3f032c64018eda2f4b31df39 nit: auto [_, ec]

    fjahr commented at 5:13 pm on April 20, 2024:
    done
  20. in src/bitcoin-cli.cpp:54 in 68bf6c82fe outdated
    49@@ -51,7 +50,6 @@
    50 using CliClock = std::chrono::system_clock;
    51 
    52 const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
    53-UrlDecodeFn* const URL_DECODE = urlDecode;
    


    Sjors commented at 8:45 am on April 19, 2024:
    Was this doing anything, or did we forget to remove it earlier?

    fjahr commented at 5:13 pm on April 20, 2024:
    Not sure if I understand your question, do you mean this line in particular or the whole hooking thing? This line was added the with the rest of hooking code in #18504 and I don’t think anything was changed about that until now, so I don’t know where we might have forgotten to remove it.
  21. Sjors commented at 8:46 am on April 19, 2024: member

    ACK 68bf6c82fee917756b367db4e99a8f6bcb6476bf

    Link to the code as of v2.2.12 which we use in depends:

    I was also happy to see that test/fuzz/string.cpp already covers this function.

  22. DrahtBot requested review from stickies-v on Apr 19, 2024
  23. DrahtBot requested review from TheCharlatan on Apr 19, 2024
  24. DrahtBot requested review from hebasto on Apr 19, 2024
  25. DrahtBot requested review from laanwj on Apr 19, 2024
  26. in src/common/url.cpp:21 in 68bf6c82fe outdated
    23+
    24+    for (size_t i = 0; i < url_encoded.size(); ++i) {
    25+        char c = url_encoded[i];
    26+        // Special handling for percent which should be followed by two hex digits
    27+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    28+        if (c == '%' && i + 2 < url_encoded.size()) {
    


    l0rinc commented at 8:50 am on April 19, 2024:

    it seems to me that && i + 2 < url_encoded.size() part isn’t checked by the tests, consider adding a few more corner cases, e.g. this is what I tested it with:

     0BOOST_AUTO_TEST_CASE(decode_malformed_test) {
     1    BOOST_CHECK_EQUAL(urlDecode("%%xhello th+ere \xff"), "%%xhello th+ere \xff");
     2
     3    BOOST_CHECK_EQUAL(urlDecode("%"), "%");
     4    BOOST_CHECK_EQUAL(urlDecode("%%"), "%%");
     5    BOOST_CHECK_EQUAL(urlDecode("%%%"), "%%%");
     6    BOOST_CHECK_EQUAL(urlDecode("%%%%"), "%%%%");
     7
     8    BOOST_CHECK_EQUAL(urlDecode("+"), "+");
     9    BOOST_CHECK_EQUAL(urlDecode("++"), "++");
    10
    11    BOOST_CHECK_EQUAL(urlDecode("?"), "?");
    12    BOOST_CHECK_EQUAL(urlDecode("??"), "??");
    13
    14    BOOST_CHECK_EQUAL(urlDecode("%G1"), "%G1");
    15    BOOST_CHECK_EQUAL(urlDecode("%2"), "%2");
    16    BOOST_CHECK_EQUAL(urlDecode("%ZX"), "%ZX");
    17
    18    BOOST_CHECK_EQUAL(urlDecode("valid%20string%G1"), "valid string%G1");
    19    BOOST_CHECK_EQUAL(urlDecode("%20invalid%ZX"), " invalid%ZX");
    20    BOOST_CHECK_EQUAL(urlDecode("%20%G1%ZX"), " %G1%ZX");
    21}
    

    Since the original implementation seems to include a case for handling ? and + as well, I’ve added tests to make sure they’re kept.


    fjahr commented at 5:13 pm on April 20, 2024:
    Thanks, added.
  27. in src/common/url.cpp:21 in 68bf6c82fe outdated
    25+        char c = url_encoded[i];
    26+        // Special handling for percent which should be followed by two hex digits
    27+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    28+        if (c == '%' && i + 2 < url_encoded.size()) {
    29+            int decoded_value = 0;
    30+            auto [ptr, ec] = std::from_chars(&url_encoded[i + 1], &url_encoded[i + 3], decoded_value, 16);
    


    l0rinc commented at 9:06 am on April 19, 2024:

    ptr seems unused, we might as well inline it to:

    0std::from_chars(&url_encoded[i + 1], &url_encoded[i + 3], decoded_value, 16).ec == std::errc{})
    

    (note, I’ve used std::errc{} instead, seems we’re using that more often in the codebase)


    fjahr commented at 5:15 pm on April 20, 2024:
    I think this is less readable so I will keep it as is but I did take @Sjors ’s suggestion to change it to change ptr to _.
  28. in src/common/url.cpp:29 in 68bf6c82fe outdated
    32+                res += static_cast<char>(decoded_value);
    33+                // Next two characters are part of the percent encoding
    34+                i += 2;
    35+            } else {
    36+                // In case of invalid percent encoding, add the '%' and continue
    37+                res += c;
    


    l0rinc commented at 9:07 am on April 19, 2024:

    we could unify these two res += c cases by merging the two conditions:

     0std::string UrlDecode(const std::string& url_encoded) {
     1    std::string res;
     2    res.reserve(url_encoded.size());
     3
     4    for (size_t i = 0; i < url_encoded.size(); ++i) {
     5        int decoded_value = 0;
     6        if (url_encoded[i] == '%' &&
     7            i + 2 < url_encoded.size() &&
     8            std::from_chars(&url_encoded[i + 1], &url_encoded[i + 3], decoded_value, 16).ec == std::errc{}) {
     9            res += static_cast<char>(decoded_value);
    10            i += 2;
    11        } else {
    12            res += url_encoded[i];
    13        }
    14    }
    15
    16    return res;
    17}
    

    fjahr commented at 5:15 pm on April 20, 2024:
    Like above, I think this is a bit less readable so I will keep it as is.
  29. in src/test/common_url_tests.cpp:58 in 68bf6c82fe outdated
    34+
    35+BOOST_AUTO_TEST_CASE(decode_lowercase_hex_test) {
    36+    BOOST_CHECK_EQUAL(UrlDecode("%f0%a0%b0"), "\xf0\xa0\xb0");
    37+}
    38+
    39+BOOST_AUTO_TEST_CASE(decode_internal_nuls_test) {
    


    l0rinc commented at 9:23 am on April 19, 2024:
    is this passing for the old implementation? Isn’t this a change in behavior?

    fjahr commented at 5:24 pm on April 20, 2024:
    You’re right, good catch! I was focussing too much on the behavior inside of libevent and didn’t take into account that the result was cast to std::string afterwards, which lead to the behavior that a null character terminated the string. I have changed this to keep behavior consistent with the libevent version.

    l0rinc commented at 7:54 pm on April 21, 2024:
    Thanks for adding me as a co-author, if you force push again, please change the email to LΕ‘rinc <pap.lorinc@gmail.com>

    fjahr commented at 7:56 pm on April 21, 2024:
    The behavior change is now re-introduced but explicitly documented as such in a separate commit.

    fjahr commented at 8:09 pm on April 21, 2024:

    Thanks for adding me as a co-author, if you force push again, please change the email to LΕ‘rinc <pap.lorinc@gmail.com>

    fixed the email

  30. in src/test/common_url_tests.cpp:14 in 68bf6c82fe outdated
    10+#include <boost/test/unit_test.hpp>
    11+
    12+BOOST_AUTO_TEST_SUITE(common_url_tests)
    13+
    14+// These test vectors were ported from test/regress.c in the libevent library
    15+// which used to be a dependency of the UrlDecode function.
    


    l0rinc commented at 9:28 am on April 19, 2024:
    nit: it used to be a dependency of urlDecode, not UrlDecode, but I guess it’s fine

    fjahr commented at 5:16 pm on April 20, 2024:
    Either way, someone might get confused, but I think it’s better to keep it as is.
  31. l0rinc changes_requested
  32. fjahr force-pushed on Apr 20, 2024
  33. fjahr force-pushed on Apr 20, 2024
  34. DrahtBot added the label CI failed on Apr 20, 2024
  35. fjahr commented at 5:28 pm on April 20, 2024: contributor

    Addressed feedback by @paplorinc and @Sjors , thanks a lot!

    I also reorganized the commits a bit because they were doing more than one thing and it was hard to address feedback cleanly. The tests are now split from the implementation, which makes it easier to cherry-pick the unit test commit and run it against master. And I split off the renaming into a scripted diff.

  36. fjahr force-pushed on Apr 20, 2024
  37. fjahr force-pushed on Apr 20, 2024
  38. fjahr commented at 11:33 pm on April 20, 2024: contributor
    CI failure can be ignored, it was introduced by #29827 and should hopefully be fixed by #29926
  39. luke-jr commented at 2:59 am on April 21, 2024: member
    Concept NACK. Embedding copies of shared code just makes things more fragile, not safer. It’s far more likely there’s a bugfix that we miss picking up on, than that there’s malicious code inserted later.
  40. laanwj commented at 8:17 am on April 21, 2024: member

    It’s far more likely there’s a bugfix that we miss picking up on, than that there’s malicious code inserted later.

    Sometimes i’d agree with that reasoning. But let’s face it, urlDecode is not more complex than say, leftpad, it’s unlikely to need maintainance or bug fixes, and thus seems unnecessary to carry a dependency for.

    (Also this isn’t a direct copy of the C code! It’s a C++ reimplementation that does std::string to std::string, so should handle embedded NULL bytes without truncation)

  41. fjahr commented at 2:59 pm on April 21, 2024: contributor

    Embedding copies of shared code just makes things more fragile, not safer. It’s far more likely there’s a bugfix that we miss picking up on, than that there’s malicious code inserted later.

    In addition to what @laanwj said, it’s also a much simpler re-implementation. The libevent function provided further options that we didn’t care about and hardcoded, that’s how we could simplify. And as was discussed in the last IRC meeting, there is also interest in removing libevent in general, at which point this would be a necessary change anyway. We are also removing that hooking code which is a big improvement in itself. If you subtract the test code, we are almost removing as much code as we are adding.

    It’s a C++ reimplementation that does std::string to std::string, so should handle embedded NULL bytes without truncation

    So, to be clear, your opinion is for this part I should remove the patch and introduce a change of behavior, right?

  42. laanwj commented at 6:16 pm on April 21, 2024: member

    So, to be clear, your opinion is for this part I should #29904 (review) and introduce a change of behavior, right?

    Yes. There’s no use in emulating broken C string behavior. Edit: maybe for one commit when you want to prove both implementations are the same. But it can be removed quickly after that, imo.

  43. fjahr force-pushed on Apr 21, 2024
  44. l0rinc approved
  45. fjahr commented at 7:55 pm on April 21, 2024: contributor
    I have added the behavior change in a separate commit.
  46. in src/common/url.h:13 in 21ad006443 outdated
     6@@ -7,8 +7,10 @@
     7 
     8 #include <string>
     9 
    10-using UrlDecodeFn = std::string(const std::string& url_encoded);
    11-UrlDecodeFn urlDecode;
    12-extern UrlDecodeFn* const URL_DECODE;
    13+/* Decode a URL.
    14+ *
    15+ * Notably this implementation does not decode a '+' to a ' '.
    


    l0rinc commented at 7:58 pm on April 21, 2024:
    would it make sense to mention the & as well?

    fjahr commented at 8:06 pm on April 21, 2024:
    Are these commonly decoded as a space as well? I couldn’t find evidence for this but I didn’t spend much time on it.


    fjahr commented at 8:14 pm on April 21, 2024:
    Hm, the code you are linking to is part of how libevent handles the +. Or did you want to refer to ?? I think if it’s not very common for & to be decoded as space then we don’t need to mention it.

    l0rinc commented at 8:18 pm on April 21, 2024:
    ? seems to be a control char, If -1, when true we transform plus to space only after we've seen a ?. -1 is deprecated., no need to mention it, thanks for checking.
  47. fjahr force-pushed on Apr 21, 2024
  48. DrahtBot removed the label CI failed on Apr 21, 2024
  49. laanwj approved
  50. laanwj commented at 8:29 am on April 22, 2024: member
    Concept and code review ACK 175d57ba953cdc1da3a54ae0208dad813f7ea4cf
  51. DrahtBot requested review from Sjors on Apr 22, 2024
  52. hebasto commented at 8:40 am on April 22, 2024: member
  53. DrahtBot requested review from hebasto on Apr 22, 2024
  54. laanwj commented at 8:42 am on April 22, 2024: member
    No, it isn’t. Nor is piling on the EVENT_CFLAGS. Good catch @hebasto.
  55. fjahr force-pushed on Apr 22, 2024
  56. fjahr commented at 11:27 pm on April 22, 2024: contributor

    No, it isn’t. Nor is piling on the EVENT_CFLAGS.

    Removed both, thanks @hebasto and @laanwj !

  57. in src/Makefile.am:715 in b7a3f8e375 outdated
    711@@ -712,10 +712,7 @@ libbitcoin_common_a_SOURCES = \
    712   warnings.cpp \
    713   $(BITCOIN_CORE_H)
    714 
    715-if USE_LIBEVENT
    716-libbitcoin_common_a_CPPFLAGS += $(EVENT_CFLAGS)
    717 libbitcoin_common_a_SOURCES += common/url.cpp
    


    stickies-v commented at 10:36 am on April 23, 2024:

    nit: this should move up a bit:

     0diff --git a/src/Makefile.am b/src/Makefile.am
     1index 239113a056..7387eb1eaa 100644
     2--- a/src/Makefile.am
     3+++ b/src/Makefile.am
     4@@ -679,6 +679,7 @@ libbitcoin_common_a_SOURCES = \
     5   common/run_command.cpp \
     6   common/settings.cpp \
     7   common/system.cpp \
     8+  common/url.cpp \
     9   compressor.cpp \
    10   core_read.cpp \
    11   core_write.cpp \
    12@@ -711,8 +712,6 @@ libbitcoin_common_a_SOURCES = \
    13   script/solver.cpp \
    14   warnings.cpp \
    15   $(BITCOIN_CORE_H)
    16-
    17-libbitcoin_common_a_SOURCES += common/url.cpp
    18 #
    19 
    20 # util #
    

    fjahr commented at 1:59 pm on April 23, 2024:
    done
  58. in src/common/url.cpp:8 in b7a3f8e375 outdated
     3@@ -4,19 +4,33 @@
     4 
     5 #include <common/url.h>
     6 
     7-#include <event2/http.h>
     8-
     9-#include <cstdlib>
    10 #include <string>
    11+#include <stdexcept>
    


    stickies-v commented at 10:41 am on April 23, 2024:
    nit: std::errc is part of system_error

    stickies-v commented at 10:55 am on April 23, 2024:
    nit: includes not sorted

    fjahr commented at 1:59 pm on April 23, 2024:
    both done
  59. in src/test/common_url_tests.cpp:3 in b7a3f8e375 outdated
    0@@ -0,0 +1,65 @@
    1+// Copyright (c) 2024 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    


    hebasto commented at 10:43 am on April 23, 2024:

    nit: For new source files we can use a copyright header with HTTPS link and future-proof dating:

    0// Copyright (c) 2024-present The Bitcoin Core developers
    1// Distributed under the MIT software license, see the accompanying
    2// file COPYING or https://opensource.org/license/mit/.
    

    fjahr commented at 1:59 pm on April 23, 2024:
    done
  60. in src/test/common_url_tests.cpp:5 in b7a3f8e375 outdated
    0@@ -0,0 +1,65 @@
    1+// Copyright (c) 2024 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <test/util/setup_common.h>
    


    hebasto commented at 10:44 am on April 23, 2024:
    nit: It seems no symbols from that header are used in the source file.

    fjahr commented at 1:59 pm on April 23, 2024:
    done
  61. hebasto commented at 10:44 am on April 23, 2024: member
    Approach ACK b7a3f8e375b87e4f3ccfa7994905dfbf46a23bf2.
  62. in src/test/common_url_tests.cpp:58 in b7a3f8e375 outdated
    53+
    54+BOOST_AUTO_TEST_CASE(decode_lowercase_hex_test) {
    55+    BOOST_CHECK_EQUAL(UrlDecode("%f0%a0%b0"), "\xf0\xa0\xb0");
    56+}
    57+
    58+BOOST_AUTO_TEST_CASE(decode_internal_nuls_test) {
    


    stickies-v commented at 10:49 am on April 23, 2024:

    typo nit

    0BOOST_AUTO_TEST_CASE(decode_internal_nulls_test) {
    

    fjahr commented at 1:59 pm on April 23, 2024:
    done, I took this naming from libevent exactly fwiw but since we change the behavior and tests anyway, it’s not that relevant anymore to keep it consistent
  63. stickies-v approved
  64. stickies-v commented at 11:15 am on April 23, 2024: contributor

    ACK b7a3f8e375b87e4f3ccfa7994905dfbf46a23bf2

    Good to remove this unnecessary libevent dependency, and nice work on restructuring the commits etc. Easy to review PR. Left some nits and happy to quickly review if you decide to address them.


    Not essential for this PR, but I noticed we’re doing two unnecessary string copies (through std::string::substr()) in code touched by this PR:

     0diff --git a/src/common/url.cpp b/src/common/url.cpp
     1index cd2c38dc44..fddbd6ab15 100644
     2--- a/src/common/url.cpp
     3+++ b/src/common/url.cpp
     4@@ -5,10 +5,11 @@
     5 #include <common/url.h>
     6 
     7 #include <string>
     8+#include <string_view>
     9 #include <stdexcept>
    10 #include <charconv>
    11 
    12-std::string UrlDecode(const std::string& url_encoded) {
    13+std::string UrlDecode(std::string_view url_encoded) {
    14     std::string res;
    15     res.reserve(url_encoded.size());
    16 
    17diff --git a/src/common/url.h b/src/common/url.h
    18index c1df626142..203f41c70f 100644
    19--- a/src/common/url.h
    20+++ b/src/common/url.h
    21@@ -6,11 +6,12 @@
    22 #define BITCOIN_COMMON_URL_H
    23 
    24 #include <string>
    25+#include <string_view>
    26 
    27 /* Decode a URL.
    28  *
    29  * Notably this implementation does not decode a '+' to a ' '.
    30  */
    31-std::string UrlDecode(const std::string& url_encoded);
    32+std::string UrlDecode(std::string_view url_encoded);
    33 
    34 #endif // BITCOIN_COMMON_URL_H
    35diff --git a/src/wallet/rpc/util.cpp b/src/wallet/rpc/util.cpp
    36index 2e7b1449bc..f1fb9db3d2 100644
    37--- a/src/wallet/rpc/util.cpp
    38+++ b/src/wallet/rpc/util.cpp
    39@@ -14,6 +14,7 @@
    40 #include <univalue.h>
    41 
    42 #include <boost/date_time/posix_time/posix_time.hpp>
    43+#include <string_view>
    44 
    45 namespace wallet {
    46 static const std::string WALLET_ENDPOINT_BASE = "/wallet/";
    47@@ -61,9 +62,9 @@ bool ParseIncludeWatchonly(const UniValue& include_watchonly, const CWallet& wal
    48 
    49 bool GetWalletNameFromJSONRPCRequest(const JSONRPCRequest& request, std::string& wallet_name)
    50 {
    51-    if (request.URI.substr(0, WALLET_ENDPOINT_BASE.size()) == WALLET_ENDPOINT_BASE) {
    52+    if (request.URI.starts_with(WALLET_ENDPOINT_BASE)) {
    53         // wallet endpoint was used
    54-        wallet_name = UrlDecode(request.URI.substr(WALLET_ENDPOINT_BASE.size()));
    55+        wallet_name = UrlDecode(std::string_view{request.URI}.substr(WALLET_ENDPOINT_BASE.size()));
    56         return true;
    57     }
    58     return false;
    
  65. DrahtBot requested review from laanwj on Apr 23, 2024
  66. DrahtBot requested review from hebasto on Apr 23, 2024
  67. fjahr force-pushed on Apr 23, 2024
  68. fjahr commented at 2:00 pm on April 23, 2024: contributor
    Addressed latest feedback from @hebasto and @stickies-v , thank you!
  69. in src/Makefile.am:682 in c8779aa751 outdated
    678@@ -679,6 +679,7 @@ libbitcoin_common_a_SOURCES = \
    679   common/run_command.cpp \
    680   common/settings.cpp \
    681   common/system.cpp \
    682+	common/url.cpp \
    


    hebasto commented at 2:03 pm on April 23, 2024:
    Indentation?

    fjahr commented at 2:18 pm on April 23, 2024:
    should be fixed now, weird, I must have some plugin that inserts these tabs :/
  70. fjahr force-pushed on Apr 23, 2024
  71. fjahr force-pushed on Apr 23, 2024
  72. DrahtBot added the label CI failed on Apr 23, 2024
  73. DrahtBot commented at 3:37 pm on April 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24156835469

  74. in src/common/url.cpp:22 in 47c472df6a outdated
    26+        char c = url_encoded[i];
    27+        // Special handling for percent which should be followed by two hex digits
    28+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    29+        if (c == '%' && i + 2 < url_encoded.size()) {
    30+            int decoded_value{0};
    31+            auto [_, ec] = std::from_chars(&url_encoded[i + 1], &url_encoded[i + 3], decoded_value, 16);
    


    maflcko commented at 4:08 pm on April 23, 2024:

    I guess i+3 fails, because you access the memory past the end of the string(view) as a reference and then take a pointer to that.

    A possible fix could be to use iterators instead of memory-access. Something like this, but not sure if it compiles, or is the best solution.

    0            auto [_, ec] = std::from_chars(url_encoded.begin() + (i + 1), url_encoded.begin() + (i + 3), decoded_value, 16);
    

    fjahr commented at 4:09 pm on April 23, 2024:
    Yeah, I just pushed this.
  75. fjahr force-pushed on Apr 23, 2024
  76. in src/common/url.cpp:22 in 89ffa56b63 outdated
    26+        char c = url_encoded[i];
    27+        // Special handling for percent which should be followed by two hex digits
    28+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    29+        if (c == '%' && i + 2 < url_encoded.size()) {
    30+            int decoded_value{0};
    31+            auto [_, ec] = std::from_chars(url_encoded.data() + i + 1, url_encoded.data() + i + 1, decoded_value, 16);
    


    maflcko commented at 4:14 pm on April 23, 2024:
    0            auto [_, ec] = std::from_chars(url_encoded.data() + i + 1, url_encoded.data() + i + 3, decoded_value, 16);
    

    Probably a wrong push?


    fjahr commented at 4:17 pm on April 23, 2024:
    fixed, thanks
  77. hebasto added the label Needs backport to CMake on Apr 23, 2024
  78. fjahr force-pushed on Apr 23, 2024
  79. fjahr force-pushed on Apr 23, 2024
  80. theStack commented at 5:48 pm on April 23, 2024: contributor
    Concept ACK
  81. DrahtBot removed the label CI failed on Apr 23, 2024
  82. stickies-v approved
  83. stickies-v commented at 6:17 pm on April 23, 2024: contributor

    ACK 477c03b42825084ac344050d97ea206a82ba0eb6

    Apologies for my UB string_view code suggestion, I didn’t realize operator[pos] behaves differently for string_view vs string when pos==size(). What you pushed looks like an elegant fix while keeping the copying minimal, nice.

  84. DrahtBot requested review from theStack on Apr 23, 2024
  85. DrahtBot requested review from hebasto on Apr 23, 2024
  86. in configure.ac:1709 in 477c03b428 outdated
    1705@@ -1706,7 +1706,6 @@ AM_CONDITIONAL([ENABLE_QT_TESTS], [test "$BUILD_TEST_QT" = "yes"])
    1706 AM_CONDITIONAL([ENABLE_BENCH], [test "$use_bench" = "yes"])
    1707 AM_CONDITIONAL([USE_QRCODE], [test "$use_qr" = "yes"])
    1708 AM_CONDITIONAL([USE_LCOV], [test "$use_lcov" = "yes"])
    1709-AM_CONDITIONAL([USE_LIBEVENT], [test "$use_libevent" = "yes"])
    


    laanwj commented at 0:36 am on April 24, 2024:
    Even this can go? It’s fascinating to see how this little function was such a wedge issue and now we can drop half the build system πŸ˜„
  87. DrahtBot requested review from laanwj on Apr 24, 2024
  88. laanwj approved
  89. laanwj commented at 0:37 am on April 24, 2024: member
    Code review ACK 477c03b42825084ac344050d97ea206a82ba0eb6
  90. maflcko added the label DrahtBot Guix build requested on Apr 24, 2024
  91. maflcko added the label Wallet on Apr 24, 2024
  92. in src/test/common_url_tests.cpp:33 in 477c03b428 outdated
    28+}
    29+
    30+BOOST_AUTO_TEST_CASE(decode_malformed_test) {
    31+    BOOST_CHECK_EQUAL(UrlDecode("%%xhello th+ere \xff"), "%%xhello th+ere \xff");
    32+
    33+    BOOST_CHECK_EQUAL(UrlDecode("%"), "%");
    


    maflcko commented at 9:56 am on April 24, 2024:
    How important is it to preserve invalid encodings? For example the decode of %1 (percent, 1, space) differs between this implementation and the one of libevent.

    fjahr commented at 2:36 pm on April 24, 2024:
    Hm, ok, that’s another behavior change I didn’t anticipate. Let me first describe my understanding of happens: In the libevent version each of the two characters following the % are looked at individually if they are valid hex characters. The space is not so the if fails and the fallback case takes over, which just adds the % to res and then the next two chars are processed individually and since they are not special cases by themselves, they are also just added to res, so the result is %1 . In the current version here we look at both characters together with std::from_chars. The white space seems to be ignored, no error raised, and so the decoded_value is integer 1. Then the static_cast<char>(decoded_value) converts the 1 to a non-printable character and so the result visually appears to be an empty string. It seems to me now that it’s safer to check both digits individually if they are hex, so I have changed the code in this regard and added a few more tests with spaces. Let me know if you found other cases where the behavior is still changing.

    l0rinc commented at 10:07 am on April 25, 2024:

    There’s still one sneaky difference that I’ve found: %00. Previously this prematurely terminated the string, now it’s kept:

    0BOOST_CHECK_EQUAL(urlDecode("a%00b"), std::string("a\u0000b", 3));
    

    this failed previously, passes now.


    maflcko commented at 10:15 am on April 25, 2024:

    %00

    Have you seen 992c714451676cee33d3dff49f36329423270c1c ?


    l0rinc commented at 10:18 am on April 25, 2024:

    l0rinc commented at 10:20 am on April 25, 2024:
    Those test cases were added, I see, thanks. (i.e. 0 before decoding and 0 after decoding)

    maflcko commented at 10:20 am on April 25, 2024:
    This change is intentional, is what I am trying to say.
  93. fjahr force-pushed on Apr 24, 2024
  94. in src/common/url.cpp:34 in d4c7863a3f outdated
    43+            i += 2;
    44+        } else {
    45+            // In case of an invalid percent encoding, we add the '%' and
    46+            // continue just like in the base case
    47+            res += c;
    48         }
    


    maflcko commented at 2:45 pm on April 24, 2024:

    The hex checking is done inside of from_chars (locale independent), so you can re-use that. Something like this may work:

     0        if (c == '%' && i + 2 < url_encoded.size()  
     1                 
     2                ) {
     3            int decoded_value{0};
     4            auto [p, ec] = std::from_chars(url_encoded.data() + i + 1, url_encoded.data() + i + 3, decoded_value, 16);
     5
     6            if(ec == std::errc{} && p==url_encoded.data() + i + 3) { // hex
     7
     8            res += static_cast<char>(decoded_value);
     9            // Next two characters are part of the percent encoding
    10            i += 2;
    11            continue;
    12            }
    13        }  
    14        res += c;
    

    fjahr commented at 3:12 pm on April 24, 2024:
    Yepp, looks good, thanks!
  95. fjahr force-pushed on Apr 24, 2024
  96. DrahtBot added the label CI failed on Apr 24, 2024
  97. DrahtBot commented at 3:11 pm on April 24, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24207110287

  98. in src/common/url.cpp:23 in fcee97601f outdated
    27+    for (size_t i = 0; i < urlEncoded.size(); ++i) {
    28+        char c = urlEncoded[i];
    29+        // Special handling for percent which should be followed by two hex digits
    30+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    31+        if (c == '%' && i + 2 < urlEncoded.size()) {
    32+            int decoded_value{0};
    


    maflcko commented at 3:53 pm on April 24, 2024:
    This must be unsinged, otherwise a sign (-) is treated as valid hex

    maflcko commented at 3:54 pm on April 24, 2024:
    %-1

    fjahr commented at 8:40 pm on April 24, 2024:
    Changed and added a test for this case as well
  99. in src/common/url.cpp:24 in fcee97601f outdated
    28+        char c = urlEncoded[i];
    29+        // Special handling for percent which should be followed by two hex digits
    30+        // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    31+        if (c == '%' && i + 2 < urlEncoded.size()) {
    32+            int decoded_value{0};
    33+            auto [p, ec] = std::from_chars(urlEncoded.data() + i + 1, urlEncoded.data() + i + 3, decoded_value, 16);
    


    maflcko commented at 3:59 pm on April 24, 2024:

    style nit: The verbose member calls can be avoided by turning i into an iterator:

     0 std::string urlDecode(std::string_view urlEncoded) {
     1     std::string res;
     2     res.reserve(urlEncoded.size());
     3 
     4-    for (size_t i = 0; i < urlEncoded.size(); ++i) {
     5-        char c = urlEncoded[i];
     6+    for (auto i = urlEncoded.begin(); i < urlEncoded.end(); ++i) {
     7+        char c = *i;
     8         // Special handling for percent which should be followed by two hex digits
     9         // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    10-        if (c == '%' && i + 2 < urlEncoded.size()) {
    11-            int decoded_value{0};
    12-            auto [p, ec] = std::from_chars(urlEncoded.data() + i + 1, urlEncoded.data() + i + 3, decoded_value, 16);
    13+        if (c == '%' && i + 2 < urlEncoded.end()) {
    14+            unsigned int decoded_value{0};
    15+            auto [p, ec] = std::from_chars( i + 1,  i + 3, decoded_value, 16);
    16 
    17             // Only if there is no error and the pointer is set to the end of
    18             // the string, we can be sure both characters were valid hex
    19-            if(ec == std::errc{} && p == urlEncoded.data() + i + 3) {
    20+            if(ec == std::errc{} && p ==  i + 3) {
    21                 // A null character terminates the string
    22                 if (decoded_value == 0) {
    23                     return res;
    

    fjahr commented at 8:40 pm on April 24, 2024:
    neat, I applied this change

    fjahr commented at 9:20 pm on April 24, 2024:
    Ugh, looks like MSVC has a problem with this, I am reverting it for now (see https://github.com/bitcoin/bitcoin/actions/runs/8823024109/job/24222527982?pr=29904)

    fanquake commented at 2:24 am on April 25, 2024:
    Another MSVC bug for us to report upstream I guess. @hebasto.

    maflcko commented at 5:22 am on April 25, 2024:
    No, it is not a msvc bug. from_chars actually accepts a raw pointer, not an iterator, so my draft diff was wrong (type-wise)

    fjahr commented at 8:46 am on April 25, 2024:
    I didn’t see compiler errors locally and the other CI jobs also didn’t seem to fail on compilation. So I assumed clang does some implicit conversion at that point and MSVC doesn’t.

    maflcko commented at 8:56 am on April 25, 2024:
    Yes, IIRC a standard library is free to define an iterator as a pointer, or allow iterators to convert to pointers implicitly, but there is no requirement for it do so.
  100. in src/common/url.cpp:14 in fcee97601f outdated
    13+#include <string_view>
    14+#include <system_error>
    15+#include <util/strencodings.h>
    16 
    17-std::string urlDecode(const std::string &urlEncoded) {
    18+std::string urlDecode(std::string_view urlEncoded) {
    


    maflcko commented at 4:00 pm on April 24, 2024:
    style nit: snake_case for new code, and newline before {, according to clang-format

    fjahr commented at 8:39 pm on April 24, 2024:
    The snake case is already applied in the scripted diff but I applied the clang format suggestions

    maflcko commented at 5:24 am on April 25, 2024:
    As the function body is new code, I’d say the url_encoded can be done correctly when the code is added, but no big deal
  101. maflcko approved
  102. maflcko commented at 4:02 pm on April 24, 2024: member
    lgtm, but unsigned may be better
  103. DrahtBot commented at 5:59 pm on April 24, 2024: contributor

    Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

    File commit 19722e3e728933e629eb8e407075fb193b2908e2(master) commit 3db54d655deb2e57cafe9512f1319971d6bba463(master and this pull)
    SHA256SUMS.part dd1712328d25bcaf... 1243f9e62f4a358e...
    *-aarch64-linux-gnu-debug.tar.gz fbc72cad71f5f74b... fdc5596c5b2d1535...
    *-aarch64-linux-gnu.tar.gz 44b0f030a753f1e8... b2424b849fec0ec5...
    *-arm-linux-gnueabihf-debug.tar.gz f74c72fe6e94b042... 16279e80f5ec1d2d...
    *-arm-linux-gnueabihf.tar.gz ec25b11c12af5bd9... 2ed1a728ce6df93c...
    *-arm64-apple-darwin-unsigned.tar.gz 5737f85f0414f80b... 79deb92dd6ee804c...
    *-arm64-apple-darwin-unsigned.zip 89397807079e9e09... b624036147bc56d2...
    *-arm64-apple-darwin.tar.gz da23abaa516c746d... 9a9d2bedc06f7185...
    *-powerpc64-linux-gnu-debug.tar.gz 8e020a133752d536... 5dc045731387cb03...
    *-powerpc64-linux-gnu.tar.gz 51263f27eb622313... 9eab562db7f148a3...
    *-riscv64-linux-gnu-debug.tar.gz 4ffa90e40623f251... 2835f73da39e68d8...
    *-riscv64-linux-gnu.tar.gz ca585d1223f81eb6... 7d605eeef3f0ef43...
    *-x86_64-apple-darwin-unsigned.tar.gz 7c94bed0e764a97a... c4d8fd1eaf22f970...
    *-x86_64-apple-darwin-unsigned.zip 6dd1358fbb025b8c... f8a9538b001bf98e...
    *-x86_64-apple-darwin.tar.gz c8e2062db3a8e478... 6cf25aa80cdd02c1...
    *-x86_64-linux-gnu-debug.tar.gz b9f95fcb53d4a9b6... f1b24f30dd8b4f3e...
    *-x86_64-linux-gnu.tar.gz 993a1a976d07cacf... b8ea47e235056555...
    *.tar.gz 9f7d684e6dda4ef7... 4bae0d613217622d...
    guix_build.log 3f47151e14a51431... 2018dd41c187e857...
    guix_build.log.diff f1c9d3754d446530...
  104. DrahtBot removed the label DrahtBot Guix build requested on Apr 24, 2024
  105. DrahtBot removed the label CI failed on Apr 24, 2024
  106. test: Add unit tests for urlDecode
    Co-authored-by: LΕ‘rinc <pap.lorinc@gmail.com>
    46bc6c2aaa
  107. fjahr force-pushed on Apr 24, 2024
  108. fjahr commented at 8:41 pm on April 24, 2024: contributor

    unsigned may be better

    No idea what this means ;) but I addressed the feedback, thanks a lot!

  109. refactor: Replace libevent use in urlDecode with our own code 650d43ec15
  110. refactor: Remove hooking code for urlDecode
    The point of this was to be able to build bitcoin-tx and bitcoin-wallet without libevent, see #18504.
    
    Now that we use our own implementation of urlDecode this is not needed anymore.
    
    Co-authored-by: stickies-v <stickies-v@protonmail.com>
    8f39aaae41
  111. scripted-diff: Modernize name of urlDecode function and param
    -BEGIN VERIFY SCRIPT-
    sed -i 's/urlDecode/UrlDecode/g' $(git grep -l 'urlDecode' ./src)
    sed -i 's/urlEncoded/url_encoded/g' $(git grep -l 'urlEncoded' ./src)
    -END VERIFY SCRIPT-
    099fa57151
  112. common: Don't terminate on null character in UrlDecode
    The previous behavior was the result of casting the result returned from the libevent function evhttp_uridecode to std:string but this was probably not intended.
    992c714451
  113. fjahr force-pushed on Apr 24, 2024
  114. theStack approved
  115. theStack commented at 9:17 am on April 25, 2024: contributor
    Code-review ACK 992c714451676cee33d3dff49f36329423270c1c
  116. DrahtBot requested review from stickies-v on Apr 25, 2024
  117. DrahtBot requested review from laanwj on Apr 25, 2024
  118. in src/test/common_url_tests.cpp:52 in 992c714451
    47+
    48+    BOOST_CHECK_EQUAL(UrlDecode("valid%20string%G1"), "valid string%G1");
    49+    BOOST_CHECK_EQUAL(UrlDecode("%20invalid%ZX"), " invalid%ZX");
    50+    BOOST_CHECK_EQUAL(UrlDecode("%20%G1%ZX"), " %G1%ZX");
    51+
    52+    BOOST_CHECK_EQUAL(UrlDecode("%1 "), "%1 ");
    


    maflcko commented at 10:03 am on April 25, 2024:

    Some more tests:

    %%ffg %fg


    fjahr commented at 2:19 pm on April 25, 2024:
    thanks, I will add them when I have to retouch, if not I will make a small follow-up

    maflcko commented at 6:52 am on April 26, 2024:
  119. maflcko approved
  120. maflcko commented at 10:06 am on April 25, 2024: member

    Did some differential fuzzing, as this is an ideal fit for it.

    ACK 992c714451676cee33d3dff49f36329423270c1c πŸ‘ˆ

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: ACK 992c714451676cee33d3dff49f36329423270c1c πŸ‘ˆ
    3uoUAes8qa1RR+FiK2t8YFFjOnn0fR1ewcF3JZcxlbFYRLH099vwrEWYq5uCUNGpAtidxMgd/deCwvPSTZbvgDQ==
    
  121. l0rinc commented at 10:09 am on April 25, 2024: contributor

    I think we should document at least that we thought of the %00 corner case, e.g. via a test case or a change. (already added)

    I have verified that the behavior is otherwise the same before and after via:

     0std::string GenerateRandomString() {
     1    std::random_device rd;
     2    std::mt19937 gen(rd());
     3    std::uniform_int_distribution<> length_dis(1, 10);
     4    std::uniform_int_distribution<> char_dis(1, 255);
     5
     6    size_t length = length_dis(gen);
     7    std::string res;
     8    res.reserve(length);
     9
    10    for (size_t i = 0; i < length; ++i) {
    11        char c = static_cast<char>(char_dis(gen));
    12        res.push_back(c);
    13    }
    14
    15    return res;
    16}
    17
    18std::string urlDecode_old(const std::string &urlEncoded) {
    19    std::string res;
    20    if (!urlEncoded.empty()) {
    21        char *decoded = evhttp_uridecode(urlEncoded.c_str(), false, nullptr);
    22        if (decoded) {
    23            res = std::string(decoded);
    24            free(decoded);
    25        }
    26    }
    27    return res;
    28}
    29
    30BOOST_AUTO_TEST_CASE(randomized_url_decode_length_test) {
    31    for (int i = 0; i < 100'000'000; ++i) {
    32        std::string random_encoded = GenerateRandomString();
    33
    34        std::string result1 = urlDecode_old(random_encoded);
    35        std::string result2 = urlDecode(random_encoded);
    36
    37        if (result1 != result2) {
    38            std::cerr
    39                << "urlDecode discrepancy found, random_encoded = `" + random_encoded + "`, urlDecode_old = `" +
    40                   result1 + "`, urlDecode (new) = `" + result2 + "`" << std::endl;
    41        }
    42
    43        BOOST_CHECK_LE(result1.size(), random_encoded.size());
    44    }
    45}
    
  122. l0rinc approved
  123. stickies-v approved
  124. stickies-v commented at 2:17 pm on April 25, 2024: contributor

    ACK 992c714451676cee33d3dff49f36329423270c1c

    nit: definitely not worth more back-and-forth, but I think (untested) maflcko’s suggestion should work with MSVC when explicitly converting iterators to pointers:

     0diff --git a/src/common/url.cpp b/src/common/url.cpp
     1index ecf88d07ea..c901ecb5dc 100644
     2--- a/src/common/url.cpp
     3+++ b/src/common/url.cpp
     4@@ -14,25 +14,24 @@ std::string UrlDecode(std::string_view url_encoded)
     5     std::string res;
     6     res.reserve(url_encoded.size());
     7 
     8-    for (size_t i = 0; i < url_encoded.size(); ++i) {
     9-        char c = url_encoded[i];
    10+    for (auto it = url_encoded.begin(); it < url_encoded.end(); ++it) {
    11         // Special handling for percent which should be followed by two hex digits
    12         // representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
    13-        if (c == '%' && i + 2 < url_encoded.size()) {
    14+        if (*it == '%' && std::distance(it, url_encoded.end()) > 2) {
    15             unsigned int decoded_value{0};
    16-            auto [p, ec] = std::from_chars(url_encoded.data() + i + 1, url_encoded.data() + i + 3, decoded_value, 16);
    17+            auto [p, ec] = std::from_chars(&*(it + 1), &*(it + 3), decoded_value, 16);
    18 
    19             // Only if there is no error and the pointer is set to the end of
    20             // the string, we can be sure both characters were valid hex
    21-            if (ec == std::errc{} && p == url_encoded.data() + i + 3) {
    22+            if (ec == std::errc{} && p == &*(it + 3)) {
    23                 res += static_cast<char>(decoded_value);
    24                 // Next two characters are part of the percent encoding
    25-                i += 2;
    26+                it += 2;
    27                 continue;
    28             }
    29             // In case of invalid percent encoding, add the '%' and continue
    30         }
    31-        res += c;
    32+        res += *it;
    33     }
    34 
    35     return res;
    
  125. fjahr commented at 2:22 pm on April 25, 2024: contributor

    I think (untested) maflcko’s suggestion should work with MSVC when explicitly converting iterators to pointers

    Yes, it’s certainly possible and I briefly explored it myself, but it didn’t seem like much of an improvement over the current version anymore. And the current version had already received some review at that point, so I decided it would be better to just revert since it was a style-nit anyway.

  126. achow101 commented at 4:58 pm on April 25, 2024: member
    ACK 992c714451676cee33d3dff49f36329423270c1c
  127. DrahtBot requested review from achow101 on Apr 25, 2024
  128. ryanofsky merged this on Apr 25, 2024
  129. ryanofsky closed this on Apr 25, 2024

  130. ryanofsky commented at 5:08 pm on April 25, 2024: contributor
    I went ahead and merged this, despite NACK from luke #29904 (comment) about costs of forking shared code. It seems like the NACK was adequately responded to, with responses acknowledging the tradeoff and giving reasons why it seemed justified in this case (even fixing a bug luke reported #29654)
  131. fanquake referenced this in commit 7973a67091 on Apr 26, 2024
  132. hebasto referenced this in commit 97a4ad5713 on Apr 26, 2024
  133. fanquake referenced this in commit 7fee0ca014 on Apr 27, 2024
  134. hebasto commented at 1:28 pm on May 1, 2024: member
    Ported to the CMake-based build system in https://github.com/hebasto/bitcoin/pull/181.
  135. hebasto removed the label Needs CMake port on May 1, 2024
  136. hebasto referenced this in commit 6e4f2e07b2 on May 3, 2024
  137. janus referenced this in commit d23351fa91 on Jun 6, 2024
  138. in src/common/url.cpp:33 in 992c714451
    37+                res += static_cast<char>(decoded_value);
    38+                // Next two characters are part of the percent encoding
    39+                i += 2;
    40+                continue;
    41+            }
    42+            // In case of invalid percent encoding, add the '%' and continue
    


    pinheadmz commented at 8:58 pm on December 15, 2024:

    I’m curious about the discrepancy between this and #27468. They relate to libevent decode uri and parse uri methods respectively and the latter is more sensitive to invalid % sequences (ie that aren’t exactly two hexadecimal characters).

    I believe it’s an example of lenient acceptance from users but strict sending to other machines. However in both of our use cases (wallet init args, and the REST interface) strict RFC3689 adherence doesn’t seem necessary.

    As I’m reimplementing HTTP without libevent it seems like this is better way to go, and relax the test added by #27468.

    Thoughts ?

  139. Guy790 commented at 9:09 pm on December 15, 2024: none
    Duplicate of #

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me