refactor: Preallocate result in TryParseHex to avoid resizing #29458

pull paplorinc wants to merge 2 commits into bitcoin:master from paplorinc:paplorinc/optimize-hex-parsing changing 3 files +39 −0
  1. paplorinc commented at 3:07 pm on February 20, 2024: none

    This pull request introduces optimizations to the TryParseHex function, focusing primarily on the ideal case (valid hexadecimal input without spaces). A new benchmark, HexParse was introduced in a separate commit.

    The main optimization preallocates the result vector based on the input string’s length. This aims to completely avoid costly dynamic reallocations when no spaces are present.


    Before:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                1.60 |      623,238,893.11 |    0.3% |      0.01 | `HexParse`
    3|                1.65 |      606,747,566.34 |    0.6% |      0.01 | `HexParse`
    4|                1.60 |      626,149,544.07 |    0.3% |      0.01 | `HexParse`
    

    After:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.68 |    1,465,555,976.27 |    0.8% |      0.01 | `HexParse`
    3|                0.68 |    1,472,962,920.18 |    0.3% |      0.01 | `HexParse`
    4|                0.68 |    1,476,159,423.00 |    0.3% |      0.01 | `HexParse`
    
  2. DrahtBot commented at 3:07 pm on February 20, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hebasto, andrewtoth, Empact, achow101
    Stale ACK maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. paplorinc renamed this:
    benchmark/speedup: optimize TryParseHex calls considerably
    optimization: Speed up TryParseHex by 300%
    on Feb 20, 2024
  4. paplorinc marked this as ready for review on Feb 20, 2024
  5. in src/util/strencodings.cpp:86 in a37a5b11bf outdated
    80@@ -81,16 +81,21 @@ template <typename Byte>
    81 std::optional<std::vector<Byte>> TryParseHex(std::string_view str)
    82 {
    83     std::vector<Byte> vch;
    84-    auto it = str.begin();
    85-    while (it != str.end()) {
    86-        if (IsSpace(*it)) {
    


    paplorinc commented at 5:55 pm on February 20, 2024:
    we don’t expect most hex strings to contain spaces so let’s fall back to this when we can’t parse the string instead

    maflcko commented at 4:48 pm on February 23, 2024:
    Is it really worth it to change this? preallocation seems fine, given that it buys most of the speedup. However, the other reordering didn’t change performance much, did it? Also, with modern compilers I’d be surprised if this made a large difference in any case?

    paplorinc commented at 5:10 pm on February 23, 2024:

    Valid question, that’s why I put this change in a separate commit with before/after measurements, resulting in an additional 30% speedup on my machine: Before:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.69 |    1,445,181,075.98 |    0.5% |      0.01 | `TryParseHexBenchmark`
    3|                0.68 |    1,459,941,407.03 |    0.2% |      0.01 | `TryParseHexBenchmark`
    4|                0.69 |    1,457,349,650.60 |    1.1% |      0.01 | `TryParseHexBenchmark`
    

    After:

    0|           ns/base16 |            base16/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.54 |    1,866,216,380.08 |    0.9% |      0.01 | `TryParseHexBenchmark`
    3|                0.53 |    1,871,755,872.12 |    0.5% |      0.01 | `TryParseHexBenchmark`
    4|                0.53 |    1,875,106,332.14 |    0.1% |      0.01 | `TryParseHexBenchmark`
    

    andrewtoth commented at 5:16 pm on February 23, 2024:

    Since I’m here, tried this as well. On my machine the speedup is not so severe, but still measurable:

    Preallocation only (e467437dea8b78e87ba16af306304c7740d5815c)

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    0.84 1,186,108,418.00 0.3% 15.21 1.86 8.172 3.37 0.0% 0.53 TryParseHexBenchmark

    Skipping whitespace check as well (a37a5b11bfba8e8cd2c5f9e940770b6b844ecedc)

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    0.76 1,320,747,604.66 0.1% 14.24 1.67 8.521 2.87 0.0% 0.54 TryParseHexBenchmark

    andrewtoth commented at 4:01 pm on February 25, 2024:
    Hmm thinking about this more, maybe saving 0.08ns is not worth this churn in the codebase. I think I agree with @maflcko that this isn’t necessary and preallocation on its own is a simpler change with bigger benefit.

    paplorinc commented at 4:38 pm on February 25, 2024:
    Thanks guys, the code is indeed cleaner without this, I’ve rebased the PR, please check it again

    paplorinc commented at 10:23 pm on March 9, 2024:
    Since we’ve left the IsSpace calls here (which were checking the ASCII whitespaces one-by-one for each converted character), I’ve optimized it in a separate PR to short circuit for non-whitespace cases, see: https://github.com/bitcoin/bitcoin/pull/29602
  6. in src/util/strencodings.cpp:83 in a37a5b11bf outdated
    80@@ -81,16 +81,21 @@ template <typename Byte>
    81 std::optional<std::vector<Byte>> TryParseHex(std::string_view str)
    82 {
    83     std::vector<Byte> vch;
    


    paplorinc commented at 5:58 pm on February 20, 2024:
    Ideally (when no spaces present) two characters will form a single byte - and when spaces are present, it will be less, so we can avoid reallocation by reserving str.size() / 2

    Empact commented at 5:40 pm on February 27, 2024:
    nit: may be unnecessary / obvious, but an inline comment saying // two hex characters form a single byte will be more helpful to readers of the code than the comment you’ve added above.

    paplorinc commented at 5:24 pm on February 28, 2024:
    Done, thanks
  7. in src/util/strencodings.cpp:87 in a37a5b11bf outdated
    92-        auto c2 = HexDigit(*(it++));
    93-        if (c1 < 0 || c2 < 0) return std::nullopt;
    94+    vch.reserve(str.size() / 2);
    95+
    96+    for (auto it = str.begin(); it != str.end(); ++it) {
    97+        auto c1 = HexDigit(*it);
    


    andrewtoth commented at 4:35 pm on February 23, 2024:
    nit: const auto c1{HexDigit(*it)};

    paplorinc commented at 5:18 pm on February 23, 2024:
    Thanks, done
  8. in src/util/strencodings.cpp:96 in a37a5b11bf outdated
    101+            } else {
    102+                return std::nullopt;
    103+            }
    104+        } else if (++it == str.end()) return std::nullopt;
    105+
    106+        auto c2 = HexDigit(*it);
    


    andrewtoth commented at 4:35 pm on February 23, 2024:
    nit: const auto c2{HexDigit(*it)};
  9. in src/bench/parse_hex.cpp:1 in a37a5b11bf outdated
    0@@ -0,0 +1,33 @@
    1+// Copyright (c) 2009-2024 The Bitcoin Core developers
    


    andrewtoth commented at 4:35 pm on February 23, 2024:
    0// Copyright (c) 2024- The Bitcoin Core developers
    
  10. andrewtoth approved
  11. andrewtoth commented at 4:38 pm on February 23, 2024: contributor

    ACK a37a5b11bfba8e8cd2c5f9e940770b6b844ecedc

    Bench before:

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    1.88 532,130,184.93 1.4% 29.43 4.13 7.118 6.40 0.0% 0.53 TryParseHexBenchmark

    After:

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    0.82 1,226,337,041.45 2.5% 14.24 1.80 7.931 2.87 0.0% 0.54 TryParseHexBenchmark
  12. in src/util/strencodings.cpp:94 in c4cbcbd13a outdated
     99+            if (IsSpace(*it)) {
    100+                continue;
    101+            } else {
    102+                return std::nullopt;
    103+            }
    104+        } else if (++it == str.end()) return std::nullopt;
    


    paplorinc commented at 5:26 pm on February 23, 2024:
    The style is a bit all over the place (I tried to adhere to the previous style), should I add braces everywhere in this method?

    andrewtoth commented at 7:12 pm on February 23, 2024:

    It took me a minute to figure out where it was being incremented when you put it in the else if. Not a big deal but maybe add a break here and move this check above where we define c2 so it’s more clear that is where we are moving it forward?

    0        }
    1        
    2        if (++it == str.end()) return std::nullopt;
    

    andrewtoth commented at 7:13 pm on February 23, 2024:
    I don’t think you need braces for a return early though.

    paplorinc commented at 7:19 pm on February 23, 2024:
    Rebased the last commit to keep the history clean. Thanks for taking the time to review this PR thoroughly, appreciate it.
  13. in src/util/strencodings.cpp:2 in c4cbcbd13a outdated
    0@@ -1,5 +1,5 @@
    1 // Copyright (c) 2009-2010 Satoshi Nakamoto
    2-// Copyright (c) 2009-2022 The Bitcoin Core developers
    3+// Copyright (c) 2024- The Bitcoin Core developers
    


    andrewtoth commented at 7:15 pm on February 23, 2024:
    I think you changed this copyright notice in the wrong file. I don’t think this one should be changed, only the one in the new file parse_hex.cpp.

    paplorinc commented at 7:18 pm on February 23, 2024:
    Got it, fixed, thanks
  14. paplorinc force-pushed on Feb 23, 2024
  15. paplorinc force-pushed on Feb 23, 2024
  16. DrahtBot added the label CI failed on Feb 23, 2024
  17. DrahtBot commented at 7:22 pm on February 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/21921664428

  18. paplorinc force-pushed on Feb 23, 2024
  19. in src/test/span_tests.cpp:56 in ad4021b10c outdated
    52@@ -53,7 +53,7 @@ BOOST_AUTO_TEST_SUITE(span_tests)
    53 // aren't compatible with Spans at compile time.
    54 //
    55 // Previously there was a bug where writing a SFINAE check for vector<bool> was
    56-// not possible, because in libstdc++ vector<bool> has a data() memeber
    57+// not possible, because in libstdc++ vector<bool> has a data() member
    


    paplorinc commented at 7:28 pm on February 23, 2024:
    completely unrelated, but got this warning from the CI

    andrewtoth commented at 8:37 pm on February 23, 2024:
    Hmm not sure we should have these changes in an unrelated PR. The CI failure was for a whitespace issue in the hex parsing code.

    paplorinc commented at 10:03 pm on February 23, 2024:
  20. DrahtBot removed the label CI failed on Feb 23, 2024
  21. paplorinc force-pushed on Feb 23, 2024
  22. paplorinc requested review from maflcko on Feb 24, 2024
  23. paplorinc requested review from andrewtoth on Feb 24, 2024
  24. andrewtoth commented at 4:02 pm on February 25, 2024: contributor
    Your last commit contains changes that should instead should be applied to the first and third commit and then rebased.
  25. paplorinc force-pushed on Feb 25, 2024
  26. paplorinc renamed this:
    optimization: Speed up TryParseHex by 300%
    optimization: Speed up TryParseHex by 200%
    on Feb 25, 2024
  27. in src/util/strencodings.cpp:84 in 589d0a4958 outdated
    80@@ -81,16 +81,19 @@ template <typename Byte>
    81 std::optional<std::vector<Byte>> TryParseHex(std::string_view str)
    82 {
    83     std::vector<Byte> vch;
    84-    auto it = str.begin();
    85-    while (it != str.end()) {
    86+    vch.reserve(str.size() / 2);
    


    andrewtoth commented at 4:47 pm on February 25, 2024:
    I think for the cleanest diff the second commit could only have just this one line, no?

    paplorinc commented at 5:00 pm on February 25, 2024:
    Done, thanks
  28. paplorinc force-pushed on Feb 25, 2024
  29. paplorinc force-pushed on Feb 25, 2024
  30. fanquake referenced this in commit 19b7f2b908 on Feb 26, 2024
  31. Empact commented at 5:23 am on February 27, 2024: member
    Concept ACK - I think you could do without the 3rd commit.
  32. paplorinc commented at 10:16 am on February 27, 2024: none
    Thanks for the review @Empact. We could do without, that’s why it’s in a separate commit. It’s very low risk however and makes the code more readable. If you think maintainability is risky, I can add more tests - what do you think?
  33. maflcko commented at 10:47 am on February 27, 2024: member

    It’s very low risk however and makes the code more readable.

    Why is that? According to https://corecheck.dev/bitcoin/bitcoin/pulls/29458 this commit is adding a bunch of sonarcloud warnings. I am not saying that those are accurate, but that “readability” is subjective. The general guideline on style changes applies:

    Thank you for your contribution. While this stylistic change makes sense on its own, it comes at a cost and risk for the project as a whole. The weak motivation for the change does not justify the burden that it places on the project. A burden could be any of the following:

    • Time spent on review
    • Accidental introduction of bugs
    • (Silent) merge conflicts, either in the branch or a backport branch. Those conflicts demand further developer and reviewer time or introduce bugs.

    For more information about refactoring changes and stylistic cleanup, see

    Generally, if the style is not mentioned nor enforced by the developer notes, we leave it up to the original author to pick whatever fits them best personally and then leave it that way until the line is touched for other reasons.

    Leave a comment below, if you have any questions.

  34. paplorinc force-pushed on Feb 27, 2024
  35. paplorinc commented at 11:03 am on February 27, 2024: none
    I have received contradictory comments, so I’ve removed the last commit to simplify this change by avoiding the resize, hoping this satisfies everyone.
  36. maflcko commented at 11:07 am on February 27, 2024: member

    The speedup depends on the input and stdlib, as well as hardware. I think it would be clearer to change the title to the commit title:

    0refactor: Preallocate result in TryParseHex to avoid resizing
    
  37. paplorinc renamed this:
    optimization: Speed up TryParseHex by 200%
    refactor: Preallocate result in TryParseHex to avoid resizing
    on Feb 27, 2024
  38. DrahtBot added the label Refactoring on Feb 27, 2024
  39. Empact commented at 5:39 pm on February 27, 2024: member

    We could do without, that’s why it’s in a separate commit. It’s very low risk however and makes the code more readable. If you think maintainability is risky, I can add more tests - what do you think? @paplorinc note that Bitcoin Core is under different constraints than a typical software project. We are review-constrained and change-skeptical in part because we have to be on guard against adversarial changes and also because we are appreciative of the importance of the codebase to the Bitcoin ecosystem.

    That said, refactoring and cleanup are important to the long-term health of the project. One way to be respectful of review time is to make focused, well-motivated pull requests, that do not mix various motivations in the same PR.

    See also: https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#refactoring

  40. Empact approved
  41. DrahtBot requested review from andrewtoth on Feb 27, 2024
  42. andrewtoth approved
  43. andrewtoth commented at 7:09 pm on February 27, 2024: contributor

    ACK 0d127ca5ebedb5a1eee7b183be06545d05236090

    Before:

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    1.76 567,923,390.61 0.6% 29.43 3.88 7.577 6.40 0.0% 10.95 HexParse

    After:

    ns/base16 base16/s err% ins/base16 cyc/base16 IPC bra/base16 miss% total benchmark
    0.81 1,234,838,738.87 0.5% 15.21 1.78 8.534 3.37 0.0% 10.84 HexParse
  44. maflcko commented at 7:29 pm on February 27, 2024: member
    lgtm ACK 0d127ca5ebedb5a1eee7b183be06545d05236090
  45. DrahtBot removed review request from maflcko on Feb 27, 2024
  46. in src/bench/parse_hex.cpp:9 in 0d127ca5eb outdated
    0@@ -0,0 +1,33 @@
    1+// Copyright (c) 2024- The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <bench/bench.h>
    6+#include <util/strencodings.h>
    7+#include <random.h>
    8+#include <cassert>
    


    hebasto commented at 5:14 pm on February 28, 2024:

    nit: https://api.cirrus-ci.com/v1/task/5912720797597696/logs/ci.log:

     0bench/parse_hex.cpp should add these lines:
     1#include <stddef.h>             // for size_t
     2#include <optional>             // for operator==, nullopt, optional
     3#include <vector>               // for vector
     4
     5bench/parse_hex.cpp should remove these lines:
     6
     7The full include-list for bench/parse_hex.cpp:
     8#include <bench/bench.h>        // for Bench, doNotOptimizeAway, BENCHMARK, PriorityLevel
     9#include <random.h>             // for FastRandomContext
    10#include <stddef.h>             // for size_t
    11#include <util/strencodings.h>  // for allocator, string, TryParseHex, char_traits
    12#include <cassert>              // for assert
    13#include <optional>             // for operator==, nullopt, optional
    14#include <vector>               // for vector
    15---
    

    paplorinc commented at 5:24 pm on February 28, 2024:
    Done, thanks
  47. hebasto commented at 5:14 pm on February 28, 2024: member

    Concept ACK 0d127ca5ebedb5a1eee7b183be06545d05236090.

    Observing ~x2 speed up on my machine.

  48. Add benchmark for TryParseHex
    Running `make && ./src/bench/bench_bitcoin -filter=HexParse` a few times results in:
    ```
    |           ns/base16 |            base16/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                1.60 |      623,238,893.11 |    0.3% |      0.01 | `HexParse`
    |                1.65 |      606,747,566.34 |    0.6% |      0.01 | `HexParse`
    |                1.60 |      626,149,544.07 |    0.3% |      0.01 | `HexParse`
    ```
    b7489ecb52
  49. Preallocate result in `TryParseHex` to avoid resizing
    Running `make && ./src/bench/bench_bitcoin -filter=HexParse` a few times results in:
    ```
    |           ns/base16 |            base16/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                0.68 |    1,465,555,976.27 |    0.8% |      0.01 | `HexParse`
    |                0.68 |    1,472,962,920.18 |    0.3% |      0.01 | `HexParse`
    |                0.68 |    1,476,159,423.00 |    0.3% |      0.01 | `HexParse`
    ```
    a19235c14b
  50. paplorinc force-pushed on Feb 28, 2024
  51. hebasto approved
  52. hebasto commented at 5:26 pm on February 28, 2024: member
    ACK a19235c14b3dc02de30b5d769de29d1752c23dbd.
  53. DrahtBot requested review from maflcko on Feb 28, 2024
  54. DrahtBot requested review from andrewtoth on Feb 28, 2024
  55. DrahtBot requested review from Empact on Feb 28, 2024
  56. andrewtoth approved
  57. andrewtoth commented at 6:30 pm on February 28, 2024: contributor
    Re-ACK a19235c14b3dc02de30b5d769de29d1752c23dbd
  58. DrahtBot removed review request from Empact on Feb 29, 2024
  59. Empact approved
  60. achow101 commented at 1:33 pm on March 11, 2024: member
    ACK a19235c14b3dc02de30b5d769de29d1752c23dbd
  61. achow101 merged this on Mar 11, 2024
  62. achow101 closed this on Mar 11, 2024

  63. paplorinc deleted the branch on Mar 11, 2024
  64. achow101 referenced this in commit c38157b9b9 on Mar 13, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me