util: faster HexStr => 13% faster blockToJSON #21173

pull martinus wants to merge 1 commits into bitcoin:master from martinus:2021-02-faster-hexstr changing 1 files +6 −5
  1. martinus commented at 11:18 AM on February 14, 2021: contributor

    std::string's push_back is rather slow because it needs to check & update the string size. For HexStr the output string size is already easily know, so we can initially create the string with the correct size and then just assign the data.

    HexStr is heavily usd in blockToJSON, so this change is a noticeable benefit. Benchmark on an i7-8700 @3.2GHz:

    • 71,315,461.00 ns/op master
    • 62,842,490.00 ns/op this commit

    So this little change makes blockToJSON about ~13% faster.

  2. fanquake added the label Utils/log/libs on Feb 14, 2021
  3. theStack approved
  4. theStack commented at 1:12 AM on February 15, 2021: member

    ACK 987192d5fc2e1b5f390c909e9798ed13362afe47 :speedboat:

    Nice idea! I have observed a ~14% speedup on my machine.

    master branch: $ src/bench/bench_bitcoin -filter=BlockToJsonVerbose

    | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 125,727,939.00 | 7.95 | 2.9% | 580,539,338.00 | 263,182,728.00 | 2.206 | 104,811,890.00 | 0.5% | 1.39 | BlockToJsonVerbose

    PR branch: $ src/bench/bench_bitcoin -filter=BlockToJsonVerbose

    | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 110,007,254.00 | 9.09 | 1.7% | 505,937,562.00 | 232,096,584.00 | 2.180 | 92,680,595.00 | 0.5% | 1.33 | BlockToJsonVerbose

  5. practicalswift commented at 10:52 AM on February 15, 2021: contributor

    One nice property of the current version in master is that it doesn't use pointer arithmetic.

    From the C++ Core Guidelines:

    Pointer arithmetic is fragile and easy to get wrong, the source of many, many bad bugs and security violations.

  6. MarcoFalke commented at 11:08 AM on February 15, 2021: member

    The compiler should be able to produce the same binary with the pointer arithmetic replaced by std::string::operator[](unsinged), no?

  7. martinus commented at 11:49 AM on February 15, 2021: contributor

    Ah, I didn't know there was no pointer arithmetic in master! I've played with a few variants:

    Variant 1

    // 63,621,136.00 ns/op
    auto it = rv.begin();
    for (uint8_t v: s) {
        *it++ = hexmap[v >> 4];
        *it++ = hexmap[v & 15];
    }
    assert(it == rv.end());
    

    This has basically has the same speed for me. it is basically a pointer in disguise, so I'm not sure anything is gained here. Having a simple assert after the loop is nice though.

    Variant 2

    Or this, but it's a bit slower and not better in my opinion:

    // 65,479,162.00 ns/op
    size_t i = 0;
    for (uint8_t v: s) {
        rv[i++] = hexmap[v >> 4];
        rv[i++] = hexmap[v & 15];
    }
    assert(i == rv.size());
    

    Variant 3

    Or another variant:

    // 65,187,171.00 ns/op
    for (size_t i = 0; i < rv.size(); i += 2) {
        auto v = s[i / 2];
        rv[i] = hexmap[v >> 4];
        rv[i + 1] = hexmap[v & 15];
    };
    
  8. in src/util/strencodings.cpp:585 in 987192d5fc outdated
     578 | @@ -579,13 +579,13 @@ std::string Capitalize(std::string str)
     579 |  
     580 |  std::string HexStr(const Span<const uint8_t> s)
     581 |  {
     582 | -    std::string rv;
     583 | +    std::string rv(s.size() * 2, '\0');
     584 |      static constexpr char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7',
     585 |                                           '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
     586 | -    rv.reserve(s.size() * 2);
     587 | +    auto* out = rv.data();
    


    laanwj commented at 5:58 PM on February 15, 2021:

    Is it allowed to write to a std::string's backing memory? I vaguely remember it was not, but maybe that was pre C++11.


    martinus commented at 7:39 PM on February 15, 2021:

    Before C++11 it was not guaranteed by the standard that std::string's memory is contiguous, but as far as I know all implementations did that anyways. It is certainly allowed since C++11

  9. laanwj commented at 5:59 PM on February 15, 2021: member

    I'm surprised this is so much faster. I mean, theoretically, reserve should effectively do this right?

    Edit: If I have to chose I think I like Variant 1 above most. Yes, an iterator is basically pointer arithmetic, but it looks cleaner than the pointer to data(), and "proves" it is allowed behavior.

  10. martinus commented at 7:58 PM on February 15, 2021: contributor

    push_back is actually really slow, even when reserve already prepared the capacity. For each call it needs to check capacity, increase the size, set the data, and maybe also a check if it is small-string optimization. With an pointer / iterator this is much cheaper.

    See the generated assembler code here: https://godbolt.org/z/zv1aP6 For HexStr_variant1 the compiler can generate a very tight loop, only a single branch, and it looks to me like clang++ even unrolls it. The push_back variant produces a lot more code

  11. laanwj commented at 8:55 PM on February 15, 2021: member

    push_back is actually really slow, even when reserve already prepared the capacity.

    Thanks. Yes I believe you, it's just counter-intuitive to me. So much for zero-cost abstraction. Oh wait, that was rust.

  12. theStack commented at 10:56 PM on February 15, 2021: member

    One nice property of the current version in master is that it doesn't use pointer arithmetic.

    Good point, I didn't consider that in my review.

    I've played with a few variants: ...

    I'd prefer Variant 1.

  13. faster HexStr => 13% faster blockToJSON
    `std::string`'s push_back is rather slow because it needs to check & update the string size. For
    `HexStr` the output string size is already easily know, so we can initially create the string with
    the correct size and then just assign the data.
    
    `HexStr` is heavily usd in `blockToJSON`, so this change is a noticeable benefit. Benchmark on an i7-8700 @3.2GHz:
    
    * 71,315,461.00 ns/op master
    * 62,842,490.00 ns/op this commit
    
    So this little change makes `blockToJSON` about ~13% faster.
    74bf850ac4
  14. martinus force-pushed on Feb 16, 2021
  15. martinus commented at 7:22 AM on February 16, 2021: contributor

    Rebased and changed the code to variant 1 (iterator & assert)

  16. 0xB10C commented at 9:17 AM on February 16, 2021: member

    Thanks for working on all these performance improvements martinus!

    master: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 69,630,868.00 | 14.36 | 0.6% | 575,779,102.00 | 138,122,873.00 | 4.169 | 107,462,304.00 | 0.5% | 0.81 | BlockToJsonVerbose

    this PR (variant 1): | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 66,411,979.00 | 15.06 | 0.2% | 509,925,986.00 | 126,353,971.00 | 4.036 | 96,559,744.00 | 0.5% | 0.76 | BlockToJsonVerbose

    Which is a 4.6% performance improvement (when looking at ns/op) for me - however the benchmark warns me that these results might be unstable (probably because I'm not plugged into power).

    EDIT: As @martinus pointed out below, there is a 11,4% improvement when looking at ins/op.

  17. martinus commented at 9:52 AM on February 16, 2021: contributor

    @0xB10C if runtime is not so stable than the number of instructions per operation might be a better way to see the performance change. 575,779,102 -> 509,925,986 is relatively similar to what @theStack wrote for the pointer version: 580,539,338 -> 505,937,562. So it's a bit slower but still quite a bit better

  18. theStack approved
  19. theStack commented at 8:40 PM on February 16, 2021: member

    re-ACK 74bf850ac47735f2ef4306059d3e664d40cac85e

    This performs even slightly better than the pointer version on my machine: 504,789,607.00 ins/op (PR branch) vs. 580,535,928.00 ins/op (master branch)

  20. Escobar181 approved
  21. Escobar181 commented at 8:58 PM on February 16, 2021: none

    Nice

  22. ViralTaco commented at 2:33 AM on February 17, 2021: none

    Couple of things in mind. Since there's no documentation in the code I'll just say what it does: "This function returns a string containing the hexadecimal representation of every byte passed to it; The string is a continuous array that is twice the size of the input." Ok, not the best doc I've wrote but it does the job.

    1. Has anyone measured performance of std::foreach(s.begin(), s.end(), string_concat_lambda);?
    2. The function isn't const correct (for loop makes non const copies for some reason). The copy is unnecessary (of the Span). To further digress: can't the function be noexcept (do you expect it to throw? I know std::string can throw, that has little to do with the function you define. ) Do we really need to initialize the std::string with s.size() * 2 0s?

    Otherwise it's pretty clean. I'm not a fan of *it++ it does the job but it, in my opinion, adds some needless complexity. I rather see the increment at the end, or after each assignment. Might just be me. (Often is, tbh)

  23. martinus commented at 6:16 AM on February 17, 2021: contributor

    Has anyone measured performance of std::foreach(s.begin(), s.end(), string_concat_lambda);?

    I've benchmarked std::for_each with variant 1:

    auto it = rv.begin();
    std::for_each(s.begin(), s.end(), [&](uint8_t v) {
        *it++ = hexmap[v >> 4];
        *it++ = hexmap[v & 15];
    });
    assert(it == rv.end());
    

    Unsurprisingly it's exactly the same performance as the for (uint8_t v : s) version. Most likely it produces the same assembler code (the difference for ins/op is 0.0005%...)

    The function isn't const correct (for loop makes non const copies for some reason).

    Two reasons:

    1. Its simple bytes, so copies are cheap.
    2. The type needs to be unsigned so v>>4 does the right thing, so I can't just use for (auto const& c : s), because then c would be of type char. Using uint8_t converts char to unsigned (which generates no code, since its just a byte).

    I'm not a fan of *it++ it does the job but it

    Maybe I'm old school, but copy loops like like while (*out++ = *in++); still look natural to me :smile:

    EDIT: I forgot to mention, it can't be noexcept because std::string allocation is not noexcept (it can throw std::bad_alloc). I'm no fan of noexcept because it is very difficult to really get it right.

    Also, the std::string constructor always initializes its element. There is no constructor that just gets a size. You could do

    std::string rv;
    rv.resize(s.size() * 2);
    

    But that seems to be a tad slower. It does basically the same (resize() zero initializes), but requires a bit more code.

  24. ViralTaco commented at 7:18 AM on February 18, 2021: none

    First off: I can't help myself reviewing code. Everything I pointed out was 100% nitpicking. The code is fine.

    1. Its simple bytes, so copies are cheap.

    I know, that's why I understand you make a copy of it. (Might have expressed that poorly) What I don't understand is why it's mutable (not const) since you don't (or aren't supposed to) change it. Agree it's cheap. It's a copy either way, here it's not an optimization for the compiler, just the reader (ie: we don't need to change that byte).

    1. […]

    Addressed, no reference needed. auto is cute but counter productive, I agree.

    still look natural to me 😄

    D:

    I forgot to mention, it can't be noexcept because std::string allocation is not noexcept (it can throw std::bad_alloc).

    This is the part I was waiting for because now I'm sure I need to read the standard again… First I have to say I agree, it's not needed here, probably a future bug if used, doesn't add much. It's just punctuation to me, at this point. And I know, allocation can fail, that's fine std::terminate is called… We're deep into the rabbit hole here but I'll still cite the standard (C++17) since I had to look it up 😞

    Non-throwing functions are permitted to call potentially-throwing functions. Whenever an exception is thrown and the search for a handler encounters the outermost block of a non-throwing function, the function std::terminate […] is called

    cf: eel.is Unless you don't want to terminate, but then this function should throw, shouldn't it? 🥺 (Java has had the better of me, didn't it?) I need to read the standard anyway, will look into it :|

  25. laanwj commented at 7:55 AM on May 19, 2021: member

    Code review ACK 74bf850ac47735f2ef4306059d3e664d40cac85e

  26. laanwj merged this on May 19, 2021
  27. laanwj closed this on May 19, 2021

  28. martinus deleted the branch on May 19, 2021
  29. sidhujag referenced this in commit 2e275ce3bc on May 19, 2021
  30. luke-jr referenced this in commit a6492a347a on Jun 27, 2021
  31. PastaPastaPasta referenced this in commit 116c8f3d88 on Jun 27, 2021
  32. PastaPastaPasta referenced this in commit 5753c5cf9f on Jun 28, 2021
  33. PastaPastaPasta referenced this in commit 3e0095449a on Jun 29, 2021
  34. PastaPastaPasta referenced this in commit bd697ec288 on Jul 1, 2021
  35. PastaPastaPasta referenced this in commit aeb6d8f37c on Jul 1, 2021
  36. PastaPastaPasta referenced this in commit 0d8f4fae3b on Jul 15, 2021
  37. gwillen referenced this in commit eb4b376770 on Jun 1, 2022
  38. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me