util: optimize HexStr #24852

martinus commented at 5:23 AM on April 14, 2022: contributor

In my benchmark, this rewrite improves runtime 27% (g++) to 46% (clang++) for the benchmark HexStrBench:

g++ 11.2.0 | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.94 | 1,061,381,310.36 | 0.7% | 12.00 | 3.01 | 3.990 | 1.00 | 0.0% | 0.01 | HexStrBench master | 0.68 | 1,465,366,544.25 | 1.7% | 6.00 | 2.16 | 2.778 | 1.00 | 0.0% | 0.01 | HexStrBench branch

clang++ 13.0.1 | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.80 | 1,244,713,415.92 | 0.9% | 10.00 | 2.56 | 3.913 | 0.50 | 0.0% | 0.01 | HexStrBench master | 0.43 | 2,324,188,940.72 | 0.2% | 4.00 | 1.37 | 2.914 | 0.25 | 0.0% | 0.01 | HexStrBench branch

Note that the idea for this change comes from denis2342 in #23364. This is a rewrite so no unaligned accesses occur.

Also, the lookup table is now calculated at compile time, which hopefully makes the code a bit easier to review.

DrahtBot added the label Build system on Apr 14, 2022

DrahtBot added the label Utils/log/libs on Apr 14, 2022

in src/util/strencodings.cpp:520 in 4d7b193af5 outdated

 515 | +
 516 | +constexpr std::array<ByteAsHex, 256> CreateByteToHexMap()
 517 | +{
 518 | +    std::array<ByteAsHex, 256> byte_to_hex{};
 519 | +    for (size_t i = 0; i < byte_to_hex.size(); ++i) {
 520 | +        char const* hex = "0123456789abcdef";

laanwj commented at 6:13 AM on April 14, 2022:

Any reason for not using the static constexpr array as removed below?

    static constexpr char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };

It clearly can be static, and that makes more sense than a string (which will be zero-terminated) to me.

martinus commented at 12:10 PM on April 17, 2022:

It can't be static here because I'm using that function in a constexpr way. With static constexpr char... I get that error message:

util/strencodings.cpp:520:31: error: static variable not permitted in a constexpr function
        static constexpr char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
                              ^

I can use constexpr char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }; though. I just used a string though because it's less typing :slightly_smiling_face:

laanwj commented at 1:04 PM on April 17, 2022:

Huh, TIL

I can use constexpr char hexmap[16] = { '0', '

I prefer that, then. I mean it's not used as a C string so it shouldn't be a C string. The amount of typing shouldn't be a consideration :stuck_out_tongue:

in src/test/strencodings_tests.cpp:16 in 4d7b193af5 outdated

  11 | +#include <string>
  12 | +#include <string_view>
  13 | +
  14 | +BOOST_AUTO_TEST_SUITE(strencodings_tests)
  15 | +
  16 | +BOOST_AUTO_TEST_CASE(util_HexStr)

laanwj commented at 6:17 AM on April 14, 2022:

Mind that we already have a test for HexStr in src/test/util_tests.cpp. Might want to remove that one, or at least move it into here.

martinus commented at 12:11 PM on April 17, 2022:

ah, I didn't see that! I'll move the test over that checks each character and remove the rest

laanwj commented at 6:18 AM on April 14, 2022: member

Concept ACK. This doesn't have the problems that #23364 has.

in src/Makefile.test.include:140 in 4d7b193af5 outdated

 136 | @@ -137,6 +137,7 @@ BITCOIN_TESTS =\
 137 |    test/skiplist_tests.cpp \
 138 |    test/sock_tests.cpp \
 139 |    test/streams_tests.cpp \
 140 | +  test/strencodings_tests.cpp \

laanwj commented at 7:34 AM on April 14, 2022:

Linter complains about trailing whitespace on this line

test: Adds a test for HexStr that checks all 256 bytes

This makes sure the whole HexStr mapping table is checked.

67c8411c37

bench: Adds a benchmark for HexStr

Benchmarks conversion of a full binary block into hex, like it is done in rest.cpp.

4e2b99f72a

util: optimizes HexStr

In my benchmark, this rewrite improves runtime 27% (g++) to 46% (clang++) for the benchmark `HexStrBench`:

g++ 11.2.0
|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.94 |    1,061,381,310.36 |    0.7% |           12.00 |            3.01 |  3.990 |           1.00 |    0.0% |      0.01 | `HexStrBench` master
|                0.68 |    1,465,366,544.25 |    1.7% |            6.00 |            2.16 |  2.778 |           1.00 |    0.0% |      0.01 | `HexStrBench` branch

clang++ 13.0.1
|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.80 |    1,244,713,415.92 |    0.9% |           10.00 |            2.56 |  3.913 |           0.50 |    0.0% |      0.01 | `HexStrBench` master
|                0.43 |    2,324,188,940.72 |    0.2% |            4.00 |            1.37 |  2.914 |           0.25 |    0.0% |      0.01 | `HexStrBench` branch

Note that the idea for this change comes from denis2342 in PR 23364. This is a rewrite so no unaligned accesses occur.

Also, the lookup table is now calculated at compile time, which hopefully makes the code a bit easier to review.

5e61532e72

in src/util/strencodings.cpp:539 in 4d7b193af5 outdated

 538 | +    char* it = rv.data();
 539 |      for (uint8_t v : s) {
 540 | -        *it++ = hexmap[v >> 4];
 541 | -        *it++ = hexmap[v & 15];
 542 | +        std::memcpy(it, byte_to_hex[v].data(), 2);
 543 | +        it += 2;

MarcoFalke commented at 10:46 AM on April 15, 2022:

        const auto& h{byte_to_hex[v]};
        it = std::copy(h.begin(),h.end(), it);

Side-note: Looks like it requires gcc-12 to get the same binary if this is written with std::copy.

martinus commented at 3:09 PM on April 17, 2022:

I also find it interesting that clang unrolls the loop, but gcc doesn't. I tried to force unrollment with #pragma GCC unroll 8 and that does the job, but still the code that comes out is not faster in my benchmark. https://godbolt.org/z/GEfYze9ob (I've replaced Span with string_view but it should be practically the same)

martinus force-pushed on Apr 17, 2022

laanwj renamed this:
~~util: optimizes HexStr~~
util: optimize HexStr
on Apr 18, 2022

laanwj commented at 1:38 PM on April 18, 2022: member

Code review ACK 5e61532e72c1021fda9c7b213bd9cf397cb3a802

in src/test/util_tests.cpp:204 in 5e61532e72

 197 | @@ -198,6 +198,24 @@ BOOST_AUTO_TEST_CASE(util_HexStr)
 198 |          BOOST_CHECK_EQUAL(HexStr(in_s), out_exp);
 199 |          BOOST_CHECK_EQUAL(HexStr(in_b), out_exp);
 200 |      }
 201 | +
 202 | +    {
 203 | +        auto input = std::string();
 204 | +        for (size_t i=0; i<256; ++i) {

aureleoules commented at 7:09 PM on April 19, 2022:

        for (size_t i = 0; i < 256; ++i) {

nit

in src/test/util_tests.cpp:216 in 5e61532e72

 211 | +        for (size_t i = 0; i < 256; ++i) {
 212 | +            auto upper = hexmap.find(hex[i * 2]);
 213 | +            auto lower = hexmap.find(hex[i * 2 + 1]);
 214 | +            BOOST_TEST_REQUIRE(upper != std::string_view::npos);
 215 | +            BOOST_TEST_REQUIRE(lower != std::string_view::npos);
 216 | +            BOOST_TEST_REQUIRE(i == upper*16 + lower);

aureleoules commented at 7:09 PM on April 19, 2022:

            BOOST_TEST_REQUIRE(i == upper * 16 + lower);

nit

aureleoules commented at 7:32 PM on April 19, 2022: member

tACK 5e61532e72c1021fda9c7b213bd9cf397cb3a802. Here are my benchmark results:

g++ (GCC) 10.3.0:
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.83 | 1,208,873,408.19 | 2.3% | 12.00 | 1.89 | 6.351 | 1.00 | 0.0% | 0.01 | master HexStrBench
| 0.47 | 2,128,159,476.75 | 1.4% | 6.00 | 1.07 | 5.588 | 1.00 | 0.0% | 0.01 | PR HexStrBench

clang version 14.0.1:
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.74 | 1,360,194,312.11 | 0.5% | 12.00 | 1.69 | 7.103 | 1.00 | 0.0% | 0.01 | master HexStrBench | 0.43 | 2,307,843,399.75 | 0.9% | 6.00 | 0.99 | 6.033 | 1.00 | 0.0% | 0.01 | PR HexStrBench

theStack approved

theStack commented at 2:03 PM on May 4, 2022: member

ACK 5e61532e72c1021fda9c7b213bd9cf397cb3a802 🚤

Tested on OpenBSD 7.1 (amd64) with clang 13.0.0:

4e2b99f72a90b956f3050095abed4949aff9b516 (this PR without the latest commit, i.e. runs the HexStr version in master)

| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 3.23 | 309,609,411.46 | 3.2% | 0.04 | HexStrBench

5e61532e72c1021fda9c7b213bd9cf397cb3a802 (PR's top commit)

| ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1.73 | 577,521,701.39 | 2.6% | 0.02 | HexStrBench

laanwj merged this on May 4, 2022

laanwj closed this on May 4, 2022

martinus deleted the branch on May 4, 2022

sidhujag referenced this in commit 3f7e7668f7 on May 4, 2022

luke-jr referenced this in commit 2350a9cfbd on May 21, 2022

luke-jr referenced this in commit c98be4400c on May 21, 2022

luke-jr referenced this in commit 6a7045f2cb on May 21, 2022

DrahtBot locked this on May 4, 2023