util: Pass size to ParseHex to assist preallocation #18061

pull elichai wants to merge 2 commits into bitcoin:master from elichai:2020-02-remove-strlen changing 4 files +70 −7
  1. elichai commented at 7:02 pm on February 3, 2020: contributor

    Hi, This is kinda chasing Concept ACK if it’s worth the review effort.* (noticed it while reviewing a PR)

    Right now both DecodeBase32 and DecodeBase64 are only ever being called with std::string but still process via char* and so they run strlen(O(n)) twice, once to validate it doesn’t contain random zeros(ValidAsCString) another to get the length.

    Instead we can iterate it as a std::string and then if we see 0 in the middle of the string we can fail the process.

    * if it is, there’s also DecodeBase58 .

    And as a by-product saw 2 other places where we can pre-allocate memory.

    EDIT: So the by-product turned out to be the only interesting part of this PR :) (see #18061 (comment)) According to the benchmarks this can cut the time it takes to parse a hex by half.

  2. elichai renamed this:
    util: Reimplement DecodeBase32/64 with std::string to remove `strlen`
    util: Change DecodeBase32/64 to use std::string to not call `strlen` twice
    on Feb 3, 2020
  3. sanjaykdragon commented at 7:13 pm on February 3, 2020: contributor
    ut: i think the compiler would optimize strlen being called twice into once
  4. elichai commented at 7:15 pm on February 3, 2020: contributor

    @sanjaykdragon good question. I’ll check godbolt later. But even if you’re right it’s A. compiler dependent. B. still one time more than this.

    EDIT: even with simplification, and force inlining everything it still calls strlen twice https://godbolt.org/z/ZBpdWa

  5. DrahtBot added the label GUI on Feb 3, 2020
  6. DrahtBot added the label P2P on Feb 3, 2020
  7. DrahtBot added the label RPC/REST/ZMQ on Feb 3, 2020
  8. DrahtBot added the label Utils/log/libs on Feb 3, 2020
  9. in src/util/strencodings.cpp:109 in 3fad849f2f outdated
    105@@ -104,7 +106,7 @@ std::vector<unsigned char> ParseHex(const char* psz)
    106 
    107 std::vector<unsigned char> ParseHex(const std::string& str)
    108 {
    109-    return ParseHex(str.c_str());
    110+    return ParseHex(str.c_str(), str.size());
    


    promag commented at 10:22 pm on February 3, 2020:

    3fad849f2f9d4c250e44fb7c329e1f93c14e48c7

    NACK this change. Instead you can drop ParseHex(const char*) and move the code here with the proper changes - I’ve done it and make check ran fine.


    elichai commented at 9:28 am on February 4, 2020:
    Well we have places where it’s called with const char*. probably by dropping that it just calls the std::string constructor

    laanwj commented at 3:26 pm on February 5, 2020:
    Same suggestion: can’t we remove the std::vector<unsigned char> ParseHex(const char* psz); variant completely? Does this cause any actual issues if we did? I’d like to move away from C-strings where possible and it seems more of a simplification.

    elichai commented at 3:36 pm on February 5, 2020:
    The only “problem” is a degregation in speed for places where we call this with C strings which I think is mostly(if not only?) tests(their raw strings usage) It’s probably worth the simplicity though.

    MarcoFalke commented at 5:57 pm on February 10, 2020:
    Is copying a string actually relevant performance wise here? Even if it was, parsing hex should only happen on user or rpc inputs, so I don’t think we need to over-optimize this.

    elichai commented at 2:15 pm on February 11, 2020:

    @MarcoFalke in my benchmarks if you have char* and you convert it into std::string it spends almost the same time as the code before this PR (X2 slower). I think it’s not about copying, it’s about heap allocation. But if you prefer I’ll be fine with removing the raw char* support because we probably only ever benefit from it in tests.

    I agree that there’s no need to over-optimize this but this is basically free optimization.


    practicalswift commented at 10:17 pm on February 11, 2020:

    Echoing the request of simply dropping ParseHex(const char* psz).

    Generally I think we should try move away from C-strings where possible.


    MarcoFalke commented at 10:23 pm on February 11, 2020:
    @elichai Is there a single call site that needs the raw string interface? If not, and you have performance concerns, you may explicitly delete the interface, and thus force all callers to explicitly call ParseHex(std::string{psz}) and be aware of the performance impact.

    elichai commented at 3:44 pm on February 12, 2020:

    Checked now, it seems like ParseHex(const char* psz) is only used in tests except for 2 places: https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L55 https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2101

    Both should almost never really run, so I’ll just drop it

  10. promag commented at 10:26 pm on February 3, 2020: member

    But what about the check done in ValidAsCString?

    I think you need to show improvements otherwise it doesn’t pay off. FWIW having ValidAsCString avoids .reserve().

  11. DrahtBot commented at 11:43 pm on February 3, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21328 (net, refactor: pass uint16 CService::port as uint16 by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  12. MarcoFalke removed the label GUI on Feb 4, 2020
  13. MarcoFalke removed the label P2P on Feb 4, 2020
  14. MarcoFalke removed the label RPC/REST/ZMQ on Feb 4, 2020
  15. MarcoFalke added the label Refactoring on Feb 4, 2020
  16. elichai force-pushed on Feb 4, 2020
  17. Add benchmarks for ParseHex and Base64 a7d2c6327c
  18. elichai commented at 3:37 pm on February 4, 2020: contributor

    So added some benchmarks, and it seems to be within the margin of noise except for ParseHex.(which is twice as fast for the 32 byte case) so I think I’ll rebase to only do the ParseHex pre-allocation Before:

     0$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     1# Benchmark,    evals,   iterations,  total,      min,         max,       median
     2Base64Decode,    5,       1000000,   1.77748,  3.49936e-07, 3.60382e-07, 3.55597e-07
     3Base64DecodeVec, 5,       1000000,   1.69569,  3.35597e-07, 3.41911e-07, 3.39301e-07
     4HexParse,        5,       1000000,   0.599282, 1.1813e-07,  1.2082e-07,  1.19942e-07
     5
     6$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     7# Benchmark,    evals,   iterations,  total,      min,         max,       median
     8Base64Decode,    5,       1000000,   1.84349,  3.6175e-07,  3.79702e-07, 3.65207e-07
     9Base64DecodeVec, 5,       1000000,   1.69212,  3.35303e-07, 3.43094e-07, 3.36332e-07
    10HexParse,        5,       1000000,   0.592378, 1.17877e-07, 1.19108e-07, 1.18541e-07
    11
    12$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
    13# Benchmark,    evals,   iterations,  total,      min,         max,       median
    14Base64Decode,    5,       1000000,   1.84158,  3.59822e-07, 3.83746e-07, 3.67486e-07
    15Base64DecodeVec, 5,       1000000,   1.7276,   3.37973e-07, 3.51833e-07, 3.46255e-07
    16HexParse,        5,       1000000,   0.614073, 1.18379e-07, 1.26827e-07, 1.22456e-07
    

    After:

     0# Benchmark,    evals,   iterations,  total,      min,         max,       median
     1Base64Decode,    5,       1000000,   1.81012,  3.55877e-07, 3.68592e-07, 3.63113e-07
     2Base64DecodeVec, 5,       1000000,   1.84868,  3.66254e-07, 3.72895e-07, 3.69377e-07
     3HexParse,        5,       1000000,   0.281846, 5.55183e-08, 5.71883e-08, 5.63974e-08
     4
     5$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     6# Benchmark,    evals,   iterations,  total,      min,         max,       median
     7Base64Decode,    5,       1000000,   1.82559, 3.58871e-07, 3.68576e-07, 3.65803e-07
     8Base64DecodeVec, 5,       1000000,   1.84521, 3.6735e-07,  3.71235e-07, 3.68715e-07
     9HexParse,        5,       1000000,   0.29672, 5.62958e-08, 6.46866e-08, 5.86538e-08
    10
    11$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
    12# Benchmark,    evals,   iterations,  total,      min,         max,       median
    13Base64Decode,    5,       1000000,   1.89398,  3.7271e-07,  3.83227e-07, 3.78387e-07
    14Base64DecodeVec, 5,       1000000,   1.90284,  3.73173e-07, 3.89818e-07, 3.81565e-07
    15HexParse,        5,       1000000,   0.288767, 5.70536e-08, 5.92714e-08, 5.73934e-08
    
  19. elichai force-pushed on Feb 4, 2020
  20. elichai renamed this:
    util: Change DecodeBase32/64 to use std::string to not call `strlen` twice
    util: Pass size to ParseHex to assist preallocation
    on Feb 4, 2020
  21. sanjaykdragon commented at 5:05 pm on February 4, 2020: contributor

    So added some benchmarks, and it seems to be within the margin of noise except for ParseHex.(which is twice as fast for the 32 byte case) so I think I’ll rebase to only do the ParseHex pre-allocation Before:

     0$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     1# Benchmark,    evals,   iterations,  total,      min,         max,       median
     2Base64Decode,    5,       1000000,   1.77748,  3.49936e-07, 3.60382e-07, 3.55597e-07
     3Base64DecodeVec, 5,       1000000,   1.69569,  3.35597e-07, 3.41911e-07, 3.39301e-07
     4HexParse,        5,       1000000,   0.599282, 1.1813e-07,  1.2082e-07,  1.19942e-07
     5
     6$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     7# Benchmark,    evals,   iterations,  total,      min,         max,       median
     8Base64Decode,    5,       1000000,   1.84349,  3.6175e-07,  3.79702e-07, 3.65207e-07
     9Base64DecodeVec, 5,       1000000,   1.69212,  3.35303e-07, 3.43094e-07, 3.36332e-07
    10HexParse,        5,       1000000,   0.592378, 1.17877e-07, 1.19108e-07, 1.18541e-07
    11
    12$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
    13# Benchmark,    evals,   iterations,  total,      min,         max,       median
    14Base64Decode,    5,       1000000,   1.84158,  3.59822e-07, 3.83746e-07, 3.67486e-07
    15Base64DecodeVec, 5,       1000000,   1.7276,   3.37973e-07, 3.51833e-07, 3.46255e-07
    16HexParse,        5,       1000000,   0.614073, 1.18379e-07, 1.26827e-07, 1.22456e-07
    

    After:

     0# Benchmark,    evals,   iterations,  total,      min,         max,       median
     1Base64Decode,    5,       1000000,   1.81012,  3.55877e-07, 3.68592e-07, 3.63113e-07
     2Base64DecodeVec, 5,       1000000,   1.84868,  3.66254e-07, 3.72895e-07, 3.69377e-07
     3HexParse,        5,       1000000,   0.281846, 5.55183e-08, 5.71883e-08, 5.63974e-08
     4
     5$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
     6# Benchmark,    evals,   iterations,  total,      min,         max,       median
     7Base64Decode,    5,       1000000,   1.82559, 3.58871e-07, 3.68576e-07, 3.65803e-07
     8Base64DecodeVec, 5,       1000000,   1.84521, 3.6735e-07,  3.71235e-07, 3.68715e-07
     9HexParse,        5,       1000000,   0.29672, 5.62958e-08, 6.46866e-08, 5.86538e-08
    10
    11$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
    12# Benchmark,    evals,   iterations,  total,      min,         max,       median
    13Base64Decode,    5,       1000000,   1.89398,  3.7271e-07,  3.83227e-07, 3.78387e-07
    14Base64DecodeVec, 5,       1000000,   1.90284,  3.73173e-07, 3.89818e-07, 3.81565e-07
    15HexParse,        5,       1000000,   0.288767, 5.70536e-08, 5.92714e-08, 5.73934e-08
    

    utACK: if these numbers are real and true, this is a nice improvement

  22. Add memory reserving in ParseHex and SanitizeString 40aa0368dc
  23. elichai force-pushed on Feb 12, 2020
  24. DrahtBot added the label Needs rebase on Mar 19, 2021
  25. DrahtBot commented at 9:15 pm on March 19, 2021: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  26. DrahtBot commented at 11:22 am on December 15, 2021: member
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  27. DrahtBot commented at 1:07 pm on March 21, 2022: member
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  28. MarcoFalke added the label Up for grabs on Mar 21, 2022
  29. MarcoFalke commented at 1:55 pm on March 21, 2022: member
    Added “up for grabs”, see also https://github.com/bitcoin/bitcoin/pull/23595/commits
  30. MarcoFalke commented at 6:41 am on April 4, 2022: member
    I’ve split up the common changes (base refactoring work) into https://github.com/bitcoin/bitcoin/pull/24751
  31. fanquake commented at 12:21 pm on April 26, 2022: member
    Closing this in favour of #24764 for now.
  32. fanquake closed this on Apr 26, 2022

  33. fanquake removed the label Up for grabs on Apr 26, 2022
  34. MarcoFalke commented at 9:05 am on April 29, 2022: member
    I don’t think this was fixed by https://github.com/bitcoin/bitcoin/pull/24764
  35. MarcoFalke reopened this on Apr 29, 2022

  36. fanquake commented at 10:23 am on April 29, 2022: member

    I don’t think this was fixed by #24764

    I closed it because the PR has been inactive for a long time, the up for grabs label had already been applied, and it was just generating noise by conflicting with on-going changes. If someone wants to continue these changes they could rebase, and open a new PR, with the changes that are still relevant.

  37. fanquake closed this on Apr 29, 2022

  38. DrahtBot locked this on Apr 29, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me