refactor: introduce single-separator split helper (boost::split replacement) #22953

pull theStack wants to merge 3 commits into bitcoin:master from theStack:202109-refactor-boost_split_replacement_single_sep changing 14 files +97 −71
  1. theStack commented at 12:17 pm on September 11, 2021: member

    This PR adds a simple string split helper SplitString that takes use of the spanparsing Split function that was first introduced in #13697 (commit fe8a7dcd78cfeedc9a7c705e91384f793822912b). This enables to replace most calls to boost::split, in the cases where only a single separator character is used. Note that while previous attempts to replace boost::split were controversial (e.g. #13751), this one has a trivial implementation: it merely uses an internal helper (that is unit tested and in regular use with output descriptiors) and converts its result from spans to strings. As a drawback though, not all boost::split instances can be tackled.

    As a possible optimization, one could return a vector of std::string_views (available since C++17) instead of strings, to avoid copies. This would need more carefulness on the caller sites though, to avoid potential lifetime issues, and it’s probably not worth it, considering that none of the places where strings are split are really performance-critical.

  2. fanquake added the label Refactoring on Sep 11, 2021
  3. in src/util/string.h:23 in e332fcd463 outdated
    15@@ -15,6 +16,15 @@
    16 #include <string>
    17 #include <vector>
    18 
    19+[[nodiscard]] inline std::vector<std::string> SplitString(const std::string& str, char sep)
    20+{
    21+    std::vector<Span<const char>> span_strs = spanparsing::Split(str, sep);
    22+    std::vector<std::string> result;
    23+    for (const auto& span_str : span_strs)
    


    MarcoFalke commented at 1:37 pm on September 11, 2021:

    nit: missing {. Also, could move the impl to the cpp file?

    0    for (const auto& span_str : span_strs) {
    

    theStack commented at 6:22 pm on September 11, 2021:
    Thanks for the review, done (was surprised to see that the string.cpp is actually empty on master).
  4. MarcoFalke commented at 1:38 pm on September 11, 2021: member
    Seems ok to use the already existing split function.
  5. theStack force-pushed on Sep 11, 2021
  6. theStack commented at 6:33 pm on September 11, 2021: member

    Force-pushed with changes as suggested by MarcoFalke (https://github.com/bitcoin/bitcoin/pull/22953#discussion_r706614179).

    Pinging @practicalswift as someone who has always shown interest in potential boost replacements and string processing helpers, with lots of contributions in that area. Maybe you feel like taking a look :)

  7. in src/util/string.cpp:8 in e3c7fe4418 outdated
    2@@ -3,3 +3,14 @@
    3 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4 
    5 #include <util/string.h>
    6+#include <util/spanparsing.h>
    7+
    8+[[nodiscard]] std::vector<std::string> SplitString(const std::string& str, char sep)
    


    MarcoFalke commented at 9:32 am on September 12, 2021:
    Is the annotation needed in the cpp file as well? At least lock annotations are only supposed to go to the declaration.

    theStack commented at 9:53 am on September 12, 2021:
    Good point. Quickly greping the [[nodiscard]] annotation in our codebase shows that there are barely any occurences in .cpp files (only for static functions). Removed it.
  8. theStack force-pushed on Sep 12, 2021
  9. kiminuo commented at 10:27 am on September 13, 2021: contributor

    I would still add a unit test even though there are tests for the internal helper. Maybe it’s a little bit over the top so feel free to ignore my suggestion.

    (Something like this possibly: https://github.com/kiminuo/bitcoin/commit/ac6dd5a921407c3957a7c15a6a5a82af826decb6)

  10. MarcoFalke commented at 10:34 am on September 13, 2021: member

    Any reason the tests aren’t fixed up as well?

    0src/test/fuzz/script_assets_test_minimizer.cpp:    boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));
    

    Also, if you add a unit test, might as extend the string fuzz target.

  11. kiminuo commented at 10:42 am on September 13, 2021: contributor

    There non-tests occurrences too:

     0src\rpc\server.cpp:
     1  410          std::vector<std::string> vargNames;
     2  411:         boost::algorithm::split(vargNames, argNamePattern, boost::algorithm::is_any_of("|"));
     3  412          auto fr = argsIn.end();
     4
     5src\test\transaction_tests.cpp:
     6  72      std::vector<std::string> words;
     7  73:     boost::algorithm::split(words, strFlags, boost::algorithm::is_any_of(","));
     8  74  
     9
    10src\test\fuzz\script_assets_test_minimizer.cpp:
    11  133      std::vector<std::string> words;
    12  134:     boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));
    13  135  
    

    If I’m not missing something.

    edit: Ah, I’m not aware what the difference is between boost::algorithm::split and boost::split. Seems like none?

  12. theStack force-pushed on Sep 13, 2021
  13. theStack commented at 11:19 am on September 13, 2021: member

    I would still add a unit test even though there are tests for the internal helper. Maybe it’s a little bit over the top so feel free to ignore my suggestion.

    I agree that it’s a good idea to add a unit test.

    (Something like this possibly: kiminuo@ac6dd5a)

    Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. “test: add unit tests for SplitString helper”) and put the test into src/test/util_tests.cpp, instead of creating a new file? I’m happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

    Any reason the tests aren’t fixed up as well? …

    There non-tests occurrences too: …

    Oh, I only greped for boost::split, not being aware that there is also boost::algorithm::split, which seems to be a synonym. Force-pushed with changes to tackle also the remaining occurences. Also found that there were some unnecessary boost includes in src/torcontrol.cpp that I forgot to remove in the previous push.

  14. kiminuo commented at 11:28 am on September 13, 2021: contributor

    (Something like this possibly: kiminuo@ac6dd5a)

    Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. “test: add unit tests for SplitString helper”) and put the test into src/test/util_tests.cpp, instead of creating a new file? I’m happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

    Modified here: https://github.com/kiminuo/bitcoin/commit/4516727e02feb1338285068541e869c36d8b1368. Anyway, feel free to modify the commit in whatever way you like :)

  15. MarcoFalke commented at 11:47 am on September 13, 2021: member

    The fuzz test I mentioned: (might not compile)

     0diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp
     1index 0c1b45b86c..e5a09c0013 100644
     2--- a/src/test/fuzz/string.cpp
     3+++ b/src/test/fuzz/string.cpp
     4@@ -127,6 +127,7 @@ FUZZ_TARGET(string)
     5         int64_t amount_out;
     6         (void)ParseFixedPoint(random_string_1, fuzzed_data_provider.ConsumeIntegralInRange<int>(0, 1024), &amount_out);
     7     }
     8+    (void)SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>());
     9     {
    10         (void)Untranslated(random_string_1);
    11         const bilingual_str bs1{random_string_1, random_string_2};
    
  16. theStack commented at 11:54 am on September 13, 2021: member

    The fuzz test I mentioned: (might not compile)

    Oh, I didn’t see that in your previous comment. It compiles, I added another commit 👌

  17. in src/util/string.cpp:13 in 25915e4772 outdated
     8+std::vector<std::string> SplitString(const std::string& str, char sep)
     9+{
    10+    std::vector<Span<const char>> span_strs = spanparsing::Split(str, sep);
    11+    std::vector<std::string> result;
    12+    for (const auto& span_str : span_strs) {
    13+        result.push_back(std::string(span_str.begin(), span_str.end()));
    


    martinus commented at 7:14 pm on September 13, 2021:
    You could use emplace_back(span_str.begin(), span_str.end()). That way no temporary string is created and moved. It’s also shorter :)

    theStack commented at 2:28 pm on September 14, 2021:
    Good idea, thanks! Done.
  18. in src/util/string.cpp:8 in 25915e4772 outdated
    2@@ -3,3 +3,14 @@
    3 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4 
    5 #include <util/string.h>
    6+#include <util/spanparsing.h>
    7+
    8+std::vector<std::string> SplitString(const std::string& str, char sep)
    


    martinus commented at 7:26 pm on September 13, 2021:
    Not sure if that’s nicer, but another way to implement this would be to make spanparsing::Splita templated function, where the template argument is the vector’s output type. If I’m not mistaken this should work for Span, string, string_view.Then it would be possible to directly call e.g. spanparsing::Split<std::string>(x) without the need for a wrapper function.

    theStack commented at 2:32 pm on September 14, 2021:
    Seems like a nice way to solve this! Unfortunately, I fear my C++ template skills are currently too lousy to come up with this. Will give it a try on the weekend. That said, I’m also happy to close this PR for another one with a better solution :)

    martinus commented at 5:28 pm on September 14, 2021:

    I think moving the code into the header and changing it into something like that should work:

     0template <typename T = Span<const char>>
     1std::vector<T> Split(const Span<const char>& sp, char sep)
     2{
     3    std::vector<T> ret;
     4    auto it = sp.begin();
     5    auto start = it;
     6    while (it != sp.end()) {
     7        if (*it == sep) {
     8            ret.emplace_back(start, it);
     9            start = it + 1;
    10        }
    11        ++it;
    12    }
    13    ret.emplace_back(start, it);
    14    return ret;
    15}
    

    So the code is practically identical, only the return take is now a template argument. Note that I’m typing this on my phone because I don’t have access to a computer the next few days so this probably won’t compile :)


    MarcoFalke commented at 9:41 am on September 20, 2021:

    If you decide to do that, it would be possible to define an alias like so:

    0constexpr auto SplitString = &spanparsing::Split<std::string>;
    

    theStack commented at 12:46 pm on September 20, 2021:
    @martinus: Thanks, I will try that out in a bit! @MarcoFalke: That’s a good idea with the alias, then I only need to change the implementation and keep the rest as it is now.
  19. theStack force-pushed on Sep 14, 2021
  20. fanquake commented at 2:39 am on September 16, 2021: member
    Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can’t also nuke anything from test/lint/lint-includes.sh
  21. laanwj commented at 7:41 pm on September 16, 2021: member

    Concept ACK. I don’t think this boost use is very bad (it’s a header-only library) but if we can replace it with functionality we already have internally, who not.

    Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can’t also nuke anything from test/lint/lint-includes.sh

    Well, split is still used in some places (httprpc.cpp for example), this only replaces the single-separator case. It will take some more work to get rid of all boost/algorithm/string.hpp use, if that’s worthwhile.

  22. theStack force-pushed on Sep 20, 2021
  23. theStack commented at 2:42 pm on September 20, 2021: member
    Force-pushed the alternative variant as suggested by @martinus, changing spanparsing::Split to a templated function and defining an alias SplitString as suggested by @MarcoFalke (see #22953 (review) and #22953 (review)).
  24. in src/test/util_tests.cpp:2397 in ed24bf5989 outdated
    2121+
    2122+    // Case-sensitivity of the separator.
    2123+    {
    2124+        std::vector<std::string> result = SplitString("AAA", 'a');
    2125+        BOOST_CHECK_EQUAL(result.size(), 1);
    2126+        BOOST_CHECK_EQUAL(result[0], "AAA");
    


    MarcoFalke commented at 3:02 pm on September 20, 2021:

    weird issue on macos:

    0test/util_tests.cpp:2087: error: in "util_tests/test_SplitString": check result[0] == "" has failed [ != ]
    1test/util_tests.cpp:2095: error: in "util_tests/test_SplitString": check result[1] == "" has failed [ != ]
    2test/util_tests.cpp:2104: error: in "util_tests/test_SplitString": check result[2] == "" has failed [ != ]
    3test/util_tests.cpp:2111: error: in "util_tests/test_SplitString": check result[0] == "abc" has failed [abc != abc]
    4test/util_tests.cpp:2119: error: in "util_tests/test_SplitString": check result[1] == "b" has failed [b != b]
    5test/util_tests.cpp:2126: error: in "util_tests/test_SplitString": check result[0] == "AAA" has failed [AAA != AAA]
    

    martinus commented at 5:02 pm on September 20, 2021:
    I think the issue here is that the conversion of the string literals to Span<const char> works so that the \0 is part of the span. So for "" the size of the span is actually 1 with content “\0”, while for std::string the size would be 0. The tests should work fine for std::string("") as the argument

    theStack commented at 5:34 pm on September 20, 2021:
    Ah, good point, that also explains why only the last element of each result vector failed the test. Thanks, fixed.
  25. theStack force-pushed on Sep 20, 2021
  26. in src/util/string.h:19 in 4eaebd78ba outdated
    15@@ -15,6 +16,8 @@
    16 #include <string>
    17 #include <vector>
    18 
    19+constexpr auto SplitString = &spanparsing::Split<std::string>;
    


    martinus commented at 5:16 am on September 21, 2021:

    I think it would be safer to use a more restrictive interface for SplitString, because it’s not really intuitive how SplitString behaves with string literals. Something like

    0inline std::vector<std::string> SplitString(std::string_view str, char sep) {
    1    return spanparsing::Split<std::string>(str, sep);
    2}
    

    Other than that, almost-ACK 8949509, code-reviewed and tested


    theStack commented at 10:45 am on September 21, 2021:
    Thanks, used your suggested interface/implementation and also add the [[nodiscard]] attribute.
  27. in src/test/util_tests.cpp:2124 in 8949509e00 outdated
    2119+        BOOST_CHECK_EQUAL(result[1], "b");
    2120+    }
    2121+
    2122+    // Case-sensitivity of the separator.
    2123+    {
    2124+        std::vector<std::string> result = SplitString(std::string("AAA"), 'a');
    


    MarcoFalke commented at 8:56 am on September 21, 2021:
    It seems dangerous to offer a function that silently does the wrong thing when passed the “wrong” type

    theStack commented at 10:46 am on September 21, 2021:
    Agree. This is fixed now by using an explicit interface instead of an alias, as suggested by martinus in #22953 (review).
  28. MarcoFalke dismissed
  29. MarcoFalke commented at 8:56 am on September 21, 2021: member
    Not a fan of forcing devs to remember to construct a std::string manually each time before calling the function.
  30. theStack force-pushed on Sep 21, 2021
  31. MarcoFalke requested review from MarcoFalke on Sep 21, 2021
  32. martinus commented at 5:51 am on September 23, 2021: contributor

    Windows build error seems unrelated, in wallet_address_types.py:

    0OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
    
  33. hebasto commented at 6:27 am on September 23, 2021: member

    See #18548

    On Thu, 23 Sep 2021 at 08:54, Martin Leitner-Ankerl < @.***> wrote:

    Windows build error seems unrelated, in wallet_address_types.py:

    OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/22953#issuecomment-925523479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3PXPQDTK3VEUUU4R2KEY3UDK6JVANCNFSM5D246WWQ .

    – Hennadii Stepanov

  34. theStack force-pushed on Sep 27, 2021
  35. theStack commented at 10:38 am on September 27, 2021: member
    Rebased on master to (hopefully) get rid of the Windows CI failure.
  36. practicalswift commented at 2:01 pm on September 29, 2021: contributor
    Concept ACK
  37. DrahtBot commented at 8:54 am on November 18, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24764 (Modernize util/strencodings and util/string: string_view and optional by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  38. DrahtBot added the label Needs rebase on Nov 25, 2021
  39. theStack force-pushed on Nov 25, 2021
  40. theStack commented at 1:08 pm on November 25, 2021: member
    Rebased on master.
  41. DrahtBot removed the label Needs rebase on Nov 25, 2021
  42. DrahtBot added the label Needs rebase on Dec 3, 2021
  43. theStack force-pushed on Dec 7, 2021
  44. theStack commented at 6:51 pm on December 7, 2021: member
    Rebased on master again.
  45. DrahtBot removed the label Needs rebase on Dec 7, 2021
  46. kiminuo commented at 10:08 am on January 17, 2022: contributor
    @theStack Is there a bug or does the PR need another rebase to pass all tests?
  47. theStack force-pushed on Jan 17, 2022
  48. theStack commented at 4:59 pm on January 17, 2022: member
    @kiminuo: Thanks for noticing, indeed a rebaster on master was needed since #17631 (commit ef7c8228fd5cf45526518ae2bd5ebdd483e65525) introduced two other single-separator uses of boost::split in ./src/rest.cpp. Done!
  49. fanquake added this to the milestone 24.0 on Mar 11, 2022
  50. DrahtBot added the label Needs rebase on Mar 25, 2022
  51. MarcoFalke referenced this in commit 7878c8655c on Mar 25, 2022
  52. theStack force-pushed on Mar 25, 2022
  53. theStack commented at 7:14 pm on March 25, 2022: member
    Rebased on master.
  54. DrahtBot removed the label Needs rebase on Mar 25, 2022
  55. sidhujag referenced this in commit 63c27b9e74 on Apr 2, 2022
  56. DrahtBot added the label Needs rebase on Apr 6, 2022
  57. refactor: introduce single-separator split helper `SplitString`
    This helper uses spanparsing::Split internally and enables to replace
    all calls to boost::split where only a single separator is passed.
    
    Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
    Co-authored-by: MarcoFalke <falke.marco@gmail.com>
    9cc8e876e4
  58. test: add unit tests for `SplitString` helper 4fad7e46d9
  59. fuzz: add `SplitString` fuzz target a62e84438d
  60. theStack force-pushed on Apr 11, 2022
  61. theStack commented at 9:28 pm on April 11, 2022: member
    Rebased on master again (there was a conflict in src/rest.cpp after #24098 has been merged).
  62. DrahtBot removed the label Needs rebase on Apr 11, 2022
  63. fanquake commented at 8:48 am on April 25, 2022: member
    @martinus want to take another look here?
  64. martinus commented at 5:49 am on April 26, 2022: contributor

    Code review ACK a62e84438d27ee6213219fe2c233e58814fcbb5d. Ran all tests. I also like that with boost::split it was not obvious that the resulting container was cleared, and with SplitString API that’s obvious.

    After that PR there is only one usage of boost::split left in core_read.cpp, that would be a nice followup PR :slightly_smiling_face:

  65. martinus approved
  66. fanquake merged this on Apr 26, 2022
  67. fanquake closed this on Apr 26, 2022

  68. theStack deleted the branch on Apr 26, 2022
  69. sidhujag referenced this in commit 3e0e7fb140 on Apr 26, 2022
  70. fanquake referenced this in commit bde5836f99 on May 4, 2022
  71. sidhujag referenced this in commit ed600cecaa on May 4, 2022
  72. kittywhiskers referenced this in commit 94a225686c on Dec 21, 2022
  73. kittywhiskers referenced this in commit 3133a4ceb0 on Dec 26, 2022
  74. kittywhiskers referenced this in commit 35268285ff on Jan 3, 2023
  75. kittywhiskers referenced this in commit eceda6a82f on Jan 13, 2023
  76. kittywhiskers referenced this in commit 25a35284f0 on Jan 15, 2023
  77. kittywhiskers referenced this in commit 135bfdec57 on Jan 19, 2023
  78. PastaPastaPasta referenced this in commit 2d7a448fdc on Jan 19, 2023
  79. DrahtBot locked this on Apr 26, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me