refactor: replace remaining boost::split with SplitString #25057

pull martinus wants to merge 4 commits into bitcoin:master from martinus:2022-04-boost-split-exorcism changing 7 files +49 −18
  1. martinus commented at 5:19 AM on May 3, 2022: contributor

    As a followup of #22953, this removes the remaining occurrences of boost::split and replaces them with our own SplitString. To be able to do so, this extends the function spanparsing::Split to work with multiple separators. Finally this removes 3 more files from lint-includes.py.

  2. martinus renamed this:
    refactor: replace remaining boost::split with StringSplit
    refactor: replace remaining boost::split with SplitString
    on May 3, 2022
  3. DrahtBot added the label Refactoring on May 3, 2022
  4. DrahtBot added the label RPC/REST/ZMQ on May 3, 2022
  5. DrahtBot added the label Utils/log/libs on May 3, 2022
  6. in src/httprpc.cpp:277 in 8976d15bb0 outdated
     273 | @@ -276,8 +274,10 @@ static bool InitRPCAuthentication()
     274 |          std::set<std::string>& whitelist = g_rpc_whitelist[strUser];
     275 |          if (pos != std::string::npos) {
     276 |              std::string strWhitelist = strRPCWhitelist.substr(pos + 1);
     277 | -            std::set<std::string> new_whitelist;
     278 | -            boost::split(new_whitelist, strWhitelist, boost::is_any_of(", "));
     279 | +            std::vector<std::string> whitelistSplit = SplitString(strWhitelist, ", ");
    


    MarcoFalke commented at 6:02 AM on May 3, 2022:
                std::vector<std::string> whitelist_split = SplitString(strWhitelist, ", ");
    

    nit: for new code snake_case ?

  7. in src/util/string.h:25 in 8976d15bb0 outdated
      20 | @@ -21,6 +21,11 @@
      21 |      return spanparsing::Split<std::string>(str, sep);
      22 |  }
      23 |  
      24 | +[[nodiscard]] inline std::vector<std::string> SplitString(std::string_view str, std::string_view separators)
    


    MarcoFalke commented at 6:02 AM on May 3, 2022:

    nit: Maybe also fuzz this? Untested diff:

    diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp
    index e6064d19b6..94399faf04 100644
    --- a/src/test/fuzz/string.cpp
    +++ b/src/test/fuzz/string.cpp
    @@ -224,7 +224,12 @@ FUZZ_TARGET(string)
             int64_t amount_out;
             (void)ParseFixedPoint(random_string_1, fuzzed_data_provider.ConsumeIntegralInRange<int>(0, 1024), &amount_out);
         }
    -    (void)SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>());
    +    {
    +        const auto single_split{SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>())};
    +        assert(single_split.size() >= 1);
    +        const auto any_split{SplitString(random_string_1, random_string_2)};
    +        assert(any_split.size() >= 1);
    +    }
         {
             (void)Untranslated(random_string_1);
             const bilingual_str bs1{random_string_1, random_string_2};
    

    MarcoFalke commented at 6:03 AM on May 3, 2022:

    nit: Add include for string_view


    martinus commented at 7:29 PM on May 3, 2022:

    I have no experience with fuzzing, would it make sense to add stronger asserts? E.g. check that the sum of all split string length plus number of splits minus 1 should give exactly the input string length:

    const auto single_split{SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>())};
    size_t sum = 0;
    for (auto const& split : single_split) {
        sum += split.size();
    }
    assert(sum + single_split.size() - 1 == random_string_1.size());
    
    const auto any_split{SplitString(random_string_1, random_string_2)};
    sum = 0;
    for (auto const& split : any_split) {
        sum += split.size();
    }
    assert(sum + any_split.size() - 1 == random_string_1.size());
    

    MarcoFalke commented at 7:38 AM on May 4, 2022:

    Sure, just like with other tests, there is no limit to how much can be tested

  8. MarcoFalke approved
  9. MarcoFalke commented at 6:03 AM on May 3, 2022: member

    LGTM. I was also working on this yesterday, but I like your version better.

  10. fanquake commented at 8:29 AM on May 3, 2022: member

    Concept ACK

  11. vincenzopalazzo commented at 8:22 PM on May 3, 2022: none

    Concept ACK

  12. Extend Split to work with multiple separators b7ab9db545
  13. core_read: Replace boost::split with SplitString
    Note that `SplitString` doesn't support token compression, but in this case
    it does not matter as empty strings are already skipped anyways.
    
    Also removes split.hpp and classification.hpp from expected includes
    0d7efcdf75
  14. http: replace boost::split with SplitString
    Also removes boost/algorithm/string.hpp from expected includes
    d1a9850102
  15. fuzz: SplitString with multiple separators
    Co-authored-by: MarcoFalke <falke.marco@gmail.com>
    f849e63bad
  16. martinus force-pushed on May 4, 2022
  17. martinus commented at 5:56 AM on May 4, 2022: contributor

    Rebased with fixing nits and adding fuzzing

  18. MarcoFalke approved
  19. fanquake requested review from theStack on May 4, 2022
  20. theStack approved
  21. theStack commented at 4:15 PM on May 4, 2022: member

    Code-review ACK f849e63bad963b8717d4bc45efdad9b08567a36e

    I tend to think that the single-character interface could be removed completely (especially since it uses the multi-separators version internally in the end anyways), but on the other hand this would make the diff larger as lots of calling instances had to be changed, so NBD.

    Finally this removes 3 more files from lint-includes.py

    🎉 🎉 🎉

  22. fanquake merged this on May 4, 2022
  23. fanquake closed this on May 4, 2022

  24. martinus deleted the branch on May 4, 2022
  25. sidhujag referenced this in commit ed600cecaa on May 4, 2022
  26. in src/util/spanparsing.h:54 in f849e63bad
      52 |      std::vector<T> ret;
      53 |      auto it = sp.begin();
      54 |      auto start = it;
      55 |      while (it != sp.end()) {
      56 | -        if (*it == sep) {
      57 | +        if (separators.find(*it) != std::string::npos) {
    


    vasild commented at 7:15 AM on May 5, 2022:

    This is doing full scan of the iterators for each input character, thus the overall complexity is O(input_length * number_of_separators). It can be optimized to O(input_length + number_of_separators) if a map of the separators is built beforehand that would allow checking whether a given character is a separator in a constant time.

    This is mostly theoretical because it would matter if lots of separators are used and currently we only use a few. Anyway, out of curiosity, I tried that and compared the performance - in the current variant it is 734 ns/op while with the map it is 130 ns/op for 10 separators.

    <details> <summary>optimize Split()</summary>

     template <typename T = Span<const char>>
     std::vector<T> Split(const Span<const char>& sp, std::string_view separators)
     {
    +    std::bitset<256> m;
    +    for (const auto& sep : separators) {
    +        m[static_cast<unsigned char>(sep)] = true;
    +    }
         std::vector<T> ret;
         auto it = sp.begin();
         auto start = it;
         while (it != sp.end()) {
    -        if (separators.find(*it) != std::string::npos) {
    +        if (m[static_cast<unsigned char>(*it)]) {
                 ret.emplace_back(start, it);
                 start = it + 1;
             }
             ++it;
         }
         ret.emplace_back(start, it);
    

    </details>

    <details> <summary>benchmark</summary>

    static void Split(benchmark::Bench& bench)
    {
        bench.run([&] {
            (void)spanparsing::Split(
                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                "bcdefghijk"
            );
        });
    }
    
    BENCHMARK(Split);
    

    </details>

    Feel free to ignore.

  27. vasild commented at 7:16 AM on May 5, 2022: member

    ACK f849e63bad963b8717d4bc45efdad9b08567a36e

  28. kittywhiskers referenced this in commit 3062f2fb3d on Dec 21, 2022
  29. kittywhiskers referenced this in commit 83eb1071eb on Dec 26, 2022
  30. kittywhiskers referenced this in commit b4bbf748ca on Jan 3, 2023
  31. kittywhiskers referenced this in commit 488510bb49 on Jan 13, 2023
  32. kittywhiskers referenced this in commit 3d0fe7e3e4 on Jan 15, 2023
  33. kittywhiskers referenced this in commit 43d657f321 on Jan 19, 2023
  34. PastaPastaPasta referenced this in commit 2d7a448fdc on Jan 19, 2023
  35. DrahtBot locked this on May 5, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me