As a followup of #22953, this removes the remaining occurrences of boost::split and replaces them with our own SplitString. To be able to do so, this extends the function spanparsing::Split to work with multiple separators. Finally this removes 3 more files from lint-includes.py.
refactor: replace remaining boost::split with SplitString #25057
pull martinus wants to merge 4 commits into bitcoin:master from martinus:2022-04-boost-split-exorcism changing 7 files +49 −18-
martinus commented at 5:19 AM on May 3, 2022: contributor
- martinus renamed this:
refactor: replace remaining boost::split with StringSplit
refactor: replace remaining boost::split with SplitString
on May 3, 2022 - DrahtBot added the label Refactoring on May 3, 2022
- DrahtBot added the label RPC/REST/ZMQ on May 3, 2022
- DrahtBot added the label Utils/log/libs on May 3, 2022
-
in src/httprpc.cpp:277 in 8976d15bb0 outdated
273 | @@ -276,8 +274,10 @@ static bool InitRPCAuthentication() 274 | std::set<std::string>& whitelist = g_rpc_whitelist[strUser]; 275 | if (pos != std::string::npos) { 276 | std::string strWhitelist = strRPCWhitelist.substr(pos + 1); 277 | - std::set<std::string> new_whitelist; 278 | - boost::split(new_whitelist, strWhitelist, boost::is_any_of(", ")); 279 | + std::vector<std::string> whitelistSplit = SplitString(strWhitelist, ", ");
MarcoFalke commented at 6:02 AM on May 3, 2022:std::vector<std::string> whitelist_split = SplitString(strWhitelist, ", ");nit: for new code snake_case ?
in src/util/string.h:25 in 8976d15bb0 outdated
20 | @@ -21,6 +21,11 @@ 21 | return spanparsing::Split<std::string>(str, sep); 22 | } 23 | 24 | +[[nodiscard]] inline std::vector<std::string> SplitString(std::string_view str, std::string_view separators)
MarcoFalke commented at 6:02 AM on May 3, 2022:nit: Maybe also fuzz this? Untested diff:
diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp index e6064d19b6..94399faf04 100644 --- a/src/test/fuzz/string.cpp +++ b/src/test/fuzz/string.cpp @@ -224,7 +224,12 @@ FUZZ_TARGET(string) int64_t amount_out; (void)ParseFixedPoint(random_string_1, fuzzed_data_provider.ConsumeIntegralInRange<int>(0, 1024), &amount_out); } - (void)SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>()); + { + const auto single_split{SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>())}; + assert(single_split.size() >= 1); + const auto any_split{SplitString(random_string_1, random_string_2)}; + assert(any_split.size() >= 1); + } { (void)Untranslated(random_string_1); const bilingual_str bs1{random_string_1, random_string_2};
MarcoFalke commented at 6:03 AM on May 3, 2022:nit: Add include for string_view
martinus commented at 7:29 PM on May 3, 2022:I have no experience with fuzzing, would it make sense to add stronger asserts? E.g. check that the sum of all split string length plus number of splits minus 1 should give exactly the input string length:
const auto single_split{SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>())}; size_t sum = 0; for (auto const& split : single_split) { sum += split.size(); } assert(sum + single_split.size() - 1 == random_string_1.size()); const auto any_split{SplitString(random_string_1, random_string_2)}; sum = 0; for (auto const& split : any_split) { sum += split.size(); } assert(sum + any_split.size() - 1 == random_string_1.size());
MarcoFalke commented at 7:38 AM on May 4, 2022:Sure, just like with other tests, there is no limit to how much can be tested
MarcoFalke approvedMarcoFalke commented at 6:03 AM on May 3, 2022: memberLGTM. I was also working on this yesterday, but I like your version better.
fanquake commented at 8:29 AM on May 3, 2022: memberConcept ACK
vincenzopalazzo commented at 8:22 PM on May 3, 2022: noneConcept ACK
Extend Split to work with multiple separators b7ab9db5450d7efcdf75core_read: Replace boost::split with SplitString
Note that `SplitString` doesn't support token compression, but in this case it does not matter as empty strings are already skipped anyways. Also removes split.hpp and classification.hpp from expected includes
d1a9850102http: replace boost::split with SplitString
Also removes boost/algorithm/string.hpp from expected includes
f849e63badfuzz: SplitString with multiple separators
Co-authored-by: MarcoFalke <falke.marco@gmail.com>
martinus force-pushed on May 4, 2022martinus commented at 5:56 AM on May 4, 2022: contributorRebased with fixing nits and adding fuzzing
MarcoFalke approvedfanquake requested review from theStack on May 4, 2022theStack approvedtheStack commented at 4:15 PM on May 4, 2022: memberCode-review ACK f849e63bad963b8717d4bc45efdad9b08567a36e
I tend to think that the single-character interface could be removed completely (especially since it uses the multi-separators version internally in the end anyways), but on the other hand this would make the diff larger as lots of calling instances had to be changed, so NBD.
Finally this removes 3 more files from lint-includes.py
🎉 🎉 🎉
fanquake merged this on May 4, 2022fanquake closed this on May 4, 2022martinus deleted the branch on May 4, 2022sidhujag referenced this in commit ed600cecaa on May 4, 2022in src/util/spanparsing.h:54 in f849e63bad
52 | std::vector<T> ret; 53 | auto it = sp.begin(); 54 | auto start = it; 55 | while (it != sp.end()) { 56 | - if (*it == sep) { 57 | + if (separators.find(*it) != std::string::npos) {
vasild commented at 7:15 AM on May 5, 2022:This is doing full scan of the iterators for each input character, thus the overall complexity is
O(input_length * number_of_separators). It can be optimized toO(input_length + number_of_separators)if a map of the separators is built beforehand that would allow checking whether a given character is a separator in a constant time.This is mostly theoretical because it would matter if lots of separators are used and currently we only use a few. Anyway, out of curiosity, I tried that and compared the performance - in the current variant it is 734 ns/op while with the map it is 130 ns/op for 10 separators.
<details> <summary>optimize Split()</summary>
template <typename T = Span<const char>> std::vector<T> Split(const Span<const char>& sp, std::string_view separators) { + std::bitset<256> m; + for (const auto& sep : separators) { + m[static_cast<unsigned char>(sep)] = true; + } std::vector<T> ret; auto it = sp.begin(); auto start = it; while (it != sp.end()) { - if (separators.find(*it) != std::string::npos) { + if (m[static_cast<unsigned char>(*it)]) { ret.emplace_back(start, it); start = it + 1; } ++it; } ret.emplace_back(start, it);</details>
<details> <summary>benchmark</summary>
static void Split(benchmark::Bench& bench) { bench.run([&] { (void)spanparsing::Split( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "bcdefghijk" ); }); } BENCHMARK(Split);</details>
Feel free to ignore.
vasild commented at 7:16 AM on May 5, 2022: memberACK f849e63bad963b8717d4bc45efdad9b08567a36e
kittywhiskers referenced this in commit 3062f2fb3d on Dec 21, 2022kittywhiskers referenced this in commit 83eb1071eb on Dec 26, 2022kittywhiskers referenced this in commit b4bbf748ca on Jan 3, 2023kittywhiskers referenced this in commit 488510bb49 on Jan 13, 2023kittywhiskers referenced this in commit 3d0fe7e3e4 on Jan 15, 2023kittywhiskers referenced this in commit 43d657f321 on Jan 19, 2023PastaPastaPasta referenced this in commit 2d7a448fdc on Jan 19, 2023DrahtBot locked this on May 5, 2023
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:13 UTC
More mirrored repositories can be found on mirror.b10c.me