Prepare string and net utils for future HTTP operations #34242

pull pinheadmz wants to merge 6 commits into bitcoin:master from pinheadmz:http-utils changing 11 files +369 −17
  1. pinheadmz commented at 6:11 pm on January 9, 2026: member

    This is a component of removing libevent as a dependency of the project. It is the first six commits of #32061 and provides a string-parsing utility (LineReader) that is also consumed by #34158.

    These are the functions that are added / updated for HTTP and Torcontrol:

    • GetBindAddress(): Given a socket, provides the bound address as a CService. Currently used by p2p but moved from net to netbase so other modules can call it.
    • ToIntegral(): Already used to parse numbers from strings, added new argument base = 10 so it can also be used to parse hexadecimal integers. HTTP chunked transfer-encoding uses hex-encoded integers to specify payload size: https://datatracker.ietf.org/doc/html/rfc7230.html#section-4.1
    • AsciiCaseInsensitive comparators: Needed to store HTTP headers in an unordered_map. Headers are key-value pairs that are parsed with case-insensitive keys: https://httpwg.org/specs/rfc9110.html#rfc.section.5.1
    • FormatRFC1123DateTime(): The required datetime format for HTTP headers (e.g. Fri, 31 May 2024 19:18:04 GMT)
    • LineReader: Fields in HTTP requests are newline-terminated. This struct is given an input buffer and provides methods to read lines as strings.
  2. DrahtBot commented at 6:11 pm on January 9, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34242.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK vasild, furszy, fjahr

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34158 (torcontrol: Remove libevent usage by fjahr)
    • #32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
    • #31672 (rpc: add cpu_load to getpeerinfo by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/netbase.cpp:956 in 67ba7c9302 outdated
    951+CService GetBindAddress(const Sock& sock)
    952+{
    953+    CService addr_bind;
    954+    struct sockaddr_storage sockaddr_bind;
    955+    socklen_t sockaddr_bind_len = sizeof(sockaddr_bind);
    956+    if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) {
    


    furszy commented at 6:22 pm on January 9, 2026:

    GetSockName() currently forwards the raw getsockname() return value (0 on success), which forces callers to use an inverted check (!GetSockName(...)).

    Can we normalize it so it returns true on success?

    0bool Sock::GetSockName(sockaddr* name, socklen_t* name_len) const
    1{
    2    return getsockname(m_socket, name, name_len) == 0;
    3}
    

    It would make the code slightly more readable and maintainable.


    pinheadmz commented at 6:47 pm on January 9, 2026:
    I agree with you, but this might be better addressed in a separate PR. The entire Sock class wraps POSIX socket management syscalls that return 0 on success (connect, bind, listen, close, etc…)

    vasild commented at 12:43 pm on January 12, 2026:

    The idea behind the Sock methods is that they are mockable versions of the OS functions. Zero learning curve for newcomers (assuming familiarity with the OS interface). Sock::Foo(x, y, z) is equivalent to calling the OS’s foo(x, y, z).

    inverted check

    the function returns an integer error code, not a boolean. A check like if (foo() == 0) looks better (less “inverted”).


    furszy commented at 4:33 pm on January 12, 2026:
    I see. Thanks @vasild. It would be nice to have more documentation on the return codes then. And +1 on the == 0 suggestion.

    pinheadmz commented at 7:59 pm on January 12, 2026:
    This line gets “modernized” in the next commit, including a comment and == 0, so I think it’s cool for now.

    vasild commented at 5:26 pm on January 13, 2026:

    “returns an integer error code” – oh, I meant this:

    https://www.man7.org/linux/man-pages/man2/getsockname.2.html#RETURN_VALUE

    RETURN VALUE On success, zero is returned. On error, -1 is returned, and errno is set to indicate the error.

    or the check could be if (foo() != -1).

  4. fjahr commented at 2:32 pm on January 10, 2026: contributor

    ACK ea993ff2af1faf863c106f855d3722b29a1f4483

    I reviewed the same 6 commits as part of #32061 and all my feedback from there has been addressed.

  5. in src/netbase.cpp:957 in ea993ff2af
    952+{
    953+    CService addr_bind;
    954+    sockaddr_storage storage;
    955+    socklen_t len = sizeof(storage);
    956+
    957+    auto sa = static_cast<sockaddr*>(static_cast<void*>(&storage));
    


    vasild commented at 12:48 pm on January 12, 2026:

    The inner cast is unnecessary?

    0    auto sa = static_cast<sockaddr*>(&storage);
    

    pinheadmz commented at 8:33 pm on January 12, 2026:
    This works with a reinterpret_cast because sockaddr_storage and sockaddr are technically unrelated types even though they overlap… I’ll clean it up, thanks. I think I had static_cast there before because of hints from Sonarcloud on corecheck.dev.
  6. in src/util/string.cpp:42 in ea993ff2af
    37+        // If the character we just consumed gives us a line length greater
    38+        // than max_read, and we are not at the end of the line (or buffer) yet,
    39+        // that means the line we are currently reading is too long, and we throw.
    40+        if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
    41+    }
    42+    const std::string_view untrimmed_line(reinterpret_cast<const char *>(std::to_address(line_start)), count);
    


    vasild commented at 1:03 pm on January 12, 2026:
    0    const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
    

    pinheadmz commented at 3:34 pm on January 13, 2026:
    taken, thanks
  7. vasild approved
  8. vasild commented at 1:05 pm on January 12, 2026: contributor
    ACK ea993ff2af1faf863c106f855d3722b29a1f4483
  9. in src/util/time.cpp:140 in 9423d8ae48 outdated
    135+    std::string_view month{months.at(unsigned{ymd.month()} - 1)};
    136+    const std::chrono::hh_mm_ss hms{secs - days};
    137+    // examples: Mon, 27 Jul 2009 12:28:53 GMT
    138+    //           Fri, 31 May 2024 19:18:04 GMT
    139+    return strprintf("%03s, %02u %03s %04i %02i:%02i:%02i GMT", weekday, unsigned{ymd.day()}, month, signed{ymd.year()}, hms.hours().count(), hms.minutes().count(), hms.seconds().count());
    140+}
    


    furszy commented at 6:32 pm on January 12, 2026:

    nit: I think C++20 has a short version of this (no extra arrays needed):

    0std::string FormatRFC1123DateTime(int64_t time)
    1{
    2    if (time < -62167219200 || 253402300799 < time) {
    3        // RFC7231 mandates 4-digit year, so only support years 0 to 9999
    4        return "";
    5    }
    6    const std::chrono::sys_seconds secs{std::chrono::seconds{time}};
    7    // Example: "Mon, 27 Jul 2009 12:28:53 GMT"
    8    return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
    9}
    

    maybe you already tried it and it doesn’t work on all our supported platforms?


    vasild commented at 5:38 pm on January 13, 2026:

    This should be locale-independent. I think it is:

    https://en.cppreference.com/w/cpp/chrono/system_clock/formatter.html#Format_specification

    the default “C” locale if L is not present in the format specification


    vasild commented at 6:04 pm on January 13, 2026:

    https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105398?pr=34242#step:11:1480

    0/home/admin/actions-runner/_work/_temp/src/util/time.cpp:129:17: error: no member named 'format' in namespace 'std'
    1  129 |     return std::format("{:%a, %d %b %Y %H:%M:%S} GMT", secs);
    2      |            ~~~~~^
    

    :cry:


    pinheadmz commented at 6:14 pm on January 13, 2026:
    Indeed, will revert on next push.

    furszy commented at 8:21 pm on January 13, 2026:
    too good to be true
  10. in src/util/string.cpp:36 in ea993ff2af
    31+        ++it;
    32+        ++count;
    33+        // If the character we just consumed was \n, the line is terminated
    34+        if (c == '\n') break;
    35+        // If we are at the end of the incoming buffer, the line is terminated
    36+        if (it == end) break;
    


    furszy commented at 7:06 pm on January 12, 2026:

    In ea993ff2af1faf863c106f855d3722b29a1f4483:

    Seems you can remove this last line. You already do a while (it != end)


    pinheadmz commented at 3:38 pm on January 13, 2026:
    Youre right thanks, its gone.

    vasild commented at 6:02 pm on January 13, 2026:

    Note that it is incremented inside the body of the loop, after the check in while (it != end). So after the increment it could be equal to end.

    0    const std::vector<std::byte> input{StringToBuffer("ab")};
    1    LineReader reader(input, /*max_read=*/1);
    2    std::optional<std::string> line{reader.ReadLine()};
    

    Before this change the above snippet would have set line to ab and after the change it throws std::runtime_error: max_read exceeded by LineReader. I guess the new behavior is the correct? Maybe worth adding such a test.


    pinheadmz commented at 6:41 pm on January 13, 2026:
    Thanks @vasild I added this as a test case, and am scratching my head how it slipped through. Removing the line is not only cleaner code, it is a bug fix. LineReader should not return a line longer than max_read
  11. in src/util/string.cpp:34 in ea993ff2af
    29+        // Read a character from the incoming buffer and increment the iterator
    30+        auto c = static_cast<char>(*it);
    31+        ++it;
    32+        ++count;
    33+        // If the character we just consumed was \n, the line is terminated
    34+        if (c == '\n') break;
    


    furszy commented at 7:09 pm on January 12, 2026:
    tiny nit: could make the termination character customizable.

    pinheadmz commented at 3:34 pm on January 13, 2026:
    Going to leave this for now, next dev who needs a custom EOL character can add it in the future ;-)
  12. furszy commented at 7:10 pm on January 12, 2026: member
    utACK ea993ff2af1faf863c106f855d3722b29a1f4483
  13. Make GetBindAddress() callable from outside net.cpp
    The function logic is moved-only from net.cpp to netbase.cpp
    and redeclared (as non-static) in netbase.h
    a0ca851d26
  14. Modernize GetBindAddress()
    Replace the C-style casting with C++ reinterpret_cast
    e93778dd82
  15. string: add `base` argument for ToIntegral to operate on hexadecimal 0717e882b1
  16. string: add AsciiCaseInsensitive{KeyEqual, Hash} for unordered map
    https://httpwg.org/specs/rfc9110.html#rfc.section.5.1
    Field names in HTTP headers are case-insensitive. These
    structs will be used in the headers map to search by key.
    
    In libevent field names are also converted to lowercase for comparison:
      evhttp_find_header()
      evutil_ascii_strcasecmp()
      EVUTIL_TOLOWER_()
    bbb9d95d88
  17. pinheadmz force-pushed on Jan 13, 2026
  18. time: implement and test RFC1123 timestamp string
    HTTP 1.1 responses require a timestamp header with a format
    specified (currently) by:
    https://datatracker.ietf.org/doc/html/rfc9110#section-5.6.7
    
    This specific format is defined in RFC1123:
    https://www.rfc-editor.org/rfc/rfc1123#page-55
    
    The libevent implementation can be referenced in evutil_time.c
    evutil_date_rfc1123()
    36449a1720
  19. DrahtBot added the label CI failed on Jan 13, 2026
  20. DrahtBot commented at 6:22 pm on January 13, 2026: contributor

    🚧 At least one of the CI tasks failed. Task macOS-cross to arm64: https://github.com/bitcoin/bitcoin/actions/runs/20963499370/job/60247105377 LLM reason (✨ experimental): Compilation failed: std::format is unavailable in the used standard library (missing C++20 std::format support).

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  21. pinheadmz force-pushed on Jan 13, 2026
  22. in src/test/util_string_tests.cpp:201 in 1d248166ef outdated
    196+        BOOST_CHECK(line2);
    197+        BOOST_CHECK(line3);
    198+        BOOST_CHECK(!line4);
    199+        BOOST_CHECK_EQUAL(line1.value(), "once upon a time");
    200+        BOOST_CHECK_EQUAL(line2.value(), "there was a dog");
    201+        BOOST_CHECK_EQUAL(line3.value(), "who liked food");
    


    furszy commented at 8:56 pm on January 13, 2026:

    In 1d248166efe6cba330cfb14e652289f68c894652:

    Tiny, non-blocking, idea to see if it makes sense to simplify the LineReader class. (you can reinterpret std::byte to char in order to make it work with std::istringstream)

     0// Check three lines terminated by \n, \r\n, and end of buffer, trimming whitespace
     1const std::string input = "once upon a time\n there was a dog \r\nwho liked food";
     2std::istringstream stream(input);
     3
     4std::string line1, line2, line3, line4;
     5std::getline(stream, line1);
     6std::getline(stream, line2);
     7std::getline(stream, line3);
     8BOOST_CHECK(!std::getline(stream, line4));
     9
    10BOOST_CHECK_EQUAL(line1, "once upon a time");
    11BOOST_CHECK_EQUAL(line2, "there was a dog \r"); // TODO: trim whitespaces at the end.
    12BOOST_CHECK_EQUAL(line3, "who liked food");
    
  23. DrahtBot removed the label CI failed on Jan 13, 2026
  24. vasild approved
  25. vasild commented at 7:55 am on January 14, 2026: contributor
    ACK 1d248166efe6cba330cfb14e652289f68c894652
  26. DrahtBot requested review from fjahr on Jan 14, 2026
  27. DrahtBot requested review from furszy on Jan 14, 2026
  28. pinheadmz commented at 2:50 pm on January 14, 2026: member
    switching to draft for today: Im going to see if we can just use std::istringstream and if so, remove the LineReader commit
  29. pinheadmz marked this as a draft on Jan 14, 2026
  30. pinheadmz commented at 3:38 pm on January 14, 2026: member

    Ok did some learning about std::istringstream and I think the current implementation is better, and not reinventing a wheel from the standard library. Will convert the PR to ready for review.

    Here’s my reasoning (up for debate, @furszy ):

    Raw data is copied as std::byte from the network socket into the receive buffer. LineReader is my boundary between the raw data and text-based HTTP stuff, it operates on a span of that data and returns valid, allocated strings. It only allocates if the line is <= max_read, and only once, after reading from the span.

    std::istringstream can’t operate on std::byte (only on strings) so that would either require one big copy in the beginning, or moving the “boundary” lower where we could read as char from the socket. Even if I made that change, std::istringstream::getline() appends one character at a time as it reads the input, expanding the output string. Only when getline() is done would it be possible to check the line against max_read.

  31. pinheadmz marked this as ready for review on Jan 14, 2026
  32. fjahr commented at 2:03 pm on January 15, 2026: contributor
    re-ACK 1d248166efe6cba330cfb14e652289f68c894652
  33. in src/util/string.h:291 in 1d248166ef outdated
    283+     * and advances iterator. May exceed max_read but will not read past end of buffer.
    284+     * @param[in]   len     The number of bytes to read from the buffer
    285+     * @returns a string of the expected length.
    286+     * @throws a std::runtime_error if there is not enough data in the buffer.
    287+     */
    288+    std::string ReadLength(size_t len);
    


    furszy commented at 2:59 pm on January 15, 2026:
    tiny nit: Would just call this Read(size_t len).

    pinheadmz commented at 6:39 pm on January 15, 2026:
    I’m gonna push back on this, I think in a class called “Line Reader” it’s important to distinguish explicitly between reading a line, and reading a number of bytes. A method called Read() seems ambiguous to me.
  34. in src/util/string.h:293 in 1d248166ef
    288+    std::string ReadLength(size_t len);
    289+
    290+    /**
    291+     * Returns remaining size of bytes in buffer
    292+     */
    293+    size_t Left() const;
    


    furszy commented at 3:00 pm on January 15, 2026:
    nano nit: Maybe Left() is not the best term. Could use Remaining()?. But at this point it doesn’t matter much.

    pinheadmz commented at 6:41 pm on January 15, 2026:
    Ok sure, taking this. 🙏
  35. in src/util/string.cpp:54 in 1d248166ef
    39+        if (count > max_read) throw std::runtime_error("max_read exceeded by LineReader");
    40+    }
    41+    const std::string_view untrimmed_line(reinterpret_cast<const char*>(std::to_address(line_start)), count);
    42+    const std::string_view line = TrimStringView(untrimmed_line); // delete trailing \r and/or \n
    43+    return std::string(line);
    44+}
    


    furszy commented at 3:21 pm on January 15, 2026:

    Since we advance the class member iterator it while scanning the span, throwing a max_read exception drops the bytes already read. We should advance it only at the very end to avoid loosing them (a test for this case would be good).

    Also, small extra nit: what if there only char is a \n? The returned string will be empty?

    As a separate topic, could use std::memchr to find the first instance of \n which would avoid the loop (haven’t fully tried it but seems to be an option as well - dropping it in case you want to investigate it, all good if not).


    pinheadmz commented at 6:37 pm on January 15, 2026:

    Great catch thanks. I now reset the iterator just before throwing. Tested by catching the “exceeded” error then using a more granular operation (ReadLength) to get the remainder of bytes from the buffer. The test fails on the original code because ReadLength returned bytes form whatever point it threw the error instead of from the last point of success.

    Also added test cases for what makes the most sense to me when using the tool:

    • If the buffer is only "\n" then you get one empty "" line, nullopt after that
    • If the buffer is completely empty "" you get nullopt (not an error)

    re: memchr, I played with that a bit, then I also tried out std::find and felt like neither approach produced clean code without also dropping a little efficiency. So I still think the long implementation here is best for our precise use-case.

  36. DrahtBot requested review from furszy on Jan 15, 2026
  37. pinheadmz force-pushed on Jan 15, 2026
  38. pinheadmz commented at 7:12 pm on January 15, 2026: member

    push to 9b3612beaf76189aa50ab66b334a3b258a563f95

    Address feedback from @furszy about LineReader, fix an edge case after an error is thrown and add more tests.

  39. in src/util/string.h:271 in 9b3612beaf outdated
    266+    const std::span<const std::byte>::iterator start;
    267+    const std::span<const std::byte>::iterator end;
    268+    const size_t max_read;
    269+    std::span<const std::byte>::iterator it;
    270+
    271+    explicit LineReader(std::span<const std::byte> buffer, size_t max_read);
    


    furszy commented at 7:57 pm on January 15, 2026:
    nit: As we can read pass max_read limit if max_read + 1 is a jump line, I would rename the arg and member to max_output_length.
  40. furszy commented at 7:58 pm on January 15, 2026: member
    utACK 9b3612beaf76189aa50ab66b334a3b258a563f95
  41. DrahtBot requested review from vasild on Jan 15, 2026
  42. fjahr commented at 9:41 pm on January 15, 2026: contributor
    re-ACK 9b3612beaf76189aa50ab66b334a3b258a563f95
  43. in src/util/strencodings.h:330 in 9b3612beaf outdated
    325@@ -324,6 +326,15 @@ std::string Capitalize(std::string str);
    326  */
    327 std::optional<uint64_t> ParseByteUnits(std::string_view str, ByteUnit default_multiplier);
    328 
    329+/**
    330+ * Returns a byte vector filled with data from a string. Used to test string-
    


    fjahr commented at 4:05 pm on January 19, 2026:

    super-nit complaint by drahtbot on my PR where I cherry-picked this:

    0string- encoded -> string-encoded [The word is split across the line with a stray space, making "string- encoded" incorrect; use "string-encoded" (or "string encoded") for clarity.]
    

    pinheadmz commented at 6:07 pm on January 20, 2026:
    fixed, thanks
  44. pinheadmz force-pushed on Jan 20, 2026
  45. pinheadmz commented at 6:05 pm on January 20, 2026: member

    push to 1a057b3f6a8c3b4f787f99c2a478c90addc6063f

    Remove the functional equivalence of end of buffer and \n, addressing the discussion thread in #34158 (review). I rebased both the torcontrol PR and the HTTP server PR on this new updated LineReader and confirmed its the correct behavior.

  46. string: add LineReader
    This is a helper struct to parse HTTP messages from data in buffers
    from sockets. HTTP messages begin with headers which are
    CRLF-terminated lines (\n or \r\n) followed by an arbitrary amount of
    body data. Whitespace is trimmed from the field lines but not the body.
    
    https://httpwg.org/specs/rfc9110.html#rfc.section.5
    a7bde7143e
  47. pinheadmz force-pushed on Jan 20, 2026
  48. pinheadmz commented at 6:08 pm on January 20, 2026: member

    push to a7bde7143ea7d943ce4aa0df8ef7979673b1aa10

    Fix DrahtBot lint warning: #34242 (review)

  49. DrahtBot added the label CI failed on Jan 20, 2026
  50. DrahtBot removed the label CI failed on Jan 20, 2026
  51. in src/util/strencodings.h:336 in a7bde7143e
    331+ * data from a socket like HTTP headers.
    332+ *
    333+ * @param[in] str                  the string to convert into bytes
    334+ * @returns                        byte vector
    335+ */
    336+std::vector<std::byte> StringToBuffer(const std::string& str);
    


    furszy commented at 9:47 pm on January 20, 2026:
    Based on the function’s description, this is a test-only function. So you could move it to util_string_tests.cpp?
  52. in src/util/string.cpp:54 in a7bde7143e
    49+    // End of buffer reached without finding a \n or exceeding max_read.
    50+    // Reset the iterator so the rest of the buffer can be read granularly
    51+    // with ReadLength() and return null to indicate a line was not found.
    52+    it = line_start;
    53+    return std::nullopt;
    54+}
    


    furszy commented at 10:05 pm on January 20, 2026:

    Maybe could:

     0std::optional<std::string> LineReader::ReadLine()
     1{
     2    if (it == end) return std::nullopt;
     3
     4    // Limit search to max_read bytes
     5    const auto limit = std::distance(it, end) > static_cast<long>(max_read) ? std::next(it, max_read) : end;
     6    const auto line = std::find(it, limit, std::byte{'\n'});
     7
     8    if (line == limit) {
     9        if (limit != end) // scanned max_read bytes without line
    10            throw std::runtime_error("max_read exceeded by LineReader");
    11        return std::nullopt; // reached end of buffer without line
    12    }
    13
    14    const std::string_view untrimmed{
    15        reinterpret_cast<const char*>(std::to_address(it)),
    16        static_cast<size_t>(std::distance(it, line) + 1)
    17    };
    18
    19    it = std::next(line); // advance iterator
    20    return std::string(TrimStringView(untrimmed));
    21}
    

    The only change from your code is that this version has a hard limit on max_read. Which in my view is much simpler to reason about. The max_read + 1 retrieval when the “past the limit char” is \n seems confusing to me. In networking code, the usual convention is to stop at a fixed size, without caring about what comes next.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-21 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me