tests: Expand HTTP coverage to assert libevent behavior #32408

pull pinheadmz wants to merge 1 commits into bitcoin:master from pinheadmz:http-tests changing 3 files +139 −0
  1. pinheadmz commented at 6:56 pm on May 2, 2025: member

    These commits are cherry-picked from #32061 and part of a project to remove libevent.

    This PR only adds functional tests to interface_http to cover some HTTP server behaviors we inherit from libevent, in order to maintain those behaviors when we replace libevent with our own HTTP server.

    1. Pipelining: The server must respond to requests from a client in the order in which they were received RFC 7230 6.3.2
    2. -rpcservertimeout config option which sets the amount of time the server will keep an idle client connection alive
    3. “Chunked” Transfer-Encoding: Allows a client to send a request in pieces, without the Content-Length header RFC 7230 4.1
  2. DrahtBot commented at 6:56 pm on May 2, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32408.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild
    Concept ACK fjahr
    Stale ACK polespinasa

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Tests on May 2, 2025
  4. DrahtBot added the label CI failed on May 2, 2025
  5. in test/functional/interface_http.py:223 in 39c04c7570 outdated
    205+        # Wait for response, but expect a timeout disconnection after 1 second
    206+        start = time.time()
    207+        res = sock.recv(1024)
    208+        stop = time.time()
    209+        assert res == b""
    210+        assert stop - start >= 1
    


    maflcko commented at 12:32 pm on May 4, 2025:

    https://cirrus-ci.com/task/5276299050090496?logs=ci#L3261:

    0[15:13:21.244]  node2 2025-05-02T19:13:19.879966Z [httpworker.4] [rpc/request.cpp:241] [parse] [rpc] ThreadRPCServer method=getmempoolinfo user=__cookie__ 
    1[15:13:21.244]  test  2025-05-02T19:13:19.882000Z TestFramework.node2 (DEBUG): RPC successfully started 
    2[15:13:21.244]  test  2025-05-02T19:13:20.885000Z TestFramework (ERROR): Assertion failed 
    3[15:13:21.244]                                    Traceback (most recent call last):
    4[15:13:21.244]                                      File "/ci_container_base/test/functional/test_framework/test_framework.py", line 183, in main
    5[15:13:21.244]                                        self.run_test()
    6[15:13:21.244]                                      File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/interface_http.py", line 210, in run_test
    7[15:13:21.244]                                        assert stop - start >= 1
    8[15:13:21.244]                                               ^^^^^^^^^^^^^^^^^
    9[15:13:21.244]                                    AssertionError
    

    pinheadmz commented at 1:00 pm on May 4, 2025:
    yeah i see it – gotta figure out why sock.recv(1024) is non blocking on the two failing platforms
  6. in test/functional/interface_http.py:206 in 39c04c7570 outdated
    201+        conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
    202+        conn.connect()
    203+        sock = conn.sock
    204+        sock.sendall(http_request.encode("utf-8"))
    205+        # Wait for response, but expect a timeout disconnection after 1 second
    206+        start = time.time()
    


    polespinasa commented at 2:54 pm on May 7, 2025:

    If you move the start = time.time() before sending the test passes correctly.

    CI was failing locally on my laptop. After changing this it does work.

    Note: it was just working on some executions


    pinheadmz commented at 8:38 pm on May 8, 2025:

    I noticed in the libevent’s test they don’t actually check that server waited at all, just that the connection closed within a few seconds of the configured time out. So I went that route, but also added a tiny minimum check just to make sure the server isn’t closing immediately. And this gives us better test reliability.

    So the new test is: -rpcservertimeout=2, we send a request, we time how long it takes to close, and that duration is expected to be between 1 and 4 seconds.


    polespinasa commented at 7:26 am on May 9, 2025:
    If the timeout is set to 2 wouldn’t make more sense to test between 2 and 4 seconds? Can it be less than 2 seconds?

    pinheadmz commented at 12:38 pm on May 9, 2025:

    It can be, and that was why the 1 second timeout check kept failing. I tried a few implementations of the test and it’s just impossible to reliably start the test timer in sync with the HTTP server. So sometimes it starts late and you end up with duration < expected.

    The point of the test is to ensure that the server disconnects idle clients, I think it does that even though the exact time is a little fudged.


    vasild commented at 8:22 am on May 21, 2025:

    Did you try to start the timer before connecting? Like this:

     0         # Get the underlying socket from HTTP connection so we can send something unusual
     1         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
     2+        start = time.time()
     3         conn.connect()
     4         sock = conn.sock
     5         sock.sendall(http_request.encode("utf-8"))
     6         # Wait for response, but expect a timeout disconnection after 1 second
     7-        start = time.time()
     8         res = sock.recv(1024)
     9         stop = time.time()
    10         # Server disconnected with EOF
    

    It must make it such that the timer in the test is then always >= than the server timeout of 2 seconds, no?


    polespinasa commented at 12:09 pm on May 21, 2025:
    Yeah we tried that and still was not working. For more context we got a discussion on the IRC channel: https://www.erisian.com.au/bitcoin-core-dev/log-2025-05-07.html

    vasild commented at 12:44 pm on May 21, 2025:
    In that discussion @maflcko suggested to start the timer before connect and send and sliv3r__ said that before connect it worked 15 times in a row. Do I miss something?

    polespinasa commented at 12:52 pm on May 21, 2025:
    It was “randomly” failing. The 15 times in a row was moving it before http.clinet.HTTPConn.... Moving it before conn.connect failed the 6th time https://www.erisian.com.au/bitcoin-core-dev/log-2025-05-07.html#l-115

    vasild commented at 12:57 pm on May 21, 2025:

    Ok, then this should work:

     0         # Get the underlying socket from HTTP connection so we can send something unusual
     1+        start = time.time()
     2         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
     3         conn.connect()
     4         sock = conn.sock
     5         sock.sendall(http_request.encode("utf-8"))
     6         # Wait for response, but expect a timeout disconnection after 1 second
     7-        start = time.time()
     8         res = sock.recv(1024)
     9         stop = time.time()
    10         # Server disconnected with EOF
    

    pinheadmz commented at 2:25 pm on May 21, 2025:

    That should work but the reason I didn’t leave like that is I don’t feel like its effectively testing the right thing. We could start the timer at the top of the file and it would always pass, but it wouldn’t necessarily catch any regressions. The server timeout should re-start every time it receives a packet from the client, and it’s very hard to start the test timer at the right moment.

    Another approach I tried was actually setting libevent debug logs and trying to track down messages like these:

    0event_add: event: 0x10a004210 (fd 20), EV_READ   EV_TIMEOUT call 0x1053cf1bc 
    1event_add: event 0x10a004210, timeout in 1 seconds 0 useconds, call 0x1053cf1bc 
    

    Interestingly, the original version of this test always passes after switching to Sockman


    pinheadmz commented at 2:26 pm on May 21, 2025:
    All that being said, if you want me to use the patch you wrote that’s fine by me too ;-)

    vasild commented at 7:13 am on May 22, 2025:

    I don’t see this discussion as a blocker.

    I just want to get to the bottom of it. Checked the source code of http.client.HTTPConnection() - it does not open any connections, just sets some internal member variables of the HTTPConnection class. So, starting the timer before or after http.client.HTTPConnection() shouldn’t make a difference.

    conn.connect() is what makes the connection. And the timer should be started before it. Why would not the followng test the right thing?

    • start timer in the test (A)
    • connect, server timer starts here (B)
    • send
    • try to recv, connection will be closed by the server due to server timeout (C)
    • stop the timer (D)

    Time between B and C will (must!) always be less than the time between A and D. If it is not, then I would like to know why. Maybe we are doing something completely wrong.

    My worry is that CI machines are sometimes veeery slow and any “reasonable” time we set turns out to be surprise and results in intermittent test failures. Now, if doing the above can result in the time between A and D being less than 2 sec (= server timeout, B to C) and we don’t know why is that and we set to check only for >1 sec instead, then, because we do not know the cause, how can we be sure that the test timer (A to D) will not be sometimes e.g. 0.99 sec causing the test to fail?


    pinheadmz commented at 5:32 pm on May 22, 2025:

    The server timer should start after receiving the last packet from the client, because its an idle timer not a session timer. So its more correct to have:

    send, server timer starts immediately after (B)

    I tried to to nail this in https://github.com/bitcoin/bitcoin/commit/ce9847e284a361fb050f366848cdf1a38b2b933b but that also failed.

    And then, if we start the timer too early, we will always get a duration > 1 no matter what the server actually does… maybe its the initial TCP connection attempt that takes 1 second and then the server doesn’t connect, or responds immediately, etc.

    There also could be something with libevent itself making this act funny? I haven’t had any intermittency with the original test in #32061 – so maybe as part of that PR we can tighten up the test ;-)


    vasild commented at 8:15 am on May 23, 2025:

    Ok, it could be that libevent behaves in an unexpected way. In #32061 we could tighten up the test to:

    • connect
    • start timer in the test (A)
    • send, server timer starts after last packet received from the client (B)
    • try to recv, connection will be closed by the server due to server timeout (C)
    • stop the timer in the test (D)
  7. pinheadmz force-pushed on May 8, 2025
  8. pinheadmz force-pushed on May 8, 2025
  9. DrahtBot removed the label CI failed on May 8, 2025
  10. in test/functional/interface_http.py:141 in 840dd5b6eb outdated
    136+        try:
    137+            # The server should not respond to the fast, second request
    138+            # until the (very) slow first request has been handled:
    139+            res = sock.recv(1024)
    140+            assert not res
    141+        except TimeoutError:
    


    polespinasa commented at 7:16 am on May 9, 2025:
    I’m not sure about this but shouldn’t this be socket.timeout? As we’re using sock.recv and not getresponse()?

    pinheadmz commented at 1:06 pm on May 9, 2025:

    Great catch! I found this:

    exception socket.timeout A deprecated alias of TimeoutError. Changed in version 3.10: This class was made an alias of TimeoutError.

    Since Python 3.10 is minimum version required in dependencies.md (and the test passes!) I think it’s ok to leave as-is

  11. polespinasa commented at 7:22 am on May 9, 2025: contributor

    tACK 840dd5b6eb0b2c1f35a91c36acac5d97933172dc

    Maybe the three commits could be squashed?

  12. pinheadmz force-pushed on May 9, 2025
  13. pinheadmz commented at 1:10 pm on May 9, 2025: member

    Maybe the three commits could be squashed?

    Done! thanks for reviewing

  14. fjahr commented at 1:00 pm on May 19, 2025: contributor
    Concept ACK
  15. in test/functional/interface_http.py:216 in 95d47449b2 outdated
    211+        # Server disconnected within an acceptable range of time:
    212+        # not immediately, and not too far over the configured duration.
    213+        # This allows for some jitter in the test between client and server.
    214+        duration = stop - start
    215+        assert duration <= 4, f"Server disconnected too slow: {duration} > 4"
    216+        assert duration >= 1, f"Server disconnected too fast: {duration} < 1"
    


    vasild commented at 1:32 pm on May 20, 2025:

    That should be 2, I guess since the timeout is set to 2. E.g. disconnecting after 1.5 seconds is unexpected and should be treated as an error.

     0diff --git i/test/functional/interface_http.py w/test/functional/interface_http.py
     1index 4f10b55afd..9c345c30b9 100755
     2--- i/test/functional/interface_http.py
     3+++ w/test/functional/interface_http.py
     4@@ -185,13 +185,14 @@ class HTTPBasicsTest (BitcoinTestFramework):
     5         # so low on this one node, its connection will quickly timeout and get dropped by
     6         # the server. Negating this setting will force the AuthServiceProxy
     7         # for this node to create a fresh new HTTP connection for every command
     8         # called for the remainder of this test.
     9         self.nodes[2].reuse_http_connections = False
    10 
    11-        self.restart_node(2, extra_args=["-rpcservertimeout=2"])
    12+        rpcservertimeout = 2
    13+        self.restart_node(2, extra_args=[f"-rpcservertimeout={rpcservertimeout}"])
    14         # This is the amount of time the server will wait for a client to
    15         # send a complete request. Test it by sending an incomplete but
    16         # so-far otherwise well-formed HTTP request, and never finishing it.
    17 
    18         # Copied from http_incomplete_test_() in regress_http.c in libevent.
    19         # A complete request would have an additional "\r\n" at the end.
    20@@ -199,24 +200,24 @@ class HTTPBasicsTest (BitcoinTestFramework):
    21 
    22         # Get the underlying socket from HTTP connection so we can send something unusual
    23         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
    24         conn.connect()
    25         sock = conn.sock
    26         sock.sendall(http_request.encode("utf-8"))
    27-        # Wait for response, but expect a timeout disconnection after 1 second
    28+        # Wait for response, but expect a timeout disconnection after `rpcservertimeout` seconds
    29         start = time.time()
    30         res = sock.recv(1024)
    31         stop = time.time()
    32         # Server disconnected with EOF
    33         assert res == b""
    34         # Server disconnected within an acceptable range of time:
    35         # not immediately, and not too far over the configured duration.
    36         # This allows for some jitter in the test between client and server.
    37         duration = stop - start
    38         assert duration <= 4, f"Server disconnected too slow: {duration} > 4"
    39-        assert duration >= 1, f"Server disconnected too fast: {duration} < 1"
    40+        assert duration >= rpcservertimeout, f"Server disconnected too fast: {duration} < {rpcservertimeout}"
    41         # The connection is definitely closed.
    42         try:
    43             conn.request('GET', '/')
    44             conn.getresponse()
    45         #       macos/linux           windows
    46         except (ConnectionResetError, ConnectionAbortedError):
    

    pinheadmz commented at 2:34 pm on May 20, 2025:

    The problem with this is the test becomes flakey because the timer may not start in sync with the server, so it may record less time than the server actually waited: #32408 (review)

    note also that libevent doesn’t even try to test the lower bound: #32408 (review)


    vasild commented at 8:23 am on May 21, 2025:
    continued the discussion in #32408 (review)
  16. in test/functional/interface_http.py:132 in 95d47449b2 outdated
    127+        req2 += body2
    128+        # Get the underlying socket from HTTP connection so we can send something unusual
    129+        conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
    130+        conn.connect()
    131+        sock = conn.sock
    132+        sock.settimeout(1)
    


    vasild commented at 1:34 pm on May 20, 2025:

    Repeat comment #32061 (review) here:

    1 second timeout to send or receive seems more than enough for local testing on a dev machine. However, CI virtual machines sometimes are surprisingly slow. To avoid unnecessary test failures maybe it would be better to have this be 5 or 10 seconds for the sendall() calls and then set to 1 for the recv() call which we expect to timeout.


    pinheadmz commented at 2:40 pm on May 21, 2025:

    Ok I’ll change the value here to 5 – meaning the sock.recv() on L139 will always take that much time. That will, however, protect us against false positives on a slow CI, if the server regressed and actually responded to the requests out of order, but took more than 1 second to do so.

    After we generate a block, the sock.recv() on L150 should execute very quickly, but will allow up to 5 seconds for a slow CI to respond to the two RPC requests. That applies to the sendall() as well.

  17. in test/functional/interface_http.py:140 in 95d47449b2 outdated
    135+        sock.sendall(req2.encode("utf-8"))
    136+        try:
    137+            # The server should not respond to the fast, second request
    138+            # until the (very) slow first request has been handled:
    139+            res = sock.recv(1024)
    140+            assert not res
    


    vasild commented at 1:35 pm on May 20, 2025:

    Repeat comment #32061 (review) here:

    Shouldn’t this be assert False? Here the expectation is that the recv() will throw an exception due to timeout.

    https://docs.python.org/3/library/socket.html#socket.socket.recv

    A returned empty bytes object indicates that the client has disconnected.

    An “empty bytes object” will not trigger the assert assert not res but if that happens (= disconnect) then the test should fail.

    suggestion:

    0-            assert not res
    1+            assert False
    

    pinheadmz commented at 2:41 pm on May 21, 2025:
    thanks, done
  18. in test/functional/interface_http.py:223 in 95d47449b2 outdated
    218+        try:
    219+            conn.request('GET', '/')
    220+            conn.getresponse()
    221+        #       macos/linux           windows
    222+        except (ConnectionResetError, ConnectionAbortedError):
    223+            pass
    


    vasild commented at 1:36 pm on May 20, 2025:

    Repeat comment #32061 (review) here:

    This will also pass if no exception is thrown. Either add assert False after line 214 or have a boolean variable to false before the try and set it to true inside except and assert that it is true afterwards.


    pinheadmz commented at 2:46 pm on May 21, 2025:
    yep thanks
  19. in test/functional/interface_http.py:169 in 95d47449b2 outdated
    164+            b'0A' * 1000000,
    165+            b'0B' * 1000000,
    166+            b'0C' * 1000000,
    167+            b'0D' * 1000000,
    168+            b'"]}'
    169+        ]
    


    vasild commented at 1:37 pm on May 20, 2025:

    Repeat comment #32061 (review) here:

    Is the intention here to send 8MB of data?


    pinheadmz commented at 2:41 pm on May 21, 2025:
    No, 4MB! Great catch thanks, fixed.
  20. in test/functional/interface_http.py:179 in 95d47449b2 outdated
    174+            url='/',
    175+            body=iter(body_chunked),
    176+            headers=headers_chunked,
    177+            encode_chunked=True)
    178+        out1 = conn.getresponse().read()
    179+        assert out1 == b'{"result":"high-hash","error":null}\n'
    


    vasild commented at 1:38 pm on May 20, 2025:

    Repeat comment #32061 (review) here:

    Here and elsewhere in the added tests, assert_equal() produces a better error message:

    assert (value of out1 is not printed):

    0    assert out1 == b'{"result":"high-hash","error":null}\n'
    1           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    2AssertionError
    

    vs

    assert_equal():

    0AssertionError: not(b'{"result":null,"error":{"code":-32700,"message":"Parse error"},"id":null}\n' == b'{"result":"high-hash","error":null}\n')
    

    pinheadmz commented at 2:43 pm on May 21, 2025:
    awesome yes, thanks, done
  21. vasild commented at 1:39 pm on May 20, 2025: contributor

    Approach ACK 95d47449b2c069fca9859d76b2c3766868b5ec79

    Copied my review comments from https://github.com/bitcoin/bitcoin/pull/32061

  22. tests: Expand HTTP coverage to assert libevent behavior
    Covers:
    - http pipelining
    - rpcservertimeout
    
      Testing this requires adding an option to TestNode to force
      the test framework to establish a new HTTP connection for
      every RPC. Otherwise, attempting to reuse a persistent connection
      would cause framework RPCs during startup and shutdown to fail.
    
    - "chunked" Transfer-Encoding
    f16c8c67bf
  23. pinheadmz force-pushed on May 21, 2025
  24. pinheadmz commented at 2:48 pm on May 21, 2025: member
    rebase to f16c8c67bf to address comments, not touching the rpcservertimeout test just yet, discussion still ongoing
  25. vasild approved
  26. vasild commented at 9:17 am on May 22, 2025: contributor

    ACK f16c8c67bf137e64fbeea1242431baaa915a5c53

    Maybe the check for the timeout can be improved, ongoing discussion at #32408 (review) (non-blocker IMO)

  27. DrahtBot requested review from polespinasa on May 22, 2025
  28. DrahtBot requested review from fjahr on May 22, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-25 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me