tests: Expand HTTP coverage to assert libevent behavior

pinheadmz commented at 6:56 pm on May 2, 2025: member

These commits are cherry-picked from #32061 and part of a project to remove libevent.

This PR only adds functional tests to interface_http to cover some HTTP server behaviors we inherit from libevent, in order to maintain those behaviors when we replace libevent with our own HTTP server.

Pipelining: The server must respond to requests from a client in the order in which they were received RFC 7230 6.3.2
-rpcservertimeout config option which sets the amount of time the server will keep an idle client connection alive
“Chunked” Transfer-Encoding: Allows a client to send a request in pieces, without the Content-Length header RFC 7230 4.1

DrahtBot commented at 6:56 pm on May 2, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32408.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	vasild, polespinasa, fjahr, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

DrahtBot added the label Tests on May 2, 2025

DrahtBot added the label CI failed on May 2, 2025

in test/functional/interface_http.py:223 in 39c04c7570 outdated

205+        # Wait for response, but expect a timeout disconnection after 1 second
206+        start = time.time()
207+        res = sock.recv(1024)
208+        stop = time.time()
209+        assert res == b""
210+        assert stop - start >= 1

maflcko commented at 12:32 pm on May 4, 2025:

https://cirrus-ci.com/task/5276299050090496?logs=ci#L3261:

0[15:13:21.244]  node2 2025-05-02T19:13:19.879966Z [httpworker.4] [rpc/request.cpp:241] [parse] [rpc] ThreadRPCServer method=getmempoolinfo user=__cookie__ 
1[15:13:21.244]  test  2025-05-02T19:13:19.882000Z TestFramework.node2 (DEBUG): RPC successfully started 
2[15:13:21.244]  test  2025-05-02T19:13:20.885000Z TestFramework (ERROR): Assertion failed 
3[15:13:21.244]                                    Traceback (most recent call last):
4[15:13:21.244]                                      File "/ci_container_base/test/functional/test_framework/test_framework.py", line 183, in main
5[15:13:21.244]                                        self.run_test()
6[15:13:21.244]                                      File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/interface_http.py", line 210, in run_test
7[15:13:21.244]                                        assert stop - start >= 1
8[15:13:21.244]                                               ^^^^^^^^^^^^^^^^^
9[15:13:21.244]                                    AssertionError

pinheadmz commented at 1:00 pm on May 4, 2025:

yeah i see it – gotta figure out why sock.recv(1024) is non blocking on the two failing platforms

in test/functional/interface_http.py:206 in 39c04c7570 outdated

201+        conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
202+        conn.connect()
203+        sock = conn.sock
204+        sock.sendall(http_request.encode("utf-8"))
205+        # Wait for response, but expect a timeout disconnection after 1 second
206+        start = time.time()

polespinasa commented at 2:54 pm on May 7, 2025:

~~If you move the start = time.time() before sending the test passes correctly.~~

~~CI was failing locally on my laptop. After changing this it does work.~~

Note: it was just working on some executions

pinheadmz commented at 8:38 pm on May 8, 2025:

I noticed in the libevent’s test they don’t actually check that server waited at all, just that the connection closed within a few seconds of the configured time out. So I went that route, but also added a tiny minimum check just to make sure the server isn’t closing immediately. And this gives us better test reliability.

So the new test is: -rpcservertimeout=2, we send a request, we time how long it takes to close, and that duration is expected to be between 1 and 4 seconds.

polespinasa commented at 7:26 am on May 9, 2025:

If the timeout is set to 2 wouldn’t make more sense to test between 2 and 4 seconds? Can it be less than 2 seconds?

pinheadmz commented at 12:38 pm on May 9, 2025:

It can be, and that was why the 1 second timeout check kept failing. I tried a few implementations of the test and it’s just impossible to reliably start the test timer in sync with the HTTP server. So sometimes it starts late and you end up with duration < expected.

The point of the test is to ensure that the server disconnects idle clients, I think it does that even though the exact time is a little fudged.

vasild commented at 8:22 am on May 21, 2025:

Did you try to start the timer before connecting? Like this:

 0         # Get the underlying socket from HTTP connection so we can send something unusual
 1         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
 2+        start = time.time()
 3         conn.connect()
 4         sock = conn.sock
 5         sock.sendall(http_request.encode("utf-8"))
 6         # Wait for response, but expect a timeout disconnection after 1 second
 7-        start = time.time()
 8         res = sock.recv(1024)
 9         stop = time.time()
10         # Server disconnected with EOF

It must make it such that the timer in the test is then always >= than the server timeout of 2 seconds, no?

polespinasa commented at 12:09 pm on May 21, 2025:

Yeah we tried that and still was not working. For more context we got a discussion on the IRC channel: https://www.erisian.com.au/bitcoin-core-dev/log-2025-05-07.html

vasild commented at 12:44 pm on May 21, 2025:

In that discussion @maflcko suggested to start the timer before connect and send and sliv3r__ said that before connect it worked 15 times in a row. Do I miss something?

polespinasa commented at 12:52 pm on May 21, 2025:

It was “randomly” failing. The 15 times in a row was moving it before http.clinet.HTTPConn.... Moving it before conn.connect failed the 6th time https://www.erisian.com.au/bitcoin-core-dev/log-2025-05-07.html#l-115

vasild commented at 12:57 pm on May 21, 2025:

Ok, then this should work:

 0         # Get the underlying socket from HTTP connection so we can send something unusual
 1+        start = time.time()
 2         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
 3         conn.connect()
 4         sock = conn.sock
 5         sock.sendall(http_request.encode("utf-8"))
 6         # Wait for response, but expect a timeout disconnection after 1 second
 7-        start = time.time()
 8         res = sock.recv(1024)
 9         stop = time.time()
10         # Server disconnected with EOF

pinheadmz commented at 2:25 pm on May 21, 2025:

That should work but the reason I didn’t leave like that is I don’t feel like its effectively testing the right thing. We could start the timer at the top of the file and it would always pass, but it wouldn’t necessarily catch any regressions. The server timeout should re-start every time it receives a packet from the client, and it’s very hard to start the test timer at the right moment.

Another approach I tried was actually setting libevent debug logs and trying to track down messages like these:

0event_add: event: 0x10a004210 (fd 20), EV_READ   EV_TIMEOUT call 0x1053cf1bc 
1event_add: event 0x10a004210, timeout in 1 seconds 0 useconds, call 0x1053cf1bc

Interestingly, the original version of this test always passes after switching to Sockman

pinheadmz commented at 2:26 pm on May 21, 2025:

All that being said, if you want me to use the patch you wrote that’s fine by me too ;-)

vasild commented at 7:13 am on May 22, 2025:

I don’t see this discussion as a blocker.

I just want to get to the bottom of it. Checked the source code of http.client.HTTPConnection() - it does not open any connections, just sets some internal member variables of the HTTPConnection class. So, starting the timer before or after http.client.HTTPConnection() shouldn’t make a difference.

conn.connect() is what makes the connection. And the timer should be started before it. Why would not the followng test the right thing?

start timer in the test (A)
connect, server timer starts here (B)
send
try to recv, connection will be closed by the server due to server timeout (C)
stop the timer (D)

Time between B and C will (must!) always be less than the time between A and D. If it is not, then I would like to know why. Maybe we are doing something completely wrong.

My worry is that CI machines are sometimes veeery slow and any “reasonable” time we set turns out to be surprise and results in intermittent test failures. Now, if doing the above can result in the time between A and D being less than 2 sec (= server timeout, B to C) and we don’t know why is that and we set to check only for >1 sec instead, then, because we do not know the cause, how can we be sure that the test timer (A to D) will not be sometimes e.g. 0.99 sec causing the test to fail?

pinheadmz commented at 5:32 pm on May 22, 2025:

The server timer should start after receiving the last packet from the client, because its an idle timer not a session timer. So its more correct to have:

send, server timer starts immediately after (B)

I tried to to nail this in https://github.com/bitcoin/bitcoin/commit/ce9847e284a361fb050f366848cdf1a38b2b933b but that also failed.

And then, if we start the timer too early, we will always get a duration > 1 no matter what the server actually does… maybe its the initial TCP connection attempt that takes 1 second and then the server doesn’t connect, or responds immediately, etc.

There also could be something with libevent itself making this act funny? I haven’t had any intermittency with the original test in #32061 – so maybe as part of that PR we can tighten up the test ;-)

vasild commented at 8:15 am on May 23, 2025:

Ok, it could be that libevent behaves in an unexpected way. In #32061 we could tighten up the test to:

connect
start timer in the test (A)
send, server timer starts after last packet received from the client (B)
try to recv, connection will be closed by the server due to server timeout (C)
stop the timer in the test (D)

pinheadmz force-pushed on May 8, 2025

DrahtBot removed the label CI failed on May 8, 2025

in test/functional/interface_http.py:141 in 840dd5b6eb outdated

136+        try:
137+            # The server should not respond to the fast, second request
138+            # until the (very) slow first request has been handled:
139+            res = sock.recv(1024)
140+            assert not res
141+        except TimeoutError:

polespinasa commented at 7:16 am on May 9, 2025:

I’m not sure about this but shouldn’t this be socket.timeout? As we’re using sock.recv and not getresponse()?

pinheadmz commented at 1:06 pm on May 9, 2025:

Great catch! I found this:

exception socket.timeout A deprecated alias of TimeoutError. Changed in version 3.10: This class was made an alias of TimeoutError.

Since Python 3.10 is minimum version required in dependencies.md (and the test passes!) I think it’s ok to leave as-is

polespinasa commented at 7:22 am on May 9, 2025: contributor

tACK 840dd5b6eb0b2c1f35a91c36acac5d97933172dc

Maybe the three commits could be squashed?

pinheadmz force-pushed on May 9, 2025

pinheadmz commented at 1:10 pm on May 9, 2025: member

Maybe the three commits could be squashed?

Done! thanks for reviewing

fjahr commented at 1:00 pm on May 19, 2025: contributor

Concept ACK

in test/functional/interface_http.py:216 in 95d47449b2 outdated

211+        # Server disconnected within an acceptable range of time:
212+        # not immediately, and not too far over the configured duration.
213+        # This allows for some jitter in the test between client and server.
214+        duration = stop - start
215+        assert duration <= 4, f"Server disconnected too slow: {duration} > 4"
216+        assert duration >= 1, f"Server disconnected too fast: {duration} < 1"

vasild commented at 1:32 pm on May 20, 2025:

That should be 2, I guess since the timeout is set to 2. E.g. disconnecting after 1.5 seconds is unexpected and should be treated as an error.

 0diff --git i/test/functional/interface_http.py w/test/functional/interface_http.py
 1index 4f10b55afd..9c345c30b9 100755
 2--- i/test/functional/interface_http.py
 3+++ w/test/functional/interface_http.py
 4@@ -185,13 +185,14 @@ class HTTPBasicsTest (BitcoinTestFramework):
 5         # so low on this one node, its connection will quickly timeout and get dropped by
 6         # the server. Negating this setting will force the AuthServiceProxy
 7         # for this node to create a fresh new HTTP connection for every command
 8         # called for the remainder of this test.
 9         self.nodes[2].reuse_http_connections = False
10 
11-        self.restart_node(2, extra_args=["-rpcservertimeout=2"])
12+        rpcservertimeout = 2
13+        self.restart_node(2, extra_args=[f"-rpcservertimeout={rpcservertimeout}"])
14         # This is the amount of time the server will wait for a client to
15         # send a complete request. Test it by sending an incomplete but
16         # so-far otherwise well-formed HTTP request, and never finishing it.
17 
18         # Copied from http_incomplete_test_() in regress_http.c in libevent.
19         # A complete request would have an additional "\r\n" at the end.
20@@ -199,24 +200,24 @@ class HTTPBasicsTest (BitcoinTestFramework):
21 
22         # Get the underlying socket from HTTP connection so we can send something unusual
23         conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
24         conn.connect()
25         sock = conn.sock
26         sock.sendall(http_request.encode("utf-8"))
27-        # Wait for response, but expect a timeout disconnection after 1 second
28+        # Wait for response, but expect a timeout disconnection after `rpcservertimeout` seconds
29         start = time.time()
30         res = sock.recv(1024)
31         stop = time.time()
32         # Server disconnected with EOF
33         assert res == b""
34         # Server disconnected within an acceptable range of time:
35         # not immediately, and not too far over the configured duration.
36         # This allows for some jitter in the test between client and server.
37         duration = stop - start
38         assert duration <= 4, f"Server disconnected too slow: {duration} > 4"
39-        assert duration >= 1, f"Server disconnected too fast: {duration} < 1"
40+        assert duration >= rpcservertimeout, f"Server disconnected too fast: {duration} < {rpcservertimeout}"
41         # The connection is definitely closed.
42         try:
43             conn.request('GET', '/')
44             conn.getresponse()
45         #       macos/linux           windows
46         except (ConnectionResetError, ConnectionAbortedError):

pinheadmz commented at 2:34 pm on May 20, 2025:

The problem with this is the test becomes flakey because the timer may not start in sync with the server, so it may record less time than the server actually waited: #32408 (review)

note also that libevent doesn’t even try to test the lower bound: #32408 (review)

vasild commented at 8:23 am on May 21, 2025:

continued the discussion in #32408 (review)

in test/functional/interface_http.py:132 in 95d47449b2 outdated

127+        req2 += body2
128+        # Get the underlying socket from HTTP connection so we can send something unusual
129+        conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)
130+        conn.connect()
131+        sock = conn.sock
132+        sock.settimeout(1)

vasild commented at 1:34 pm on May 20, 2025:

Repeat comment #32061 (review) here:

1 second timeout to send or receive seems more than enough for local testing on a dev machine. However, CI virtual machines sometimes are surprisingly slow. To avoid unnecessary test failures maybe it would be better to have this be 5 or 10 seconds for the sendall() calls and then set to 1 for the recv() call which we expect to timeout.

pinheadmz commented at 2:40 pm on May 21, 2025:

Ok I’ll change the value here to 5 – meaning the sock.recv() on L139 will always take that much time. That will, however, protect us against false positives on a slow CI, if the server regressed and actually responded to the requests out of order, but took more than 1 second to do so.

After we generate a block, the sock.recv() on L150 should execute very quickly, but will allow up to 5 seconds for a slow CI to respond to the two RPC requests. That applies to the sendall() as well.

in test/functional/interface_http.py:140 in 95d47449b2 outdated

135+        sock.sendall(req2.encode("utf-8"))
136+        try:
137+            # The server should not respond to the fast, second request
138+            # until the (very) slow first request has been handled:
139+            res = sock.recv(1024)
140+            assert not res

vasild commented at 1:35 pm on May 20, 2025:

Repeat comment #32061 (review) here:

Shouldn’t this be assert False? Here the expectation is that the recv() will throw an exception due to timeout.

https://docs.python.org/3/library/socket.html#socket.socket.recv

A returned empty bytes object indicates that the client has disconnected.

An “empty bytes object” will not trigger the assert assert not res but if that happens (= disconnect) then the test should fail.

suggestion:

0-            assert not res
1+            assert False

pinheadmz commented at 2:41 pm on May 21, 2025:

thanks, done

in test/functional/interface_http.py:223 in 95d47449b2 outdated

218+        try:
219+            conn.request('GET', '/')
220+            conn.getresponse()
221+        #       macos/linux           windows
222+        except (ConnectionResetError, ConnectionAbortedError):
223+            pass

vasild commented at 1:36 pm on May 20, 2025:

Repeat comment #32061 (review) here:

This will also pass if no exception is thrown. Either add assert False after line 214 or have a boolean variable to false before the try and set it to true inside except and assert that it is true afterwards.

pinheadmz commented at 2:46 pm on May 21, 2025:

yep thanks

in test/functional/interface_http.py:169 in 95d47449b2 outdated

164+            b'0A' * 1000000,
165+            b'0B' * 1000000,
166+            b'0C' * 1000000,
167+            b'0D' * 1000000,
168+            b'"]}'
169+        ]

vasild commented at 1:37 pm on May 20, 2025:

Repeat comment #32061 (review) here:

Is the intention here to send 8MB of data?

pinheadmz commented at 2:41 pm on May 21, 2025:

No, 4MB! Great catch thanks, fixed.

in test/functional/interface_http.py:179 in 95d47449b2 outdated

174+            url='/',
175+            body=iter(body_chunked),
176+            headers=headers_chunked,
177+            encode_chunked=True)
178+        out1 = conn.getresponse().read()
179+        assert out1 == b'{"result":"high-hash","error":null}\n'

vasild commented at 1:38 pm on May 20, 2025:

Repeat comment #32061 (review) here:

Here and elsewhere in the added tests, assert_equal() produces a better error message:

assert (value of out1 is not printed):

0    assert out1 == b'{"result":"high-hash","error":null}\n'
1           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2AssertionError

vs

assert_equal():

0AssertionError: not(b'{"result":null,"error":{"code":-32700,"message":"Parse error"},"id":null}\n' == b'{"result":"high-hash","error":null}\n')

pinheadmz commented at 2:43 pm on May 21, 2025:

awesome yes, thanks, done

vasild commented at 1:39 pm on May 20, 2025: contributor

Approach ACK 95d47449b2c069fca9859d76b2c3766868b5ec79

Copied my review comments from https://github.com/bitcoin/bitcoin/pull/32061

tests: Expand HTTP coverage to assert libevent behavior

Covers:
- http pipelining
- rpcservertimeout

  Testing this requires adding an option to TestNode to force
  the test framework to establish a new HTTP connection for
  every RPC. Otherwise, attempting to reuse a persistent connection
  would cause framework RPCs during startup and shutdown to fail.

- "chunked" Transfer-Encoding

f16c8c67bf

pinheadmz force-pushed on May 21, 2025

pinheadmz commented at 2:48 pm on May 21, 2025: member

rebase to f16c8c67bf to address comments, not touching the rpcservertimeout test just yet, discussion still ongoing

vasild approved

vasild commented at 9:17 am on May 22, 2025: contributor

ACK f16c8c67bf137e64fbeea1242431baaa915a5c53

Maybe the check for the timeout can be improved, ongoing discussion at #32408 (review) (non-blocker IMO)

DrahtBot requested review from polespinasa on May 22, 2025

DrahtBot requested review from fjahr on May 22, 2025

pinheadmz commented at 2:09 pm on June 3, 2025: member

Friendly ping @fjahr and @polespinasa for reviewing some HTTP tests ? <3

polespinasa commented at 6:31 pm on June 3, 2025: contributor

LGTM friendly acknowledging this PR :) ACK f16c8c67bf137e64fbeea1242431baaa915a5c53

Agree on the 5sec timeout change, we avoid weird errors if the CI is lazy.

I locally rebased this on top of master (e872a566f251c73908de8b6d243c94a6679c2eac) to make sure there aren’t silent merging conflicts that make the test fail. It works correctly :)

in test/functional/interface_http.py:170 in f16c8c67bf

165+            b'1' * 1000000,
166+            b'2' * 1000000,
167+            b'3' * 1000000,
168+            b'"]}'
169+        ]
170+        conn = http.client.HTTPConnection(urlNode2.hostname, urlNode2.port)

fjahr commented at 10:51 pm on June 3, 2025:

nit: Could use a helper function to get rid of some duplication here (and even beyond the changes here across the whole test file).

0def _get_http_connection(self, url_node):
1    conn = http.client.HTTPConnection(url_node.hostname, url_node.port) 
2    conn.connect()
3    sock = conn.sock
4    return conn, sock

pinheadmz commented at 3:53 pm on June 4, 2025:

Good call thanks. I will do this if I retouch this PR, otherwise will add to #32061 on rebase

in test/functional/interface_http.py:150 in f16c8c67bf

145+        self.generate(self.nodes[2], 1, sync_fun=self.no_op)
146+
147+        # Wait for two responses to be received
148+        res = b""
149+        while res.count(b"result") != 2:
150+            res += sock.recv(1024)

fjahr commented at 10:53 pm on June 3, 2025:

nit: I haven’t looked at the actual data but this seems a bit fragile, if the second “result” is at the very end of res, the test would continue here but then fail below. Not sure how real that threat is but it could also be easily mitigated by doing one more recv I guess.

pinheadmz commented at 3:52 pm on June 4, 2025:

I don’t think that’s possible, the two RPCs are getblockcount and waitforblockheight, those two responses together including their HTTP overhead only total about 500 bytes so I don’t think recv(1024) would truncate anything. If the server is extremely slow the client may even have to call recv() twice in the while loop, reading only about 250 bytes each time.

fjahr commented at 11:11 pm on June 3, 2025: contributor

utACK f16c8c67bf137e64fbeea1242431baaa915a5c53

Looks good to me, I would consider my comments non-blocking and could potentially be addressed in a follow-up.

achow101 commented at 7:50 pm on June 9, 2025: member

ACK f16c8c67bf137e64fbeea1242431baaa915a5c53

achow101 merged this on Jun 9, 2025

achow101 closed this on Jun 9, 2025

tests: Expand HTTP coverage to assert libevent behavior #32408

Code Coverage & Benchmarks

Reviews

Conflicts