Reasons why I don’t believe it is a deadlock issue based on the debug logs and debugger thread backtrace output shared in the issue:
The HTTP worker threads (b-http_pool_x) are waiting on the condition variable and not on the mutex that signals that these threads are idle & waiting for work to be assigned to them.
The HTTP thread (b-http) is epoll waiting that means it is waiting for a request (or a part of it) to be received.
The added logs show that the first few testmempool RPCs were successful and the next one timed out. But the logs don’t show a request for it being logged unlike in the previous ones, hinting that the server never received such a request (or in full) and thus never processed it. Even then the functional test client timed out, which means that it did send it (at least a part of it).
The large orphan transactions being sent are each 780KB in size that are sent sequentially by the test. It tries to send 60 of them in a loop amounting to 46MB of data over a single HTTP connection that is reused.
More details are shared in the first commit message.
This PR throttles the RPCs on client side. I’ve not been able to reproduce this intermittent issue and thus I don’t gurantee that this fixes the issue altogether.
Note: A previous approach in this PR tried to not reuse the HTTP connection for the RPCs in this test instead. But I noticed a CI run where this affected test took around 75mins to complete that led me to move to this approach where the HTTP connection is reused like before but with some throttling.
DrahtBot added the label
Tests
on Mar 18, 2026
DrahtBot
commented at 8:57 am on March 18, 2026:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviews
See the guideline for information on the review process.
A summary of reviews will appear here.
rkrux
commented at 10:31 am on March 18, 2026:
contributor
maflcko
commented at 11:08 am on March 18, 2026:
member
I don’t think the issue happens in macOS, but only in the task that tests ancestor commits, but that task is skipped for pull requests with one commit. Also, reproducing requires tens of runs/commits.
hebasto
commented at 11:08 am on March 18, 2026:
member
rkrux
commented at 11:13 am on March 18, 2026:
contributor
I don’t think the issue happens in macOS, but only in the task that tests ancestor commits,
Oh interesting, I do have a second commit that I will push after this recently started job ends/succeeds.
Also, reproducing requires tens of runs/commits.
I’m not trying to reproduce the issue. Just ensuring that the tests don’t fail for some other reason on this commit. Will undraft the PR once the CI is green (or not yellow at least).
DrahtBot removed the label
CI failed
on Mar 18, 2026
rkrux force-pushed
on Mar 18, 2026
rkrux force-pushed
on Mar 18, 2026
rkrux renamed this:
test: work in progress commit
test: conditionally throttle large testmempoolaccept rpcs in p2p_orphan_handling test
on Mar 18, 2026
rkrux
commented at 2:42 pm on March 18, 2026:
contributor
A previous instance of the CI run where the test took 4536s (~75min) to complete when the HTTP connection was not reused and a fresh one was created for every RPC in the test: ASan + LSan + UBSan + integer
rkrux marked this as ready for review
on Mar 18, 2026
rkrux force-pushed
on Mar 19, 2026
in
test/functional/p2p_orphan_handling.py:633
in
b309bdb2c4
629@@ -630,7 +630,8 @@ def test_maximal_package_protected(self):
630631 # Check to make sure these are orphans, within max standard size (to be accepted into the orphanage)
632 for large_orphan in large_orphans:
633- testres = node.testmempoolaccept([large_orphan.serialize().hex()])
634+ # throttle these 780KB large requests if the RPC latency is greater than 1s
Above is a log excerpt from the last successful request. There is no such delay in these logs (as per the first timestamps), although these are only server side logs and not from the client (test) side. I think I will just revert to unconditional timeout instead of doing it conditionally for which I don’t have any basis.
The tests are run on a fast gaming CPU,
Nice, I didn’t know this but do we know how much load it is under (or atleast when the intermittent issue occured)?
A previous instance of the CI run where the test took 4536s (~75min) to complete when the HTTP connection was not reused and a fresh one was created for every RPC in the test: ASan + LSan + UBSan + integer
I like this approach for this test but this one occurence discouraged me. I do feel that it hints at the CI instance(s) being under load intermittently for which an unconditional timeout can be a remedy.
Nice, I didn’t know this but do we know how much load it is under (or atleast when the intermittent issue occured)?
I wouldn’t expect a high load to be the issue here. This is an optimized build without any sanitizers, running on a high-end CPU. Seeing a spurious 30 seconds timeout for an RPC that would otherwise take milliseconds seems off.
In fact, it may be a race, that is only visible because the CPU is so fast.
I’ve reworked the PR to throttle at the large orphan level if it’s being sent over the network to the RPC. This handles both the p2p tests (p2p_orphan_handling, p2p_opportunistic_1p1c) that timed out intermittently.
CyberNFT
commented at 9:18 am on March 19, 2026:
none
👍🏻
rkrux force-pushed
on Mar 19, 2026
rkrux renamed this:
test: conditionally throttle large testmempoolaccept rpcs in p2p_orphan_handling test
test: throttle large testmempoolaccept rpcs in p2p_orphan_handling test
on Mar 19, 2026
rkrux force-pushed
on Mar 19, 2026
DrahtBot added the label
CI failed
on Mar 19, 2026
DrahtBot removed the label
CI failed
on Mar 19, 2026
rkrux force-pushed
on Mar 27, 2026
rkrux marked this as a draft
on Mar 27, 2026
rkrux force-pushed
on Mar 27, 2026
rkrux force-pushed
on Mar 27, 2026
DrahtBot added the label
CI failed
on Mar 27, 2026
rkrux renamed this:
test: throttle large testmempoolaccept rpcs in p2p_orphan_handling test
test: throttle large orphan transactions while being sent in RPCs
on Mar 27, 2026
rkrux marked this as ready for review
on Mar 27, 2026
DrahtBot removed the label
CI failed
on Mar 27, 2026
maflcko
commented at 10:40 am on March 27, 2026:
member
Not sure this is the correct fix. We are not sending 1MB from somewhere outside the solar system to the earth. This is sending 780KB on a local socket from one process to another. Why should this take 30 seconds? Normally, the whole test passes in less time than that, and then suddenly a single RPC times out?
Also, you haven’t even tested if this fix is working. 12 runs/commits is not enough. It can happen after the 40th or 60th run. You’ll have to add 125 empty commits or so.
I don’t mind a temporary workaround, but at least it should be tested, and it should be explained that this is just a temporary workaround for a real underlying bug.
Otherwise, are we going to update the docs to say: “If you call an RPC with a large payload, you have to manually sleep after each call”?
rkrux
commented at 11:00 am on March 27, 2026:
contributor
This is sending 780KB on a local socket from one process to another. Why should this take 30 seconds?
It shouldn’t take 30 seconds. Since it’s all on local, I don’t think this is a network latency issue, but more of an issue with the server drain rate of its TCP buffer not as quick as the client send rate. That’s why I believe the zero TCP window issue occurs.
it should be explained that this is just a temporary workaround
The PR description does hint at it in the end but I can make this explicit.
Not sure this is the correct fix.
It can happen after the 40th or 60th run. You’ll have to add 125 empty commits or so.
I can test with more commits and put it in draft until then.
for a real underlying bug.
This presence of this issue in only one CI job is what I find confusing (and interesting) the most.
“If you call an RPC with a large payload, you have to manually sleep after each call”
This shouldn’t be required because this issue doesn’t happen all the time and is intermittent in a specific CI job, which can even put into question the setup of that CI job.
rkrux marked this as a draft
on Mar 27, 2026
in
test/functional/p2p_orphan_handling.py:632
in
56898d5d5aoutdated
628@@ -630,7 +629,7 @@ def test_maximal_package_protected(self):
629630 # Check to make sure these are orphans, within max standard size (to be accepted into the orphanage)
631 for large_orphan in large_orphans:
632- testres = node.testmempoolaccept([large_orphan.serialize().hex()])
633+ testres = node.testmempoolaccept([large_orphan.to_send.serialize().hex()])
Instead of sleeping 300ms, it would be a smaller temporary workaround to just quickly spin up a new tcp connection. You can do this either:
by calling .cli() (spawns a bitcoin-cli process) in a trivial one-line patch
or cherry-pick fa8fc5a23752c2a590b95f62833cf013a3d6febc, which was meant for different threads, but using the new authproxy for a single rpc call should also be fine and work around the issue for now.
If you want to keep the unconditional sleep, my preference would be to inline it here again, like it was in the beginning of this pull?
it would be a smaller temporary workaround to just quickly spin up a new tcp connection.
A new connection for every iteration of testmempoolaccept?
If you want to keep the unconditional sleep, my preference would be to inline it here again, like it was in the beginning of this pull?
I preferred that too and then noticed the same failure in the p2p_opportunistic_1p1c test. So thought maybe highlight it in the code to add a sleep whenever a LargeOrphan is sent over the wire by putting the sleep in the class itself (though the sleep is effective only when many such large orphans are sent in a burst). Otherwise it seemed easy that a new call site might miss adding the sleep.
in
test/functional/p2p_opportunistic_1p1c.py:444
in
56898d5d5aoutdated
maflcko
commented at 3:01 pm on March 27, 2026:
member
lgtm (assuming ci passes)
Looks like it is on track of passing …
So I guess adding a sleep to workaround a timeout bug is another data point that shows there is an underlying racy bug, which is only triggered by weird timing. (And can be avoided by adding weird timing/sleeps)
rkrux
commented at 3:07 pm on March 27, 2026:
contributor
Looks like it is on track of passing …
Yeah, no failure yet. But I sense that the 360 minutes threshold of the job will be hit before all the ancestor commits are tested, so this job might get cancelled in the end. :(
DrahtBot added the label
CI failed
on Mar 28, 2026
test: throttle large orphan transactions while being sent in RPCs
Each of these large orphan transaction is around 780KB large that are
sent sequentially without waiting in the `testmempoolaccept` RPC. For the
`p2p_orphan_handling` and `p2p_opportunistic_1p1c` tests that send these
large orphan transactions sequentially 50-60 times, it has been observed in
the CI via the tcpdump outputs (refer:
https://github.com/bitcoin/bitcoin/issues/34731#issuecomment-4133098597) that
the HTTP server is showing zero TCP window `win 0` intermittently that leads to
such requests never being read fully, so the server never processes them and
thus never sends a response. The test client rightfully times out after 30 seconds.
Interestingly, this intermittent issue has been observed only in the "test ancestor
commits" CI job that recently started testing all the commits in the PR, which is more
robust, as opposed to testing only the last 6 commits like it used to do earlier. For each
commit in the PR, this job runs 16 tests in parallel where the CPU nproc is 8. These
two are the only tests that send such large orphans to the server in the same instance
50-60 times amounting to 45MB being sent in a burst. I've noticed this issue in this
job never in the first commit being tested but instead only in the subsequent ones.
This commit creates a LargeOrphanTransaction class that provides two properties
- one to get the large orphan transaction for the internal operations and the
other to send this transaction over the network, which by default adds a 50ms
sleep before returning. This is to ensure that the test client doesn't bombard
the server with such large transactions without providing it with a cool down
period for the TCP window to clear.
One of the ealier CI runs I tested this change on had 125 commits with a 300ms
delay between each such RPC where this issue didn't occur:
https://github.com/bitcoin/bitcoin/actions/runs/23643332720/job/68869238262?pr=34847
All the `p2p_orphan_handling` tests finished in under 60sec each and all the
`p2p_opportunistic_1p1c` tests finished in undr 70sec each. Adding the timeout
does increase the overall latency of these two tests but might help in avoiding
the intermittent timeouts.
08722f0a97
test: remove redundant setmocktime RPC
setmocktime RPC is already called in the cleanup function that runs prior to
this last test, setting it again in the test case is not needed.
dc92bb81d9
test: empty commit 1 to test ancestor commits jobfa01aee063
test: empty commit 2 to test ancestor commits jobc6f23553f1
test: empty commit 3 to test ancestor commits job59ddba1511
test: empty commit 4 to test ancestor commits jobd6f701f7da
test: empty commit 5 to test ancestor commits job3d6054d85e
test: empty commit 6 to test ancestor commits job5eaaee41c5
test: empty commit 7 to test ancestor commits job31bb9c9811
test: empty commit 8 to test ancestor commits job2dc526d623
test: empty commit 9 to test ancestor commits job84796fd6c8
test: empty commit 10 to test ancestor commits joba4ac780c2e
test: empty commit 11 to test ancestor commits jobf1e794a6fa
test: empty commit 12 to test ancestor commits job3ddc58d248
test: empty commit 13 to test ancestor commits job7cb77e6329
test: empty commit 14 to test ancestor commits job9dd07a4b56
test: empty commit 15 to test ancestor commits job09308c9ec0
test: empty commit 16 to test ancestor commits job1aaa0cb7ce
test: empty commit 17 to test ancestor commits jobd31cfe7386
test: empty commit 18 to test ancestor commits jobda81542424
test: empty commit 19 to test ancestor commits job2a76b669cd
test: empty commit 20 to test ancestor commits job439b5231a1
test: empty commit 21 to test ancestor commits job86e623615c
test: empty commit 22 to test ancestor commits job64c735cb1f
test: empty commit 23 to test ancestor commits job42fa580776
test: empty commit 24 to test ancestor commits jobedec3511b2
test: empty commit 25 to test ancestor commits jobdbdd6265d9
test: empty commit 26 to test ancestor commits jobc0fc48c53d
test: empty commit 27 to test ancestor commits jobe6ca5d68e0
test: empty commit 28 to test ancestor commits job679467ddff
test: empty commit 29 to test ancestor commits jobc4b32e6339
test: empty commit 30 to test ancestor commits jobaaf1bf0e7a
test: empty commit 31 to test ancestor commits job97623fc8ed
test: empty commit 32 to test ancestor commits job7d7b59cb32
test: empty commit 33 to test ancestor commits job6924a071fe
test: empty commit 34 to test ancestor commits job7bffb9ef95
test: empty commit 35 to test ancestor commits job2eadab46d9
test: empty commit 36 to test ancestor commits job83772e1c2d
test: empty commit 37 to test ancestor commits job56efb0d097
test: empty commit 38 to test ancestor commits job44ef7ee6db
test: empty commit 39 to test ancestor commits job61f716538c
test: empty commit 40 to test ancestor commits jobc051b05c4b
test: empty commit 41 to test ancestor commits job311edbf9b9
test: empty commit 42 to test ancestor commits jobde5d7e50a9
test: empty commit 43 to test ancestor commits job1b273a0a40
test: empty commit 44 to test ancestor commits job4bdf05c7b2
test: empty commit 45 to test ancestor commits joba9ee93b5b4
test: empty commit 46 to test ancestor commits job32106beb33
test: empty commit 47 to test ancestor commits job22a9a22733
test: empty commit 48 to test ancestor commits job25157737ad
test: empty commit 49 to test ancestor commits jobdcc2356d61
test: empty commit 50 to test ancestor commits job68b3945b1f
test: empty commit 51 to test ancestor commits job9ffa812a77
test: empty commit 52 to test ancestor commits jobcd1e68beeb
test: empty commit 53 to test ancestor commits job54c4fae21a
test: empty commit 54 to test ancestor commits job173d84d02a
test: empty commit 55 to test ancestor commits job248233eea5
test: empty commit 56 to test ancestor commits job536b0cc915
test: empty commit 57 to test ancestor commits job698996a727
test: empty commit 58 to test ancestor commits job9236f40d30
test: empty commit 59 to test ancestor commits jobf05a5aeef4
test: empty commit 60 to test ancestor commits job5494f8162e
test: empty commit 61 to test ancestor commits job8df131ffc9
test: empty commit 62 to test ancestor commits job41c55364dd
test: empty commit 63 to test ancestor commits job36170deffa
test: empty commit 64 to test ancestor commits job7bc7584346
test: empty commit 65 to test ancestor commits job0fffaf10ef
test: empty commit 66 to test ancestor commits job825ceb2811
test: empty commit 67 to test ancestor commits jobd4ae56fe95
test: empty commit 68 to test ancestor commits jobe10f1ab805
test: empty commit 69 to test ancestor commits job146ec131fb
test: empty commit 70 to test ancestor commits job9ea1efb6f0
test: empty commit 71 to test ancestor commits job42095e2757
test: empty commit 72 to test ancestor commits jobd9b867a8bf
test: empty commit 73 to test ancestor commits job84ada5eee0
test: empty commit 74 to test ancestor commits joba0046e50ee
test: empty commit 75 to test ancestor commits job1aa85ab1b9
test: empty commit 76 to test ancestor commits jobec593cbf9d
test: empty commit 77 to test ancestor commits job44b852d57a
test: empty commit 78 to test ancestor commits job6b9b9716c9
test: empty commit 79 to test ancestor commits jobbf1e1a3d05
test: empty commit 80 to test ancestor commits job6e9bac15ab
test: empty commit 81 to test ancestor commits job6a7e90135a
test: empty commit 82 to test ancestor commits jobcf8c0463c6
test: empty commit 83 to test ancestor commits jobe52735a2d2
test: empty commit 84 to test ancestor commits job6c27cc3180
test: empty commit 85 to test ancestor commits job838a5cac64
test: empty commit 86 to test ancestor commits job4aac950c43
test: empty commit 87 to test ancestor commits job3ad387abed
test: empty commit 88 to test ancestor commits job4a4ce69482
test: empty commit 89 to test ancestor commits job87b6bba88e
test: empty commit 90 to test ancestor commits job671eee10db
test: empty commit 91 to test ancestor commits job911cbaf55c
test: empty commit 92 to test ancestor commits job6b737a5b54
test: empty commit 93 to test ancestor commits job7082e4c530
test: empty commit 94 to test ancestor commits job7222c6e97e
test: empty commit 95 to test ancestor commits job63c2169d09
test: empty commit 96 to test ancestor commits jobf4a25f7849
test: empty commit 97 to test ancestor commits joba9b1516e1b
test: empty commit 98 to test ancestor commits jobbf208caf02
test: empty commit 99 to test ancestor commits job4fd70acbfc
test: empty commit 100 to test ancestor commits jobcaece82a70
test: empty commit 101 to test ancestor commits job7e3944ca77
test: empty commit 102 to test ancestor commits job2e5d25301c
test: empty commit 103 to test ancestor commits job49cf5bd4c8
test: empty commit 104 to test ancestor commits job5d7e2bcc89
test: empty commit 105 to test ancestor commits jobaba6730938
test: empty commit 106 to test ancestor commits job024fd01bfd
test: empty commit 107 to test ancestor commits job19f724c49a
test: empty commit 108 to test ancestor commits jobeeaa9eccaf
test: empty commit 109 to test ancestor commits jobb445328f27
test: empty commit 110 to test ancestor commits job2e5eada2c4
test: empty commit 111 to test ancestor commits job0b5039e466
test: empty commit 112 to test ancestor commits job2921639f50
test: empty commit 113 to test ancestor commits jobd81f39b3bf
test: empty commit 114 to test ancestor commits jobe832dc3b06
test: empty commit 115 to test ancestor commits job15aaee7e4b
test: empty commit 116 to test ancestor commits jobae2e87ea6c
test: empty commit 117 to test ancestor commits job9a25b741db
test: empty commit 118 to test ancestor commits job7791baecba
test: empty commit 119 to test ancestor commits jobf27e4e6646
test: empty commit 120 to test ancestor commits job6e59e1e504
test: empty commit 121 to test ancestor commits jobed1d49125e
test: empty commit 122 to test ancestor commits job04712a96fb
test: empty commit 123 to test ancestor commits jobfda946d6b8
rkrux force-pushed
on Mar 28, 2026
rkrux
commented at 10:54 am on March 28, 2026:
contributor
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2026-03-30 00:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me