test: intermittent issue in p2p_1p1c

fanquake commented at 10:56 am on September 5, 2025: member

Timeout failure in the TSAN job here https://github.com/bitcoin/bitcoin/actions/runs/17489459842/job/49675712585?pr=33317:

 02025-09-05T10:51:37.2200865Z [0;33m node3 2025-09-05T10:51:30.541244Z [httpworker.7] [rpc/request.cpp:243] [void JSONRPCRequest::parse(const UniValue &)] [rpc] ThreadRPCServer method=getpeerinfo user=__cookie__ [0m
 12025-09-05T10:51:37.2201033Z [0;36m test  2025-09-05T10:51:31.542469Z TestFramework (ERROR): Unexpected exception [0m
 22025-09-05T10:51:37.2201165Z [0;36m                                   Traceback (most recent call last):[0m
 32025-09-05T10:51:37.2201494Z [0;36m                                     File "/home/admin/actions-runner/_work/_temp/test/functional/test_framework/test_framework.py", line 195, in main[0m
 42025-09-05T10:51:37.2201593Z [0;36m                                       self.run_test()[0m
 52025-09-05T10:51:37.2201851Z [0;36m                                     File "/home/admin/actions-runner/_work/_temp/build/test/functional/p2p_1p1c_network.py", line 155, in run_test[0m
 62025-09-05T10:51:37.2201964Z [0;36m                                       self.sync_mempools()[0m
 72025-09-05T10:51:37.2202246Z [0;36m                                     File "/home/admin/actions-runner/_work/_temp/test/functional/test_framework/test_framework.py", line 811, in sync_mempools[0m
 82025-09-05T10:51:37.2202421Z [0;36m                                       raise AssertionError("Mempool sync timed out after {}s:{}".format([0m
 92025-09-05T10:51:37.2202566Z [0;36m                                   AssertionError: Mempool sync timed out after 2400s:[0m
102025-09-05T10:51:37.2216842Z [0;36m                                     {'74c951d3e1bc27437394377a48d6ff7f11b49c1e0bd05e169d54c5138deea7f2', '77708ea74fbffa47e192e5be99b03a72e830f22a8b60d02e262ba8cfbc3e9b47', '233c3760167a81cd6ffccae81bcb650bdbd9c84b72cf87bf329940d2ac97b8ee', '390ec58b4c5b017f557e30ead39f9921b3fec5729b07ff89415c5570275a87dc', 'c8ca64cb3da89905b3ccfc5423d908a17273f90c4893373af9354e61f3bd651c',

fanquake added the label Tests on Sep 5, 2025

fanquake added the label CI failed on Sep 5, 2025

instagibbs commented at 5:31 pm on October 10, 2025: member

I think I’ve nailed down what’s happening from logs:

test peer sends child to node1
node1 has parent resolved normally, it’s above minfee
test peer is picked as unique peer to schedule resolution of child during test peer’s next “turn” in networking thread
node0 advertises child by wtxid (it’s ignored currently because it’s in orphanage and has no parents, test peer is already the unique reconsideration peer so that’s fairly immaterial?)
test peer disconnects before ProcessOrphanTx is called (and we never reconsider the orphan again because all parents are known!)
now any peer can advertise the child again but in this test only node0 can, and it already has, so the transaction never makes it

The test does the disconnect earlier but from socket closing to clearing of peer state, this race can occur.

due to #31829 node1’s orphanage will actually empty out on disconnect, so it’s not actually testing much anymore, so that can be cleaned up and I expect the test to work better (especially if you assert the orphanage of each node is in the expected state)

Overall, I think this is a vote in favor of forcing ProcessOrphanTx on disconnect by peer, or having the reconsiderable slot reassigned to a new peer to disallow them from playing similar games. Seems like it only effects “catch up” orphanage usage, not 1p1c, and relies on some incredible timing to disrupt it successfully

Some discussion of this DoS strategy here: #31829 (comment)

glozow commented at 7:52 pm on October 10, 2025: member

The test does the disconnect earlier but from socket closing to clearing of peer state, this race can occur.

If it’s a race, then this should go away if we make node3 the one that pre-receives orphans?

 0
 1diff --git a/test/functional/p2p_1p1c_network.py b/test/functional/p2p_1p1c_network.py
 2index e4d3b738c19..5567b4e7002 100755
 3--- a/test/functional/p2p_1p1c_network.py
 4+++ b/test/functional/p2p_1p1c_network.py
 5@@ -109,14 +109,14 @@ class PackageRelayTest(BitcoinTestFramework):
 6         # Assemble return results
 7         packages_to_submit = [package_hex_1, package_hex_2, package_hex_3, package_hex_4]
 8         # node0: sender
 9-        # node1: pre-received the children (orphan)
10-        # node3: pre-received the parents (too low fee)
11+        # node2: pre-received the parents (too low fee)
12+        # node3: pre-received the children (orphan)
13         # All nodes receive parent_31 ahead of time.
14         txns_to_send = [
15             [],
16-            [child_1, child_2, parent_31, child_3, child_4],
17             [parent_31],
18-            [parent_1, parent_2, parent_31, parent_4]
19+            [parent_1, parent_2, parent_31, parent_4],
20+            [child_1, child_2, parent_31, child_3, child_4]
21         ]
22 
23         return packages_to_submit, txns_to_send

instagibbs commented at 12:37 pm on October 13, 2025: member

I don’t think the pre-received child peer(s) make sense anymore, as we intentionally zero these out of the orphanage when all the announcing peers disconnect.

If we want to keep this part just to exercise that state space, we could make sure that we wait until all the node’s orphan slots are empty via rpc before continuing with the ultimate submitpackage?

fanquake commented at 4:53 pm on November 7, 2025: member

Seen again in https://github.com/bitcoin/bitcoin/actions/runs/19173508101/job/54812101221?pr=33810.

instagibbs commented at 4:59 pm on November 18, 2025: member

I’ve also discovered that create_package_2outs has regressed since we lowered minrelay to 0.1s/vbyte. The parent_4 now pays for itself rather than needing cppf.

I think it’d be better if we do 0-fee parents to make the test less brittle. I have a branch that I can push along later shortly after cluster mempool and #33892 is merged: https://github.com/instagibbs/bitcoin/commits/2025-11-1p1c-test-timeout/

As per branch I think just making sure the orphanage is actually wiped before doing submitpackage would be a minimum viable improvement for now.

maflcko commented at 7:51 pm on December 16, 2025: member

https://github.com/bitcoin/bitcoin/actions/runs/20277892656/job/58232592949#step:11:2724

instagibbs commented at 8:04 pm on December 16, 2025: member

Still waiting on #33892 which would make it a lot easier to make this test easier to reason about (no filling mempool and dealing with dynamic limits)

fanquake closed this on Dec 31, 2025

fanquake referenced this in commit 891aed2f75 on Dec 31, 2025

test: intermittent issue in p2p_1p1c_network.py #33318