The feature_index_prune.py test seems to be racy because connections are reestablished after restarts and the blocks are synced via the sync_blocks function. The sync_blocks function has a sanity check at the beginning to check that all nodes in the set have at least one established connection and that is not always the case.
As a solution nodes are not connected via the -connect parameter on start but instead via the connect_nodes helper.
theStack
commented at 11:11 pm on May 12, 2022:
member
Concept ACK
Is there an explanation why this race is only (or at least, much more likely, it seems) happening on Windows? If someone knows how to reproduce the fixed failure behaviour on a Linux/BSD system, would be nice to know.
fjahr
commented at 11:29 pm on May 12, 2022:
member
Concept ACK
Is there an explanation why this race is only (or at least, much more likely, it seems) happening on Windows? If someone knows how to reproduce the fixed failure behaviour on a Linux/BSD system, would be nice to know.
Agree, it would be nice to be sure :) My assumption is that the Windows build is just slower than everything else and that’s why we have only seen it there so far. Looking at https://cirrus-ci.com/github/bitcoin/bitcoin it always seems the be the job that takes the longest time.
DrahtBot added the label
Tests
on May 13, 2022
MarcoFalke
commented at 5:27 am on May 13, 2022:
member
why should addconnection be less racy than connect?
Why not use connect_nodes which has the wait_until built into? #25031 (comment)
MarcoFalke
commented at 5:28 am on May 13, 2022:
member
It should be possible to reproduce by adding a sleep in the -connect thread?
fjahr
commented at 3:29 pm on May 14, 2022:
member
I was able to reproduce the issue as well with this, thanks @MarcoFalke !
why should addconnection be less racy than connect?
From my understanding of the code addconnection only returns after the connection has been established (or failed). It doesn’t seem to use threading and since it’s only available for testing I thought something like this should be it’s primary use-case. I confirmed it works with your hint mentioned above.
Why not use connect_nodes which has the wait_until built into? #25031 (comment)
Right, I can replace addconnection here with connect_nodes, I didn’t remember connect_nodes has the wait built into it. It looks a little less efficient (calling more RPCs under the hood) but saves us another line for the ip+port in the test, so I am indifferent between the two. I just didn’t like the one-liner in that link because it’s harder to read and would require a comment to explain that this is a clunky way to protect against a potential race condition.
fjahr force-pushed
on May 14, 2022
fjahr renamed this:
test: Use addconnection for robustness in index prune test
test: Fix race condition in index prune test
on May 14, 2022
test: Fix race condition in index pruning test
Nodes are restarted and reconnected as part of the test. Afterwards
`sync_blocks` is called immediately on the nodes. `sync_blocks`
first checks that all the included nodes have at least one
connection. Since adding a connection is usually happening in a
thread, sometimes nodes could run into this check before the
connection was fully established so that it would fail the entire
test.
This fix uses the `connect_nodes` helper to make the connection the
nodes. `connect_nodes` has a wait for the connection built into it.
4faa550072
fjahr force-pushed
on May 14, 2022
fjahr
commented at 3:38 pm on May 14, 2022:
member
Updated to use connect_nodes helper instead of addconnection RPC.
MarcoFalke merged this
on May 15, 2022
MarcoFalke closed this
on May 15, 2022
MarcoFalke
commented at 7:32 am on May 15, 2022:
member
If addconnection is somehow better than the stuff in connect_nodes, it could make sense to use addconnection in the connect_nodes impl?
fjahr
commented at 12:32 pm on May 15, 2022:
member
If addconnection is somehow better than the stuff in connect_nodes, it could make sense to use addconnection in the connect_nodes impl?
Yeah, I was thinking about that as well. I will give it a try and benchmark.
sidhujag referenced this in commit
d670f9782d
on May 28, 2022
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-22 15:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me