test: Extends wait_for_getheaders so a specific block hash can be checked #29736

sr-gi commented at 3:12 PM on March 26, 2024: member

Previously, wait_for_getheaders would check whether a node had received any getheaders message. This implied that, if a test needed to check for a specific block hash within a headers message, it had to make sure that it was checking the desired message. This normally involved having to manually clear last_message. This method, apart from being too verbose, was error-prone, given an undesired getheaders would make tests pass.

This adds the ability to check for a specific block_hash within the last getheaders message.

DrahtBot commented at 3:12 PM on March 26, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	BrandonOdiwuor, cbergqvist, stratospher, achow101
Concept ACK	kevkevinpal
Stale ACK	brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29748 (test: Makes wait_for_getdata delete data on checks, plus allows to check the getdata message type by sr-gi)
#29575 (net_processing: make any misbehavior trigger immediate discouragement by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Mar 26, 2024

sr-gi commented at 3:14 PM on March 26, 2024: member

I've added a default case for block_hash=None, for the cases in which we may care about a message being received (for flow control), but not really about the content of it. FullBlockTest::bootstrap_p2p in feature_block.py is a good example of this.

sr-gi force-pushed on Mar 26, 2024

DrahtBot added the label CI failed on Mar 26, 2024

DrahtBot commented at 3:17 PM on March 26, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

<sub>Debug: https://github.com/bitcoin/bitcoin/runs/23110398265</sub>

sr-gi force-pushed on Mar 26, 2024

sr-gi commented at 3:52 PM on March 26, 2024: member

Something that may be worth discussing is whether we want to assert that the desired block hash is within the hashes reported on the getheaders message (current approach), or if we want to make sure it matches the first one (i.e. the closest to the tip).

For the patched tests, I've made sure that it was the case it matched the closest to the tip, but its parent would also have made the test pass.

sr-gi force-pushed on Mar 26, 2024

sr-gi commented at 4:27 PM on March 26, 2024: member

@maflcko @stratospher

DrahtBot removed the label CI failed on Mar 26, 2024

kevkevinpal commented at 1:13 AM on March 27, 2024: contributor

Concept ACK 3520d28

just an observation, It seems like we only use block_hash=None in feature_block.py

Something that may be worth discussing is whether we want to assert that the desired block hash is within the hashes reported on the getheaders message (current approach), or if we want to make sure it matches the first one (i.e. the closest to the tip).

If we're looking for the block_hash as the tip value is there a chance we can miss the correct block_hash being the tip causing the test to fail?

in test/functional/test_framework/p2p.py:647 in 3520d28a6e outdated

 643 | @@ -644,15 +644,17 @@ def test_function():
 644 |  
 645 |          self.wait_until(test_function, timeout=timeout)
 646 |  
 647 | -    def wait_for_getheaders(self, timeout=60):
 648 | -        """Waits for a getheaders message.
 649 | +    def wait_for_getheaders(self, block_hash=None, timeout=60):

maflcko commented at 8:25 AM on March 27, 2024:

    def wait_for_getheaders(self, block_hash=None, *, timeout=60):

unrelated style nit, while touching this: Could make timeout a keyword-only argument?

sr-gi commented at 9:29 AM on March 27, 2024:

Interesting, no other wait_for_* method has it like that. What's the rationale?

maflcko commented at 9:50 AM on March 27, 2024:

Generally, I find it preferable when code is self-documenting. Forcing named arguments ensures that arguments are named at the call site, especially when they are integral literals.

For example:

peer.wait_for_getheaders(tip, 99)

may be questionable as to what it means, and even brittle, if in the future the function takes another integral parameter. However,

peer.wait_for_getheaders(tip, timeout=99)

makes it clear that 99 means timeout, with or without the function taking other parameters of the same type (or types that can silently decay into int in python).

maflcko commented at 9:52 AM on March 27, 2024:

(Another reason is that the timeout should rarely be overwritten from the default value, so forcing the code to be longer is a way to force developers to re-think whether it is needed)

sr-gi commented at 10:07 AM on March 27, 2024:

Oh, I agree with you, my point is why do we want to do it for this one but not for any of the others wait_for_* methods?

maflcko commented at 10:14 AM on March 27, 2024:

If you want to do it for the others as well, it'd be great. Make sure to put it into a separate commit, to aid review.

sr-gi commented at 11:41 AM on March 27, 2024:

Do you think this should be part of the same PR, or should I open a separate one to cover all cases?

maflcko commented at 11:58 AM on March 27, 2024:

Up to you :)

sr-gi commented at 12:42 PM on March 27, 2024:

Tackled in #29750

maflcko approved

maflcko commented at 8:28 AM on March 27, 2024: member

lgtm

Partially fixes #18614

I think this fully solves the remainder, or is something left to be done afterwards?

in test/functional/mining_basic.py:311 in 3520d28a6e outdated

 307 | @@ -308,7 +308,7 @@ def chain_tip(b_hash, *, status='headers-only', branchlen=1):
 308 |  
 309 |          # Should ask for the block from a p2p node, if they announce the header as well:
 310 |          peer = node.add_p2p_connection(P2PDataStore())
 311 | -        peer.wait_for_getheaders(timeout=5)  # Drop the first getheaders
 312 | +        peer.wait_for_getheaders(timeout=5, block_hash=block.hashPrevBlock)  # Drop the first getheaders

maflcko commented at 8:31 AM on March 27, 2024:

unrelated question: What does "drop" mean in the comment? I think nothing is dropped?

sr-gi commented at 9:31 AM on March 27, 2024:

I think that may be an old comment, given there is no dropping (there nor later by manually accessing last_message). I'll get rid of it.

maflcko commented at 9:54 AM on March 27, 2024:

According to fa091b001605c4481fb4eca415929a98d3478549, it was written by me :man_facepalming:

sr-gi commented at 9:23 AM on March 27, 2024: member

lgtm

Partially fixes #18614

I think this fully solves the remainder, or is something left to be done afterwards?

You're right, I thought there may be other methods that also needed patching (and didn't check that was not the case 😅 ).

I'm working on a cleanup PR for things that were left out after #18690, but that should be unrelated

brunoerg commented at 9:24 AM on March 27, 2024: contributor

Concept ACK

sr-gi commented at 9:39 AM on March 27, 2024: member

Concept ACK 3520d28

just an observation, It seems like we only use block_hash=None in feature_block.py

Yeah, that's done on purpose. The getheaders received in feature_block.py is part of the reconnection. The test is not really expecting that a getheaders with a specific block hash is received, but that the message is received as part of the connection flow. Trying to make it assert an specific block hash would make the test more verbose with no apparent benefits

Something that may be worth discussing is whether we want to assert that the desired block hash is within the hashes reported on the getheaders message (current approach), or if we want to make sure it matches the first one (i.e. the closest to the tip).

If we're looking for the block_hash as the tip value is there a chance we can miss the correct block_hash being the tip causing the test to fail?

That could imply that the test is wrong. The current approach is more permissive, but it means that a test could pass unexpectedly. e.g; Imagine our current chain looks like this:

Genesis - B1 - ... - BN - BN+1 - BN+2 (tip)

We wait to receive a getheaders containing BN, and we receive:

[BN+1, BN, ..., Genesis]

In this case, the test will pass because our expected hash is within the received hashes, but likely the test should have waited for BN+1.

It really depends on what we want to check, and the flexibility we want to give to the method though

sr-gi force-pushed on Mar 27, 2024

in test/functional/test_framework/p2p.py:657 in 247c5a54c0 outdated

 659 | +            last_getheaders = self.last_message.get("getheaders")
 660 | +            if block_hash is None:
 661 | +                 return last_getheaders
 662 | +            if last_getheaders is None:
 663 | +                return False
 664 | +            return block_hash in [hash for hash in last_getheaders.locator.vHave]

brunoerg commented at 12:21 PM on March 27, 2024:

In 247c5a54c028bde21d010299cd447397b25c31ba: vHave is a list so we could do:

            return block_hash in last_getheaders.locator.vHave

sr-gi commented at 12:41 PM on March 27, 2024:

Covered in fb9eb7a

sr-gi force-pushed on Mar 27, 2024

brunoerg approved

brunoerg commented at 12:47 PM on March 27, 2024: contributor

crACK fb9eb7a19d87e1a896ebfb16a09c36cdf3816228

kevkevinpal commented at 12:48 PM on March 27, 2024: contributor

I just tried with return block_hash == last_getheaders.locator.vHave[0] and the tests seemed to have run fine for me. So I have no problem just checking the last block_hash

sr-gi commented at 2:38 PM on March 27, 2024: member

I just tried with return block_hash == last_getheaders.locator.vHave[0] and the tests seemed to have run fine for me. So I have no problem just checking the last block_hash

Yeah, for all patched tests I"ve made sure that was the case. The question was more if generally that would be the expected case

in test/functional/p2p_sendheaders.py:536 in fb9eb7a19d outdated

 534 | @@ -533,15 +535,14 @@ def test_nonnull_locators(self, test_node, inv_node):
 535 |                  block_time += 1
 536 |                  height += 1
 537 |              # Send the header of the second block -> this won't connect.
 538 | -            with p2p_lock:

stratospher commented at 4:58 AM on March 30, 2024:

if we remove this, how can we ensure that we really received a new getheaders message and that it isn't the same getheaders message from a previous iteration?

sr-gi commented at 11:13 AM on April 1, 2024:

This should be safe now after 80ec483

stratospher commented at 5:09 AM on March 30, 2024: contributor

yay, nice! just 1 question, will ACK after that.

in test/functional/p2p_sendheaders.py:338 in fb9eb7a19d outdated

 334 | @@ -334,7 +335,7 @@ def test_nonnull_locators(self, test_node, inv_node):
 335 |                  if j == 0:
 336 |                      # Announce via inv
 337 |                      test_node.send_block_inv(tip)
 338 | -                    test_node.wait_for_getheaders()
 339 | +                    test_node.wait_for_getheaders(block_hash=expected_hash)

stratospher commented at 6:26 AM on March 30, 2024:

oh this is no longer true after 05f7f31. we will only receive a getheaders the first time because of m_inv_triggered_getheaders_before_sync. found this issue when i ran the tests by clearing old getheaders as mentioned in #29736 (review).

sr-gi commented at 10:17 AM on April 1, 2024:

Agreed, it looks like it was already the case that we were asserting the same getheaders message given it was never popped in this test.

I guess we can check it now as:

if i == 0:
    test.node.wait_for_getheaders(block_hash=expected_hash)
else:
    test_node.last_message.get("getheaders") is None

in test/functional/test_framework/p2p.py:652 in fb9eb7a19d outdated

 654 | -        immediately with success. TODO: change this method to take a hash value and only
 655 | -        return true if the correct block header has been requested."""
 656 | +        If no block hash is provided, checks whether any getheaders message has been received by the node."""
 657 |          def test_function():
 658 | -            return self.last_message.get("getheaders")
 659 | +            last_getheaders = self.last_message.get("getheaders")

stratospher commented at 7:19 AM on March 30, 2024:

maybe this for #29736 (review)?

            last_getheaders = self.last_message.pop("getheaders", None)

sr-gi commented at 9:49 AM on March 31, 2024:

That manually check that last_message has a certain message after waiting fail.

What if we pass a flag (remove=False) and, if set, the element is poped, otherwise only checked?

sr-gi commented at 9:49 AM on March 31, 2024:

That manually check that last_message has a certain message after waiting fail.

What if we pass a flag (remove=False) and, if set, the element is poped, otherwise only checked?

stratospher commented at 5:49 AM on April 1, 2024:

do you have any advantage/test scenario in mind where we only need to check and not pop?

i couldn't think of a situation where we still require the contents of getheaders message. possibilities:

counting bytes of message received (as in rpc_net.py) - but we don't need functions like wait_for_getheaders there.
even on_getheeaders for processing the message would happen before wait_for_getheaders

ideally we need to pop before and not after calling wait_for_getheaders function. don't think that can be done though.

(also gentle ping @theStack in case you have thoughts about this approach.)

sr-gi commented at 11:07 AM on April 1, 2024:

I don't really. I guess it depends on the expectations of what wait_for_getheaders is doing.

I don't think any current test fails if we change the behavior, so as long as we stick with that assumption it should be ok

stratospher commented at 7:25 AM on March 30, 2024: contributor

Something that may be worth discussing is whether we want to assert that the desired block hash is within the hashes reported on the getheaders message (current approach), or if we want to make sure it matches the first one (i.e. the closest to the tip).

For the patched tests, I've made sure that it was the case it matched the closest to the tip, but its parent would also have made the test pass.

i prefer the stricter option since i'd want tests written in the future to be aware of those unexpected passes. if we don't care about the exact content of header, there is the block_hash=None variant anyways.

stratospher commented at 6:03 AM on April 1, 2024: contributor

looks like this wait_until got forgotten. maybe update that to use wait_for_getheaders here too?

sr-gi force-pushed on Apr 1, 2024

DrahtBot added the label CI failed on Apr 1, 2024

DrahtBot commented at 11:11 AM on April 1, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

<sub>Debug: https://github.com/bitcoin/bitcoin/runs/23298959570</sub>

sr-gi commented at 11:12 AM on April 1, 2024: member

Thanks for the thorough review @stratospher

looks like this wait_until got forgotten. maybe update that to use wait_for_getheaders here too?

I've done it in 80ec483. Given we are reusing the same test node throughout the whole file and the state of what block may be expected is not clean, I waited for the message to be seen, but without any expected block check (this should be safe now that we pop the content of the dict).

Also:

I replaced get by pop so we don't need to manually pop to make sure we are not checking an old message
I have strengthened the membership check so the expected block needs to match the tip of vHave instead of just being part of it

DrahtBot removed the label CI failed on Apr 1, 2024

in test/functional/p2p_compactblocks.py:390 in 80ec483a3b outdated

 386 | @@ -387,7 +387,7 @@ def test_compactblock_requests(self, test_node):
 387 |  
 388 |              if announce == "inv":
 389 |                  test_node.send_message(msg_inv([CInv(MSG_BLOCK, block.sha256)]))
 390 | -                test_node.wait_until(lambda: "getheaders" in test_node.last_message, timeout=30)
 391 | +                test_node.wait_for_getheaders()

stratospher commented at 4:15 PM on April 1, 2024:

                test_node.wait_for_getheaders(timeout=30)

sr-gi commented at 7:09 AM on April 2, 2024:

I didn't really see the point of the shorter timeout here, but fair, I guess it's better to leave it as is if this is simply a refactor

in test/functional/p2p_initial_headers_sync.py:83 in 80ec483a3b outdated

  75 | @@ -80,7 +76,6 @@ def run_test(self):
  76 |                  if "getheaders" in p.last_message:
  77 |                      count += 1
  78 |                      peer_receiving_getheaders = p
  79 | -                    p.last_message.pop("getheaders", None)

stratospher commented at 4:24 PM on April 1, 2024:

don't think we can remove this since we need to clear initial getheaders in peer2, peer3 before using wait_for_getheaders the first time so that it detects new getheaders.

sr-gi commented at 7:14 AM on April 2, 2024:

We are asserting this is clean between connecting and announcing the block, so I think this should be safe: https://github.com/bitcoin/bitcoin/blob/80ec483a3b0dfb9e43c06b099eb272813739bd54/test/functional/p2p_initial_headers_sync.py#L60-L62

stratospher commented at 2:05 PM on April 9, 2024:

inserting an assert "getheaders" not in p.last_message fails here because it received a new header after those checks. and the block hash doesn't change here.

sr-gi commented at 10:27 AM on April 10, 2024:

Sure, because "getheaders" is in last_message for that peer, but only for that one as the assert after the loop is checking.

The later wait_for_getheaders (the one on expected_peer) is checked on the other node only. I guess that is not too clear based on how expected_peer is initialized. We could just set expected_peer as follows to make it clearer:

for p in [peer2, peer3]:
    with p2p_lock:
        if "getheaders" in p.last_message:
            count += 1
            p.send_message(msg_headers()) # Send empty response, see above
        else:
            expected_peer = p

stratospher commented at 2:45 PM on April 23, 2024:

oh makes sense! wait_for_getheaders is done on the peer which we made sure didn’t receive getheaders before. i'm fine with leaving it as it is.

in test/functional/p2p_segwit.py:194 in 80ec483a3b outdated

 190 | @@ -191,14 +191,13 @@ def announce_tx_and_wait_for_getdata(self, tx, success=True, use_wtxid=False):
 191 |      def announce_block_and_wait_for_getdata(self, block, use_header, timeout=60):
 192 |          with p2p_lock:
 193 |              self.last_message.pop("getdata", None)
 194 | -            self.last_message.pop("getheaders", None)

stratospher commented at 4:32 PM on April 1, 2024:

this can't be removed either. (same reason as above)

sr-gi commented at 7:26 AM on April 2, 2024:

I don't think this is the same case as above:

Here we are using wait_for_getheaders expecting a specific block hash. Therefore, this should not be the case as long as a specific block is not announced manually and then re-announced using this method, which is not what the test is doing AFAICT

stratospher commented at 2:05 PM on April 9, 2024:

oh good point.

fanquake referenced this in commit 1d8a5f0d9b on Apr 2, 2024

DrahtBot added the label Needs rebase on Apr 2, 2024

sr-gi force-pushed on Apr 2, 2024

sr-gi commented at 10:42 AM on April 2, 2024: member

Rebased to include the changes from #29750 and cover #29736 (review)

DrahtBot removed the label Needs rebase on Apr 2, 2024

sr-gi force-pushed on Apr 2, 2024

DrahtBot added the label CI failed on Apr 2, 2024

sr-gi force-pushed on Apr 3, 2024

test: Extends wait_for_getheaders so a specific block hash can be checked

Previously, `wait_for_getheaders` would check whether a node had received **any**
getheaders message. This implied that, if a test needed to check for a specific block
hash within a headers message, it had to make sure that it was checking the desired message.
This normally involved having to manually clear `last_message`. This method, apart from being
too verbose, was error prone, given an undesired `getheaders` would make tests pass.

This adds the ability to check for a specific block_hash within the last `getheaders` message.

c4f857cc30

sr-gi force-pushed on Apr 4, 2024

DrahtBot removed the label CI failed on Apr 4, 2024

sr-gi commented at 2:12 PM on April 10, 2024: member

I wonder if it may be worth splitting the wait_for_getheaders function so the internal function can also be used externally, so we can use that to check for whether a certain message exists without waiting for them (e.g. checking for the negative case) in a similar fashion as done in bc844fd8a7d130bfb7cf598f5b1acd87acd70e58

This way we would get rid of a bunch of manual checks like:

with p2p_lock:
    assert "getheaders" not in peer.last_message

cc/ @maflcko @stratospher

BrandonOdiwuor approved

BrandonOdiwuor commented at 2:12 PM on April 10, 2024: contributor

crACK c4f857cc301d856f3c60acbe6271d3fe19441c7a

DrahtBot requested review from brunoerg on Apr 10, 2024

DrahtBot requested review from stratospher on Apr 10, 2024

cbergqvist approved

cbergqvist commented at 2:10 PM on April 21, 2024: contributor

ACK c4f857cc301d856f3c60acbe6271d3fe19441c7a

Nice to see the TODO in p2p.py resolved!

Searched through source for "getheaders" to make sure the function had been applied when suitable.

Reasoned through the modified test functions.

Modifications look good taking into account the new behavior of wait_for_getheaders() popping off the message and requiring the block hash to match the top one if provided.

Built and ran functional tests with --extended --exclude=feature_dbcrash with no failures (after cherry-picking 49c0b8b2288e60ae22fcac5d03811cf36ecec058 from #29791 on top of this PR to bump timeouts in 2 tests not touched in this PR).

Nit: Commit message subject is 73 chars long

CONTRIBUTING.md links https://chris.beams.io/posts/git-commit/ which states "consider 72 the hard limit". It further suggests using an "Imperative mood", so you could just change "Extends" -> "Extend" and be done with it.

stratospher commented at 2:46 PM on April 23, 2024: contributor

tested ACK c4f857c. went through all getheaders messages sent in the tests and checked that it's the one we want.

achow101 commented at 5:19 PM on April 25, 2024: member

ACK c4f857cc301d856f3c60acbe6271d3fe19441c7a

achow101 merged this on Apr 25, 2024

achow101 closed this on Apr 25, 2024

sr-gi commented at 6:28 PM on April 25, 2024: member

I wonder if it may be worth splitting the wait_for_getheaders function so the internal function can also be used externally, so we can use that to check for whether a certain message exists without waiting for them (e.g. checking for the negative case) in a similar fashion as done in bc844fd

This way we would get rid of a bunch of manual checks like:
with p2p_lock:
    assert "getheaders" not in peer.last_message
cc/ @maflcko @stratospher

This was never discussed btw, and I wonder if it is worth a followup

bitcoin locked this on Apr 25, 2025