This is a follow-up PR for #21008, fixes #21216.
In the course of investigating the problem with jnewbery (analyzing the Cirrus log https://cirrus-ci.com/task/4660108304056320), it turned out that the “sync up” procedure of repeatedly generating a block and waiting for a notification with timeout is too brittle in its current form, as the following scenario could happen:
- generate block A
- receive notification, timeout happens => repeat procedure
- generate block B
- node publishes block A notification
- receive notification, we receive the one caused by block A (!!!) => sync-up procedure is completed
- node publishes block B notification
- the actual test starts
- on the first notification reception, the one caused by block B is received, rather than the one actually caused by test code => assertion failure
This change in the PR ensures that after each test block generation, we wait for the notification that is actually caused by that block and ignore others from possibly earlier blocks. The matching is kind of ugly, it assumes that one out of four components in the block is contained in the notification: the block hash, the tx id, the raw block data or the raw transaction data. (Unfortunately we have to support all publisher topics.)
I’m aware that this is quite a lot of code now only for establishing a robust test setup. OTOH I wouldn’t know of a better method right now, suggestions are very welcome.
Note for potential reviewers: for both reproducing the issue on master branch and verifying on PR branch, one can simply generate two blocks in the sync-up procedure rather than one.