Fixes #20934 by using the "sync up" method described in #20538 (comment).
After improving robustness with this approach (commits 1-3), it turned out that there were still some fails, but those were unrelated to zmq: Out of 500 runs, 3 times sync_mempool() or sync_blocks() timed out, which can happen because the trickle relay time has no upper bound -- hence in rare cases, it takes longer than 60s. This is fixed by enabling immediate tx relay on node1 (commit 4), which as a nice side-effect also gives us a rough 2x speedup for the test.
For further details, also see the explanations in the commit messages.
There is no guarantee that the test is still not flaky, but it would help if potential reviewers would run the following script locally and report how many runs failed (feel free to do less than 1000 runs, as this takes quite a long if ran with --valgrind):
#!/bin/sh
OUTPUT_FILE=./zmq_results
echo ===== repeated zmq test ===== > $OUTPUT_FILE
for i in `seq 1000`; do
echo ------------------------
echo ----- test run $i -----
echo ------------------------
echo --- $i --- >> $OUTPUT_FILE
./test/functional/interface_zmq.py --valgrind
if [ $? -ne 0 ]; then
echo "FAILED. /o\\" >> $OUTPUT_FILE
else
echo "PASSED. \\o/" >> $OUTPUT_FILE
fi
done
echo Failed test runs:
grep FAILED $OUTPUT_FILE | wc -l