Fix zmq test flakiness #20934

issue MarcoFalke openend this issue on January 14, 2021
  1. MarcoFalke commented at 3:26 pm on January 14, 2021: member

    There are many reports of the test being flaky: #20672 (comment)

    Thus, it should be made more robust, as described in #20538 (comment)

    Useful skills:

    • Background in our functional test suite (python3)
    • Background in zmq

    Want to work on this issue?

    For guidance on contributing, please read CONTRIBUTING.md before opening your pull request.

  2. MarcoFalke added the label Tests on Jan 14, 2021
  3. MarcoFalke added the label good first issue on Jan 14, 2021
  4. adamjonas commented at 8:15 pm on January 14, 2021: member
    Of the last 571 failures, 22 are from the interface_zmq.py functional tests (3.8%). According to the numbers, it’s the flakiest functional tests we have. @domob1812 @theStack @mruddy @n-thumann are any of you willing to give this a shot?
  5. theStack commented at 5:49 pm on January 17, 2021: member

    Took some time to look at the problem, it seems to be quite tricky to solve in a solid way. I tried the suggested method of “syncing up” via repeatedly generating a block and waiting for the expected message (until it doesn’t timeout anymore), but generating a block seems to interfere with some of the sub-tests. It also already generates notification messages for our subs that are received later (even if we are not connected yet). Maybe something like this would work:

    • restart node with additional pubhashtx test publisher (on a port not used by any of the test subs)
    • repeatedly generate block and wait for expected messages from test publisher, until it doesn’t time out anymore
    • invalidate generated blocks
    • clear mempool (needed?)
    • read from our subscriber sockets until there is no data (a “reverse flush” so to say)

    Maybe I’m thinking too complicated though. Whatever the solution will be, at least having a common test setup method should serve as a better basis for solving this issue: #20953

  6. instagibbs commented at 2:58 am on January 18, 2021: member

    but generating a block seems to interfere with some of the sub-tests

    Yes it would require making all the subtests more robust I think.

    alternative setup

    Seems pretty complicated, and with intentional block rollbacks things can get weird.

  7. fanquake referenced this in commit 3734adba39 on Jan 21, 2021
  8. sidhujag referenced this in commit 4dceb42b8b on Jan 21, 2021
  9. MarcoFalke commented at 8:06 am on January 22, 2021: member
    Could a mempool tx be used to sync up instead of a block?
  10. practicalswift commented at 11:29 am on January 26, 2021: contributor

    What about temporarily disabling interface_zmq.py in CI until this is fixed?

    It seems to me that interface_zmq.py as it is currently working is a net negative from a CI testing perspective due to its extreme flakiness :)

  11. instagibbs commented at 11:53 am on January 26, 2021: member

    How often is it failing?

    On Tue, Jan 26, 2021, 7:30 PM practicalswift notifications@github.com wrote:

    What about temporarily disabling interface_zmq.py in CI until this is fixed?

    It seems to me that interface_zmq.py as it is currently working is a net negative from a testing perspective due to its extreme flakiness :)

    — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/issues/20934#issuecomment-767482690, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU3JOQHWO4XZFSHAKP3S32RTTANCNFSM4WCTAKVQ .

  12. MarcoFalke commented at 11:57 am on January 26, 2021: member

    Of the last 571 failures, 22 are from the interface_zmq.py functional tests (3.8%). According to the numbers, it’s the flakiest functional tests we have.

    (quote from @adamjonas )

  13. MarcoFalke closed this on Feb 16, 2021

  14. sidhujag referenced this in commit 31ef542332 on Feb 16, 2021
  15. adamjonas reopened this on Mar 1, 2021

  16. adamjonas commented at 4:03 pm on March 1, 2021: member

    interface_zmq.py flakiness is back and I think #21008 is hurting more than helping.

    Before merge of #21008 on 2/16 (Feb 12-15): Failed 1 time on 1 PR (1,274 bullds)

    Same Friday to Monday time period after merge (Feb 19-22): Failed 11 times across different 9 PRs (1,470 total builds)

  17. MarcoFalke closed this on Mar 2, 2021

  18. MarcoFalke commented at 10:31 am on March 2, 2021: member
    Fixed in #21216 ?
  19. adamjonas commented at 10:59 pm on March 2, 2021: member
    ref #21310
  20. Fabcien referenced this in commit 30b874af38 on Nov 30, 2021
  21. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-04 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me