intermittent timeout in mptest unit test #33244

issue maflcko openend this issue on August 23, 2025
  1. maflcko commented at 9:48 am on August 23, 2025: member

    Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/48709103843 LLM reason (✨ experimental): The CI failure is caused by a test timeout during the execution of the ‘mptest’ test.

    this failure looks real? The unit test should normally pass in a few milliseconds, so taking 40 minutes seems odd?

    https://cirrus-ci.com/task/5714850606743552?logs=ci#L2822: [23:07:33.279] 3/148 Test [#3](/bitcoin-bitcoin/3/): mptest ............................... Passed 0.03 sec https://cirrus-ci.com/task/4911861373599744?logs=ci#L3101: [22:41:29.095] 148/148 Test [#3](/bitcoin-bitcoin/3/): mptest ...............................***Timeout 2400.10 sec

    Originally posted by @maflcko in #33241 (comment)

  2. maflcko added the label CI failed on Aug 23, 2025
  3. fanquake added this to the milestone 30.0 on Aug 23, 2025
  4. ryanofsky commented at 6:09 pm on August 23, 2025: contributor

    As noted #33241 (comment), I’m pretty sure this is caused by https://github.com/bitcoin-core/libmultiprocess/issues/189. It’s possible to reproduce the issue locally by just running mptest in a loop thousands of times until it locks up.

    https://github.com/bitcoin-core/libmultiprocess/issues/189 happens because the new “disconnecting and blocking” test introduced in https://github.com/bitcoin-core/libmultiprocess/issues/160 tests for for problems with unclean disconnections that weren’t previously detected. The most common issues with unclean disconnections were fixed in https://github.com/bitcoin-core/libmultiprocess/issues/160. But two more issues with unclean disconnections that happened reliably in CI were fixed https://github.com/bitcoin-core/libmultiprocess/pull/186, and one more unclean disconnect issue that happens more rarely and isn’t fixed yet is described in https://github.com/bitcoin-core/libmultiprocess/issues/189. The issue is debugged and I think should not be hard to fix but I wanted to hold off because the previous fixes had a bunch of manual testing and seemed to work well in practice while this issue was more artificial, happening as a result of the way the test was written.

  5. Sjors commented at 8:08 am on September 1, 2025: member
    Now that the subtree was updated with #33241 and most cases are fixed, do we still want to fix the more rare https://github.com/bitcoin-core/libmultiprocess/issues/189 for the v30 milestone?
  6. maflcko commented at 8:15 am on September 1, 2025: member

    ctest doesn’t have a default timeout, so it would be a bit odd to expose users to a unit test run that never finishes, albeit rarely?

    this issue was more artificial, happening as a result of the way the test was written.

    This feature is experimental anyway, so maybe the unit test could be rewritten or removed temporarily for the 30.x release branch, if fixing it is too invasive for now?

  7. Sjors commented at 8:18 am on September 1, 2025: member
    Or we could add a timeout for this specific test. I don’t think we should remove it, because we want to catch unknown issues on platforms / circumstances that our CI doesn’t cover.
  8. maflcko commented at 8:36 am on September 1, 2025: member

    add a timeout

    Sure, but I’d say the timeout should be added in the C++ code, not in ctest, possibly with an error message explaining the known issue.

  9. ismaelsadeeq commented at 6:41 pm on September 1, 2025: member
  10. maflcko commented at 6:34 am on September 2, 2025: member
    At the same time, it seems to happen frequently in CI, so I prefer my initial suggestion to either rewrite the failing unit test or remove it temporarily.
  11. ryanofsky commented at 10:58 am on September 2, 2025: contributor

    re: #33244 (comment)

    At the same time, it seems to happen frequently in CI, so I prefer my initial suggestion to either rewrite the failing unit test or remove it temporarily.

    Sorry, I’ve been very distracted the past two weeks but I should be able to post a fix for this today.

    It’d also be completely reasonable to disable the test, though I’m sure what a good way to disable it is because it’s in a subtree.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-02 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me