intermittent timeout in mptest unit test #33244

issue maflcko openend this issue on August 23, 2025
  1. maflcko commented at 9:48 am on August 23, 2025: member

    Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/48709103843 LLM reason (✨ experimental): The CI failure is caused by a test timeout during the execution of the ‘mptest’ test.

    this failure looks real? The unit test should normally pass in a few milliseconds, so taking 40 minutes seems odd?

    https://cirrus-ci.com/task/5714850606743552?logs=ci#L2822: [23:07:33.279] 3/148 Test [#3](/bitcoin-bitcoin/3/): mptest ............................... Passed 0.03 sec https://cirrus-ci.com/task/4911861373599744?logs=ci#L3101: [22:41:29.095] 148/148 Test [#3](/bitcoin-bitcoin/3/): mptest ...............................***Timeout 2400.10 sec

    Originally posted by @maflcko in #33241 (comment)

  2. maflcko added the label CI failed on Aug 23, 2025
  3. fanquake added this to the milestone 30.0 on Aug 23, 2025
  4. ryanofsky commented at 6:09 pm on August 23, 2025: contributor

    As noted #33241 (comment), I’m pretty sure this is caused by https://github.com/bitcoin-core/libmultiprocess/issues/189. It’s possible to reproduce the issue locally by just running mptest in a loop thousands of times until it locks up.

    https://github.com/bitcoin-core/libmultiprocess/issues/189 happens because the new “disconnecting and blocking” test introduced in https://github.com/bitcoin-core/libmultiprocess/issues/160 tests for for problems with unclean disconnections that weren’t previously detected. The most common issues with unclean disconnections were fixed in https://github.com/bitcoin-core/libmultiprocess/issues/160. But two more issues with unclean disconnections that happened reliably in CI were fixed https://github.com/bitcoin-core/libmultiprocess/pull/186, and one more unclean disconnect issue that happens more rarely and isn’t fixed yet is described in https://github.com/bitcoin-core/libmultiprocess/issues/189. The issue is debugged and I think should not be hard to fix but I wanted to hold off because the previous fixes had a bunch of manual testing and seemed to work well in practice while this issue was more artificial, happening as a result of the way the test was written.

  5. Sjors commented at 8:08 am on September 1, 2025: member
    Now that the subtree was updated with #33241 and most cases are fixed, do we still want to fix the more rare https://github.com/bitcoin-core/libmultiprocess/issues/189 for the v30 milestone?
  6. maflcko commented at 8:15 am on September 1, 2025: member

    ctest doesn’t have a default timeout, so it would be a bit odd to expose users to a unit test run that never finishes, albeit rarely?

    this issue was more artificial, happening as a result of the way the test was written.

    This feature is experimental anyway, so maybe the unit test could be rewritten or removed temporarily for the 30.x release branch, if fixing it is too invasive for now?

  7. Sjors commented at 8:18 am on September 1, 2025: member
    Or we could add a timeout for this specific test. I don’t think we should remove it, because we want to catch unknown issues on platforms / circumstances that our CI doesn’t cover.
  8. maflcko commented at 8:36 am on September 1, 2025: member

    add a timeout

    Sure, but I’d say the timeout should be added in the C++ code, not in ctest, possibly with an error message explaining the known issue.

  9. ismaelsadeeq commented at 6:41 pm on September 1, 2025: member
  10. maflcko commented at 6:34 am on September 2, 2025: member
    At the same time, it seems to happen frequently in CI, so I prefer my initial suggestion to either rewrite the failing unit test or remove it temporarily.
  11. ryanofsky commented at 10:58 am on September 2, 2025: contributor

    re: #33244 (comment)

    At the same time, it seems to happen frequently in CI, so I prefer my initial suggestion to either rewrite the failing unit test or remove it temporarily.

    Sorry, I’ve been very distracted the past two weeks but I should be able to post a fix for this today.

    It’d also be completely reasonable to disable the test, though I’m sure what a good way to disable it is because it’s in a subtree.

  12. ryanofsky referenced this in commit 2b05ed71af on Sep 2, 2025
  13. ryanofsky commented at 8:24 pm on September 2, 2025: contributor
    I think https://github.com/bitcoin-core/libmultiprocess/pull/201 should fix this. I haven’t tested it very much yet, but the bug hasn’t happened running mptest in a loop for about an hour.
  14. ryanofsky referenced this in commit cd6d95aef8 on Sep 3, 2025
  15. ryanofsky referenced this in commit 9fe7f0c841 on Sep 5, 2025
  16. ryanofsky referenced this in commit 7a8a99c50f on Sep 5, 2025
  17. Sjors referenced this in commit 9fe5d03058 on Sep 10, 2025
  18. Sjors referenced this in commit 5b65dae936 on Sep 10, 2025
  19. ryanofsky referenced this in commit b6321bb99c on Sep 10, 2025
  20. ryanofsky referenced this in commit d86823eed7 on Sep 10, 2025
  21. ryanofsky referenced this in commit 4a269b21b8 on Sep 11, 2025
  22. ryanofsky referenced this in commit a0952c2d0f on Sep 17, 2025
  23. ryanofsky referenced this in commit 47d79db8a5 on Sep 17, 2025
  24. fanquake referenced this in commit edb871cba2 on Sep 19, 2025
  25. fanquake commented at 9:54 am on September 19, 2025: member
    Closing this, given #33412.
  26. fanquake closed this on Sep 19, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-26 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me