ci: add delay between commits while testing all ancestor commits #34943

pull rkrux wants to merge 1 commits into bitcoin:master from rkrux:ancestor-commits changing 1 files +4 −0
  1. rkrux commented at 11:12 am on March 28, 2026: contributor

    Addresses #34731

    It was earlier suspected that there is an intermittent deadlock in the server but following are the reasons why I didn’t believe it is a deadlock issue based on the debug logs and debugger thread backtrace output shared in the issue:

    • The HTTP worker threads (b-http_pool_x) are waiting on the condition variable and not on the mutex that signals that these threads are idle & waiting for work to be assigned to them.
    • The HTTP thread (b-http) is epoll waiting that means it is waiting for a request (or a part of it) to be received.
    • The added logs show that the first few testmempool RPCs were successful and the next one timed out. But the logs don’t show a request for it being logged unlike in the previous ones, hinting that the server never received such a request (or in full) and thus never processed it. Even then the functional test client timed out, which means that it did send it (at least a part of it).

    The large orphan transactions being sent via p2p_orphan_handling.py and p2p_opportunistic_1p1c.py tests are each 780KB in size that are sent sequentially. The test tries to send 60 of them in a loop amounting to 46MB of data over a single HTTP connection that is reused. Later, tcpdump was collected via this PR #34726 (credit maflcko) that showed the HTTP server signalled TCP zero window win 0 intermittently that led to such large requests never being read fully, so the server never processed them and thus never sent a response - the test client rightfully timed out after 30 seconds.

    Interestingly, this intermittent issue has been observed only in the “test ancestor commits” CI job that recently started testing all the commits in the PR, which is more robust, as opposed to testing only the last 6 commits like it used to do earlier. For each commit in the PR, this job runs 16 tests in parallel where the CPU nproc is 8. These two are the only tests that send such large orphans to the server in the same instance 50-60 times amounting to 45MB being sent in a burst. I’ve noticed this issue in this job never in the first commit being tested but instead only in the subsequent ones.

    Adding a 3s delay before testing each commit in the “test ancestor commits job” seems to get rid of the issue, the passing CI job is here that tests 125 commits.

    Note: The previous approach I tried in PR #34847 added a 50ms delay before sending such large orphan transaction requests but I prefer this solution over it because this change is restricted in this CI job only that fails.

  2. DrahtBot added the label Tests on Mar 28, 2026
  3. DrahtBot commented at 11:12 am on March 28, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34847 (test: throttle large orphan transactions while being sent in RPCs by rkrux)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label CI failed on Mar 28, 2026
  5. rkrux commented at 1:28 pm on March 28, 2026: contributor
  6. rkrux force-pushed on Mar 28, 2026
  7. rkrux force-pushed on Mar 29, 2026
  8. rkrux commented at 4:13 am on March 29, 2026: contributor
    With -j=nproc//2 the error didn’t occur but the tests were slow by factor of more than 3.5 that led to the job being cancelled at the end because all the commits could not be tested (only 45 could). So, this is not a viable solution.
  9. DrahtBot removed the label CI failed on Mar 29, 2026
  10. rkrux commented at 9:23 am on March 29, 2026: contributor

    So adding a 3s delay before testing each commit in the “test ancestor commits job” seems to fix the issue: https://github.com/bitcoin/bitcoin/actions/runs/23701103981/job/69044839367?pr=34943

    I suppose adding a delay before sending large orphan transactions over the wire doesn’t seem necessary as done in PR #34847.

  11. rkrux renamed this:
    ci: ancestor commits [draft, ignore]
    ci: add delay between commit while testing all ancestor commits
    on Mar 29, 2026
  12. rkrux renamed this:
    ci: add delay between commit while testing all ancestor commits
    ci: add delay between commits while testing all ancestor commits
    on Mar 29, 2026
  13. rkrux force-pushed on Mar 29, 2026
  14. rkrux marked this as ready for review on Mar 29, 2026
  15. in .github/ci-test-each-commit-exec.py:23 in 103036adc7 outdated
    19@@ -19,6 +20,7 @@ def run(cmd, **kwargs):
    20 
    21 def main():
    22     print("Running tests on commit ...")
    23+    time.sleep(3)
    


    maflcko commented at 6:13 am on March 30, 2026:
    I don’t like hard sleeps, especially if they come without any reason. This should add a comment to say it is a temporary workaround for something.

    rkrux commented at 6:31 am on March 30, 2026:
    Added comment here and reworded from “fixes” to “addresses” in PR description.
  16. DrahtBot approved
  17. DrahtBot commented at 6:17 am on March 30, 2026: contributor
    .
  18. maflcko approved
  19. maflcko commented at 6:19 am on March 30, 2026: member

    Fixes #34731

    I don’t think this is a fix. This is just hiding the issue temporarily, not fixing the underlying issue.

    lgtm either way as a temp fix

  20. ci: add delay between commits while testing ancestor commits
    This patch adds a 3s delay before starting testing the next commit.
    This is done to give the CI instance some time to clean up in this
    job that can potentially test upto 150 commits or so.
    8e3af53081
  21. rkrux force-pushed on Mar 30, 2026
  22. maflcko commented at 6:33 am on March 30, 2026: member

    lgtm ACK 8e3af53081b7d88204c09a75ecb7d00cfe64a41f

    I don’t understand how and why this even affects #34731, but if this is really a workaround, it seems simply and harmless enough.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-31 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me