qa: PermissionError in functional tests on Windows #28529

issue hebasto openend this issue on September 25, 2023
  1. hebasto commented at 12:18 pm on September 25, 2023: member

    Having an intermittent error in my local Windows:

     0> py -3 test\functional\test_runner.py --jobs 40 --timeout-factor=8
     1...
     2165/269 - feature_versionbits_warning.py failed, Duration: 59 s
     3
     4
     5stdout:
     62023-09-25T11:29:08.904000Z TestFramework (INFO): PRNG seed is: 6135880998339750841
     72023-09-25T11:29:09.019000Z TestFramework (INFO): Initializing test directory C:\Users\hebasto\AppData\Local\Temp\test_runner_β‚Ώ_πŸƒ_20230925_122333\feature_versionbits_warning_108
     82023-09-25T11:29:25.118000Z TestFramework (INFO): Check that there is no warning if previous VB_BLOCKS have <VB_THRESHOLD blocks with unknown versionbits version.
     92023-09-25T11:29:43.242000Z TestFramework (INFO): Check that there is a warning if previous VB_BLOCKS have >=VB_THRESHOLD blocks with unknown versionbits version.
    102023-09-25T11:29:50.044000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:12296 due to [WinError 10054] An existing connection was forcibly closed by the remote host
    112023-09-25T11:30:05.174000Z TestFramework (INFO): Stopping nodes
    122023-09-25T11:30:05.548000Z TestFramework (INFO): Cleaning up C:\Users\hebasto\AppData\Local\Temp\test_runner_β‚Ώ_πŸƒ_20230925_122333\feature_versionbits_warning_108 on exit
    132023-09-25T11:30:05.548000Z TestFramework (INFO): Tests successful
    14
    15
    16stderr:
    17Traceback (most recent call last):
    18  File "C:\Users\hebasto\bitcoin/test/functional/feature_versionbits_warning.py", line 103, in <module>
    19    VersionBitsWarningTest().main()
    20  File "C:\Users\hebasto\bitcoin\test\functional\test_framework\test_framework.py", line 154, in main
    21    exit_code = self.shutdown()
    22                ^^^^^^^^^^^^^^^
    23  File "C:\Users\hebasto\bitcoin\test\functional\test_framework\test_framework.py", line 361, in shutdown
    24    shutil.rmtree(self.options.tmpdir)
    25  File "C:\Python311\Lib\shutil.py", line 759, in rmtree
    26    return _rmtree_unsafe(path, onerror)
    27    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    28File "C:\Python311\Lib\shutil.py", line 617, in _rmtree_unsafe
    29    _rmtree_unsafe(fullname, onerror)
    30File "C:\Python311\Lib\shutil.py", line 617, in _rmtree_unsafe
    31    _rmtree_unsafe(fullname, onerror)
    32File "C:\Python311\Lib\shutil.py", line 622, in _rmtree_unsafe
    33    onerror(os.unlink, fullname, sys.exc_info())
    34File "C:\Python311\Lib\shutil.py", line 620, in _rmtree_unsafe
    35    os.unlink(fullname)
    36PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\hebasto\\AppData\\Local\\Temp\\test_runner_β‚Ώ_πŸƒ_20230925_122333\\feature_versionbits_warning_108\\node0\\regtest\\debug.log'
    37
    38
    39166/269 - feature_startupnotify.py passed, Duration: 30 s
    40...
    
  2. hebasto added the label Windows on Sep 25, 2023
  3. hebasto added the label Tests on Sep 25, 2023
  4. maflcko commented at 12:23 pm on September 25, 2023: member
    See also #28115
  5. hebasto commented at 9:38 pm on September 25, 2023: member
    0PermissionError: [WinError 32] The process cannot access the file because it is being used by another process...
    

    Trying to collect some details with Sysinternals handle64.exe makes the issue disappear :man_shrugging:

  6. hebasto commented at 2:29 pm on January 8, 2024: member

    https://github.com/bitcoin/bitcoin/actions/runs/7433919780/job/20227517742:

     02024-01-06T21:17:20.926000Z TestFramework (INFO): Tests successful
     1
     2
     3
     4stderr:
     5Traceback (most recent call last):
     6
     7  File "D:\a\bitcoin\bitcoin/test/functional/tool_utxo_to_sqlite.py", line 111, in <module>
     8
     9    UtxoToSqliteTest().main()
    10
    11  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 154, in main
    12
    13    exit_code = self.shutdown()
    14
    15                ^^^^^^^^^^^^^^^
    16
    17  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 363, in shutdown
    18
    19    shutil.rmtree(self.options.tmpdir)
    20
    21  File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 808, in rmtree
    22
    23    return _rmtree_unsafe(path, onexc)
    24
    25           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    26
    27  File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 636, in _rmtree_unsafe
    28
    29    onexc(os.unlink, fullname, err)
    30
    31  File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 634, in _rmtree_unsafe
    32
    33    os.unlink(fullname)
    34
    35PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\_temp\\test_runner_β‚Ώ_πŸƒ_20240106_210717\\tool_utxo_to_sqlite_113\\utxos.sqlite'
    
  7. hebasto renamed this:
    qa: No access to `debug.log` in functional tests on Windows
    qa: `PermissionError` in functional tests on Windows
    on Jan 8, 2024
  8. fanquake commented at 4:39 pm on November 11, 2024: member

    Saw a failure that matches the first portion of the issue here. https://github.com/bitcoin/bitcoin/actions/runs/11781528801/job/32814548437#step:12:69. Unclear if this should be a new issue, but couldn’t seem to match it to an existing issue.

    0 269/316 - p2p_unrequested_blocks.py failed, Duration: 1 s
    1
    2stdout:
    32024-11-11T16:26:48.484000Z TestFramework (INFO): PRNG seed is: 2650025219029138162
    4
    52024-11-11T16:26:48.518000Z TestFramework (INFO): Initializing test directory D:\a\_temp\test_runner_β‚Ώ_πŸƒ_20241111_160404\p2p_unrequested_blocks_45
    6
    72024-11-11T16:26:48.999000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:11540 due to [WinError 10054] An existing connection was forcibly closed by the remote host
    8
    92024-11-11T16:26:49.060000Z TestFramework (ERROR): Assertion failed
    
  9. maflcko commented at 5:03 pm on November 11, 2024: member
    Has anyone ever seen any of the warnings or errors locally?
  10. willcl-ark commented at 3:25 pm on October 21, 2025: member
    Since the problem isn’t easily reproducible, I’m closing this for now. If it occurs again, please feel free to open a new issue or comment here so we can reopen it.
  11. willcl-ark closed this on Oct 21, 2025

  12. willcl-ark commented at 1:23 pm on March 13, 2026: member
    I think this has happened again on the merge commit failure earlier today: https://github.com/bitcoin/bitcoin/actions/runs/23042377679/job/66923350385#step:16:585
  13. maflcko reopened this on Mar 13, 2026

  14. maflcko added the label CI failed on Mar 13, 2026
  15. willcl-ark commented at 2:03 pm on March 13, 2026: member

    We could “fix” this by adding a timeout/retry around the rmtree call in test_framework.py, but I don’t really like adding more magic sleeps and loops.

    I talked with an agent about this:

    The root cause is the alertnotify subprocess on Windows.

    1. feature_versionbits_warning.py line 37 configures -alertnotify=echo %s » “alert.txt” for the node.
    2. When the test triggers the unknown versionbits warning (line 95-96), kernel_notifications.cpp:44-45 fires AlertNotify, which spawns a detached thread calling runCommand(): std::thread t(runCommand, strCmd); t.detach(); // thread runs free
    3. On Windows, runCommand() (system.cpp:56) calls _wsystem(), which spawns cmd.exe /c “echo … » alert.txt”. On Windows, child processes inherit the parent’s file handles by default β€” including the open handle to debug.log.
    4. The test completes, stop_node() is called, which issues an RPC stop and waits for the bitcoind process to exit (wait_until_stopped). But it does not wait for the detached cmd.exe subprocess.
    5. shutdown() (test_framework.py:334) immediately calls shutil.rmtree() with no retry logic. If cmd.exe hasn’t fully exited and released the inherited debug.log handle, Windows refuses to delete the file β†’ PermissionError [WinError 32].

    Why it’s flaky: It depends on whether cmd.exe exits and releases the inherited handle before shutil.rmtree gets to debug.log. Usually it does; occasionally it doesn’t.

    Why it’s Windows-only: On POSIX, system() doesn’t cause child processes to hold open file descriptors in a way that blocks unlink. On Windows, file deletion is blocked while any process holds an open handle.

    Two contributing issues:

    1. C++ side: AlertNotify uses a detached thread with no way to join/wait before shutdown β€” fire-and-forget.
    2. Python side: shutil.rmtree has no retry/backoff for Windows PermissionError, and there’s no delay between stop_nodes() and rmtree.

    There are a few angles of attack, with different trade-offs:

    Option A β€” Prevent handle inheritance at the source (C++ side) In runCommand() on Windows, replace _wsystem() with CreateProcessW using bInheritHandles = FALSE. This prevents cmd.exe from inheriting debug.log’s handle entirely. Proper root cause fix, but more invasive β€” replaces a one-liner with ~15 lines of Win32 API.

    Option B β€” Make the log file handle non-inheritable (C++ side) Set debug.log’s file handle to non-inheritable when opening it (via SetHandleInformation or O_NOINHERIT). Simpler and more targeted than A, but requires finding where the log file is opened and may affect other legitimate handle inheritance.

    Option C β€” Retry shutil.rmtree on Windows (Python side) Add a short retry loop around shutil.rmtree in test_framework.py:334 for PermissionError. Band-aid that doesn’t fix the root cause, but is the smallest/safest change and makes the test suite resilient to any similar handle-leak issue.

    Option D β€” Join the alertnotify thread before shutdown (C++ side) Stop detaching the AlertNotify thread; track it and join before exit. Ensures _wsystem (and thus cmd.exe) finishes before bitcoind exits. Clean but changes AlertNotify semantics and requires plumbing the thread handle through shutdown.

    My recommendation: A is the cleanest root cause fix β€” child processes spawned by runCommand on Windows should never inherit all of bitcoind’s handles. C is worth doing as defense-in-depth regardless. They’re not mutually exclusive.

    So this exact bug has been reported three separate times going back to 2019, and each time it was closed without a fix:

    • #15171 (Jan 2019) β€” feature_notifications.py on Appveyor, same WinError 32 on debug.log. Closed “pending future sighting.”
    • #28115 (Jul 2023) β€” feature_versionbits_warning.py on Cirrus CI Windows. Closed.
    • #28529 (Sep 2023) β€” feature_versionbits_warning.py on Windows, exact same stack trace. Also hit tool_utxo_to_sqlite.py in Jan 2024. You closed it yourself in Oct 2025 since it wasn’t easily reproducible.

    And now it’s back (Mar 2026).

    Note that #15171 hit feature_notifications.py β€” a different test that also uses -alertnotify. This confirms the root cause: it’s the alertnotify subprocess (cmd.exe) inheriting the debug.log handle.

    Key observations:

    • It’s not specific to feature_versionbits_warning.py β€” any test using -alertnotify (or -blocknotify, -walletnotify) can trigger it on Windows
    • It’s been known for 7 years without a proper fix
    • The consensus from #15171 was that retry is preferable over a fixed sleep (laanwj, promag)

    Given the history, I’d suggest both:

    1. Retry with backoff on shutil.rmtree in the test framework (addresses all tests, not just this one) β€” this was the preferred approach from the 2019 discussion
    2. Optionally, the CreateProcessW with bInheritHandles=FALSE fix to address the root cause

    I then provisioned a new agent create a fix based on this diagnosis, which is commited (with the retry as a commit) in this branch: https://github.com/bitcoin/bitcoin/compare/master...willcl-ark:bitcoin:refs/heads/fix-win-debug-cleanup-race

    I won’t open it as a PR myself as I don’t know much about windows or its APIs (and it’s currently 100% vibe-coded), but perhaps it will be useful to someone else looking at fixing this.

  16. maflcko commented at 2:25 pm on March 13, 2026: member

    Ah nice. I think the LLM is actually correct here.

    • Option A is certainly the correct fix here, but it isn’t a trivial one-line fix, see #33063 and #32566
    • Option B: haven’t checked if this is possible
    • Option C: Seems like a temporary workaround. Not sure if this is worth it. Let’s see how often it happens … A CI failure once every 6 months seems harmless
    • Option D: May be tedious to implement, as -blocknotify is also affected, which could be one thread per block?
  17. maflcko commented at 3:35 pm on March 18, 2026: member

    Actually, on a second though, I am not sure if the LLM is 100% correct.

    The thread immediately detaches, so Option A to close the fds may not be sufficient, because in theory it can happen at any time in the future (possibly too late).

    Edit: The LLM is likely right, because the detached thread will block bitcoind.exe shutdown, until it has forked via bInheritHandles = FALSE?

    In any case, I don’t have Windows, so I can’t reproduce this, but it should be trivial to reproduce by adding a sleep in the detached thread right after running it? Maybe:

    0diff --git a/src/common/system.cpp b/src/common/system.cpp
    1index 08c0c69271..c775d4bbd9 100644
    2--- a/src/common/system.cpp
    3+++ b/src/common/system.cpp
    4@@ -55,2 +55,3 @@ void runCommand(const std::string& strCommand)
    5 #else
    6+UninterruptibleSleep(678ms);
    7     int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(strCommand).c_str());
    
  18. hebasto referenced this in commit 8dbd7e9e65 on Mar 19, 2026
  19. hebasto referenced this in commit 61c9b8aecb on Mar 19, 2026
  20. hebasto referenced this in commit 71181bf199 on Mar 19, 2026
  21. hebasto commented at 9:42 pm on March 19, 2026: member

    Actually, on a second though, I am not sure if the LLM is 100% correct.

    The thread immediately detaches, so Option A to close the fds may not be sufficient, because in theory it can happen at any time in the future (possibly too late).

    In any case, I don’t have Windows, so I can’t reproduce this, but it should be trivial to reproduce by adding a sleep in the detached thread right after running it? Maybe:

    0diff --git a/src/common/system.cpp b/src/common/system.cpp
    1index 08c0c69271..c775d4bbd9 100644
    2--- a/src/common/system.cpp
    3+++ b/src/common/system.cpp
    4@@ -55,2 +55,3 @@ void runCommand(const std::string& strCommand)
    5 #else
    6+UninterruptibleSleep(678ms);
    7     int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(strCommand).c_str());
    

    https://github.com/hebasto/bitcoin-core-nightly/pull/214

  22. maflcko removed the label Tests on Mar 20, 2026
  23. maflcko commented at 8:33 am on March 20, 2026: member

    hebasto/bitcoin-core-nightly#214

    Interesting, looks like it passed. I guess it could make more sense to try to reproduce locally, because debugging in CI usually increases the time between debug cycles.

    On a second thought, it is obvious why the sleep does not reproduce the issue: The test has a wait_until to wait for the result/side-effect of the _wsystem call.

    So to reproduce this, there are options:

    • the wait_until (and all following test logic) would have to be removed.
    • The alertnotify command string in the test would have to include a delay or sleep, after writing to the file.

    On a general note, I presume it can only happen outside of tests if the Windows user is using a dedicated folder for the notifications and then tries to delete that folder immediately after bitcoind shutdown.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-01 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me