qa: `PermissionError` in functional tests on Windows #28529

issue hebasto opened this issue on September 25, 2023
  1. hebasto commented at 12:18 PM on September 25, 2023: member

    Having an intermittent error in my local Windows:

    > py -3 test\functional\test_runner.py --jobs 40 --timeout-factor=8
    ...
    165/269 - feature_versionbits_warning.py failed, Duration: 59 s
    
    
    stdout:
    2023-09-25T11:29:08.904000Z TestFramework (INFO): PRNG seed is: 6135880998339750841
    2023-09-25T11:29:09.019000Z TestFramework (INFO): Initializing test directory C:\Users\hebasto\AppData\Local\Temp\test_runner_β‚Ώ_πŸƒ_20230925_122333\feature_versionbits_warning_108
    2023-09-25T11:29:25.118000Z TestFramework (INFO): Check that there is no warning if previous VB_BLOCKS have <VB_THRESHOLD blocks with unknown versionbits version.
    2023-09-25T11:29:43.242000Z TestFramework (INFO): Check that there is a warning if previous VB_BLOCKS have >=VB_THRESHOLD blocks with unknown versionbits version.
    2023-09-25T11:29:50.044000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:12296 due to [WinError 10054] An existing connection was forcibly closed by the remote host
    2023-09-25T11:30:05.174000Z TestFramework (INFO): Stopping nodes
    2023-09-25T11:30:05.548000Z TestFramework (INFO): Cleaning up C:\Users\hebasto\AppData\Local\Temp\test_runner_β‚Ώ_πŸƒ_20230925_122333\feature_versionbits_warning_108 on exit
    2023-09-25T11:30:05.548000Z TestFramework (INFO): Tests successful
    
    
    stderr:
    Traceback (most recent call last):
      File "C:\Users\hebasto\bitcoin/test/functional/feature_versionbits_warning.py", line 103, in <module>
        VersionBitsWarningTest().main()
      File "C:\Users\hebasto\bitcoin\test\functional\test_framework\test_framework.py", line 154, in main
        exit_code = self.shutdown()
                    ^^^^^^^^^^^^^^^
      File "C:\Users\hebasto\bitcoin\test\functional\test_framework\test_framework.py", line 361, in shutdown
        shutil.rmtree(self.options.tmpdir)
      File "C:\Python311\Lib\shutil.py", line 759, in rmtree
        return _rmtree_unsafe(path, onerror)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Python311\Lib\shutil.py", line 617, in _rmtree_unsafe
        _rmtree_unsafe(fullname, onerror)
    File "C:\Python311\Lib\shutil.py", line 617, in _rmtree_unsafe
        _rmtree_unsafe(fullname, onerror)
    File "C:\Python311\Lib\shutil.py", line 622, in _rmtree_unsafe
        onerror(os.unlink, fullname, sys.exc_info())
    File "C:\Python311\Lib\shutil.py", line 620, in _rmtree_unsafe
        os.unlink(fullname)
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\hebasto\\AppData\\Local\\Temp\\test_runner_β‚Ώ_πŸƒ_20230925_122333\\feature_versionbits_warning_108\\node0\\regtest\\debug.log'
    
    
    166/269 - feature_startupnotify.py passed, Duration: 30 s
    ...
    
  2. hebasto added the label Windows on Sep 25, 2023
  3. hebasto added the label Tests on Sep 25, 2023
  4. maflcko commented at 12:23 PM on September 25, 2023: member

    See also #28115

  5. hebasto commented at 9:38 PM on September 25, 2023: member
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process...
    

    Trying to collect some details with Sysinternals handle64.exe makes the issue disappear :man_shrugging:

  6. hebasto commented at 2:29 PM on January 8, 2024: member

    https://github.com/bitcoin/bitcoin/actions/runs/7433919780/job/20227517742:

    2024-01-06T21:17:20.926000Z TestFramework (INFO): Tests successful
    
    
    
    stderr:
    Traceback (most recent call last):
    
      File "D:\a\bitcoin\bitcoin/test/functional/tool_utxo_to_sqlite.py", line 111, in <module>
    
        UtxoToSqliteTest().main()
    
      File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 154, in main
    
        exit_code = self.shutdown()
    
                    ^^^^^^^^^^^^^^^
    
      File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 363, in shutdown
    
        shutil.rmtree(self.options.tmpdir)
    
      File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 808, in rmtree
    
        return _rmtree_unsafe(path, onexc)
    
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
      File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 636, in _rmtree_unsafe
    
        onexc(os.unlink, fullname, err)
    
      File "C:\hostedtoolcache\windows\Python\3.12.1\x64\Lib\shutil.py", line 634, in _rmtree_unsafe
    
        os.unlink(fullname)
    
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\_temp\\test_runner_β‚Ώ_πŸƒ_20240106_210717\\tool_utxo_to_sqlite_113\\utxos.sqlite'
    
  7. hebasto renamed this:
    qa: No access to `debug.log` in functional tests on Windows
    qa: `PermissionError` in functional tests on Windows
    on Jan 8, 2024
  8. fanquake commented at 4:39 PM on November 11, 2024: member

    Saw a failure that matches the first portion of the issue here. https://github.com/bitcoin/bitcoin/actions/runs/11781528801/job/32814548437#step:12:69. Unclear if this should be a new issue, but couldn't seem to match it to an existing issue.

     269/316 - p2p_unrequested_blocks.py failed, Duration: 1 s
    
    stdout:
    2024-11-11T16:26:48.484000Z TestFramework (INFO): PRNG seed is: 2650025219029138162
    
    2024-11-11T16:26:48.518000Z TestFramework (INFO): Initializing test directory D:\a\_temp\test_runner_β‚Ώ_πŸƒ_20241111_160404\p2p_unrequested_blocks_45
    
    2024-11-11T16:26:48.999000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:11540 due to [WinError 10054] An existing connection was forcibly closed by the remote host
    
    2024-11-11T16:26:49.060000Z TestFramework (ERROR): Assertion failed
    
  9. maflcko commented at 5:03 PM on November 11, 2024: member

    Has anyone ever seen any of the warnings or errors locally?

  10. willcl-ark commented at 3:25 PM on October 21, 2025: member

    Since the problem isn’t easily reproducible, I’m closing this for now. If it occurs again, please feel free to open a new issue or comment here so we can reopen it.

  11. willcl-ark closed this on Oct 21, 2025

  12. willcl-ark commented at 1:23 PM on March 13, 2026: member

    I think this has happened again on the merge commit failure earlier today: https://github.com/bitcoin/bitcoin/actions/runs/23042377679/job/66923350385#step:16:585

  13. maflcko reopened this on Mar 13, 2026

  14. maflcko added the label CI failed on Mar 13, 2026
  15. willcl-ark commented at 2:03 PM on March 13, 2026: member

    We could "fix" this by adding a timeout/retry around the rmtree call in test_framework.py, but I don't really like adding more magic sleeps and loops.

    I talked with an agent about this:

    <details><summary>Details</summary> <p>

    The root cause is the alertnotify subprocess on Windows.

    1. feature_versionbits_warning.py line 37 configures -alertnotify=echo %s >> "alert.txt" for the node.
    2. When the test triggers the unknown versionbits warning (line 95-96), kernel_notifications.cpp:44-45 fires AlertNotify, which spawns a detached thread calling runCommand(): std::thread t(runCommand, strCmd); t.detach(); // thread runs free
    3. On Windows, runCommand() (system.cpp:56) calls _wsystem(), which spawns cmd.exe /c "echo ... >> alert.txt". On Windows, child processes inherit the parent's file handles by default β€” including the open handle to debug.log.
    4. The test completes, stop_node() is called, which issues an RPC stop and waits for the bitcoind process to exit (wait_until_stopped). But it does not wait for the detached cmd.exe subprocess.
    5. shutdown() (test_framework.py:334) immediately calls shutil.rmtree() with no retry logic. If cmd.exe hasn't fully exited and released the inherited debug.log handle, Windows refuses to delete the file β†’ PermissionError [WinError 32].

    Why it's flaky: It depends on whether cmd.exe exits and releases the inherited handle before shutil.rmtree gets to debug.log. Usually it does; occasionally it doesn't.

    Why it's Windows-only: On POSIX, system() doesn't cause child processes to hold open file descriptors in a way that blocks unlink. On Windows, file deletion is blocked while any process holds an open handle.

    Two contributing issues:

    1. C++ side: AlertNotify uses a detached thread with no way to join/wait before shutdown β€” fire-and-forget.
    2. Python side: shutil.rmtree has no retry/backoff for Windows PermissionError, and there's no delay between stop_nodes() and rmtree.

    There are a few angles of attack, with different trade-offs:

    Option A β€” Prevent handle inheritance at the source (C++ side) In runCommand() on Windows, replace _wsystem() with CreateProcessW using bInheritHandles = FALSE. This prevents cmd.exe from inheriting debug.log's handle entirely. Proper root cause fix, but more invasive β€” replaces a one-liner with ~15 lines of Win32 API.

    Option B β€” Make the log file handle non-inheritable (C++ side) Set debug.log's file handle to non-inheritable when opening it (via SetHandleInformation or O_NOINHERIT). Simpler and more targeted than A, but requires finding where the log file is opened and may affect other legitimate handle inheritance.

    Option C β€” Retry shutil.rmtree on Windows (Python side) Add a short retry loop around shutil.rmtree in test_framework.py:334 for PermissionError. Band-aid that doesn't fix the root cause, but is the smallest/safest change and makes the test suite resilient to any similar handle-leak issue.

    Option D β€” Join the alertnotify thread before shutdown (C++ side) Stop detaching the AlertNotify thread; track it and join before exit. Ensures _wsystem (and thus cmd.exe) finishes before bitcoind exits. Clean but changes AlertNotify semantics and requires plumbing the thread handle through shutdown.

    My recommendation: A is the cleanest root cause fix β€” child processes spawned by runCommand on Windows should never inherit all of bitcoind's handles. C is worth doing as defense-in-depth regardless. They're not mutually exclusive.

    So this exact bug has been reported three separate times going back to 2019, and each time it was closed without a fix:

    • #15171 (Jan 2019) β€” feature_notifications.py on Appveyor, same WinError 32 on debug.log. Closed "pending future sighting."
    • #28115 (Jul 2023) β€” feature_versionbits_warning.py on Cirrus CI Windows. Closed.
    • #28529 (Sep 2023) β€” feature_versionbits_warning.py on Windows, exact same stack trace. Also hit tool_utxo_to_sqlite.py in Jan 2024. You closed it yourself in Oct 2025 since it wasn't easily reproducible.

    And now it's back (Mar 2026).

    Note that #15171 hit feature_notifications.py β€” a different test that also uses -alertnotify. This confirms the root cause: it's the alertnotify subprocess (cmd.exe) inheriting the debug.log handle.

    Key observations:

    • It's not specific to feature_versionbits_warning.py β€” any test using -alertnotify (or -blocknotify, -walletnotify) can trigger it on Windows
    • It's been known for 7 years without a proper fix
    • The consensus from #15171 was that retry is preferable over a fixed sleep (laanwj, promag)

    Given the history, I'd suggest both:

    1. Retry with backoff on shutil.rmtree in the test framework (addresses all tests, not just this one) β€” this was the preferred approach from the 2019 discussion
    2. Optionally, the CreateProcessW with bInheritHandles=FALSE fix to address the root cause

    </p> </details>

    I then provisioned a new agent create a fix based on this diagnosis, which is commited (with the retry as a commit) in this branch: https://github.com/bitcoin/bitcoin/compare/master...willcl-ark:bitcoin:refs/heads/fix-win-debug-cleanup-race

    I won't open it as a PR myself as I don't know much about windows or its APIs (and it's currently 100% vibe-coded), but perhaps it will be useful to someone else looking at fixing this.

  16. maflcko commented at 2:25 PM on March 13, 2026: member

    Ah nice. I think the LLM is actually correct here.

    • Option A is certainly the correct fix here, but it isn't a trivial one-line fix, see #33063 and #32566
    • Option B: haven't checked if this is possible
    • Option C: Seems like a temporary workaround. Not sure if this is worth it. Let's see how often it happens ... A CI failure once every 6 months seems harmless
    • Option D: May be tedious to implement, as -blocknotify is also affected, which could be one thread per block?
  17. maflcko commented at 3:35 PM on March 18, 2026: member

    <strike>Actually, on a second though, I am not sure if the LLM is 100% correct.

    The thread immediately detaches, so Option A to close the fds may not be sufficient, because in theory it can happen at any time in the future (possibly too late). </strike>

    Edit: The LLM is likely right, because the detached thread will block bitcoind.exe shutdown, until it has forked via bInheritHandles = FALSE?

    In any case, I don't have Windows, so I can't reproduce this, but it should be trivial to reproduce by adding a sleep in the detached thread right after running it? Maybe:

    diff --git a/src/common/system.cpp b/src/common/system.cpp
    index 08c0c69271..c775d4bbd9 100644
    --- a/src/common/system.cpp
    +++ b/src/common/system.cpp
    @@ -55,2 +55,3 @@ void runCommand(const std::string& strCommand)
     #else
    +UninterruptibleSleep(678ms);
         int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(strCommand).c_str());
    
  18. hebasto referenced this in commit 8dbd7e9e65 on Mar 19, 2026
  19. hebasto referenced this in commit 61c9b8aecb on Mar 19, 2026
  20. hebasto referenced this in commit 71181bf199 on Mar 19, 2026
  21. hebasto commented at 9:42 PM on March 19, 2026: member

    Actually, on a second though, I am not sure if the LLM is 100% correct.

    The thread immediately detaches, so Option A to close the fds may not be sufficient, because in theory it can happen at any time in the future (possibly too late).

    In any case, I don't have Windows, so I can't reproduce this, but it should be trivial to reproduce by adding a sleep in the detached thread right after running it? Maybe:

    diff --git a/src/common/system.cpp b/src/common/system.cpp
    index 08c0c69271..c775d4bbd9 100644
    --- a/src/common/system.cpp
    +++ b/src/common/system.cpp
    @@ -55,2 +55,3 @@ void runCommand(const std::string& strCommand)
     #else
    +UninterruptibleSleep(678ms);
         int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(strCommand).c_str());
    

    https://github.com/hebasto/bitcoin-core-nightly/pull/214

  22. maflcko removed the label Tests on Mar 20, 2026
  23. maflcko commented at 8:33 AM on March 20, 2026: member

    hebasto/bitcoin-core-nightly#214

    Interesting, looks like it passed. I guess it could make more sense to try to reproduce locally, because debugging in CI usually increases the time between debug cycles.

    On a second thought, it is obvious why the sleep does not reproduce the issue: The test has a wait_until to wait for the result/side-effect of the _wsystem call.

    So to reproduce this, there are options:

    • the wait_until (and all following test logic) would have to be removed.
    • The alertnotify command string in the test would have to include a delay or sleep, after writing to the file.

    On a general note, I presume it can only happen outside of tests if the Windows user is using a dedicated folder for the notifications and then tries to delete that folder immediately after bitcoind shutdown.

  24. maflcko commented at 5:49 AM on March 27, 2026: member

    So to clarify, a correct reproducer would be:

    diff --git a/src/common/system.cpp b/src/common/system.cpp
    index 08c0c69271..3431fb322f 100644
    --- a/src/common/system.cpp
    +++ b/src/common/system.cpp
    @@ -49,14 +49,16 @@ std::string ShellEscape(const std::string& arg)
     #if HAVE_SYSTEM
     void runCommand(const std::string& strCommand)
     {
         if (strCommand.empty()) return;
     #ifndef WIN32
         int nErr = ::system(strCommand.c_str());
     #else
    -    int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(strCommand).c_str());
    +    // Hack to make the child process stay alive and hold inherited handles
    +    std::string sleepyCommand = strCommand + " & ping -n 10 127.0.0.1 > nul";
    +    int nErr = ::_wsystem(std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>().from_bytes(sleepyCommand).c_str());
     #endif
         if (nErr) {
             LogWarning("runCommand error: system(%s) returned %d", strCommand, nErr);
         }
     }
     #endif
    

    This will deterministically fail with PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\bitcoin-core-with-ci\\bitcoin-core-with-ci\\test_runner_β‚Ώ_πŸƒ_20260326_213742\\feature_startupnotify_89\\node0\\regtest\\debug.log'

    See https://github.com/maflcko/bitcoin-core-with-ci/actions/runs/23616233568/job/68793850155#step:10:22215:

    367/459 - feature_startupnotify.py failed, Duration: 1 s
    
    stdout:
    2026-03-26T21:54:34.633925Z TestFramework (INFO): PRNG seed is: 8212506448362893318
    
    2026-03-26T21:54:34.685565Z TestFramework (INFO): Initializing test directory D:\a\bitcoin-core-with-ci\bitcoin-core-with-ci\test_runner_β‚Ώ_πŸƒ_20260326_213742\feature_startupnotify_89
    
    2026-03-26T21:54:34.963680Z TestFramework (INFO): Test -startupnotify command is run when node starts
    
    2026-03-26T21:54:35.371216Z TestFramework (INFO): Test -startupnotify is executed once
    
    2026-03-26T21:54:35.371510Z TestFramework (INFO): Test node is fully started
    
    2026-03-26T21:54:35.422787Z TestFramework (INFO): Stopping nodes
    
    2026-03-26T21:54:35.525183Z TestFramework (INFO): Cleaning up D:\a\bitcoin-core-with-ci\bitcoin-core-with-ci\test_runner_β‚Ώ_πŸƒ_20260326_213742\feature_startupnotify_89 on exit
    
    2026-03-26T21:54:35.526764Z TestFramework (INFO): Tests successful
    
    
    
    stderr:
    Traceback (most recent call last):
    
      File "D:\a\bitcoin-core-with-ci\bitcoin-core-with-ci/test/functional/feature_startupnotify.py", line 42, in <module>
    
        StartupNotifyTest(__file__).main()
    
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
    
      File "D:\a\bitcoin-core-with-ci\bitcoin-core-with-ci\test\functional\test_framework\test_framework.py", line 156, in main
    
        exit_code = self.shutdown()
    
      File "D:\a\bitcoin-core-with-ci\bitcoin-core-with-ci\test\functional\test_framework\test_framework.py", line 334, in shutdown
    
        shutil.rmtree(self.options.tmpdir)
    
        ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    
      File "C:\hostedtoolcache\windows\Python\3.14.3\x64\Lib\shutil.py", line 852, in rmtree
    
        _rmtree_impl(path, dir_fd, onexc)
    
        ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    
      File "C:\hostedtoolcache\windows\Python\3.14.3\x64\Lib\shutil.py", line 705, in _rmtree_unsafe
    
        onexc(os.unlink, fullname, err)
    
        ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
    
      File "C:\hostedtoolcache\windows\Python\3.14.3\x64\Lib\shutil.py", line 701, in _rmtree_unsafe
    
        os.unlink(fullname)
    
        ~~~~~~~~~^^^^^^^^^^
    
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\bitcoin-core-with-ci\\bitcoin-core-with-ci\\test_runner_β‚Ώ_πŸƒ_20260326_213742\\feature_startupnotify_89\\node0\\regtest\\debug.log'
    
    
    
  25. hebasto commented at 9:44 AM on April 8, 2026: member

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me