Intermittent win64 CI failure in feature_index_prune.py #25031

issue jonatack openend this issue on April 29, 2022
  1. jonatack commented at 3:30 pm on April 29, 2022: member

    https://cirrus-ci.com/task/6532449864777728

    Win64 native [msvc] task

    0 test  2022-04-29T14:22:15.843000Z TestFramework (WARNING): Not cleaning up dir C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner__🏃_20220429_140215\feature_index_prune_242 
    1
    2 test  2022-04-29T14:22:15.843000Z TestFramework (ERROR): Test failed. Test logging available at C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner__🏃_20220429_140215\feature_index_prune_242/test_framework.log 
    3
    4 test  2022-04-29T14:22:15.843000Z TestFramework (ERROR): 
    5 node0 stderr Error: basic block filter index best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again) 
    6 node1 stderr Error: coinstatsindex best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again) 
    7 node2 stderr Error: basic block filter index best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again) 
    
  2. jonatack added the label Bug on Apr 29, 2022
  3. jonatack commented at 3:55 pm on April 29, 2022: member
    This test passes locally for me in 11-12 minutes (debian testing), albeit only with timeouts disabled test/functional/feature_index_prune.py --timeout-factor=0
  4. david-bakin commented at 3:49 am on May 10, 2022: contributor

    I got the same at https://cirrus-ci.com/task/6464103580434432?logs=functional_tests#L3097:

     0test  2022-05-10T03:08:31.456000Z TestFramework (ERROR): Assertion failed 
     1                                Traceback (most recent call last):
     2                                    File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 133, in main
     3                                    self.run_test()
     4                                    File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\feature_index_prune.py", line 141, in run_test
     5                                    self.sync_blocks(timeout=300)
     6                                    File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 680, in sync_blocks
     7                                    assert (all([len(x.getpeerinfo()) for x in rpc_connections]))
     8                                AssertionError
     9test  2022-05-10T03:08:31.456000Z TestFramework (DEBUG): Closing down network thread 
    10...
    11<lots of lines of other tests elided>
    12...
    13test  2022-05-10T03:08:35.705000Z TestFramework (WARNING): Not cleaning up dir C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner__🏃_20220510_024705\feature_index_prune_242 
    14test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): Test failed. Test logging available at C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner__🏃_20220510_024705\feature_index_prune_242/test_framework.log 
    15test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): 
    16test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): Hint: Call C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\combine_logs.py 'C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner_₿_🏃_20220510_024705\feature_index_prune_242' to consolidate all logs 
    17test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): 
    18test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log. 
    19test  2022-05-10T03:08:35.705000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues 
    20test  2022-05-10T03:08:35.705000Z TestFramework (ERROR):             
    

    (I’m not sure how to do the upload of the combined log myself, but the link above should take you to the job.)

  5. MarcoFalke added the label Tests on May 12, 2022
  6. ryanofsky commented at 6:36 pm on May 12, 2022: member

    I started debugging the same error in #24230 log https://cirrus-ci.com/task/4528289543880704 wrongly thinking it was caused by the PR (which is related to indexing) after it happened twice in a row.

    But it seems like there is just a race condition in the test. The test starts 4 nodes, then shut down the first 3, and restarts those three with a -connect option to connect them to 4th node. Then it immediately calls sync_blocks which fails because the getpeerinfo for one of the nodes is empty.

    Maybe a fix is to add:

    0self.wait_until(lambda: all(len(n.getpeerinfo()) for n in self.nodes))
    

    before the sync_blocks call, but I’m not sure.

  7. ryanofsky commented at 6:37 pm on May 12, 2022: member

    Note the stack trace:

     0Traceback (most recent call last):
     1
     2  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 133, in main
     3
     4    self.run_test()
     5
     6  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\feature_index_prune.py", line 141, in run_test
     7
     8    self.sync_blocks(timeout=300)
     9
    10  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 680, in sync_blocks
    11
    12    assert (all([len(x.getpeerinfo()) for x in rpc_connections]))
    13
    14AssertionError
    
  8. ryanofsky commented at 6:41 pm on May 12, 2022: member
    @fjahr could weigh in perhaps. The sync_blocks call was added in de08932efa953e9a237cbf879460488ad8947411 from #21726 which was merged about 2 weeks ago
  9. fjahr commented at 11:05 pm on May 12, 2022: member

    Thanks for the ping! I was unable to reproduce the issue locally but I agree with the analysis by @ryanofsky that there is a race going on between the establishment of the connection and that sanity check line in sync_blocks that fails.

    I think the one-liner suggested would work I think but I think I would prefer to use the addconnection RPC which should have the same effect but the code is easier to read and debug. Suggested in #25123.

  10. MarcoFalke commented at 9:10 am on May 13, 2022: member

    Can reproduce on master with a diff.

     0diff --git a/src/net.cpp b/src/net.cpp
     1index 1c775f9a3f..e2ddb7fe3b 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -1913,6 +1913,7 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
     5     // Connect to specific addresses
     6     if (!connect.empty())
     7     {
     8+        UninterruptibleSleep(1s);
     9         for (int64_t nLoop = 0;; nLoop++)
    10         {
    11             ProcessAddrFetch();
    
  11. MarcoFalke referenced this in commit b74a6dde8c on May 15, 2022
  12. MarcoFalke closed this on May 15, 2022

  13. sidhujag referenced this in commit d670f9782d on May 28, 2022
  14. DrahtBot locked this on May 15, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-05 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me