test: Shut down framework cleanly on RPC connection failure #30660

pull hodlinator wants to merge 7 commits into bitcoin:master from hodlinator:2024-08/test_node_rpc_connection_failure changing 5 files +135 −30
  1. hodlinator commented at 11:53 am on August 15, 2024: contributor

    Without this PR, BitcoinTestFramework.start_nodes(), upon failing to establish RPC connections and calling stop_nodes(), would end up attempting to call the stop()-RPC, raising additional exceptions which made the shutdown less clean.

    Since we now avoiding the additional exceptions which in some cases provided useful clues, we instead capture information about the original exceptions being triggered and ignored within TestNode.wait_for_rpc_connection(). This means we now have the more informative: AssertionError: [node 0] Unable to connect to bitcoind after 60s (ignored errors: {'missing_credentials': 1, 'OSError.ECONNREFUSED': 239}, latest error: ConnectionRefusedError(111, 'Connection refused'))

    As long as the increased complexity and maintenance burden is low, exposing previously silenced exceptions in combination with having a cleaner shutdown clears the muddy waters for devs trying to figure out root causes.

    Related: #30390

  2. DrahtBot commented at 11:53 am on August 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30660.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK tdb3, m3dwards

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Tests on Aug 15, 2024
  4. hodlinator commented at 11:54 am on August 15, 2024: contributor

    Encountered a case of #30390 here: https://github.com/bitcoin/bitcoin/actions/runs/10380115902/job/28739417666?pr=30377

    Output with only the test (no fix commit):

     0$ test/functional/feature_framework_rpc_failure.py
     12024-08-15T07:30:20.427000Z TestFramework (INFO): PRNG seed is: 635212259632049987
     22024-08-15T07:30:20.427000Z TestFramework (INFO): Initializing test directory /run/user/1001/bitcoin_func_test_ujsvbdnm
     32024-08-15T07:30:20.428000Z TestFramework (INFO): Forcing us to timeout instantly to simulate an unresponsive bitcoind
     42024-08-15T07:30:20.429000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 329, in wait_for_rpc_connection
     9    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    10  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 192, in _raise_assertion_error
    11    raise AssertionError(self._node_msg(msg))
    12AssertionError: [node 0] Unable to connect to bitcoind after 0s
    13
    14During handling of the above exception, another exception occurred:
    15
    16Traceback (most recent call last):
    17  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 21, in setup_network
    18    BitcoinTestFramework.setup_network(self)
    19  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    20    self.setup_nodes()
    21  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    22    self.start_nodes()
    23  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    24    self.stop_nodes()
    25  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    26    node.stop_node(wait=wait, wait_until_stopped=False)
    27  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 390, in stop_node
    28    self.stop(wait=wait)
    29    ^^^^^^^^^
    30  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 209, in __getattr__
    31    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    32           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    33AssertionError: [node 0] Error: no RPC connection
    34
    35During handling of the above exception, another exception occurred:
    36
    37Traceback (most recent call last):
    38  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    39    self.setup()
    40  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    41    self.setup_network()
    42  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 23, in setup_network
    43    assert_equal(str(e), "[node 0] Unable to connect to bitcoind after 0s")
    44  File "/home/hodlinator/bitcoin/test/functional/test_framework/util.py", line 75, in assert_equal
    45    raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    46AssertionError: not([node 0] Error: no RPC connection == [node 0] Unable to connect to bitcoind after 0s)
    472024-08-15T07:30:20.480000Z TestFramework (INFO): Stopping nodes
    48Traceback (most recent call last):
    49  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 32, in <module>
    50    FeatureFrameworkRPCFailure(__file__).main()
    51  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    52    exit_code = self.shutdown()
    53                ^^^^^^^^^^^^^^^
    54  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    55    self.stop_nodes()
    56  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    57    node.stop_node(wait=wait, wait_until_stopped=False)
    58  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 390, in stop_node
    59    self.stop(wait=wait)
    60    ^^^^^^^^^
    61  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 209, in __getattr__
    62    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    63           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    64AssertionError: [node 0] Error: no RPC connection
    65[node 0] Cleaning up leftover process
    

    With fix:

    0$ test/functional/feature_framework_rpc_failure.py
    12024-08-15T07:30:05.092000Z TestFramework (INFO): PRNG seed is: 1672695534155872365
    22024-08-15T07:30:05.092000Z TestFramework (INFO): Initializing test directory /run/user/1001/bitcoin_func_test_cow4vrf7
    32024-08-15T07:30:05.093000Z TestFramework (INFO): Forcing us to timeout instantly to simulate an unresponsive bitcoind
    42024-08-15T07:30:05.094000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 326500.
    52024-08-15T07:30:05.144000Z TestFramework (INFO): Caught AssertionError with expected message
    62024-08-15T07:30:05.194000Z TestFramework (INFO): Stopping nodes
    72024-08-15T07:30:05.194000Z TestFramework (INFO): Cleaning up /run/user/1001/bitcoin_func_test_cow4vrf7 on exit
    82024-08-15T07:30:05.194000Z TestFramework (INFO): Tests successful
    
  5. maflcko commented at 12:32 pm on August 15, 2024: member
    Can you explain this a bit more? I think optimizing the test framework to cleanly shut down on a failure is a bit premature, and may not be needed. However, if a failing test may leave behind a dangling bitcoind process, this seems like something to fix, because it will otherwise lead to test warnings and failures down the road.
  6. hodlinator force-pushed on Aug 18, 2024
  7. hodlinator commented at 8:23 pm on August 18, 2024: contributor

    Can you explain this a bit more? I think optimizing the test framework to cleanly shut down on a failure is a bit premature, and may not be needed.

    As long as the additional complexity and maintenance burden is low, I think it is worth having a cleaner shutdown to avoid muddying the waters for people trying to figure out root causes.

    However, if a failing test may leave behind a dangling bitcoind process, this seems like something to fix, because it will otherwise lead to test warnings and failures down the road.

    It does clean up dangling bitcoind processes without the fix, as can be seen in the log:

    0[node 0] Cleaning up leftover process
    

    The latest push (bcc87514e780cd35d1f71d5c4c4069a8fac95324 0ea00cddc3d4a3f92c6bfae13a8b844e97ebc8c3) now explains in the log message that it is terminating the process to avoid wait_until_stopped timing out (which raises an additional exception).

    + Broke out a commit adding the self.expected_ret_code functionality, so that the fix commit in the middle is clearer. + Attempted to make the comment at the top of the test file more explanatory. + Switched from process.kill() to the somewhat less aggressive process.terminate() (although on second thoughts, if the RPC interface is not responding in a real-world situation outside of the added test, maybe .kill() is closer to what we want).

  8. hodlinator force-pushed on Aug 18, 2024
  9. hodlinator commented at 8:27 pm on August 26, 2024: contributor

    Last week I was hit by this again. Probably line ending differences on Windows makes any additional Python exception reporting extra annoying to read due to every second line being blank (separate issue, but slightly improved by this PR producing less output).

    Git-sleuthing to help motivate this PR

    Was this ever working?

    Unfortunately for my case, it seems the answer is no: 7897338918dac072e788b8ab2919d4559f311bef (Aug, 2017) broke out test_node.py from test_framework.py added the self.rpc_connected flag. Would have been nice if it had protected use of the stop() RPC using this new flag (like in this PR) but unfortunately it did not.

    Fixes in the same area

    fafb33cdefd2d8ce065263978075d26a1672b630 (May, 2016) MarcoFalke wrapped stop() attempts in try/except so at least other nodes would have a chance of shutting down properly. (6 days later in fab5233fe6c99d446a2cebc48cf479c3f026c2cc a timeout for waiting on the process to shutdown was added too, probably so the tests wouldn’t hang waiting for a non-crashed bitcoind that we failed to send stop() to).

    #14670 (Dec, 2018) is tangential and adds test/functional/feature_shutdown.py in 28479f926f21f2a91bec5a06671c60e5b0c55532, to ensure that HTTP responses for stop() make it all the way back to the client if I’m understanding it correctly. It fixes #11777 (Nov, 2017) which is arguably about shutdowns not being clean enough (sometimes showing additional errors depending on timing). The current PR is certainly more nit-level, but still continuing in the same general direction.

  10. hodlinator commented at 10:07 pm on August 26, 2024: contributor

    Another argument for this PR is that even the current title of #30390 is “ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it”. This in itself when looking at logs like https://github.com/bitcoin/bitcoin/actions/runs/9677128783/job/26698120754#step:27:225 is actually just a knock-on exception happening after the RPC connection attempt times out and stop()-RPC is called in vain:

     0313/313 - wallet_listsinceblock.py --legacy-wallet failed, Duration: 2415 s
     1stdout:
     22024-06-26T09:53:17.196000Z TestFramework (INFO): PRNG seed is: 3062882430636553844
     32024-06-26T09:53:17.227000Z TestFramework (INFO): Initializing test directory D:\a\_temp\test_runner__🏃_20240626_093314\wallet_listsinceblock_88
     42024-06-26T10:33:29.783000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 314, in wait_for_rpc_connection
     9    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    10  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 188, in _raise_assertion_error
    11    raise AssertionError(self._node_msg(msg))
    12AssertionError: [node 3] Unable to connect to bitcoind after 2400s
    13
    14During handling of the above exception, another exception occurred:
    15
    16Traceback (most recent call last):
    17  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 131, in main
    18    self.setup()
    19  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 302, in setup
    20    self.setup_network()
    21  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 396, in setup_network
    22    self.setup_nodes()
    23  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 418, in setup_nodes
    24    self.start_nodes()
    25  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 572, in start_nodes
    26    self.stop_nodes()
    27  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 587, in stop_nodes
    28    node.stop_node(wait=wait, wait_until_stopped=False)
    29  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 375, in stop_node
    30    self.stop(wait=wait)
    31    ^^^^^^^^^
    32  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 205, in __getattr__
    33    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    34           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    35AssertionError: [node 3] Error: no RPC connection
    362024-06-26T10:33:29.842000Z TestFramework (INFO): Stopping nodes
    37[node 3] Cleaning up leftover process
    38[node 2] Cleaning up leftover process
    39[node 1] Cleaning up leftover process
    40[node 0] Cleaning up leftover process
    41
    42stderr:
    43Traceback (most recent call last):
    44  File "D:\a\bitcoin\bitcoin/test/functional/wallet_listsinceblock.py", line 508, in <module>
    45    ListSinceBlockTest().main()
    46  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 155, in main
    47    exit_code = self.shutdown()
    48                ^^^^^^^^^^^^^^^
    49  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 318, in shutdown
    50    self.stop_nodes()
    51  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 587, in stop_nodes
    52    node.stop_node(wait=wait, wait_until_stopped=False)
    53  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 375, in stop_node
    54    self.stop(wait=wait)
    55  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\coverage.py", line 50, in __call__
    56    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    57                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    58  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\authproxy.py", line 127, in __call__
    59    response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    60                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    61  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\authproxy.py", line 105, in _request
    62    self.__conn.request(method, path, postdata, headers)
    63  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1336, in request
    64    self._send_request(method, url, body, headers, encode_chunked)
    65  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1382, in _send_request
    66    self.endheaders(body, encode_chunked=encode_chunked)
    67  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1331, in endheaders
    68    self._send_output(message_body, encode_chunked=encode_chunked)
    69  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1091, in _send_output
    70    self.send(msg)
    71  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1035, in send
    72    self.connect()
    73  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1001, in connect
    74    self.sock = self._create_connection(
    75                ^^^^^^^^^^^^^^^^^^^^^^^^
    76  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\socket.py", line 853, in create_connection
    77    raise exceptions[0]
    78  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\socket.py", line 838, in create_connection
    79    sock.connect(sa)
    80ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
    

    A better title might have been “Unable to connect to bitcoind after 2400s errors on Windows”, which this PR would have made it more likely to be in the first place (had it predated the issue).

  11. hodlinator force-pushed on Aug 27, 2024
  12. hodlinator commented at 9:54 pm on August 27, 2024: contributor

    After more investigation into #30390, my impression is that since TestNode.wait_for_rpc_connection() ignores a bunch of exceptions, the knock-on exceptions when calling stop() and failing prior to this PR actually may sometimes provide a clue for why the RPC connection wasn’t established in the first place.

    That realization made me add 2 more commits here, which provide more information on the original exceptions being triggered and ignored inside wait_for_rpc_connection. Only fair since we are avoiding the triggering of knock-on exceptions later.

    The timeout exception’s message thereby now includes counters of encountered (and ignored) exceptions + latest exception type & message: Unable to connect to bitcoind after 60s (ignored errors: {'missing_credentials': 5}, latest error: ValueError('No RPC credentials'))

  13. DrahtBot added the label CI failed on Aug 27, 2024
  14. DrahtBot removed the label CI failed on Sep 29, 2024
  15. DrahtBot added the label CI failed on Oct 13, 2024
  16. achow101 requested review from josibake on Oct 15, 2024
  17. achow101 requested review from willcl-ark on Oct 15, 2024
  18. achow101 commented at 3:30 pm on October 15, 2024: member
    cc @tdb3
  19. DrahtBot removed the label CI failed on Oct 18, 2024
  20. hodlinator force-pushed on Oct 21, 2024
  21. hodlinator commented at 10:14 pm on October 21, 2024: contributor
    (Rebased onto latest master).
  22. in test/functional/test_framework/test_node.py:324 in 5c95843281 outdated
    320+                    ignored_errors["JSONRPCException -28"] += 1
    321                 # -342 Service unavailable, RPC server started but is shutting down due to error
    322-                if e.error['code'] != -28 and e.error['code'] != -342:
    323+                # - Ignore the exception to properly raise the FailedToStartError.
    324+                elif e.error['code'] == -342:
    325+                    ignored_errors["JSONRPCException -342"] += 1
    


    tdb3 commented at 6:19 pm on October 31, 2024:

    non-blocking nit: might be a bit more Pythonic to use Counter.update() rather than += 1. For example:

    0ignored_errors.update([f"JSONRPCException {e.error['code']}"])
    

    hodlinator commented at 8:23 pm on November 2, 2024:

    Tried using update(), but it resulted in things like:

    0AssertionError: [node 0] Unable to connect to bitcoind after 10s (ignored errors: {'m': 40, 'i': 120, 's': 120, 'n': 80, 'g': 40, '_': 40, 'c': 40, 'r': 40, 'e': 80, 'd': 40, 't': 40, 'a': 40, 'l': 40}, latest error: ValueError('No RPC credentials'))
    

    i.e. it was iterating over the string and incrementing an entry for each unique character, which seems to match the docs even though not super clear.~


    Edit: @tdb3 pointed out out of band than I’d probably missed out the []-brackets around the strings to make update() iterate through whole strings instead of chars.

    0ignored_errors.update([f"JSONRPCException {e.error['code']}"])
    

    feels slightly more contrived and verbose to me than

    0ignored_errors[f"JSONRPCException {e.error['code']}"] += 1
    

    but neither is too elegant.

    Might switch away from Counter if something more appropriate is found.

  23. in test/functional/test_framework/test_node.py:336 in 5c95843281 outdated
    335                 elif e.errno == errno.ETIMEDOUT:
    336-                    pass # Treat identical to ECONNRESET
    337+                    ignored_errors["OSError.ETIMEDOUT"] += 1 # Treat identical to ECONNRESET
    338                 elif e.errno == errno.ECONNREFUSED:
    339-                    pass  # Port not yet open?
    340+                    ignored_errors["OSError.ECONNREFUSED"] += 1 # Port not yet open?
    


    tdb3 commented at 6:27 pm on October 31, 2024:

    non-blocking nit: same with these. e.g.

    0ignored_errors.update([errno.errorcode[e.errno]])
    

    hodlinator commented at 8:21 pm on November 2, 2024:
    (Same issue with update(), see #30660 (review)).
  24. in test/functional/feature_framework_rpc_failure.py:35 in 9173b05600 outdated
    30+            BitcoinTestFramework.setup_network(self)
    31+        except AssertionError as e:
    32+            assert_equal(str(e)[:47], "[node 0] Unable to connect to bitcoind after 0s")
    33+            self.log.info(f"{test_name}: Caught AssertionError with expected RPC connection failure message")
    34+        else:
    35+            raise AssertionError("Didn't raise expected error")
    


    tdb3 commented at 6:47 pm on October 31, 2024:
    Maybe we can use util.assert_raises() or util.assert_raises_message() here to keep things more concise?

    hodlinator commented at 8:22 pm on November 2, 2024:
    Had to add a commit (2a0c0e410e9f32dbf7af7229457889b09553c438) that refurbishes assert_raises_message(), but test is much cleaner now!
  25. tdb3 approved
  26. tdb3 commented at 6:49 pm on October 31, 2024: contributor

    Code review ACK 9173b056006fd3940675108aa3fe0d0d855af768

    This seems quite useful when troubleshooting errors. Left a few nits, but nothing I feel very strongly about.

  27. hodlinator force-pushed on Nov 2, 2024
  28. hodlinator commented at 8:26 pm on November 2, 2024: contributor

    Thanks for the review @tdb3!

    Implemented feedback from that review, made the code a bit more Pythonic and improved comments slightly.

  29. in test/functional/test_framework/test_node.py:341 in b6b30c2d9c outdated
    337                     raise
    338+                ignored_errors["missing_credentials"] += 1
    339+                latest_error = repr(e)
    340             time.sleep(1.0 / poll_per_s)
    341-        self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    342+        self._raise_assertion_error("Unable to connect to bitcoind after {}s (ignored errors: {}, latest error: {})".format(self.rpc_timeout, str(dict(ignored_errors)), latest_error))
    


    tdb3 commented at 10:50 pm on November 2, 2024:
    non-blocking pico-nit: If this file gets touched again, could use f string instead of format. https://github.com/bitcoin/bitcoin/blob/master/test/functional/README.md#style-guidelines

    hodlinator commented at 11:11 pm on November 2, 2024:
    Good point! Resolved in latest push.
  30. tdb3 approved
  31. tdb3 commented at 10:59 pm on November 2, 2024: contributor

    Code Review re ACK 1148c4a53295e7635a813aadfde53ba5d973395c

    Nice work cleaning up the existing code around the core changes. Increases readability.

  32. hodlinator force-pushed on Nov 2, 2024
  33. tdb3 approved
  34. tdb3 commented at 11:15 pm on November 2, 2024: contributor
    re ACK 042cee7073b338d79b5176d4157cf08bbd079b9f
  35. in test/functional/feature_framework_rpc_failure.py:26 in 042cee7073 outdated
    21+    def set_test_params(self):
    22+        self.num_nodes = 1
    23+
    24+    def setup_network(self):
    25+        test_name = Path(__file__).name
    26+        self.log.info(f"{test_name}: Setting RPC timeout to 0 to simulate an unresponsive bitcoind")
    


    maflcko commented at 10:15 am on November 4, 2024:

    Any reason to log the file name here? The containing folder of the log should already have the name of the test/file. If not, the user should know which test they called to know it.

    If not, and this was helpful, it would be better to log it for all tests, instead of just for this one test specifically.


    hodlinator commented at 10:44 am on November 8, 2024:
    Wanted to make clear it wasn’t something internal to BitcoinTestFramework/TestNode deciding to set the RPC timeout. Replaced that with instead appending “, expect one warning” to the log message. (Edit: now pushed).
  36. in test/functional/test_framework/test_node.py:315 in 5ca92b17c5 outdated
    310@@ -311,23 +311,21 @@ def wait_for_rpc_connection(self, *, wait_for_import=True):
    311                 self.rpc_connected = True
    312                 self.url = self.rpc.rpc_url
    313                 return
    314-            except JSONRPCException as e:  # Initialization phase
    315+            except JSONRPCException as e:
    316+                # Avoid raising these, to possibly raise the FailedToStartError on next iteration.
    


    maflcko commented at 11:24 am on November 4, 2024:

    nit in 5ca92b17c5d2d5efa7b2246da1905bb8fd186230: It is obvious, but could mention the success case as well?

    Retry after these, until a permanent success or failure state (such as FailedToStartError, or timeout) is reached.?


    hodlinator commented at 10:47 am on November 8, 2024:
    Good point, changed to Suppress these in favor of a later outcome (success, FailedToStartError, or timeout).
  37. in test/functional/test_framework/test_node.py:317 in 5ca92b17c5 outdated
    310@@ -311,23 +311,21 @@ def wait_for_rpc_connection(self, *, wait_for_import=True):
    311                 self.rpc_connected = True
    312                 self.url = self.rpc.rpc_url
    313                 return
    314-            except JSONRPCException as e:  # Initialization phase
    315+            except JSONRPCException as e:
    316+                # Avoid raising these, to possibly raise the FailedToStartError on next iteration.
    317                 # -28 RPC in warmup
    318                 # -342 Service unavailable, RPC server started but is shutting down due to error
    


    maflcko commented at 11:36 am on November 4, 2024:
    I don’t think this is true, it is just one example. I think 342 can be raised by authproxy.py for various reasons.

    hodlinator commented at 10:48 am on November 8, 2024:
    Made comment less definitive.
  38. in test/functional/test_framework/test_node.py:331 in 5ca92b17c5 outdated
    330-                    pass  # Port not yet open?
    331-                else:
    332+                # ECONNRESET   - This might happen when the RPC server is in warmup, but shut down before the call to
    333+                #                getblockcount succeeds. Try again to properly raise the FailedToStartError
    334+                # ETIMEDOUT    - Treat identical to ECONNRESET
    335+                # ECONNREFUSED - Port not yet open?
    


    maflcko commented at 11:42 am on November 4, 2024:
    nit: Could just mention that the three are treated equal to “-342 Service unavailable”, which is already explained above?

    hodlinator commented at 10:48 am on November 8, 2024:
    Added: Suppress similarly to the above JSONRPCException errors:

    maflcko commented at 3:06 pm on November 8, 2024:

    I still don’t know if the current docs are correct and useful.

    Why would ETIMEDOUT only happen on warmup? Why is it even relevant to mention anything here?

    I think it would be clearer to just mention that the three are treated equal to “-342 Service unavailable”, which is already explained above?


    maflcko commented at 11:27 am on November 18, 2024:

    re #30660 (review)

    Are you still working on this?


    hodlinator commented at 10:19 pm on November 18, 2024:
    I am still working on this PR. Not thrilled about removing the 3 comment strings predating this PR.

    hodlinator commented at 10:35 am on November 26, 2024:
    I am satisfied with the PR in it’s latest revision, but if you don’t consider this a nit, but rather a blocker, let me know.
  39. in test/functional/test_framework/test_node.py:328 in 5ca92b17c5 outdated
    335+                # ECONNREFUSED - Port not yet open?
    336+                if e.errno not in [errno.ECONNRESET, errno.ETIMEDOUT, errno.ECONNREFUSED]:
    337                     raise  # unknown OS error
    338-            except ValueError as e:  # cookie file not found and no rpcuser or rpcpassword; bitcoind is still starting
    339+            except ValueError as e:
    340+                # If cookie file is missing and no rpcuser or rpcpassword; bitcoind is still starting, otherwise raise
    


    maflcko commented at 11:47 am on November 4, 2024:

    nit: I don’t think “bitcoind is still starting” is true either. It is just one example. Another example would be that the cookie file is (accidentally) disabled in the config.

    What about just `Retry if cookie file is not created and no rpcuser or rpcpassword."?


    hodlinator commented at 10:49 am on November 8, 2024:
    Agree, made comment less definitive, but still think it’s good to include the common cause: Suppress if cookie file is missing and no rpcuser or rpcpassword; bitcoind may be starting
  40. maflcko commented at 6:18 pm on November 6, 2024: member
    left some nits on the first commit. Will take a look at the others later.
  41. test refactor: wait_for_rpc_connection - Treat OSErrors the same
    ConnectionResetError is an OSError as well (ECONNRESET), no reason to have a separate except-block for it.
    
    Also improves comments for other exceptions and make condition above more Pythonic.
    5321eb089c
  42. hodlinator force-pushed on Nov 8, 2024
  43. hodlinator commented at 10:56 am on November 8, 2024: contributor
    Thanks for the reviews! Was trying to keep comments largely intact, but agree with @maflcko that they were somewhat incorrect, so altered them. Also renamed ignored_errors (introduced by this PR) to suppressed_errors since it felt more fitting and also matches the updated comments.
  44. DrahtBot added the label CI failed on Nov 8, 2024
  45. DrahtBot commented at 11:32 am on November 8, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32707688603

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  46. hodlinator force-pushed on Nov 8, 2024
  47. DrahtBot removed the label CI failed on Nov 8, 2024
  48. tdb3 approved
  49. tdb3 commented at 5:32 pm on November 9, 2024: contributor
    Code Review re ACK 68b0530d5fe867dbee5e551ecf6d491fecbc77f1
  50. m3dwards commented at 10:40 am on November 18, 2024: contributor

    Concept ACK

    I like the motivation to cleanup the logs and remove the cascade of exceptions.

    To test, I modified the included test to see what errors a user would have seen without the assertion:

     0--- a/test/functional/feature_framework_rpc_failure.py
     1+++ b/test/functional/feature_framework_rpc_failure.py
     2@@ -23,7 +23,8 @@ class FeatureFrameworkRPCFailure(BitcoinTestFramework):
     3         self.log.info(f"Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.")
     4         self.rpc_timeout = 0
     5
     6-        assert_raises_message(AssertionError, "[node 0] Unable to connect to bitcoind after 0s", BitcoinTestFramework.setup_network, self)
     7+        BitcoinTestFramework.setup_network(self)
     8
     9     def run_test(self):
    10         pass
    
     02024-11-18T09:59:33.936000Z TestFramework (INFO): PRNG seed is: 6704525597374825472
     12024-11-18T09:59:33.937000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_kccj2gaf
     22024-11-18T09:59:33.939000Z TestFramework (INFO): Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.
     32024-11-18T09:59:33.944000Z TestFramework (ERROR): Assertion failed
     4Traceback (most recent call last):
     5  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     6    node.wait_for_rpc_connection()
     7  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 335, in wait_for_rpc_connection
     8    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
     9  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 198, in _raise_assertion_error
    10    raise AssertionError(self._node_msg(msg))
    11AssertionError: [node 0] Unable to connect to bitcoind after 0s
    12
    13During handling of the above exception, another exception occurred:
    14
    15Traceback (most recent call last):
    16  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 131, in main
    17    self.setup()
    18  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 302, in setup
    19    self.setup_network()
    20  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 24, in setup_network
    21    BitcoinTestFramework.setup_network(self)
    22  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 396, in setup_network
    23    self.setup_nodes()
    24  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    25    self.start_nodes()
    26  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    27    self.stop_nodes()
    28  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    29    node.stop_node(wait=wait, wait_until_stopped=False)
    30  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 396, in stop_node
    31    self.stop(wait=wait)
    32    ^^^^^^^^^
    33  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 215, in __getattr__
    34    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    35           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    36AssertionError: [node 0] Error: no RPC connection
    372024-11-18T09:59:33.997000Z TestFramework (INFO): Stopping nodes
    38Traceback (most recent call last):
    39  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 32, in <module>
    40    FeatureFrameworkRPCFailure(__file__).main()
    41  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 155, in main
    42    exit_code = self.shutdown()
    43                ^^^^^^^^^^^^^^^
    44  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 318, in shutdown
    45    self.stop_nodes()
    46  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    47    node.stop_node(wait=wait, wait_until_stopped=False)
    48  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 396, in stop_node
    49    self.stop(wait=wait)
    50    ^^^^^^^^^
    51  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 215, in __getattr__
    52    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    53           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    54AssertionError: [node 0] Error: no RPC connection
    55[node 0] Cleaning up leftover process
    
     02024-11-18T10:09:52.135000Z TestFramework (INFO): PRNG seed is: 7321683449385937851
     12024-11-18T10:09:52.135000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a
     22024-11-18T10:09:52.137000Z TestFramework (INFO): Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.
     32024-11-18T10:09:52.141000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 47850 so that wait_until_stopped will not time out.
     42024-11-18T10:09:52.194000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
     7    self.setup()
     8  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
     9    self.setup_network()
    10  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 26, in setup_network
    11    BitcoinTestFramework.setup_network(self)
    12  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    13    self.setup_nodes()
    14  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    15    self.start_nodes()
    16  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
    17    node.wait_for_rpc_connection()
    18  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 344, in wait_for_rpc_connection
    19    self._raise_assertion_error(f"Unable to connect to bitcoind after {self.rpc_timeout}s (ignored errors: {str(dict(suppressed_errors))}, latest error: {latest_error})")
    20  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 199, in _raise_assertion_error
    21    raise AssertionError(self._node_msg(msg))
    22AssertionError: [node 0] Unable to connect to bitcoind after 0s (ignored errors: {}, latest error: )
    232024-11-18T10:09:52.250000Z TestFramework (INFO): Stopping nodes
    242024-11-18T10:09:52.250000Z TestFramework (WARNING): Not cleaning up dir /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a
    252024-11-18T10:09:52.250000Z TestFramework (ERROR): Test failed. Test logging available at /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a/test_framework.log
    262024-11-18T10:09:52.250000Z TestFramework (ERROR):
    272024-11-18T10:09:52.251000Z TestFramework (ERROR): Hint: Call /Users/max/source/bitcoin/test/functional/combine_logs.py '/var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a' to consolidate all logs
    282024-11-18T10:09:52.251000Z TestFramework (ERROR):
    292024-11-18T10:09:52.251000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    302024-11-18T10:09:52.251000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    312024-11-18T10:09:52.251000Z TestFramework (ERROR):
    

    I prefer the logs with this change!

    Next I modified bitcoind so that it would fail to start and ran a random functional test (p2p_blockfilters.py)

     02024-11-18T09:39:40.861000Z TestFramework (INFO): PRNG seed is: 8730495587434508112
     12024-11-18T09:39:40.862000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_0gw_zhhl
     22024-11-18T09:39:41.372000Z TestFramework (ERROR): Assertion failed
     3Traceback (most recent call last):
     4  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     5    node.wait_for_rpc_connection()
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 276, in wait_for_rpc_connection
     7    raise FailedToStartError(self._node_msg(
     8test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization.
     9
    10During handling of the above exception, another exception occurred:
    11
    12Traceback (most recent call last):
    13  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    14    self.setup()
    15  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    16    self.setup_network()
    17  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    18    self.setup_nodes()
    19  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    20    self.start_nodes()
    21  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    22    self.stop_nodes()
    23  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    24    node.stop_node(wait=wait, wait_until_stopped=False)
    25  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 396, in stop_node
    26    self.stop(wait=wait)
    27    ^^^^^^^^^
    28  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 215, in __getattr__
    29    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    30           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    31AssertionError: [node 0] Error: no RPC connection
    322024-11-18T09:39:41.431000Z TestFramework (INFO): Stopping nodes
    33Traceback (most recent call last):
    34  File "/Users/max/source/bitcoin/build/test/functional/p2p_blockfilters.py", line 285, in <module>
    35    CompactFiltersTest(__file__).main()
    36  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    37    exit_code = self.shutdown()
    38                ^^^^^^^^^^^^^^^
    39  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    40    self.stop_nodes()
    41  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    42    node.stop_node(wait=wait, wait_until_stopped=False)
    43  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 396, in stop_node
    44    self.stop(wait=wait)
    45    ^^^^^^^^^
    46  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 215, in __getattr__
    47    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    48           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    49AssertionError: [node 0] Error: no RPC connection
    50[node 1] Cleaning up leftover process
    51[node 0] Cleaning up leftover process
    
     02024-11-15T13:07:28.308000Z TestFramework (INFO): PRNG seed is: 2784953090753807617
     12024-11-15T13:07:28.308000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_udgky3fr
     22024-11-15T13:07:28.821000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26867 so that wait_until_stopped will not time out.
     32024-11-15T13:07:28.821000Z TestFramework.node1 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26868 so that wait_until_stopped will not time out.
     42024-11-15T13:07:28.821000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 279, in wait_for_rpc_connection
     9    raise FailedToStartError(self._node_msg(
    10test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization.
    11
    12During handling of the above exception, another exception occurred:
    13
    14Traceback (most recent call last):
    15  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    16    self.setup()
    17  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    18    self.setup_network()
    19  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    20    self.setup_nodes()
    21  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    22    self.start_nodes()
    23  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    24    self.stop_nodes()
    25  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 591, in stop_nodes
    26    node.wait_until_stopped()
    27  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in wait_until_stopped
    28    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    29  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 860, in wait_until
    30    return wait_until_helper_internal(test_function, timeout=timeout, timeout_factor=self.timeout_factor)
    31           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    32  File "/Users/max/source/bitcoin/test/functional/test_framework/util.py", line 291, in wait_until_helper_internal
    33    if predicate():
    34       ^^^^^^^^^^^
    35  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in <lambda>
    36    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    37                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    38  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 439, in is_node_stopped
    39    assert return_code == expected_ret_code, self._node_msg(
    40           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    41AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping
    422024-11-15T13:07:28.880000Z TestFramework (INFO): Stopping nodes
    432024-11-15T13:07:28.881000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26867 so that wait_until_stopped will not time out.
    442024-11-15T13:07:28.881000Z TestFramework.node1 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26868 so that wait_until_stopped will not time out.
    45Traceback (most recent call last):
    46  File "/Users/max/source/bitcoin/build/test/functional/p2p_blockfilters.py", line 285, in <module>
    47    CompactFiltersTest(__file__).main()
    48  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    49    exit_code = self.shutdown()
    50                ^^^^^^^^^^^^^^^
    51  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    52    self.stop_nodes()
    53  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 591, in stop_nodes
    54    node.wait_until_stopped()
    55  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in wait_until_stopped
    56    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    57  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 860, in wait_until
    58    return wait_until_helper_internal(test_function, timeout=timeout, timeout_factor=self.timeout_factor)
    59           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    60  File "/Users/max/source/bitcoin/test/functional/test_framework/util.py", line 291, in wait_until_helper_internal
    61    if predicate():
    62       ^^^^^^^^^^^
    63  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in <lambda>
    64    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    65                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    66  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 439, in is_node_stopped
    67    assert return_code == expected_ret_code, self._node_msg(
    68           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    69AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping
    70[node 1] Cleaning up leftover process 26868
    71[node 0] Cleaning up leftover process 26867
    

    I’m not sure if it’s the intention of this PR to improve things in that scenario but the log doesn’t seem much better and if anything it was slightly easier to understand before.

    Before this change the logs showed AssertionError: [node 0] Error: no RPC connection and after this change that is now AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping. I feel like no RPC connection is the better error here?

  51. hodlinator force-pushed on Nov 18, 2024
  52. hodlinator commented at 11:13 pm on November 18, 2024: contributor

    Thanks @m3dwards for having a look!

    I had not tested the case of bitcoind exiting early together with my suggested change here (focused more on stalling as in one of the issues inspiring it).

    Made some changes to this part of TestNode.stop_node() to handle an already dead process:

    https://github.com/bitcoin/bitcoin/blob/ba28593147ccc880c3a4d40db0d4ef57f4766254/test/functional/test_framework/test_node.py#L412-L420

    With those changes, the output for such a case is now:

     0 build/test/functional/p2p_blockfilters.py 
     12024-11-18T23:01:17.204000Z TestFramework (INFO): PRNG seed is: 5667811122664278008
     22024-11-18T23:01:17.205000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_mzr3jg96
     32024-11-18T23:01:17.458000Z TestFramework.node0 (WARNING): Process 175180 already died with return code 1.
     42024-11-18T23:01:17.458000Z TestFramework.node1 (WARNING): Process 175181 already died with return code 1.
     52024-11-18T23:01:17.459000Z TestFramework (ERROR): Unexpected exception caught during testing
     6Traceback (most recent call last):
     7  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
     8    self.setup()
     9  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    10    self.setup_network()
    11  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    12    self.setup_nodes()
    13  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    14    self.start_nodes()
    15  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
    16    node.wait_for_rpc_connection()
    17  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 279, in wait_for_rpc_connection
    18    raise FailedToStartError(self._node_msg(
    19test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization. 
    202024-11-18T23:01:17.513000Z TestFramework (INFO): Stopping nodes
    212024-11-18T23:01:17.513000Z TestFramework (WARNING): Not cleaning up dir /tmp/bitcoin_func_test_mzr3jg96
    222024-11-18T23:01:17.513000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/bitcoin_func_test_mzr3jg96/test_framework.log
    232024-11-18T23:01:17.513000Z TestFramework (ERROR): 
    242024-11-18T23:01:17.513000Z TestFramework (ERROR): Hint: Call /home/hodlinator/bitcoin/test/functional/combine_logs.py '/tmp/bitcoin_func_test_mzr3jg96' to consolidate all logs
    252024-11-18T23:01:17.513000Z TestFramework (ERROR): 
    262024-11-18T23:01:17.514000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    272024-11-18T23:01:17.514000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    282024-11-18T23:01:17.514000Z TestFramework (ERROR):
    
  53. m3dwards commented at 1:13 pm on November 21, 2024: contributor

    Made some changes to this part of TestNode.stop_node() to handle an already dead process

    Looks cleaner.

    ACK ba28593147ccc880c3a4d40db0d4ef57f4766254

  54. DrahtBot requested review from tdb3 on Nov 21, 2024
  55. in test/functional/test_framework/test_node.py:8 in ba28593147 outdated
    4@@ -5,6 +5,7 @@
    5 """Class for bitcoind node under test"""
    6 
    7 import contextlib
    8+from collections import Counter
    


    hodlinator commented at 8:31 pm on December 3, 2024:
    (Could possibly use collections.defaultdict(int) instead of Counter iff I really need to retouch).
  56. DrahtBot added the label CI failed on Dec 11, 2024
  57. DrahtBot commented at 1:30 am on December 11, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33168290506

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  58. test: Include ignored errors in RPC connection timeout
    When an RPC connection attempt with bitcoind times out, include which ignored errors occurred in the exception message.
    
    May provide clues of what has gone wrong.
    dbb6bc4c29
  59. test: Allow changing expected internal return code
    To be used in next commit.
    
    Also adds process IDs in relevant log messages.
    db0e1580a5
  60. test: Kill process if stop-RPC is not available
    `BitcoinTestFramework.start_nodes()`, upon failing to establish RPC connections, calls `stop_nodes()` which would raise an exception immediately through attempting to use the non-existent RPC interface, preventing a clean shutdown.
    
    We now avoid calling the RPC interface and instead kill the bitcoind process, so that subsequent wait_until_stopped() calls will succeed instantly instead of waiting for the full timeout only to raise another exception.
    
    Since explicit killing means the bitcoind process will not exit with return code 0, we adjust the expected value.
    
    Avoid beating the dead horse if the process has already exited by the time stop_node() is called.
    4ab7879ec6
  61. test: assert_raises_message - don't require JSONRPCException type layout
    Not all exceptions have an .error dict containing a 'message' key. (A leftover from before the separate assert_raises_rpc_error was introduced).
    
    The function is not actually called as of this commit, but will start getting used again in the next commit.
    4954847d33
  62. test: Add feature_framework_rpc_failure.py
    Verifies correct behavior when failing to connect to bitcoind RPC interface during setup. Parent commit fixes issues where a raised exception raised additional knock-on exceptions.
    f67d896357
  63. hodlinator force-pushed on Dec 11, 2024
  64. hodlinator commented at 8:29 pm on December 11, 2024: contributor
    • Fixed ruff linter error reported by CI due to #31417 being merged.
    • Added feature_framework_stop_node.py-test to verify clean shutdown of TestNode when the bitcoind-process already exited/crashed, issue found earlier by m3dwards. (Also added comment justifying setting of self.expected_ret_code in test_node.py).
    • Switched from Counter to simpler collections.defaultdict(int)-type.
    • Updated copyright headers to include “-present”.
  65. test: Add feature_framework_stop_node.py 608b9ff097
  66. hodlinator force-pushed on Dec 11, 2024
  67. DrahtBot removed the label CI failed on Dec 11, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me