qa: Shut down framework cleanly on RPC connection failure #30660

pull hodlinator wants to merge 8 commits into bitcoin:master from hodlinator:2024-08/test_node_rpc_connection_failure changing 8 files +211 −41
  1. hodlinator commented at 11:53 am on August 15, 2024: contributor
    • BitcoinTestFramework.start_nodes(), upon failing to establish RPC connections and calling stop_nodes(), no longer attempts to call the stop()-RPC, which would raise additional exceptions, making the shutdown less clean.

    • Since we now avoid aforementioned additional exceptions which in some cases provided useful clues, we make up for that by capturing information about the original exceptions being triggered and ignored within TestNode.wait_for_rpc_connection(). This means we now have the more informative ignored errors-section: AssertionError: [node 0] Unable to connect to bitcoind after 60s (ignored errors: {'missing_credentials': 1, 'OSError.ECONNREFUSED': 239}, latest error: ConnectionRefusedError(111, 'Connection refused'))

    As long as the increased complexity and maintenance burden is low, exposing previously silenced exceptions in combination with having a cleaner shutdown clears the muddy waters for devs trying to figure out root causes.

    Related: #30390

  2. DrahtBot commented at 11:53 am on August 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30660.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK tdb3, m3dwards, ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31620 (test: Remove –noshutdown flag, Tidy startup failures by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on Aug 15, 2024
  4. hodlinator commented at 11:54 am on August 15, 2024: contributor

    Encountered a case of #30390 here: https://github.com/bitcoin/bitcoin/actions/runs/10380115902/job/28739417666?pr=30377

    Output with only the test (no fix commit):

     0$ test/functional/feature_framework_rpc_failure.py
     12024-08-15T07:30:20.427000Z TestFramework (INFO): PRNG seed is: 635212259632049987
     22024-08-15T07:30:20.427000Z TestFramework (INFO): Initializing test directory /run/user/1001/bitcoin_func_test_ujsvbdnm
     32024-08-15T07:30:20.428000Z TestFramework (INFO): Forcing us to timeout instantly to simulate an unresponsive bitcoind
     42024-08-15T07:30:20.429000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 329, in wait_for_rpc_connection
     9    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    10  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 192, in _raise_assertion_error
    11    raise AssertionError(self._node_msg(msg))
    12AssertionError: [node 0] Unable to connect to bitcoind after 0s
    13
    14During handling of the above exception, another exception occurred:
    15
    16Traceback (most recent call last):
    17  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 21, in setup_network
    18    BitcoinTestFramework.setup_network(self)
    19  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    20    self.setup_nodes()
    21  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    22    self.start_nodes()
    23  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    24    self.stop_nodes()
    25  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    26    node.stop_node(wait=wait, wait_until_stopped=False)
    27  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 390, in stop_node
    28    self.stop(wait=wait)
    29    ^^^^^^^^^
    30  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 209, in __getattr__
    31    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    32           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    33AssertionError: [node 0] Error: no RPC connection
    34
    35During handling of the above exception, another exception occurred:
    36
    37Traceback (most recent call last):
    38  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    39    self.setup()
    40  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    41    self.setup_network()
    42  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 23, in setup_network
    43    assert_equal(str(e), "[node 0] Unable to connect to bitcoind after 0s")
    44  File "/home/hodlinator/bitcoin/test/functional/test_framework/util.py", line 75, in assert_equal
    45    raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    46AssertionError: not([node 0] Error: no RPC connection == [node 0] Unable to connect to bitcoind after 0s)
    472024-08-15T07:30:20.480000Z TestFramework (INFO): Stopping nodes
    48Traceback (most recent call last):
    49  File "/home/hodlinator/bitcoin/test/functional/feature_framework_rpc_failure.py", line 32, in <module>
    50    FeatureFrameworkRPCFailure(__file__).main()
    51  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    52    exit_code = self.shutdown()
    53                ^^^^^^^^^^^^^^^
    54  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    55    self.stop_nodes()
    56  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    57    node.stop_node(wait=wait, wait_until_stopped=False)
    58  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 390, in stop_node
    59    self.stop(wait=wait)
    60    ^^^^^^^^^
    61  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 209, in __getattr__
    62    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    63           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    64AssertionError: [node 0] Error: no RPC connection
    65[node 0] Cleaning up leftover process
    

    With fix:

    0$ test/functional/feature_framework_rpc_failure.py
    12024-08-15T07:30:05.092000Z TestFramework (INFO): PRNG seed is: 1672695534155872365
    22024-08-15T07:30:05.092000Z TestFramework (INFO): Initializing test directory /run/user/1001/bitcoin_func_test_cow4vrf7
    32024-08-15T07:30:05.093000Z TestFramework (INFO): Forcing us to timeout instantly to simulate an unresponsive bitcoind
    42024-08-15T07:30:05.094000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 326500.
    52024-08-15T07:30:05.144000Z TestFramework (INFO): Caught AssertionError with expected message
    62024-08-15T07:30:05.194000Z TestFramework (INFO): Stopping nodes
    72024-08-15T07:30:05.194000Z TestFramework (INFO): Cleaning up /run/user/1001/bitcoin_func_test_cow4vrf7 on exit
    82024-08-15T07:30:05.194000Z TestFramework (INFO): Tests successful
    
  5. maflcko commented at 12:32 pm on August 15, 2024: member
    Can you explain this a bit more? I think optimizing the test framework to cleanly shut down on a failure is a bit premature, and may not be needed. However, if a failing test may leave behind a dangling bitcoind process, this seems like something to fix, because it will otherwise lead to test warnings and failures down the road.
  6. hodlinator force-pushed on Aug 18, 2024
  7. hodlinator commented at 8:23 pm on August 18, 2024: contributor

    Can you explain this a bit more? I think optimizing the test framework to cleanly shut down on a failure is a bit premature, and may not be needed.

    As long as the additional complexity and maintenance burden is low, I think it is worth having a cleaner shutdown to avoid muddying the waters for people trying to figure out root causes.

    However, if a failing test may leave behind a dangling bitcoind process, this seems like something to fix, because it will otherwise lead to test warnings and failures down the road.

    It does clean up dangling bitcoind processes without the fix, as can be seen in the log:

    0[node 0] Cleaning up leftover process
    

    The latest push (bcc87514e780cd35d1f71d5c4c4069a8fac95324 0ea00cddc3d4a3f92c6bfae13a8b844e97ebc8c3) now explains in the log message that it is terminating the process to avoid wait_until_stopped timing out (which raises an additional exception).

    + Broke out a commit adding the self.expected_ret_code functionality, so that the fix commit in the middle is clearer. + Attempted to make the comment at the top of the test file more explanatory. + Switched from process.kill() to the somewhat less aggressive process.terminate() (although on second thoughts, if the RPC interface is not responding in a real-world situation outside of the added test, maybe .kill() is closer to what we want).

  8. hodlinator force-pushed on Aug 18, 2024
  9. hodlinator commented at 8:27 pm on August 26, 2024: contributor

    Last week I was hit by this again. Probably line ending differences on Windows makes any additional Python exception reporting extra annoying to read due to every second line being blank (separate issue, but slightly improved by this PR producing less output).

    Git-sleuthing to help motivate this PR

    Was this ever working?

    Unfortunately for my case, it seems the answer is no: 7897338918dac072e788b8ab2919d4559f311bef (Aug, 2017) broke out test_node.py from test_framework.py added the self.rpc_connected flag. Would have been nice if it had protected use of the stop() RPC using this new flag (like in this PR) but unfortunately it did not.

    Fixes in the same area

    fafb33cdefd2d8ce065263978075d26a1672b630 (May, 2016) MarcoFalke wrapped stop() attempts in try/except so at least other nodes would have a chance of shutting down properly. (6 days later in fab5233fe6c99d446a2cebc48cf479c3f026c2cc a timeout for waiting on the process to shutdown was added too, probably so the tests wouldn’t hang waiting for a non-crashed bitcoind that we failed to send stop() to).

    #14670 (Dec, 2018) is tangential and adds test/functional/feature_shutdown.py in 28479f926f21f2a91bec5a06671c60e5b0c55532, to ensure that HTTP responses for stop() make it all the way back to the client if I’m understanding it correctly. It fixes #11777 (Nov, 2017) which is arguably about shutdowns not being clean enough (sometimes showing additional errors depending on timing). The current PR is certainly more nit-level, but still continuing in the same general direction.

  10. hodlinator commented at 10:07 pm on August 26, 2024: contributor

    Another argument for this PR is that even the current title of #30390 is “ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it”. This in itself when looking at logs like https://github.com/bitcoin/bitcoin/actions/runs/9677128783/job/26698120754#step:27:225 is actually just a knock-on exception happening after the RPC connection attempt times out and stop()-RPC is called in vain:

     0313/313 - wallet_listsinceblock.py --legacy-wallet failed, Duration: 2415 s
     1stdout:
     22024-06-26T09:53:17.196000Z TestFramework (INFO): PRNG seed is: 3062882430636553844
     32024-06-26T09:53:17.227000Z TestFramework (INFO): Initializing test directory D:\a\_temp\test_runner__🏃_20240626_093314\wallet_listsinceblock_88
     42024-06-26T10:33:29.783000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 314, in wait_for_rpc_connection
     9    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    10  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 188, in _raise_assertion_error
    11    raise AssertionError(self._node_msg(msg))
    12AssertionError: [node 3] Unable to connect to bitcoind after 2400s
    13
    14During handling of the above exception, another exception occurred:
    15
    16Traceback (most recent call last):
    17  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 131, in main
    18    self.setup()
    19  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 302, in setup
    20    self.setup_network()
    21  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 396, in setup_network
    22    self.setup_nodes()
    23  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 418, in setup_nodes
    24    self.start_nodes()
    25  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 572, in start_nodes
    26    self.stop_nodes()
    27  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 587, in stop_nodes
    28    node.stop_node(wait=wait, wait_until_stopped=False)
    29  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 375, in stop_node
    30    self.stop(wait=wait)
    31    ^^^^^^^^^
    32  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 205, in __getattr__
    33    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    34           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    35AssertionError: [node 3] Error: no RPC connection
    362024-06-26T10:33:29.842000Z TestFramework (INFO): Stopping nodes
    37[node 3] Cleaning up leftover process
    38[node 2] Cleaning up leftover process
    39[node 1] Cleaning up leftover process
    40[node 0] Cleaning up leftover process
    41
    42stderr:
    43Traceback (most recent call last):
    44  File "D:\a\bitcoin\bitcoin/test/functional/wallet_listsinceblock.py", line 508, in <module>
    45    ListSinceBlockTest().main()
    46  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 155, in main
    47    exit_code = self.shutdown()
    48                ^^^^^^^^^^^^^^^
    49  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 318, in shutdown
    50    self.stop_nodes()
    51  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_framework.py", line 587, in stop_nodes
    52    node.stop_node(wait=wait, wait_until_stopped=False)
    53  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\test_node.py", line 375, in stop_node
    54    self.stop(wait=wait)
    55  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\coverage.py", line 50, in __call__
    56    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    57                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    58  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\authproxy.py", line 127, in __call__
    59    response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    60                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    61  File "D:\a\bitcoin\bitcoin\test\functional\test_framework\authproxy.py", line 105, in _request
    62    self.__conn.request(method, path, postdata, headers)
    63  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1336, in request
    64    self._send_request(method, url, body, headers, encode_chunked)
    65  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1382, in _send_request
    66    self.endheaders(body, encode_chunked=encode_chunked)
    67  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1331, in endheaders
    68    self._send_output(message_body, encode_chunked=encode_chunked)
    69  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1091, in _send_output
    70    self.send(msg)
    71  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1035, in send
    72    self.connect()
    73  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\http\client.py", line 1001, in connect
    74    self.sock = self._create_connection(
    75                ^^^^^^^^^^^^^^^^^^^^^^^^
    76  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\socket.py", line 853, in create_connection
    77    raise exceptions[0]
    78  File "C:\hostedtoolcache\windows\Python\3.12.4\x64\Lib\socket.py", line 838, in create_connection
    79    sock.connect(sa)
    80ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
    

    A better title might have been “Unable to connect to bitcoind after 2400s errors on Windows”, which this PR would have made it more likely to be in the first place (had it predated the issue).

  11. hodlinator force-pushed on Aug 27, 2024
  12. hodlinator commented at 9:54 pm on August 27, 2024: contributor

    After more investigation into #30390, my impression is that since TestNode.wait_for_rpc_connection() ignores a bunch of exceptions, the knock-on exceptions when calling stop() and failing prior to this PR actually may sometimes provide a clue for why the RPC connection wasn’t established in the first place.

    That realization made me add 2 more commits here, which provide more information on the original exceptions being triggered and ignored inside wait_for_rpc_connection. Only fair since we are avoiding the triggering of knock-on exceptions later.

    The timeout exception’s message thereby now includes counters of encountered (and ignored) exceptions + latest exception type & message: Unable to connect to bitcoind after 60s (ignored errors: {'missing_credentials': 5}, latest error: ValueError('No RPC credentials'))

  13. DrahtBot added the label CI failed on Aug 27, 2024
  14. DrahtBot removed the label CI failed on Sep 29, 2024
  15. DrahtBot added the label CI failed on Oct 13, 2024
  16. achow101 requested review from josibake on Oct 15, 2024
  17. achow101 requested review from willcl-ark on Oct 15, 2024
  18. achow101 commented at 3:30 pm on October 15, 2024: member
    cc @tdb3
  19. DrahtBot removed the label CI failed on Oct 18, 2024
  20. hodlinator force-pushed on Oct 21, 2024
  21. hodlinator commented at 10:14 pm on October 21, 2024: contributor
    (Rebased onto latest master).
  22. in test/functional/test_framework/test_node.py:324 in 5c95843281 outdated
    320+                    ignored_errors["JSONRPCException -28"] += 1
    321                 # -342 Service unavailable, RPC server started but is shutting down due to error
    322-                if e.error['code'] != -28 and e.error['code'] != -342:
    323+                # - Ignore the exception to properly raise the FailedToStartError.
    324+                elif e.error['code'] == -342:
    325+                    ignored_errors["JSONRPCException -342"] += 1
    


    tdb3 commented at 6:19 pm on October 31, 2024:

    non-blocking nit: might be a bit more Pythonic to use Counter.update() rather than += 1. For example:

    0ignored_errors.update([f"JSONRPCException {e.error['code']}"])
    

    hodlinator commented at 8:23 pm on November 2, 2024:

    Tried using update(), but it resulted in things like:

    0AssertionError: [node 0] Unable to connect to bitcoind after 10s (ignored errors: {'m': 40, 'i': 120, 's': 120, 'n': 80, 'g': 40, '_': 40, 'c': 40, 'r': 40, 'e': 80, 'd': 40, 't': 40, 'a': 40, 'l': 40}, latest error: ValueError('No RPC credentials'))
    

    i.e. it was iterating over the string and incrementing an entry for each unique character, which seems to match the docs even though not super clear.~


    Edit: @tdb3 pointed out out of band than I’d probably missed out the []-brackets around the strings to make update() iterate through whole strings instead of chars.

    0ignored_errors.update([f"JSONRPCException {e.error['code']}"])
    

    feels slightly more contrived and verbose to me than

    0ignored_errors[f"JSONRPCException {e.error['code']}"] += 1
    

    but neither is too elegant.

    Might switch away from Counter if something more appropriate is found.

  23. in test/functional/test_framework/test_node.py:336 in 5c95843281 outdated
    335                 elif e.errno == errno.ETIMEDOUT:
    336-                    pass # Treat identical to ECONNRESET
    337+                    ignored_errors["OSError.ETIMEDOUT"] += 1 # Treat identical to ECONNRESET
    338                 elif e.errno == errno.ECONNREFUSED:
    339-                    pass  # Port not yet open?
    340+                    ignored_errors["OSError.ECONNREFUSED"] += 1 # Port not yet open?
    


    tdb3 commented at 6:27 pm on October 31, 2024:

    non-blocking nit: same with these. e.g.

    0ignored_errors.update([errno.errorcode[e.errno]])
    

    hodlinator commented at 8:21 pm on November 2, 2024:
    (Same issue with update(), see #30660 (review)).
  24. in test/functional/feature_framework_rpc_failure.py:35 in 9173b05600 outdated
    30+            BitcoinTestFramework.setup_network(self)
    31+        except AssertionError as e:
    32+            assert_equal(str(e)[:47], "[node 0] Unable to connect to bitcoind after 0s")
    33+            self.log.info(f"{test_name}: Caught AssertionError with expected RPC connection failure message")
    34+        else:
    35+            raise AssertionError("Didn't raise expected error")
    


    tdb3 commented at 6:47 pm on October 31, 2024:
    Maybe we can use util.assert_raises() or util.assert_raises_message() here to keep things more concise?

    hodlinator commented at 8:22 pm on November 2, 2024:
    Had to add a commit (2a0c0e410e9f32dbf7af7229457889b09553c438) that refurbishes assert_raises_message(), but test is much cleaner now!
  25. tdb3 approved
  26. tdb3 commented at 6:49 pm on October 31, 2024: contributor

    Code review ACK 9173b056006fd3940675108aa3fe0d0d855af768

    This seems quite useful when troubleshooting errors. Left a few nits, but nothing I feel very strongly about.

  27. hodlinator force-pushed on Nov 2, 2024
  28. hodlinator commented at 8:26 pm on November 2, 2024: contributor

    Thanks for the review @tdb3!

    Implemented feedback from that review, made the code a bit more Pythonic and improved comments slightly.

  29. in test/functional/test_framework/test_node.py:341 in b6b30c2d9c outdated
    337                     raise
    338+                ignored_errors["missing_credentials"] += 1
    339+                latest_error = repr(e)
    340             time.sleep(1.0 / poll_per_s)
    341-        self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
    342+        self._raise_assertion_error("Unable to connect to bitcoind after {}s (ignored errors: {}, latest error: {})".format(self.rpc_timeout, str(dict(ignored_errors)), latest_error))
    


    tdb3 commented at 10:50 pm on November 2, 2024:
    non-blocking pico-nit: If this file gets touched again, could use f string instead of format. https://github.com/bitcoin/bitcoin/blob/master/test/functional/README.md#style-guidelines

    hodlinator commented at 11:11 pm on November 2, 2024:
    Good point! Resolved in latest push.
  30. tdb3 approved
  31. tdb3 commented at 10:59 pm on November 2, 2024: contributor

    Code Review re ACK 1148c4a53295e7635a813aadfde53ba5d973395c

    Nice work cleaning up the existing code around the core changes. Increases readability.

  32. hodlinator force-pushed on Nov 2, 2024
  33. tdb3 approved
  34. tdb3 commented at 11:15 pm on November 2, 2024: contributor
    re ACK 042cee7073b338d79b5176d4157cf08bbd079b9f
  35. in test/functional/feature_framework_rpc_failure.py:26 in 042cee7073 outdated
    21+    def set_test_params(self):
    22+        self.num_nodes = 1
    23+
    24+    def setup_network(self):
    25+        test_name = Path(__file__).name
    26+        self.log.info(f"{test_name}: Setting RPC timeout to 0 to simulate an unresponsive bitcoind")
    


    maflcko commented at 10:15 am on November 4, 2024:

    Any reason to log the file name here? The containing folder of the log should already have the name of the test/file. If not, the user should know which test they called to know it.

    If not, and this was helpful, it would be better to log it for all tests, instead of just for this one test specifically.


    hodlinator commented at 10:44 am on November 8, 2024:
    Wanted to make clear it wasn’t something internal to BitcoinTestFramework/TestNode deciding to set the RPC timeout. Replaced that with instead appending “, expect one warning” to the log message. (Edit: now pushed).
  36. in test/functional/test_framework/test_node.py:315 in 5ca92b17c5 outdated
    310@@ -311,23 +311,21 @@ def wait_for_rpc_connection(self, *, wait_for_import=True):
    311                 self.rpc_connected = True
    312                 self.url = self.rpc.rpc_url
    313                 return
    314-            except JSONRPCException as e:  # Initialization phase
    315+            except JSONRPCException as e:
    316+                # Avoid raising these, to possibly raise the FailedToStartError on next iteration.
    


    maflcko commented at 11:24 am on November 4, 2024:

    nit in 5ca92b17c5d2d5efa7b2246da1905bb8fd186230: It is obvious, but could mention the success case as well?

    Retry after these, until a permanent success or failure state (such as FailedToStartError, or timeout) is reached.?


    hodlinator commented at 10:47 am on November 8, 2024:
    Good point, changed to Suppress these in favor of a later outcome (success, FailedToStartError, or timeout).
  37. in test/functional/test_framework/test_node.py:317 in 5ca92b17c5 outdated
    310@@ -311,23 +311,21 @@ def wait_for_rpc_connection(self, *, wait_for_import=True):
    311                 self.rpc_connected = True
    312                 self.url = self.rpc.rpc_url
    313                 return
    314-            except JSONRPCException as e:  # Initialization phase
    315+            except JSONRPCException as e:
    316+                # Avoid raising these, to possibly raise the FailedToStartError on next iteration.
    317                 # -28 RPC in warmup
    318                 # -342 Service unavailable, RPC server started but is shutting down due to error
    


    maflcko commented at 11:36 am on November 4, 2024:
    I don’t think this is true, it is just one example. I think 342 can be raised by authproxy.py for various reasons.

    hodlinator commented at 10:48 am on November 8, 2024:
    Made comment less definitive.
  38. in test/functional/test_framework/test_node.py:331 in 5ca92b17c5 outdated
    330-                    pass  # Port not yet open?
    331-                else:
    332+                # ECONNRESET   - This might happen when the RPC server is in warmup, but shut down before the call to
    333+                #                getblockcount succeeds. Try again to properly raise the FailedToStartError
    334+                # ETIMEDOUT    - Treat identical to ECONNRESET
    335+                # ECONNREFUSED - Port not yet open?
    


    maflcko commented at 11:42 am on November 4, 2024:
    nit: Could just mention that the three are treated equal to “-342 Service unavailable”, which is already explained above?

    hodlinator commented at 10:48 am on November 8, 2024:
    Added: Suppress similarly to the above JSONRPCException errors:

    maflcko commented at 3:06 pm on November 8, 2024:

    I still don’t know if the current docs are correct and useful.

    Why would ETIMEDOUT only happen on warmup? Why is it even relevant to mention anything here?

    I think it would be clearer to just mention that the three are treated equal to “-342 Service unavailable”, which is already explained above?


    maflcko commented at 11:27 am on November 18, 2024:

    re #30660 (review)

    Are you still working on this?


    hodlinator commented at 10:19 pm on November 18, 2024:
    I am still working on this PR. Not thrilled about removing the 3 comment strings predating this PR.

    hodlinator commented at 10:35 am on November 26, 2024:
    I am satisfied with the PR in it’s latest revision, but if you don’t consider this a nit, but rather a blocker, let me know.

    ryanofsky commented at 7:57 pm on January 6, 2025:

    In commit “test refactor: wait_for_rpc_connection - Treat OSErrors the same” (5321eb089cd1b39160038050a6ef6f27fd6f4ef9)

    I think these documentation changes look ok, but looking at old and new versions side by side I mostly found the previous comments easier to follow because they were more direct and interspersed with relevant code.

    E.g. the previous “# Initialization phase” comment next to the exception being caught told me that the exception was expected during initialization. The current comment “Suppress these in favor of a later outcome” tells me the code does not care about this exception as much as other exceptions, but not why.

    Am fine with both versions though.


    hodlinator commented at 4:43 pm on January 9, 2025:

    I mostly found the previous comments easier to follow because they were more direct and interspersed with relevant code.

    Hopefully this is from the latest push can be considered an improvement:

    0if e.errno not in [ errno.ECONNRESET,     # This might happen when the RPC server is in warmup,
    1                                          # but shut down before the call to getblockcount succeeds.
    2                    errno.ETIMEDOUT,      # Treat identical to ECONNRESET
    3                    errno.ECONNREFUSED ]: # Port not yet open?
    
  39. in test/functional/test_framework/test_node.py:328 in 5ca92b17c5 outdated
    335+                # ECONNREFUSED - Port not yet open?
    336+                if e.errno not in [errno.ECONNRESET, errno.ETIMEDOUT, errno.ECONNREFUSED]:
    337                     raise  # unknown OS error
    338-            except ValueError as e:  # cookie file not found and no rpcuser or rpcpassword; bitcoind is still starting
    339+            except ValueError as e:
    340+                # If cookie file is missing and no rpcuser or rpcpassword; bitcoind is still starting, otherwise raise
    


    maflcko commented at 11:47 am on November 4, 2024:

    nit: I don’t think “bitcoind is still starting” is true either. It is just one example. Another example would be that the cookie file is (accidentally) disabled in the config.

    What about just `Retry if cookie file is not created and no rpcuser or rpcpassword."?


    hodlinator commented at 10:49 am on November 8, 2024:
    Agree, made comment less definitive, but still think it’s good to include the common cause: Suppress if cookie file is missing and no rpcuser or rpcpassword; bitcoind may be starting
  40. maflcko commented at 6:18 pm on November 6, 2024: member
    left some nits on the first commit. Will take a look at the others later.
  41. hodlinator force-pushed on Nov 8, 2024
  42. hodlinator commented at 10:56 am on November 8, 2024: contributor
    Thanks for the reviews! Was trying to keep comments largely intact, but agree with @maflcko that they were somewhat incorrect, so altered them. Also renamed ignored_errors (introduced by this PR) to suppressed_errors since it felt more fitting and also matches the updated comments.
  43. DrahtBot added the label CI failed on Nov 8, 2024
  44. DrahtBot commented at 11:32 am on November 8, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32707688603

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  45. hodlinator force-pushed on Nov 8, 2024
  46. DrahtBot removed the label CI failed on Nov 8, 2024
  47. tdb3 approved
  48. tdb3 commented at 5:32 pm on November 9, 2024: contributor
    Code Review re ACK 68b0530d5fe867dbee5e551ecf6d491fecbc77f1
  49. m3dwards commented at 10:40 am on November 18, 2024: contributor

    Concept ACK

    I like the motivation to cleanup the logs and remove the cascade of exceptions.

    To test, I modified the included test to see what errors a user would have seen without the assertion:

     0--- a/test/functional/feature_framework_rpc_failure.py
     1+++ b/test/functional/feature_framework_rpc_failure.py
     2@@ -23,7 +23,8 @@ class FeatureFrameworkRPCFailure(BitcoinTestFramework):
     3         self.log.info(f"Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.")
     4         self.rpc_timeout = 0
     5
     6-        assert_raises_message(AssertionError, "[node 0] Unable to connect to bitcoind after 0s", BitcoinTestFramework.setup_network, self)
     7+        BitcoinTestFramework.setup_network(self)
     8
     9     def run_test(self):
    10         pass
    
     02024-11-18T09:59:33.936000Z TestFramework (INFO): PRNG seed is: 6704525597374825472
     12024-11-18T09:59:33.937000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_kccj2gaf
     22024-11-18T09:59:33.939000Z TestFramework (INFO): Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.
     32024-11-18T09:59:33.944000Z TestFramework (ERROR): Assertion failed
     4Traceback (most recent call last):
     5  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     6    node.wait_for_rpc_connection()
     7  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 335, in wait_for_rpc_connection
     8    self._raise_assertion_error("Unable to connect to bitcoind after {}s".format(self.rpc_timeout))
     9  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 198, in _raise_assertion_error
    10    raise AssertionError(self._node_msg(msg))
    11AssertionError: [node 0] Unable to connect to bitcoind after 0s
    12
    13During handling of the above exception, another exception occurred:
    14
    15Traceback (most recent call last):
    16  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 131, in main
    17    self.setup()
    18  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 302, in setup
    19    self.setup_network()
    20  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 24, in setup_network
    21    BitcoinTestFramework.setup_network(self)
    22  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 396, in setup_network
    23    self.setup_nodes()
    24  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    25    self.start_nodes()
    26  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    27    self.stop_nodes()
    28  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    29    node.stop_node(wait=wait, wait_until_stopped=False)
    30  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 396, in stop_node
    31    self.stop(wait=wait)
    32    ^^^^^^^^^
    33  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 215, in __getattr__
    34    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    35           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    36AssertionError: [node 0] Error: no RPC connection
    372024-11-18T09:59:33.997000Z TestFramework (INFO): Stopping nodes
    38Traceback (most recent call last):
    39  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 32, in <module>
    40    FeatureFrameworkRPCFailure(__file__).main()
    41  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 155, in main
    42    exit_code = self.shutdown()
    43                ^^^^^^^^^^^^^^^
    44  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 318, in shutdown
    45    self.stop_nodes()
    46  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    47    node.stop_node(wait=wait, wait_until_stopped=False)
    48  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 396, in stop_node
    49    self.stop(wait=wait)
    50    ^^^^^^^^^
    51  File "/Users/max/source/bitcoin/build/test/functional/test_framework/test_node.py", line 215, in __getattr__
    52    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    53           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    54AssertionError: [node 0] Error: no RPC connection
    55[node 0] Cleaning up leftover process
    
     02024-11-18T10:09:52.135000Z TestFramework (INFO): PRNG seed is: 7321683449385937851
     12024-11-18T10:09:52.135000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a
     22024-11-18T10:09:52.137000Z TestFramework (INFO): Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.
     32024-11-18T10:09:52.141000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 47850 so that wait_until_stopped will not time out.
     42024-11-18T10:09:52.194000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
     7    self.setup()
     8  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
     9    self.setup_network()
    10  File "/Users/max/source/bitcoin/./build/test/functional/feature_framework_rpc_failure.py", line 26, in setup_network
    11    BitcoinTestFramework.setup_network(self)
    12  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    13    self.setup_nodes()
    14  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    15    self.start_nodes()
    16  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
    17    node.wait_for_rpc_connection()
    18  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 344, in wait_for_rpc_connection
    19    self._raise_assertion_error(f"Unable to connect to bitcoind after {self.rpc_timeout}s (ignored errors: {str(dict(suppressed_errors))}, latest error: {latest_error})")
    20  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 199, in _raise_assertion_error
    21    raise AssertionError(self._node_msg(msg))
    22AssertionError: [node 0] Unable to connect to bitcoind after 0s (ignored errors: {}, latest error: )
    232024-11-18T10:09:52.250000Z TestFramework (INFO): Stopping nodes
    242024-11-18T10:09:52.250000Z TestFramework (WARNING): Not cleaning up dir /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a
    252024-11-18T10:09:52.250000Z TestFramework (ERROR): Test failed. Test logging available at /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a/test_framework.log
    262024-11-18T10:09:52.250000Z TestFramework (ERROR):
    272024-11-18T10:09:52.251000Z TestFramework (ERROR): Hint: Call /Users/max/source/bitcoin/test/functional/combine_logs.py '/var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_izqaxw8a' to consolidate all logs
    282024-11-18T10:09:52.251000Z TestFramework (ERROR):
    292024-11-18T10:09:52.251000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    302024-11-18T10:09:52.251000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    312024-11-18T10:09:52.251000Z TestFramework (ERROR):
    

    I prefer the logs with this change!

    Next I modified bitcoind so that it would fail to start and ran a random functional test (p2p_blockfilters.py)

     02024-11-18T09:39:40.861000Z TestFramework (INFO): PRNG seed is: 8730495587434508112
     12024-11-18T09:39:40.862000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_0gw_zhhl
     22024-11-18T09:39:41.372000Z TestFramework (ERROR): Assertion failed
     3Traceback (most recent call last):
     4  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     5    node.wait_for_rpc_connection()
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 276, in wait_for_rpc_connection
     7    raise FailedToStartError(self._node_msg(
     8test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization.
     9
    10During handling of the above exception, another exception occurred:
    11
    12Traceback (most recent call last):
    13  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    14    self.setup()
    15  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    16    self.setup_network()
    17  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    18    self.setup_nodes()
    19  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    20    self.start_nodes()
    21  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    22    self.stop_nodes()
    23  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    24    node.stop_node(wait=wait, wait_until_stopped=False)
    25  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 396, in stop_node
    26    self.stop(wait=wait)
    27    ^^^^^^^^^
    28  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 215, in __getattr__
    29    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    30           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    31AssertionError: [node 0] Error: no RPC connection
    322024-11-18T09:39:41.431000Z TestFramework (INFO): Stopping nodes
    33Traceback (most recent call last):
    34  File "/Users/max/source/bitcoin/build/test/functional/p2p_blockfilters.py", line 285, in <module>
    35    CompactFiltersTest(__file__).main()
    36  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    37    exit_code = self.shutdown()
    38                ^^^^^^^^^^^^^^^
    39  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    40    self.stop_nodes()
    41  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 587, in stop_nodes
    42    node.stop_node(wait=wait, wait_until_stopped=False)
    43  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 396, in stop_node
    44    self.stop(wait=wait)
    45    ^^^^^^^^^
    46  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 215, in __getattr__
    47    assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection")
    48           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    49AssertionError: [node 0] Error: no RPC connection
    50[node 1] Cleaning up leftover process
    51[node 0] Cleaning up leftover process
    
     02024-11-15T13:07:28.308000Z TestFramework (INFO): PRNG seed is: 2784953090753807617
     12024-11-15T13:07:28.308000Z TestFramework (INFO): Initializing test directory /var/folders/q9/65c9dvyj3lbdsv5zrhp_8qk80000gn/T/bitcoin_func_test_udgky3fr
     22024-11-15T13:07:28.821000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26867 so that wait_until_stopped will not time out.
     32024-11-15T13:07:28.821000Z TestFramework.node1 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26868 so that wait_until_stopped will not time out.
     42024-11-15T13:07:28.821000Z TestFramework (ERROR): Assertion failed
     5Traceback (most recent call last):
     6  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
     7    node.wait_for_rpc_connection()
     8  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 279, in wait_for_rpc_connection
     9    raise FailedToStartError(self._node_msg(
    10test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization.
    11
    12During handling of the above exception, another exception occurred:
    13
    14Traceback (most recent call last):
    15  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
    16    self.setup()
    17  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    18    self.setup_network()
    19  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    20    self.setup_nodes()
    21  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    22    self.start_nodes()
    23  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 572, in start_nodes
    24    self.stop_nodes()
    25  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 591, in stop_nodes
    26    node.wait_until_stopped()
    27  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in wait_until_stopped
    28    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    29  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 860, in wait_until
    30    return wait_until_helper_internal(test_function, timeout=timeout, timeout_factor=self.timeout_factor)
    31           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    32  File "/Users/max/source/bitcoin/test/functional/test_framework/util.py", line 291, in wait_until_helper_internal
    33    if predicate():
    34       ^^^^^^^^^^^
    35  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in <lambda>
    36    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    37                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    38  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 439, in is_node_stopped
    39    assert return_code == expected_ret_code, self._node_msg(
    40           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    41AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping
    422024-11-15T13:07:28.880000Z TestFramework (INFO): Stopping nodes
    432024-11-15T13:07:28.881000Z TestFramework.node0 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26867 so that wait_until_stopped will not time out.
    442024-11-15T13:07:28.881000Z TestFramework.node1 (WARNING): Cannot call stop-RPC as we are not connected. Killing process 26868 so that wait_until_stopped will not time out.
    45Traceback (most recent call last):
    46  File "/Users/max/source/bitcoin/build/test/functional/p2p_blockfilters.py", line 285, in <module>
    47    CompactFiltersTest(__file__).main()
    48  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 155, in main
    49    exit_code = self.shutdown()
    50                ^^^^^^^^^^^^^^^
    51  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 318, in shutdown
    52    self.stop_nodes()
    53  File "/Users/max/source/bitcoin/test/functional/test_framework/test_framework.py", line 591, in stop_nodes
    54    node.wait_until_stopped()
    55  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in wait_until_stopped
    56    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    57  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 860, in wait_until
    58    return wait_until_helper_internal(test_function, timeout=timeout, timeout_factor=self.timeout_factor)
    59           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    60  File "/Users/max/source/bitcoin/test/functional/test_framework/util.py", line 291, in wait_until_helper_internal
    61    if predicate():
    62       ^^^^^^^^^^^
    63  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 462, in <lambda>
    64    self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    65                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    66  File "/Users/max/source/bitcoin/test/functional/test_framework/test_node.py", line 439, in is_node_stopped
    67    assert return_code == expected_ret_code, self._node_msg(
    68           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    69AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping
    70[node 1] Cleaning up leftover process 26868
    71[node 0] Cleaning up leftover process 26867
    

    I’m not sure if it’s the intention of this PR to improve things in that scenario but the log doesn’t seem much better and if anything it was slightly easier to understand before.

    Before this change the logs showed AssertionError: [node 0] Error: no RPC connection and after this change that is now AssertionError: [node 0] Node returned unexpected exit code (1) vs (-9) when stopping. I feel like no RPC connection is the better error here?

  50. hodlinator force-pushed on Nov 18, 2024
  51. hodlinator commented at 11:13 pm on November 18, 2024: contributor

    Thanks @m3dwards for having a look!

    I had not tested the case of bitcoind exiting early together with my suggested change here (focused more on stalling as in one of the issues inspiring it).

    Made some changes to this part of TestNode.stop_node() to handle an already dead process:

    https://github.com/bitcoin/bitcoin/blob/ba28593147ccc880c3a4d40db0d4ef57f4766254/test/functional/test_framework/test_node.py#L412-L420

    With those changes, the output for such a case is now:

     0 build/test/functional/p2p_blockfilters.py 
     12024-11-18T23:01:17.204000Z TestFramework (INFO): PRNG seed is: 5667811122664278008
     22024-11-18T23:01:17.205000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_mzr3jg96
     32024-11-18T23:01:17.458000Z TestFramework.node0 (WARNING): Process 175180 already died with return code 1.
     42024-11-18T23:01:17.458000Z TestFramework.node1 (WARNING): Process 175181 already died with return code 1.
     52024-11-18T23:01:17.459000Z TestFramework (ERROR): Unexpected exception caught during testing
     6Traceback (most recent call last):
     7  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 131, in main
     8    self.setup()
     9  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 302, in setup
    10    self.setup_network()
    11  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 396, in setup_network
    12    self.setup_nodes()
    13  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 418, in setup_nodes
    14    self.start_nodes()
    15  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_framework.py", line 569, in start_nodes
    16    node.wait_for_rpc_connection()
    17  File "/home/hodlinator/bitcoin/test/functional/test_framework/test_node.py", line 279, in wait_for_rpc_connection
    18    raise FailedToStartError(self._node_msg(
    19test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization. 
    202024-11-18T23:01:17.513000Z TestFramework (INFO): Stopping nodes
    212024-11-18T23:01:17.513000Z TestFramework (WARNING): Not cleaning up dir /tmp/bitcoin_func_test_mzr3jg96
    222024-11-18T23:01:17.513000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/bitcoin_func_test_mzr3jg96/test_framework.log
    232024-11-18T23:01:17.513000Z TestFramework (ERROR): 
    242024-11-18T23:01:17.513000Z TestFramework (ERROR): Hint: Call /home/hodlinator/bitcoin/test/functional/combine_logs.py '/tmp/bitcoin_func_test_mzr3jg96' to consolidate all logs
    252024-11-18T23:01:17.513000Z TestFramework (ERROR): 
    262024-11-18T23:01:17.514000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    272024-11-18T23:01:17.514000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    282024-11-18T23:01:17.514000Z TestFramework (ERROR):
    
  52. m3dwards commented at 1:13 pm on November 21, 2024: contributor

    Made some changes to this part of TestNode.stop_node() to handle an already dead process

    Looks cleaner.

    ACK ba28593147ccc880c3a4d40db0d4ef57f4766254

  53. DrahtBot requested review from tdb3 on Nov 21, 2024
  54. in test/functional/test_framework/test_node.py:8 in ba28593147 outdated
    4@@ -5,6 +5,7 @@
    5 """Class for bitcoind node under test"""
    6 
    7 import contextlib
    8+from collections import Counter
    


    hodlinator commented at 8:31 pm on December 3, 2024:
    (Could possibly use collections.defaultdict(int) instead of Counter iff I really need to retouch).
  55. DrahtBot added the label CI failed on Dec 11, 2024
  56. DrahtBot commented at 1:30 am on December 11, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33168290506

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  57. hodlinator force-pushed on Dec 11, 2024
  58. hodlinator commented at 8:29 pm on December 11, 2024: contributor
    • Fixed ruff linter error reported by CI due to #31417 being merged.
    • Added feature_framework_stop_node.py-test to verify clean shutdown of TestNode when the bitcoind-process already exited/crashed, issue found earlier by m3dwards. (Also added comment justifying setting of self.expected_ret_code in test_node.py).
    • Switched from Counter to simpler collections.defaultdict(int)-type.
    • Updated copyright headers to include “-present”.
  59. hodlinator force-pushed on Dec 11, 2024
  60. DrahtBot removed the label CI failed on Dec 11, 2024
  61. in test/functional/test_framework/test_node.py:329 in 5321eb089c outdated
    337+                # ECONNREFUSED - Port not yet open?
    338+                if e.errno not in [errno.ECONNRESET, errno.ETIMEDOUT, errno.ECONNREFUSED]:
    339                     raise  # unknown OS error
    340-            except ValueError as e:  # cookie file not found and no rpcuser or rpcpassword; bitcoind is still starting
    341+            except ValueError as e:
    342+                # Suppress if cookie file is missing and no rpcuser or rpcpassword; bitcoind may be starting
    


    ryanofsky commented at 7:59 pm on January 6, 2025:

    In commit “test refactor: wait_for_rpc_connection - Treat OSErrors the same” (5321eb089cd1b39160038050a6ef6f27fd6f4ef9)

    Found both this comment “cookie file is missing” and previous comment “cookie file not found” confusing because it is not clear why the cookie file would not be present. IMO saying “node has not generated the cookie file yet” would be clearer.


    hodlinator commented at 4:41 pm on January 9, 2025:
    New version: # Suppress if cookie file isn't generated yet and no rpcuser or rpcpassword; bitcoind may be starting.
  62. in test/functional/test_framework/test_node.py:453 in db0e1580a5 outdated
    447@@ -447,7 +448,9 @@ def is_node_stopped(self, *, expected_stderr="", expected_ret_code=0):
    448 
    449     def wait_until_stopped(self, *, timeout=BITCOIND_PROC_WAIT_TIMEOUT, expect_error=False, **kwargs):
    450         if "expected_ret_code" not in kwargs:
    451-            kwargs["expected_ret_code"] = 1 if expect_error else 0  # Whether node shutdown return EXIT_FAILURE or EXIT_SUCCESS
    452+            # Whether node shutdown is expected to return EXIT_FAILURE (1) or
    453+            # EXIT_SUCCESS (self.expected_ret_code).
    454+            kwargs["expected_ret_code"] = 1 if expect_error else self.expected_ret_code
    


    ryanofsky commented at 8:30 pm on January 6, 2025:

    In commit “test: Allow changing expected internal return code” (db0e1580a5212a162908ac4f23e935e1f76e06dc)

    Now that this logic is getting more complicated with self.expected_ret_code I think the expect_error argument should be dropped so the code can be more understandable. It looks like there is only a single expect_error usage in feature_abortnode.py and it can be easily replaced by using expected_ret_code. Would suggest:

     0--- a/test/functional/feature_abortnode.py
     1+++ b/test/functional/feature_abortnode.py
     2@@ -36,7 +36,7 @@ class AbortNodeTest(BitcoinTestFramework):
     3 
     4             # Check that node0 aborted
     5             self.log.info("Waiting for crash")
     6-            self.nodes[0].wait_until_stopped(timeout=5, expect_error=True, expected_stderr="Error: A fatal internal error occurred, see debug.log for details: Failed to disconnect block.")
     7+            self.nodes[0].wait_until_stopped(timeout=5, expected_ret_code=1, expected_stderr="Error: A fatal internal error occurred, see debug.log for details: Failed to disconnect block.")
     8         self.log.info("Node crashed - now verifying restart fails")
     9         self.nodes[0].assert_start_raises_init_error()
    10 
    11--- a/test/functional/test_framework/test_node.py
    12+++ b/test/functional/test_framework/test_node.py
    13@@ -460,11 +460,11 @@ class TestNode():
    14         self.log.debug("Node stopped")
    15         return True
    16 
    17-    def wait_until_stopped(self, *, timeout=BITCOIND_PROC_WAIT_TIMEOUT, expect_error=False, **kwargs):
    18+    def wait_until_stopped(self, *, timeout=BITCOIND_PROC_WAIT_TIMEOUT, **kwargs):
    19         if "expected_ret_code" not in kwargs:
    20             # Whether node shutdown is expected to return EXIT_FAILURE (1) or
    21             # EXIT_SUCCESS (self.expected_ret_code).
    22-            kwargs["expected_ret_code"] = 1 if expect_error else self.expected_ret_code
    23+            kwargs["expected_ret_code"] = self.expected_ret_code
    24         self.wait_until(lambda: self.is_node_stopped(**kwargs), timeout=timeout)
    25 
    26     def replace_in_config(self, replacements):
    

    hodlinator commented at 4:45 pm on January 9, 2025:
    Taken, although I moved some of the handling of expected_ret_code directly into is_node_stopped().
  63. in test/functional/test_framework/test_node.py:431 in 4ab7879ec6 outdated
    421             else:
    422-                self.stop()
    423-        except http.client.CannotSendRequest:
    424-            self.log.exception("Unable to stop node.")
    425+                self.log.warning(f"Process {self.process.pid} already died with return code {return_code}.")
    426+                # If we no longer have a running process once stop_node() is called, assume we've already triggered an
    


    ryanofsky commented at 8:47 pm on January 6, 2025:

    In commit “test: Kill process if stop-RPC is not available” (4ab7879ec6b71804cc3a51bcf58fa6318fa103b0)

    This change doesn’t seem safe. The stop_node method can be called by arbitrary test code at arbitrary times, including when self.rpc_connected is False, and I don’t think it should be suppressing errors unless the caller has explicitly requested that errors should not be passed on.

    Would suggest narrowing the error suppression behavior, so errors and bugs will be less likely to go unnoticed because of this change:

     0--- a/test/functional/feature_framework_stop_node.py
     1+++ b/test/functional/feature_framework_stop_node.py
     2@@ -42,7 +42,7 @@ class FeatureFrameworkStopNode(BitcoinTestFramework):
     3         assert self.nodes[0].running, \
     4             "The process should still be flagged as running before calling stop_node()"
     5         self.log.debug("Explicitly stopping the node to verify it completes cleanly during the test")
     6-        self.nodes[0].stop_node(expected_stderr=invalid_arg_error)
     7+        self.nodes[0].stop_node(expected_stderr=invalid_arg_error, force=True)
     8 
     9 if __name__ == '__main__':
    10     FeatureFrameworkStopNode(__file__).main()
    11--- a/test/functional/test_framework/test_framework.py
    12+++ b/test/functional/test_framework/test_framework.py
    13@@ -315,7 +315,7 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass):
    14         if not self.options.noshutdown:
    15             self.log.info("Stopping nodes")
    16             if self.nodes:
    17-                self.stop_nodes()
    18+                self.stop_nodes(force=self.success != TestStatus.PASSED)
    19         else:
    20             for node in self.nodes:
    21                 node.cleanup_on_exit = False
    22@@ -569,7 +569,7 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass):
    23                 node.wait_for_rpc_connection()
    24         except Exception:
    25             # If one node failed to start, stop the others
    26-            self.stop_nodes()
    27+            self.stop_nodes(force=True)
    28             raise
    29 
    30         if self.options.coveragedir is not None:
    31@@ -580,11 +580,11 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass):
    32         """Stop a bitcoind test node"""
    33         self.nodes[i].stop_node(expected_stderr, wait=wait)
    34 
    35-    def stop_nodes(self, wait=0):
    36+    def stop_nodes(self, wait=0, force=False):
    37         """Stop multiple bitcoind test nodes"""
    38         for node in self.nodes:
    39             # Issue RPC to stop nodes
    40-            node.stop_node(wait=wait, wait_until_stopped=False)
    41+            node.stop_node(wait=wait, wait_until_stopped=False, force=force)
    42 
    43         for node in self.nodes:
    44             # Wait for nodes to stop
    45--- a/test/functional/test_framework/test_node.py
    46+++ b/test/functional/test_framework/test_node.py
    47@@ -393,12 +393,30 @@ class TestNode():
    48     def version_is_at_least(self, ver):
    49         return self.version is None or self.version >= ver
    50 
    51-    def stop_node(self, expected_stderr='', *, wait=0, wait_until_stopped=True):
    52-        """Stop the node."""
    53+    def stop_node(self, expected_stderr='', *, wait=0, wait_until_stopped=True, force=False):
    54+        """Stop the node.
    55+
    56+        Args:
    57+            force (bool): Stop the node forcefully if necessary, and change
    58+                errors that would otherwise be exceptions into logged warnings.
    59+
    60+        Notes:
    61+            - The force argument should only be set to true when a test has
    62+            already failed, and the caller is already throwing an exception of
    63+            its own, and wants its own exception to take precedence over errors
    64+            from this function. Since the error that happened was presumably
    65+            unknown and unexpected, the node may have already crashed or may
    66+            need to be shut down forcefully.
    67+        """
    68         if not self.running:
    69             return
    70         self.log.debug("Stopping node")
    71-        if self.rpc_connected:
    72+
    73+        # If self.running is true but self.rpc_connected is not true, it means
    74+        # that the process has been started, but wait_for_rpc_connection() was
    75+        # not able to make a successful connection. In this case, if force is
    76+        # true, avoid trying to make another RPC call and just stop the process.
    77+        if not force or self.rpc_connected:
    78             try:
    79                 # Do not use wait argument when testing older nodes, e.g. in wallet_backwards_compatibility.py
    80                 if self.version_is_at_least(180000):
    

    Note: The diff above is still calling the stop RPC instead of killing the node when self.rpc_connected is true. But it might actually be more desirable to change if not force or self.rpc_connected: there to if not force: and prefer to kill node processes instead of stopping them after test failures. Would not suggest doing this here, since it would change other behavior beyond the scope of the PR, but I could imagine cases where changing this might reduce noise and help debugging errors in CI.


    hodlinator commented at 10:31 am on January 9, 2025:

    It seems wrong to me to attempt using the RPC connection in stop_node() when we known that self.rpc_connected == False. New version has:

    0if self.rpc_connected:
    1    # Call stop-RPC...
    2elif force:
    3    # Kill etc...
    4else:
    5    raise RuntimeError(f"Cannot call stop-RPC as we don't have an RPC connection to process {self.process.pid}.")
    

    Edit: (Ended up calling it avoid_exceptions which I think is slightly better than force).

  64. in test/functional/test_framework/test_node.py:428 in 4ab7879ec6 outdated
    415+            return_code = self.process.poll()
    416+            if return_code is None:
    417+                self.log.warning("Cannot call stop-RPC as we are not connected. "
    418+                    f"Killing process {self.process.pid} so that wait_until_stopped will not time out.")
    419+                self.expected_ret_code = 1 if platform.system() == "Windows" else -9
    420+                self.process.kill()
    


    ryanofsky commented at 4:05 pm on January 7, 2025:

    In commit “test: Kill process if stop-RPC is not available” (4ab7879ec6b71804cc3a51bcf58fa6318fa103b0)

    I think this is probably the simplest way to write this code and we should stick with it for now, but just want to point out there is a minor race condition here because the process could end between the time that poll and kill are called and then the kill call could throw an exception.

    Another separate idea is that it might be safer to write self.expected_ret_code = self.process.wait() instead of hardcoding 1/-9 return codes. Again though would be inclined to stick with the working code unless there is a more of a reason to change.


    hodlinator commented at 9:52 am on January 9, 2025:

    Thanks for pointing out the race condition! It is not so likely if we get here because of RPC connection timeout, but still possible.

    I tried adding a second .kill() after the first one and it does not raise an exception, and no exception is documented.

  65. in test/functional/feature_framework_stop_node.py:45 in 608b9ff097 outdated
    40+            self=self.nodes[0]
    41+        )
    42+        assert self.nodes[0].running, \
    43+            "The process should still be flagged as running before calling stop_node()"
    44+        self.log.debug("Explicitly stopping the node to verify it completes cleanly during the test")
    45+        self.nodes[0].stop_node(expected_stderr=invalid_arg_error)
    


    ryanofsky commented at 5:02 pm on January 7, 2025:

    In commit “test: Add feature_framework_stop_node.py” (608b9ff097214bff3527cc240b7be313c62131d5)

    One problem with this test as currently written is that it prints a real warning to the output (like TestFramework.node0 (WARNING): Process 579831 already died with return code 1). This is potentially confusing because this warning is expected and should not be a cause for concern, but you can’t know that from looking at test output.

    I can think of a few possible solutions:

    • Redirecting log output during the stop_node call and checking for the expected warning with an assertion here.
    • Letting the stop_node function throw exceptions instead of converting errors into warnings, then catching and handling the exceptions in the places (just 2 places currently) where they are expected.
    • Not changing anything but just add a warning before the warning instructing reader to ignore the next warning.

    hodlinator commented at 4:36 pm on January 9, 2025:
    Went for option 3, making clear in the log message just above that a WARNING is expected.
  66. in test/functional/feature_framework_rpc_failure.py:26 in f67d896357 outdated
    21+
    22+    def setup_network(self):
    23+        self.log.info("Setting RPC timeout to 0 to simulate an unresponsive bitcoind, expect one warning.")
    24+        self.rpc_timeout = 0
    25+
    26+        assert_raises_message(AssertionError, "[node 0] Unable to connect to bitcoind after 0s", BitcoinTestFramework.setup_network, self)
    


    ryanofsky commented at 5:20 pm on January 7, 2025:

    In commit “test: Add feature_framework_rpc_failure.py” (f67d8963575feecdd4914c1a56bab4e0bebc000d)

    Note: there is no test coverage for the “ignored errors” and “latest error” parts of the message. I think it would be good to add some coverage for this, even if it just checking for 0 and None, so there is a at least a hint that more coverage could be added here, especially since it’s possible to imagine this functionality becoming broken in the future and more coverage helping prevent this.


    hodlinator commented at 4:37 pm on January 9, 2025:
    Added test/functional/feature_framework_rpc_failure_details.py.
  67. in test/functional/test_framework/test_node.py:160 in db0e1580a5 outdated
    156@@ -157,6 +157,7 @@ def __init__(self, i, datadir_path, *, chain, rpchost, timewait, timeout_factor,
    157         self.rpc = None
    158         self.url = None
    159         self.log = logging.getLogger('TestFramework.node%d' % i)
    160+        self.expected_ret_code = 0 # EXIT_SUCCESS
    


    ryanofsky commented at 6:13 pm on January 7, 2025:

    In commit “test: Allow changing expected internal return code” (db0e1580a5212a162908ac4f23e935e1f76e06dc)

    Note: I like the changes in this commit, and think the new TestNode.expected_ret_code member could be a useful way to signal expected statuses when shutdown starts. But just want to point out that I don’t think adding a new expected_ret_code member is strictly necessary for this PR to work. The only place it is actually used later in the PR is in the stop_node function when it is killing a failed process, and it seems to me that stop_node could just as easily set self.running = False instead and call process.wait() itself to reap the pid without blocking.


    hodlinator commented at 4:39 pm on January 9, 2025:
    Interesting, I do agree there is a spaghetti/complexity factor to expected_ret_code, so might revisit this.
  68. ryanofsky commented at 6:54 pm on January 7, 2025: contributor

    Code review 608b9ff097214bff3527cc240b7be313c62131d5

    Nice changes in this PR. I’ve been confused by the error output from wait_for_rpc_connection() before, so think this will be helpful.

    One thing that wasn’t obvious to me from the PR description is that this improving error output in two separate ways:

    • One change here is adding error output to wait_for_rpc_connection. Previously errors that happened during polling before the final “Unable to connect to bitcoind” error were silently ignored, so less information would be visible about what was causing the problem. Now counts of the different errors and the most recent message are shown.

    • Another change here is suppressing error output after the wait_for_rpc_connection error, specifically suppressing exceptions from in stop_node when trying to call stop method fails.

    I didn’t ACK the PR because I think the changes to the stop_node method are overbroad, and I suggested a way to limit them. Other than that I think this all looks very good.

  69. qa refactor: wait_for_rpc_connection - Treat OSErrors the same
    ConnectionResetError is an OSError as well (ECONNRESET), no reason to have a separate except-block for it.
    
    Also improves comments for other exceptions and make condition above more Pythonic.
    865571849a
  70. qa: Include ignored errors in RPC connection timeout
    When an RPC connection attempt with bitcoind times out, include which ignored errors occurred in the exception message.
    
    May provide clues of what has gone wrong.
    6347c222f7
  71. qa: Allow changing expected internal return code
    To be used in next commit.
    
    Also adds process IDs in relevant log messages.
    9de837e615
  72. qa: Kill process if stop-RPC is not available
    `BitcoinTestFramework.start_nodes()`, upon failing to establish RPC connections, calls `stop_nodes()` which would raise an exception immediately through attempting to use the non-existent RPC interface, preventing a clean shutdown.
    
    We now avoid calling the RPC interface and instead kill the bitcoind process, so that subsequent wait_until_stopped() calls will succeed instantly instead of waiting for the full timeout only to raise another exception.
    
    Since explicit killing means the bitcoind process will not exit with return code 0, we adjust the expected value.
    
    Avoid beating the dead horse if the process has already exited by the time stop_node() is called.
    4538700cec
  73. qa: assert_raises_message - don't require JSONRPCException type layout
    Not all exceptions have an .error dict containing a 'message' key. (A leftover from before the separate assert_raises_rpc_error was introduced).
    
    The function is not actually called as of this commit, but will start getting used again in the next commit.
    14e124be79
  74. qa: Add feature_framework_rpc_failure.py
    Verifies correct behavior when failing to connect to bitcoind RPC interface during setup. Parent commit "qa: Kill process if stop-RPC is not available" fixes issues where a raised exception raised additional knock-on exceptions.
    23bc6bf11f
  75. hodlinator renamed this:
    test: Shut down framework cleanly on RPC connection failure
    qa: Shut down framework cleanly on RPC connection failure
    on Jan 9, 2025
  76. hodlinator force-pushed on Jan 9, 2025
  77. hodlinator commented at 4:55 pm on January 9, 2025: contributor

    Updated based off ryanofsky’s review.

    Hopefully latest PR summary is slightly clearer.

    Replaced “test” -> “qa” since PR is about functional tests and not unit tests, should conform better with CONTRIBUTING.md.

  78. hodlinator force-pushed on Jan 9, 2025
  79. DrahtBot added the label CI failed on Jan 9, 2025
  80. DrahtBot commented at 6:21 pm on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35384673501

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  81. qa: Add feature_framework_rpc_failure_details.py ee4c9d7eba
  82. qa: Add feature_framework_stop_node.py 8148e4da2c
  83. hodlinator force-pushed on Jan 9, 2025
  84. DrahtBot removed the label CI failed on Jan 9, 2025
  85. in test/functional/test_framework/test_framework.py:572 in 8148e4da2c
    568@@ -569,7 +569,7 @@ def start_nodes(self, extra_args=None, *args, **kwargs):
    569                 node.wait_for_rpc_connection()
    570         except Exception:
    571             # If one node failed to start, stop the others
    572-            self.stop_nodes()
    573+            self.stop_nodes(avoid_exceptions=True)
    


    maflcko commented at 9:10 am on January 10, 2025:

    coming from the conflicting pull, I don’t think this call should exist in the first place for two reasons:

    • It doesn’t work in most cases, because a failure to start any node likely means the RPC isn’t up for the other nodes either, so the call will just fail in most cases. Effectively just using the kill fallback.
    • It doesn’t make sense to special case startup. Nodes can at any time fail to start (not only in start_nodes), or crash, and adding this exception handling to all places isn’t useful nor trivially possible.

    So I removed it in #31620.

    (Just leaving a comment here to document the merge conflict between the two changes. Should be trivial to solve either way, so nothing to be done here, I think.)

  86. hodlinator commented at 12:53 pm on January 17, 2025: contributor

    Excuse me from being brief earlier.

    Taking the time do document changes in latest push - git range-diff master 608b9ff 8148e4d

    • Inlined comments about different errno-codes instead of duplicating the identifiers in the comment.
    • Made missing cookie file comment slightly more informative.
    • Removed expect_error in favor of expected_ret_code after ryanofsky’s suggestion (grepping for uses of the former). This helps decrease the amount of added complexity to test_node.py.
    • Add avoid_exceptions parameter to stop_node() to make the behavior change more surgical, thanks to ryanofsky’s suggestion. We now only kill the node under more narrow conditions.
    • Clarified in log output of tests when a WARNING is expected (ryanofsky suggestion).
    • ee4c9d7eba6ba6fb6dcb7abd539c23f5b62d5991 - Added feature_framework_rpc_failure_details.py to verify that suppressed exception information is included in RPC connection timeout message (ryanofsky suggestion).
    • Switched feature_framework_stop_node.py from using TestNode.start() + TestNode.wait_for_rpc_connection() to BitcoinTestFramework.start_node() which combines the two.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-23 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me