[qa] Handle potential cookie race when starting node #12902

pull sdaftuar wants to merge 1 commits into bitcoin:master from sdaftuar:2018-04-improve-dbcrash-restarts changing 2 files +11 −0
  1. sdaftuar commented at 1:49 pm on April 6, 2018: member

    When a node is restarted during a test after an unclean shutdown (such as with -dbcrashratio), it’s possible an old cookie file was left behind. This can cause a race condition when restarting the node, where the test framework might try to connect using credentials from the old cookie file, just as the node will generate new credentials and overwrite the old file.

    Delete any such cookie file if present prior to startup.

  2. fanquake added the label Tests on Apr 6, 2018
  3. sdaftuar commented at 1:54 pm on April 6, 2018: member

    I triggered this issue while running the tests on the 0.16 branch:

     0feature_dbcrash.py failed, Duration: 1185 s
     1
     2stdout:
     32018-04-05 17:05:30.803000 TestFramework (INFO): Initializing test directory /tmp/bitcoin_test_runner_20180405_130431/feature_dbcrash_170
     42018-04-05 17:06:08.451000 TestFramework (INFO): Prepped 5001 utxo entries
     52018-04-05 17:06:59.908000 TestFramework (INFO): Iteration 0, generating 2500 transactions [0, 0, 0]
     62018-04-05 17:07:17.195000 TestFramework (INFO): Iteration 1, generating 2500 transactions [0, 0, 0]
     72018-04-05 17:07:33.347000 TestFramework (INFO): Iteration 2, generating 2500 transactions [0, 0, 0]
     82018-04-05 17:07:49.595000 TestFramework (INFO): Iteration 3, generating 2500 transactions [0, 0, 0]
     92018-04-05 17:08:09.162000 TestFramework (INFO): Iteration 4, generating 2500 transactions [0, 0, 0]
    102018-04-05 17:08:26.442000 TestFramework (INFO): Iteration 5, generating 2500 transactions [0, 0, 0]
    112018-04-05 17:08:47.728000 TestFramework (INFO): Iteration 6, generating 2500 transactions [0, 0, 0]
    122018-04-05 17:09:13.273000 TestFramework (INFO): Iteration 7, generating 2500 transactions [1, 0, 0]
    132018-04-05 17:09:34.107000 TestFramework (INFO): Iteration 8, generating 2500 transactions [2, 0, 0]
    142018-04-05 17:10:23.227000 TestFramework (INFO): Iteration 9, generating 2500 transactions [3, 1, 0]
    152018-04-05 17:10:48.601000 TestFramework (INFO): Iteration 10, generating 2500 transactions [3, 1, 0]
    162018-04-05 17:11:14.825000 TestFramework (INFO): Iteration 11, generating 2500 transactions [4, 1, 0]
    172018-04-05 17:11:41.301000 TestFramework (INFO): Iteration 12, generating 2500 transactions [4, 2, 0]
    182018-04-05 17:12:04.667000 TestFramework (INFO): Iteration 13, generating 2500 transactions [5, 2, 0]
    192018-04-05 17:12:29.121000 TestFramework (INFO): Iteration 14, generating 2500 transactions [5, 2, 0]
    202018-04-05 17:12:56.188000 TestFramework (INFO): Iteration 15, generating 2500 transactions [5, 3, 0]
    212018-04-05 17:13:24.200000 TestFramework (INFO): Iteration 16, generating 2500 transactions [6, 3, 0]
    222018-04-05 17:14:22.805000 TestFramework (INFO): Iteration 17, generating 2500 transactions [7, 4, 0]
    232018-04-05 17:15:35.174000 TestFramework (INFO): Iteration 18, generating 2500 transactions [8, 4, 1]
    242018-04-05 17:16:01.286000 TestFramework (INFO): Iteration 19, generating 2500 transactions [8, 4, 1]
    252018-04-05 17:16:30.793000 TestFramework (INFO): Iteration 20, generating 2500 transactions [8, 4, 1]
    262018-04-05 17:16:56.134000 TestFramework (INFO): Iteration 21, generating 2500 transactions [8, 4, 1]
    272018-04-05 17:17:24.077000 TestFramework (INFO): Iteration 22, generating 2500 transactions [8, 4, 1]
    282018-04-05 17:18:05.554000 TestFramework (INFO): Iteration 23, generating 2500 transactions [9, 5, 2]
    292018-04-05 17:18:39.609000 TestFramework (INFO): Iteration 24, generating 2500 transactions [10, 5, 2]
    302018-04-05 17:19:25.984000 TestFramework (INFO): Iteration 25, generating 2500 transactions [10, 5, 2]
    312018-04-05 17:20:07.936000 TestFramework (INFO): Iteration 26, generating 2500 transactions [11, 5, 2]
    322018-04-05 17:20:47.133000 TestFramework (INFO): Iteration 27, generating 2500 transactions [11, 5, 2]
    332018-04-05 17:21:31.789000 TestFramework (INFO): Iteration 28, generating 2500 transactions [11, 5, 3]
    342018-04-05 17:22:11.479000 TestFramework (INFO): Iteration 29, generating 2500 transactions [11, 5, 3]
    352018-04-05 17:22:55.978000 TestFramework (INFO): Iteration 30, generating 2500 transactions [12, 5, 3]
    362018-04-05 17:23:42.142000 TestFramework (INFO): Iteration 31, generating 2500 transactions [12, 5, 3]
    372018-04-05 17:25:15.294000 TestFramework (ERROR): Unexpected exception caught during testing
    38Traceback (most recent call last):
    39  File "/home/prod/jenkins/bitcoin/test/functional/feature_dbcrash.py", line 81, in restart_node
    40    self.start_node(node_index)
    41  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/test_framework.py", line 229, in start_node
    42    node.wait_for_rpc_connection()
    43  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/test_node.py", line 104, in wait_for_rpc_connection
    44    self.rpc.getblockcount()
    45  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/coverage.py", line 47, in __call__
    46    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    47  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/authproxy.py", line 136, in __call__
    48    response = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    49  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/authproxy.py", line 107, in _request
    50    return self._get_response()
    51  File "/home/prod/jenkins/bitcoin/test/functional/test_framework/authproxy.py", line 168, in _get_response
    52    'code': -342, 'message': 'non-JSON HTTP response with \'%i %s\' from server' % (http_response.status, http_response.reason)})
    53test_framework.authproxy.JSONRPCException: non-JSON HTTP response with '401 Unauthorized' from server (-342)
    

    But this is also straightforward to reproduce on master; I was able to do so by adding a small sleep in wait_for_rpc_connection in test_node.py:

     0diff --git a/test/functional/test_framework/test_node.py b/test/functional/test_framework/test_node.py
     1index 291ac3ee46..44627cba7c 100755
     2--- a/test/functional/test_framework/test_node.py
     3+++ b/test/functional/test_framework/test_node.py
     4@@ -111,6 +111,7 @@ class TestNode():
     5                 raise FailedToStartError('bitcoind exited with status {} during initialization'.format(self.process.returncode))
     6             try:
     7                 self.rpc = get_rpc_proxy(rpc_url(self.datadir, self.index, self.rpchost), self.index, timeout=self.rpc_timeout, coveragedir=self.coverage_dir)
     8+                time.sleep(0.5)
     9                 self.rpc.getblockcount()
    10                 # If the call to getblockcount() succeeds then the RPC connection is up
    11                 self.rpc_connected = True
    
  4. MarcoFalke commented at 2:48 pm on April 6, 2018: member

    Considering that we usually clean up the datadir when the test was successful, wouldn’t it make more sense to delete the cookie when it was left behind due to a crash? That would also limit the changes to feature_dbcrash, the only place where the fix would be needed, it seems… Just a thought, no strong opinion.

    Concept ACK

  5. sdaftuar commented at 2:58 pm on April 6, 2018: member

    @MarcoFalke I did consider that as well, also I considered changing the way we use the test framework so that if a cookie file is already present we instruct bitcoind to use it… None of these options seems all that great to me as this is pretty messy, but I figure that changing wait_for_rpc_connection is more robust (ie if other tests end up getting written that need something like this), versus special-casing cookie file deletion in a specific test (as that knowledge would likely be forgotten when a new test that needs it gets written).

    Edit: I should add – normally the cookie file is deleted by bitcoind when it shuts down cleanly, not the test framework (I think).

  6. in test/functional/test_framework/test_node.py:137 in 02b3c4af8a outdated
    133+                    # The code comes from authproxy.py, passing through the http
    134+                    # error. 'Unauthorized' can happen due to a race condition with
    135+                    # using an old cookie file which is overwritten when the
    136+                    # node starts up, so treat this the same as "No RPC
    137+                    # credentials" and do nothing.
    138+                    pass
    


    jnewbery commented at 8:13 pm on April 6, 2018:
    Perhaps log, since this is an unexpected condition?
  7. jnewbery commented at 8:13 pm on April 6, 2018: member
    Seems reasonable. If there really is an authorization failure, then wait_for_rpc_connection() will eventually timeout anyway.
  8. laanwj commented at 2:01 pm on April 9, 2018: member

    Considering that we usually clean up the datadir when the test was successful, wouldn’t it make more sense to delete the cookie when it was left behind due to a crash?

    That was my initial response to this as well. If the test framework makes sure that no cookie file is left behind, the race will not trigger in the first place, and we don’t have to special-case it.

    Edit: I should add – normally the cookie file is deleted by bitcoind when it shuts down cleanly, not the test framework (I think).

    Correct. This is done here: https://github.com/bitcoin/bitcoin/blob/master/src/rpc/server.cpp#L323

  9. [qa] Delete cookie file before starting node
    If a cookie file exists in a datadir prior to node startup, it must have
    been leftover from a prior unclean shutdown. As bitcoind will overwrite
    it anyway, delete it before starting up to prevent the test framework
    from inadvertently trying to connect using stale credentials.
    75d0e4c544
  10. sdaftuar force-pushed on Apr 9, 2018
  11. sdaftuar commented at 6:58 pm on April 9, 2018: member

    @laanwj @MarcoFalke I reworked this PR to delete cookie files, if present, as suggested. I avoided putting this logic directly in feature_dbcrash.py, since the authentication logic lives in the test framework (and the test is unaware of cookie files or RPC authentication in the first place), but let me know if you think I should move it.

    Original version of this PR is here for comparison: https://github.com/sdaftuar/bitcoin/commit/02b3c4af8a1e75a918363ee64c05a8966ad3dec2

    Also updated OP to account for new behavior.

  12. MarcoFalke commented at 7:21 pm on April 9, 2018: member
    utACK 75d0e4c5444249f0cf81ed0494e2c2717170b695
  13. jnewbery commented at 7:33 pm on April 9, 2018: member

    Tested ACK 75d0e4c5444249f0cf81ed0494e2c2717170b695.

    I’d be tempted to put the new functionality directly into TestNode.start(), since that’s the only place it gets called, but this is also fine.

  14. sdaftuar commented at 9:08 pm on April 9, 2018: member

    I’d be tempted to put the new functionality directly into TestNode.start(), since that’s the only place it gets called, but this is also fine.

    No strong opinion, but it looked like util.py is the only place in the test framework that currently knows where cookie files live exactly, so I figured might as well continue to restrict that knowledge to one area of the codebase.

  15. MarcoFalke merged this on Apr 9, 2018
  16. MarcoFalke closed this on Apr 9, 2018

  17. MarcoFalke referenced this in commit 727175a08d on Apr 9, 2018
  18. laanwj commented at 8:17 am on April 10, 2018: member
    post-utACK 75d0e4c5444249f0cf81ed0494e2c2717170b695
  19. fanquake referenced this in commit 75f593c4dd on Apr 18, 2018
  20. MarcoFalke referenced this in commit 69a6490302 on Apr 20, 2018
  21. MarcoFalke referenced this in commit 7460945e0b on Apr 20, 2018
  22. HashUnlimited referenced this in commit 4c5cc4c0de on May 13, 2018
  23. ccebrecos referenced this in commit a99c382df2 on Sep 14, 2018
  24. PastaPastaPasta referenced this in commit 3b1df7a914 on Mar 14, 2020
  25. PastaPastaPasta referenced this in commit 90b0ef74bb on Mar 19, 2020
  26. PastaPastaPasta referenced this in commit e356c7b307 on Mar 21, 2020
  27. PastaPastaPasta referenced this in commit 9b1ebdb161 on Mar 24, 2020
  28. ckti referenced this in commit c3c56f7672 on Mar 28, 2021
  29. gades referenced this in commit 7f8a5181bf on Jun 30, 2021
  30. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me