test: fix intermittent wallet_encryption failures on win64 task #25021

pull jonatack wants to merge 3 commits into bitcoin:master from jonatack:fix-intermittent-wallet_encryption-test changing 3 files +41 −38
  1. jonatack commented at 10:32 pm on April 28, 2022: contributor
  2. jonatack force-pushed on Apr 28, 2022
  3. jonatack force-pushed on Apr 28, 2022
  4. jonatack force-pushed on Apr 28, 2022
  5. jonatack force-pushed on Apr 28, 2022
  6. DrahtBot added the label Tests on Apr 28, 2022
  7. mzumsande commented at 0:11 am on April 29, 2022: contributor
    Strange that this happens twice in a day suddenly, without any recent changes that look related. The linked CI runs fail at different subtests, one at (“Check a timeout less than the limit”) and one at (“Check a timeout greater than the limit”) - so a fix should probably also affect both.
  8. MarcoFalke commented at 5:38 am on April 29, 2022: member
    I can’t see how this would happen unless the python clock is ahead of the system clock??
  9. jonatack force-pushed on Apr 29, 2022
  10. jonatack commented at 12:27 pm on April 29, 2022: contributor

    Yeah, it’s weird. Win64-only so far. Let’s see if there are more failures.

    Thanks @mzumsande, re-pushed with a smaller additional margin for both tests.

  11. jonatack force-pushed on Apr 29, 2022
  12. jonatack commented at 4:35 pm on April 29, 2022: contributor

    Another one: https://cirrus-ci.com/task/4631236957175808

    0  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\wallet_encryption.py", line 93, in run_test
    1    assert_greater_than(expected_time_with_buffer, actual_time)
    2  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\util.py", line 56, in assert_greater_than
    3    raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    4
    5AssertionError: 1751237054.993892 <= 1751237055
    
  13. MarcoFalke added the label Windows on Apr 29, 2022
  14. jonatack renamed this:
    test: fix intermittent wallet_encryption functional test
    test: fix intermittent wallet_encryption failures on win64 task
    on Apr 29, 2022
  15. bitcoin deleted a comment on Apr 29, 2022
  16. bitcoin deleted a comment on Apr 29, 2022
  17. jonatack commented at 2:15 pm on May 3, 2022: contributor
  18. vasild commented at 4:27 am on May 6, 2022: contributor

    I think a possible explanation would be if the GetTime() in the C++ code on Windows ends up rounding up to a whole second. So, for example:

    1. expected time = int(10.5) + MAX = 10 + MAX
    2. 0.1 second passes
    3. walletpassphrase RPC is executed, asking it to keep the phrase for MAX seconds, it calculates now=ceil(10.6) + MAX = 11 + MAX // ceil() wtf!?
    4. 0.1 second passes
    5. expected with buffer = 10.7 + MAX // not truncated to whole seconds

    So, we end up with actual time (11 + MAX) < expected with buffer (10.7 + MAX).

    If this guess is correct, then the following should fix it:

    0before = math.floor(time.time()) + MAX_VALUE
    1self.nodes[0].walletpassphrase(passphrase2, MAX_VALUE + 1000) 
    2after = math.ceil(time.time()) + MAX_VALUE 
    3actual = self.nodes[0].getwalletinfo()['unlocked_until'] 
    4assert_greater_than_or_equal(actual, before)
    5assert_greater_than_or_equal(after, actual)
    
  19. jonatack force-pushed on May 10, 2022
  20. jonatack commented at 11:53 am on May 10, 2022: contributor
    @vasild done (thanks for looking into this) and added a commit for test logging (and named args and an assertion).
  21. in test/functional/wallet_encryption.py:76 in 2178ad2e79 outdated
    88-        self.nodes[0].walletpassphrase(passphrase2, MAX_VALUE + 1000)
    89-        expected_time_with_buffer = time.time() + MAX_VALUE
    90-        actual_time = self.nodes[0].getwalletinfo()['unlocked_until']
    91-        assert_greater_than_or_equal(actual_time, expected_time)
    92-        assert_greater_than(expected_time_with_buffer, actual_time)
    93+        timeout_limit = 100000000  # maximum time to keep the decryption key, in seconds (~3 years)
    


    vasild commented at 6:21 am on May 11, 2022:
    nit: maybe mention that this is the same as MAX_SLEEP_TIME in src/wallet/rpc/encrypt.cpp.

    jonatack commented at 3:15 pm on May 12, 2022:
    Done – seems best to avoid specifying the file to reduce future maintainance but added the constant name and documentation where it is declared; good idea.
  22. vasild approved
  23. vasild commented at 6:24 am on May 11, 2022: contributor
    ACK 76f8dc9bd1b4efd543dc54c17a5f8e9d512580e1
  24. jonatack force-pushed on May 12, 2022
  25. MarcoFalke commented at 3:25 pm on May 12, 2022: member
    Is this still an issue? I haven’t seen it recently.
  26. jonatack commented at 3:30 pm on May 12, 2022: contributor
    Haven’t seen it in a week. Could drop the fix for now and leave the improvements.
  27. ryanofsky commented at 9:34 pm on May 12, 2022: contributor

    Just saw this error: https://cirrus-ci.com/task/4948737314258944?logs=functional_tests#L25 in #24230

    Log:

     0 node0 2022-05-12T20:03:02.859261Z [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\rpc\request.cpp:179] [parse] ThreadRPCServer method=walletpassphrase user=__cookie__ 
     1 node0 2022-05-12T20:03:03.012258Z [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\rpc\server.cpp:530] [RPCRunLater] queue run of timer lockwallet(default_wallet) in 99999400 seconds (using HTTP) 
     2 test  2022-05-12T20:03:03.014000Z TestFramework (ERROR): Assertion failed 
     3                                   Traceback (most recent call last):
     4                                     File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 133, in main
     5                                       self.run_test()
     6                                     File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\wallet_encryption.py", line 85, in run_test
     7                                       assert_greater_than(expected_time_with_buffer, actual_time)
     8                                     File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\util.py", line 56, in assert_greater_than
     9                                       raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    10                                   AssertionError: 1752385182.999136 <= 1752385183
    
  28. MarcoFalke commented at 5:33 am on May 13, 2022: member
    Ok, then I think we should only maim the test on windows, not on all platforms.
  29. vasild approved
  30. vasild commented at 9:11 am on May 13, 2022: contributor

    ACK 1a82609ce078e2d3e5e242d85b275b950fd6fb8f

    Why “maim”? The test functionality if fully preserved, no?

    cc @achow101, author of these tests

  31. MarcoFalke commented at 9:37 am on May 13, 2022: member

    Why “maim”? The test functionality if fully preserved, no?

    Sure, “maim” is a bit too strong. It is making the test lax by up to another second, while no one understands why.

  32. MarcoFalke commented at 1:03 pm on May 13, 2022: member

    Would be nice if someone could run this on msvc:

     0#include <iostream>
     1#include <chrono>
     2#include <string>
     3 
     4template <typename TimePoint>
     5std::string to_string(const TimePoint& time_point)
     6{
     7    return std::to_string(time_point.time_since_epoch().count());
     8}
     9 
    10int main()
    11{
    12    using namespace std::literals::chrono_literals;
    13    using Sec = std::chrono::seconds;
    14 
    15    std::cout << "Time point\t" "Cast\t" "Floor\t" "Round\t" "Ceil\n";
    16    std::cout << "(ms)\t\t"     "(s)\t"  "(s)\t"   "(s)\t"   "(s)\n";
    17    for (const auto value_ms: {5432ms, 5678ms}) {
    18        std::chrono::time_point<std::chrono::system_clock, std::chrono::milliseconds>
    19            time_point_ms(value_ms);
    20 
    21        std::cout
    22            << to_string(time_point_ms) << "\t\t"
    23            << to_string(std::chrono::time_point_cast<Sec>(time_point_ms)) << "\t"
    24            << to_string(std::chrono::floor          <Sec>(time_point_ms)) << "\t"
    25            << to_string(std::chrono::round          <Sec>(time_point_ms)) << "\t"
    26            << to_string(std::chrono::ceil           <Sec>(time_point_ms)) << "\n";
    27    }
    28}
    

    From https://en.cppreference.com/w/cpp/chrono/time_point/ceil. Should print:

    0Time point	Cast	Floor	Round	Ceil
    1(ms)		(s)	(s)	(s)	(s)
    25432		5	5	5	6
    35678		5	5	6	6
    
  33. vasild commented at 1:26 pm on May 13, 2022: contributor

    It is making the test lax by up to another second

    That is true, but the point of the test is to ensure that the time limit is honored - 1. cannot be negative, 2. is capped at some limit and 3. if within the limits, then is accepted. The original test in 134cdc7cee3da7c554e40ad947a9cdcbb3069f13 contained 5 seconds buffer, so the point of the test is not to check that the value is accepted up to the second, exactly.

  34. in test/functional/wallet_encryption.py:86 in 1a82609ce0 outdated
    111+        timeout = timeout_limit - 600
    112+        before = floor(time.time()) + timeout
    113+        self.nodes[0].walletpassphrase(passphrase2, timeout)
    114+        after = ceil(time.time()) + timeout
    115+        actual = self.nodes[0].getwalletinfo()['unlocked_until']
    116+        assert before <= actual <= after
    


    vasild commented at 1:28 pm on May 13, 2022:

    Would this print the values if the assertion fails?

     0>>> a=1
     1>>> b=2
     2>>> c=3
     3>>> assert a <= b <= c
     4>>> 
     5>>> c=0
     6>>> assert a <= b <= c
     7Traceback (most recent call last):
     8  File "<stdin>", line 1, in <module>
     9AssertionError
    10>>> 
    

    If not, then maybe revert back to using assert_greater_than() because seeing the values in CI logs is so precious in debugging this.


    jonatack commented at 3:05 pm on August 10, 2022:
    Good point. Updated the assertion error to print i.e. AssertionError: walletpassphrase "unlocked_until" value of 1760141911 not in expected range of 1760141909..1760141910
  35. MarcoFalke commented at 1:43 pm on May 13, 2022: member

    so the point of the test is not to check that the value is accepted up to the second, exactly.

    Right. I don’t care about this test. nRelockTime is only there for aesthetic reasons, it can’t be relied upon anyway, since it doesn’t take into account how long it takes to register the event in libevent. (If that takes 5 seconds, the value is off by 5 seconds)

    I mostly worry the clock on windows doesn’t work as expected and it will lead to more severe issues down the line (extremely unlikely, but worst case a consensus split)

  36. vasild commented at 1:54 pm on May 13, 2022: contributor

    Ok, in case it helps, here is another test prog I played with on my computer (not windows):

    0int main(int, char**)
    1{
    2    const auto now = std::chrono::system_clock::now().time_since_epoch();
    3    std::cout << std::chrono::duration_cast<std::chrono::seconds>(now).count() << std::endl
    4              << std::chrono::duration_cast<std::chrono::milliseconds>(now).count() << std::endl;
    5    return 0;
    6}
    

    It prints, for example:

    01652449734
    11652449734758
    

    which confirms my implementation is rounding down, as in the spec:

    casting to an integer duration is subject to truncation as with any static_cast to an integer type

    PS: even if the problem is elsewhere I think the changes in this PR make the test more robust and readable.

  37. MarcoFalke commented at 1:59 pm on May 13, 2022: member
    cc @sipsorcery Could you please try #25021 (comment) on msvc?
  38. sipsorcery commented at 1:37 pm on May 15, 2022: member

    Visual Studio 2022 v17.1.6 9 (C++ compiler version 19.31.31107 for x86)

    0Time point      Cast    Floor   Round   Ceil                                                                          
    1(ms)            (s)     (s)     (s)     (s)                                                                             
    25432            5       5       5       6                                                                             
    35678            5       5       6       6
    
  39. in test/functional/wallet_encryption.py:84 in 1a82609ce0 outdated
    109+
    110+        self.log.info("Test walletpassphrase with a timeout less than the limit")
    111+        timeout = timeout_limit - 600
    112+        before = floor(time.time()) + timeout
    113+        self.nodes[0].walletpassphrase(passphrase2, timeout)
    114+        after = ceil(time.time()) + timeout
    


    MarcoFalke commented at 5:12 pm on May 15, 2022:

    This doesn’t seem right. If the Bitcoin Core clock was doing ceil, we’d see a failure rate of about 100%. round seems unlikely as well, as the failure rate would be about 50%.

    Statistically, the error on windows should be 10-100ms, which can’t be explained by cast/round/ceil.


    vasild commented at 1:53 pm on May 19, 2022:
    Well, this just ignores sub-second fractions and accounts for the time passed between the calls, keeping the original intention of the test (I hope!). Do you think it could still fail on Windows?
  40. MarcoFalke commented at 9:17 am on May 20, 2022: member
    An alternative to this would be to use mocktime to set the exact seconds and then check the exact result. (Obviously mocktime couldn’t shorten or extend the actual locking duration, but the test doesn’t check that anyway).
  41. MarcoFalke commented at 9:24 am on May 20, 2022: member
    (I still don’t know what the correct fix is, just handing out thoughts fyi)
  42. jonatack commented at 3:41 pm on May 20, 2022: contributor

    Another one https://cirrus-ci.com/task/5010386603737088

     0176/244 - wallet_encryption.py --legacy-wallet failed, Duration: 6 s
     1stdout:
     22022-05-20T15:08:20.442000Z TestFramework (INFO): Initializing test directory C:\Users\ContainerAdministrator\AppData\Local\Temp\test_runner_₿_🏃_20220520_141819\wallet_encryption_66
     32022-05-20T15:08:25.754000Z TestFramework (INFO): Check a timeout less than the limit
     42022-05-20T15:08:25.879000Z TestFramework (INFO): Check a timeout greater than the limit
     52022-05-20T15:08:26.006000Z TestFramework (ERROR): Assertion failed
     6Traceback (most recent call last):
     7  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\test_framework.py", line 133, in main
     8    self.run_test()
     9  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\wallet_encryption.py", line 93, in run_test
    10    assert_greater_than(expected_time_with_buffer, actual_time)
    11  File "C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\test\functional\test_framework\util.py", line 56, in assert_greater_than
    12    raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    13AssertionError: 1753059305.991002 <= 1753059306
    142022-05-20T15:08:26.069000Z TestFramework (INFO): Stopping nodes
    
  43. jonatack commented at 12:15 pm on May 25, 2022: contributor
  44. jonatack commented at 6:30 pm on May 27, 2022: contributor
  45. vasild commented at 1:03 pm on July 13, 2022: contributor
    I think this is an improvement over the current master, even not considering the test failures.
  46. MarcoFalke commented at 11:31 am on July 19, 2022: member
    Is this still happening after the windows upgrade?
  47. MarcoFalke commented at 1:40 pm on July 25, 2022: member
    I haven’t seen it yet, so it indeed seems like a bug in Windows 2019
  48. jonatack commented at 8:11 am on July 27, 2022: contributor
  49. vasild commented at 4:24 am on July 29, 2022: contributor
  50. jonatack commented at 2:53 pm on July 30, 2022: contributor
    Seems this won’t be merged.
  51. jonatack closed this on Jul 30, 2022

  52. MarcoFalke commented at 3:05 pm on July 30, 2022: member

    The reason I haven’t merged this yet is that it has only one ACK, but outstanding review comments such as #25021 (review) and #25021 (review)

    Also, I am not sure if a test fix should modify the cpp source code. No objection to those changes, but maybe try to submit them separate from a test fix?

  53. vasild commented at 12:32 pm on August 10, 2022: contributor

    The reason for closing - “this won’t be merged” seems unjustified to me. If Marco did not merge it in the past, that does not mean he will not merge it in the future. Or that another maintainer will not merge it in the future.

    Indeed it has just one ACK. @achow101, maybe you can review this, since you are the original author of the tests.

    I think this PR should be reopened, IMO it is an improvement even if it does not fix the mysterious test failures.

  54. MarcoFalke commented at 12:35 pm on August 10, 2022: member
    For reference, I created #25750 two weeks ago
  55. jonatack commented at 2:23 pm on August 10, 2022: contributor

    I think this PR should be reopened, IMO it is an improvement even if it does not fix the mysterious test failures.

    Ok, re-opening.

    Note also that many test changes (and regular codebase ones, #25656 would be an example merged today) are merged with one ACK, so this doesn’t seem to be a hard rule.

  56. jonatack reopened this on Aug 10, 2022

  57. jonatack commented at 2:23 pm on August 10, 2022: contributor

    For reference, I created #25750 two weeks ago

    I’m happy to update this PR with suggestions for improvements or a different approach.

  58. Work around intermittent Win failure in walletpassphrase tests
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    baa739e845
  59. Document and tidy up RPC walletpassphrase MAX_SLEEP_TIME/nSleepTime
    note: using std::min here shouldn't add to the compilation time of
    src/wallet/rpc/encrypt.cpp, as src/wallet/wallet.h that it
    depends on already includes the algorithm library.
    
    Also added this file to the clang-tidy include-what-you-use CI script
    and updated the headers.
    8ad6ad8303
  60. Add logging and an assertion to wallet_encryption tests and use named args 189f57bd03
  61. jonatack force-pushed on Aug 10, 2022
  62. jonatack commented at 3:08 pm on August 10, 2022: contributor
    Updated to improve assertion error, changed the first commit message from “fix” to “work around” and rebased to current master to enable clang-tidy iwyu checks on the second commit (will re-push after the clang-tidy CI run). Happy to split out the some of changes to a separate pull or to update the approach if reviewers prefer.
  63. fanquake commented at 3:20 pm on August 10, 2022: member

    https://github.com/bitcoin/bitcoin/pull/25021/files#r873198183

    Has this been is addressed? If yes, can you mark it as resolved, if not, can you comment?

    changed the first commit message from “fix” to “work around”

    If this is now a work around, rather than a fix, I’d rather use the work-around that is contained to the platform that is broken, i.e #25750, rather than refactoring code that works for all other platforms, and essentially hiding the bug on Windows. At least until we properly understand the issue; see debugging in #25482.

    The PR title and description need updating in any case.

  64. MarcoFalke commented at 5:00 pm on August 10, 2022: member
    Can you explain why the workaround here is better than the one in #25750?
  65. jonatack commented at 5:04 pm on August 10, 2022: contributor

    Can you explain why the workaround here is better than the one in #25750?

    I’m happy to update this PR with suggestions for improvements or a different approach.

    ATM am looking into the discussion in #25482.

  66. DrahtBot commented at 1:40 pm on August 11, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25878 (tests: Use mocktime for wallet encryption timeout by achow101)
    • #25750 (test: Temporarily work around Windows bug in walletpassphrase by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  67. achow101 commented at 6:21 pm on August 15, 2022: member
    It seems like instead of messing around with buffers and relaxing ranges, this could just be resolved through the use of mock time so that bitcoind is set to a fixed time.
  68. jonatack commented at 6:25 pm on August 15, 2022: contributor
    I’ll give that a shot 👍
  69. MarcoFalke commented at 5:23 pm on August 18, 2022: member

    For reference, I suggested mocktime in May: #25021 (comment)

    This should be fine to do here, as the time returned has no guarantee to be related to the real relock time, which happens completely independent in the libevent scheduler thread.

  70. jonatack closed this on Aug 20, 2022

  71. bitcoin locked this on Aug 20, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me