Test Nodes will sometimes not sync their blocks on native Windows #6554

issue ptschip openend this issue on August 13, 2015
  1. ptschip commented at 2:26 pm on August 13, 2015: contributor

    While getting the python tests to run on windows I came across a problem of the test nodes sometimes not syncing. I believe that’s why the Travis builds were failing when I submitted my pull request for enabling windows python tests.

    Here are the details as far as I can find. I have run these tests on Windows 7, both 32, 64 bit, with the current v11.0 and also the master branch all with the same results.

    1. When I run the wallet.py test, the nodes will not sync I would say roughly 30% of the time.
    2. They always fail at the same place, just after “Have node1 generate 100 blocks (so node0 can recover the fee)”.
    3. At this point one and sometimes two of the nodes will not sync their blocks
    4. I can’t do a network trace because wireshark will not trace on the loopback adapter on windows so all I have is the following from the log file shown below.

    On the node that doesn’t sync the only difference I can find in the logs is that just prior to the test failing there is always the same messasge in the logs that is not present in the logs of the other nodes. The message begins with “Reducing block download timeout”

    2015-08-13 14:14:24 Reducing block download timeout for peer=1 block=6ebed8c2c5be6942028da47f1ddbb99405e856ad7c12b0207a8b4647962255fa, orig=1439480964263430 new=1439476464388230 2015-08-13 14:14:24 Requesting block 5917b56ef19db11cc6ef2a6f44f026eab87f47bffb521d9402e119cfa289310d (172) peer=1 2015-08-13 14:14:24 sending: getdata (37 bytes) peer=1 2015-08-13 14:14:24 received: block (181 bytes) peer=1 2015-08-13 14:14:24 received block 6ebed8c2c5be6942028da47f1ddbb99405e856ad7c12b0207a8b4647962255fa peer=1 2015-08-13 14:14:24 Reducing block download timeout for peer=1 block=7c94201f537240a21e1b187c7f5ca96bb9a6d8f302de3961ddbaecf22dc3da19, orig=1439480964310230 new=1439476464403830 2015-08-13 14:14:24 Requesting block 76ec7cee8ad99c8bdae67e2c651c6bb05fd5ec68787e45c21964bfb77c7bde8e (173) peer=1 2015-08-13 14:14:24 sending: getdata (37 bytes) peer=1 2015-08-13 14:14:24 received: block (181 bytes) peer=1 2015-08-13 14:14:24 received block 7c94201f537240a21e1b187c7f5ca96bb9a6d8f302de3961ddbaecf22dc3da19 peer=1 2015-08-13 14:14:24 Reducing block download timeout for peer=1 block=2d85186ef03a7bfdabca3363e085a373b36bf0fc6b3e9e1da252fce7004aad6b, orig=1439480964310230 new=1439476464435031

  2. gavinandresen commented at 3:15 pm on August 13, 2015: contributor
    Run with –tracerpc to see why the sync is failing– probably one of the nodes has the wrong block count…
  3. ptschip commented at 4:10 pm on August 13, 2015: contributor

    i had already added some code to the python scripts to output the blockcounts on the nodes and indeed verified that one or two nodes sometimes has the wrong blockcount and won’t sync with the others.

    i’ll try running with tracderpc also to see if anything else pops up.

    On 13/08/2015 8:16 AM, Gavin Andresen wrote:

    Run with –tracerpc to see why the sync is failing– probably one of the nodes has the wrong block count…

    — Reply to this email directly or view it on GitHub #6554 (comment).

  4. ptschip commented at 5:51 pm on August 13, 2015: contributor

    I didn’t see anything extra with tracerpc

    however, i additionaly verifed that the problem is not within the python script by using the bitcoin-cli from the command line while the test was running to verify the blockcounts manually. They are correct, the block counts are at times not syncing.

    within the wallet.py script after the section

    Have node0 mine a block, thus it will collect its own fee.

    the block counts looks as follows:

    0node0, node1, node2
    1[103, 103, 103]
    

    then after mining the next 100 blocks the nodes begin to sync but sometimes one or two of them don’t and they almost always get hung up at 119 blocks…so the end result is:

    [203, 203, 119]

    On 13/08/2015 8:16 AM, Gavin Andresen wrote:

    Run with –tracerpc to see why the sync is failing– probably one of the nodes has the wrong block count…

    — Reply to this email directly or view it on GitHub #6554 (comment).

  5. ptschip commented at 6:01 pm on August 13, 2015: contributor

    Here’s something that looks interesting. When i do a “getinfo” on the node that apparently is not syncing, the number of blocks received under “errors” shows received blocks as 203 which is correct however the number of blocks only shows as 119. The other two nodes in the the test show the correct 203 blocks for both cases.

    $ bitcoin-cli -rpcuser=rt -rpcpassword=rt -rpcport=12421 getinfo { “version”: 119900, “protocolversion”: 70002, “walletversion”: 60000, “balance”: 21.00000000, “blocks”: 119, “timeoffset”: 0, “connections”: 4, “proxy”: “”, “difficulty”: 4.656542373906925e-010, “testnet”: false, “keypoololdest”: 1439487755, “keypoolsize”: 2, “paytxfee”: 0.00000000, “relayfee”: 0.00001000, “errors”: “WARNING: abnormally high number of blocks generated, 203 blocks rec eived in the last 4 hours (24 expected)” }

    On 13/08/2015 8:16 AM, Gavin Andresen wrote:

    Run with –tracerpc to see why the sync is failing– probably one of the nodes has the wrong block count…

    — Reply to this email directly or view it on GitHub #6554 (comment).

  6. ptschip commented at 6:56 pm on August 13, 2015: contributor

    Looks like the node does eventually sync, i was only letting my tests run about 10 mins before killing them but it seems that somewhere between 14 and 20 minutes the node seems to do what it’s supposed to do and sync. But why it’s stopping and then suddenly starting up again is a mystery since there is nothing in the logs other than received headers showing up.

    So to sum it up:

    1. Some nodes seem to hang while running the wallet.py test
    2. The hung node shows the incorrect block count but no information in the error/warning message in “getinfo”
    3. After 10 mins: the hung node shows the incorrect block count however the error/warning message in “getinfo” shows that all blocks were received
    4. The hung node starts processing again after 14 minutes or in some cases almost 20 mins

    The following is the part of the log where the hung node starts to process again…nothing really interesting there.

    2015-08-13 18:17:57 ThreadRPCServer method=getblockcount 2015-08-13 18:17:58 ThreadRPCServer method=getblockcount 2015-08-13 18:17:59 ThreadRPCServer method=getblockcount 2015-08-13 18:18:00 ThreadRPCServer method=getblockcount 2015-08-13 18:18:01 ThreadRPCServer method=getblockcount 2015-08-13 18:18:02 ThreadRPCServer method=getblockcount 2015-08-13 18:18:03 ThreadRPCServer method=getblockcount 2015-08-13 18:18:04 ThreadRPCServer method=getblockcount 2015-08-13 18:18:05 ThreadRPCServer method=getblockcount 2015-08-13 18:18:06 ThreadRPCServer method=getblockcount 2015-08-13 18:18:07 ThreadRPCServer method=getblockcount 2015-08-13 18:18:08 ThreadRPCServer method=getblockcount 2015-08-13 18:18:09 ThreadRPCServer method=getblockcount 2015-08-13 18:18:10 ThreadRPCServer method=getblockcount 2015-08-13 18:18:11 ThreadRPCServer method=getblockcount 2015-08-13 18:18:12 ThreadRPCServer method=getblockcount 2015-08-13 18:18:13 ThreadRPCServer method=getblockcount 2015-08-13 18:18:14 ThreadRPCServer method=getblockcount 2015-08-13 18:18:15 ThreadRPCServer method=getblockcount 2015-08-13 18:18:17 ThreadRPCServer method=getblockcount 2015-08-13 18:18:18 ThreadRPCServer method=getblockcount 2015-08-13 18:18:19 ThreadRPCServer method=getblockcount 2015-08-13 18:18:20 ThreadRPCServer method=getblockcount 2015-08-13 18:18:20 received: headers (1378 bytes) peer=2 2015-08-13 18:18:20 received: headers (1459 bytes) peer=2 2015-08-13 18:18:20 received: headers (1540 bytes) peer=2 2015-08-13 18:18:20 received: headers (1621 bytes) peer=2

  7. ptschip commented at 10:46 pm on August 13, 2015: contributor

    Lastly there is a workaround, at least as far as the python tests are concerned

    It appears that if one creates just 50 blocks or less at a time and then sync’s the nodes then the scripts can move forward without problems.
    But create 80 or 100 blocks and the nodes can then get hung for a while, up to 20 mins before syncing. (this hanging of the node is happening when running wallet.py and also walletbackup.py )

  8. gavinandresen commented at 10:55 pm on August 13, 2015: contributor

    Very odd that syncing long chains is misbehaving on Windows – that’s definitely a bug.

    Unfortunately, I don’t think we have any developers with a very deep knowledge of the syncing process AND a Windows development environment in which they might be able to debug the problem.

  9. sipa commented at 10:58 pm on August 13, 2015: member
    Such a long delay sounds like the block download tineout triggering, causing it to be requestes from another peer? Just a guess.
  10. jonasschnelli commented at 6:45 am on August 14, 2015: contributor
    If wallet.py and walletbackup.py are breaking from time to time it could be because both tests do a shutdown and restart of (at least one) node (https://github.com/bitcoin/bitcoin/blob/master/qa/rpc-tests/wallet.py#L182). I once also had issues with the wait_bitcoinds() command.
  11. ptschip commented at 7:06 am on August 14, 2015: contributor

    the problem happens early on and before any shuydown/restart.

    On 13/08/2015 11:45 PM, Jonas Schnelli wrote:

    If wallet.py and walletbackup.py are breaking from time to time it could be because both tests do a shutdown and restart of (at least one) node (https://github.com/bitcoin/bitcoin/blob/master/qa/rpc-tests/wallet.py#L182). I once also had issues with the wait_bitcoinds() command.

    — Reply to this email directly or view it on GitHub #6554 (comment).

  12. ptschip referenced this in commit a93578b6f9 on Aug 15, 2015
  13. jonasschnelli commented at 3:31 pm on August 17, 2015: contributor
    @ptschip: did you already try to see if some informations are present in the nodes debug.log(s)?
  14. ptschip commented at 4:02 pm on August 17, 2015: contributor

    yes, if you read through the bug report you’ll see all that…On 17/08/2015 8:31 AM, Jonas Schnelli wrote:

    @ptschip https://github.com/ptschip: did you already try to see if some informations are present in the nodes debug.log(s)?

    — Reply to this email directly or view it on GitHub #6554 (comment).

  15. ptschip referenced this in commit 0a94be1c8d on Aug 18, 2015
  16. laanwj added the label Tests on Sep 4, 2015
  17. MarcoFalke commented at 11:07 pm on September 13, 2015: member
    @ptschip Is the issue fixed in current master?
  18. ptschip commented at 1:50 am on September 14, 2015: contributor

    After comiling with libevent 2.0.22, i ran the wallet.py test 10 times and nodes failed to sync on five of the tests, always at the same place as before: where we mine 100 blocks and then try to sync.

    Unfortunately, Libevent most definitely does not fix this issue on Native Windows

    On 13/09/2015 4:07 PM, MarcoFalke wrote:

    @ptschip https://github.com/ptschip Is the issue fixed in current master?

    — Reply to this email directly or view it on GitHub #6554 (comment).

  19. laanwj commented at 11:14 am on April 28, 2016: member
    There are plans for using it, but libevent has nothing to do with the P2P protocol implementation at the moment, so it makes sense that that didn’t change anything.
  20. MarcoFalke added the label Windows on May 15, 2016
  21. Sjors commented at 6:52 pm on March 16, 2018: member
    Is this still an issue?
  22. jnewbery commented at 6:59 pm on March 28, 2018: member
    Functional tests and bitcoind code has changed completely in the last 2.5 years, so none of the history in this issue is likely to be useful. I suggest we close this (and open a new issue if functional tests fail on windows). @MarcoFalke - thoughts?
  23. MarcoFalke commented at 7:06 pm on March 28, 2018: member
    The underlying issue might still be there. But it would be indeed nice to have recent stack traces and steps to reproduce on current master
  24. MarcoFalke commented at 7:06 pm on March 28, 2018: member
  25. MarcoFalke closed this on Apr 11, 2018

  26. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-08 22:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me