Disk IO > 90% on when running 0.13.1 on Ubuntu #9051

issue vogelito opened this issue on October 31, 2016
  1. vogelito commented at 11:57 PM on October 31, 2016: none

    <!--- Remove sections that do not apply -->

    This issue tracker is only for technical issues related to bitcoin-core.

    General bitcoin questions and/or support requests and are best directed to the Bitcoin StackExchange.

    Describe the issue

    I upgraded to 0.13.1 on Oct 29 @ 03:10:23 UTC. This machine runs a dedicated bitcoin node and nothing else, except a small process that queries the node for certain transaction data. Ever since upgrading, Disk IO is through the roof. Prior to the upgrade, this had never been a problem. We were running 0.13.0 prior to upgrade.

    Can you reliably reproduce the issue?

    If so, please list the steps to reproduce below:

    1. Start bitcoind
    2. Let it run
    3. Disk I/O goes through the roof

    Expected behaviour

    Disk IO shouldn't be consistently so high

    Actual behaviour

    Disk IO is through the roof

    Screenshots.

    These are all from the monitoring software of where the machine runs. Using iotop (below) we found the culprit is bitcoind If the issue is related to the GUI, screenshots can be added to this issue via drag & drop. <img width="776" alt="screen shot 2016-10-31 at 5 46 37 pm" src="https://cloud.githubusercontent.com/assets/1325863/19875452/71f0a4ac-9f92-11e6-883d-70621a1f0a14.png"> <img width="771" alt="screen shot 2016-10-31 at 5 46 49 pm" src="https://cloud.githubusercontent.com/assets/1325863/19875453/71f98c7a-9f92-11e6-8ca4-a01b5f953a52.png">

    What version of bitcoin-core are you using?

    List the version number/commit ID, and if it is an official binary, self compiled or a distribution package such as PPA. Bitcoin core 0.13.1 as downloaded from bitcoin.org.

    Machine specs:

    • OS: Linux version 3.13.0-87-generic (buildd@lgw01-25) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016
    • CPU: 4x 2.3 GHz Intel Xeon E5-2686 v4
    • RAM: 16GB
    • Disk size: 252G
    • Disk Type (HD/SDD): SSD

    Any extra information that might be useful in the debugging process.

    This is normally the contents of a debug.log or config.log file. Raw text or a link to a pastebin type site are preferred.

    From iotop:

    Total DISK READ :     894.08 K/s | Total DISK WRITE :       0.00 B/s
    Actual DISK READ:     894.08 K/s | Actual DISK WRITE:       3.96 K/s
      TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                      
    14915 be/4 root      894.08 K/s    0.00 B/s  0.00 % 98.49 % bitcoind -datadir=/root/.bitcoin/ [bitcoin-msghand]
        1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    
  2. unsystemizer commented at 9:05 AM on November 1, 2016: contributor

    2 MB/s and 25 IOPS can hardly be described as "through the roof". For an SSD that workload is trivial and "I/O utilization" is certainly not anywhere close to 100% of what it can do. It may be close to 100% of what's going on on the system, but since it's a dedicated full node, that's to be expected.

  3. rebroad commented at 1:05 PM on November 1, 2016: contributor

    Perhaps it's worth including in the documentation what typical usage for bitcoind looks like. @vogelito what did you use to graph the I/O usage etc? I can use the same tool to compare with my node.

  4. unsystemizer commented at 4:23 PM on November 1, 2016: contributor

    1MB/s (as seen in that iotop output above) is not untypical. Documentation on reducing network utilization for users who want to use less network resources exists: https://github.com/bitcoin/bitcoin/blob/0.13/doc/reduce-traffic.md

  5. vogelito commented at 1:50 AM on November 2, 2016: none

    @rebroad i'm using new relic.

    I downgraded to 0.13.0 and the problem is gone: <img width="686" alt="screen shot 2016-11-01 at 7 44 52 pm" src="https://cloud.githubusercontent.com/assets/1325863/19913879/fa7557b2-a06b-11e6-9203-ab6b0998a717.png">

    Basically I upgraded to 0.13.1 on Oct 29 @ 03:10:23 UTC and you see the spikes. I then downgraded to 0.13.0 on Nov 1 @ 00:05:11 UTC

    Let me know if I can provide any additional info. @unsystemizer sorry for the use of the hyperbole, I just had never seen disk usage that high.

  6. rebroad commented at 3:34 AM on November 2, 2016: contributor

    @vogelito remember, correlation does not imply causation. Nevertheless, I will look into this :)

  7. vogelito commented at 3:36 AM on November 2, 2016: none

    Sounds good @rebroad - let me know if I can help in any way.

  8. sipa commented at 3:39 AM on November 2, 2016: member

    I assume this is just due to the preferential peering. 0.13.1 is witness-enabled, and other 0.13.1 nodes will preferentially connect to you if you're reachable to synchronize from. This effect will likely reduce once more 0.13.1 nodes appear.

  9. unsystemizer commented at 3:39 AM on November 2, 2016: contributor

    @vogelito like I said you may have had couple of peers who were downloading blocks at the time, now they're done or you got different peers. It's not due to suddenly added disk, but simply a network (hence bitcoin-msghand) workload that pulls data from the disk. You can google it to see what network traffic others get with full node, for any version. Because they get that much or more, that's why the docs about "saving" bandwidth were added.

  10. rebroad commented at 6:09 AM on November 2, 2016: contributor

    @sipa why is this preferential peering already happening? SegWit hasn't activated yet - I'd have thought this didn't need to happen until activation.

  11. vogelito commented at 1:49 AM on November 3, 2016: none

    Since downgrading to 0.13.0 the disk IO is back to normal.

    I just upgraded again to 0.13.1 (Nov 3 01:46:13 UTC 2016).

    Will report with findings.

  12. TheBlueMatt commented at 2:06 AM on November 3, 2016: member

    We've had several reports that appear to indicate there is an abnormal number of users installing new full nodes using 0.13.1, which, due to preferential peering, causes more load on other 0.13.1 nodes than on older nodes (even my public node had very significant bandwidth usage for this reason). This appears to be other nodes downloading the chain from you, which selected your node due to preferential peering (which exists to ensure the network of SegWit nodes is strongly connected long before segwit activates, so that we can see any issues long before they happen).

    On November 2, 2016 9:49:59 PM EDT, Daniel Vogel notifications@github.com wrote:

    Since downgrading to 0.13.0 the disk IO is back to normal.

    I just upgraded again to 0.13.1 (Nov 3 01:46:13 UTC 2016).

    Will report with findings.

    You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #9051 (comment)

  13. rebroad commented at 8:43 AM on November 3, 2016: contributor

    I have also noticed that the latency on my node has "gone through the roof" also since switching to 13.1, I am considering going back to 13.0 for this reason alone.

    I will raise an issue for the root cause of this, which is the preferential peering, which should not occur until SegWit reaches the 95% consensus, surely.

  14. rebroad commented at 10:30 AM on November 3, 2016: contributor

    I raised #9072 to address the root cause issue, but as is often the case, it got closed before any real evaluation of its merits could occur... For something supposedly decentralized it is a shame that effective censorship continues to happen to this extent :(

  15. MarcoFalke commented at 11:13 AM on November 3, 2016: member

    @rebroad We don't close issues when they are related to Bitcoin Core and when it is clear the author put some time and thought into the issue and reported them with good intentions. However, when the title is suggesting that the only intent is to troll, the only sensible choice is to close them. (E.g. "Has SegWit activated?", which is btw known as one way to ruin Open Source projects. See: http://opensoul.org/99ways/#23 "Ask lazy questions.") You can find further examples of how to ruin Open Source projects in those slides.

  16. sipa commented at 3:39 PM on November 3, 2016: member

    @rebroad You've been told over and over again to take these questions to stackexchange instead of creating issues.

  17. rebroad commented at 3:15 AM on November 4, 2016: contributor

    @sipa just because an issue includes a question does not make it not an issue. @MarcoFalke I admit it was a badly chosen issue title, but this doesn't make it a non-issue just because the title was in the form of a question.

  18. rebroad referenced this in commit f6e0bcd2e3 on Nov 4, 2016
  19. rebroad referenced this in commit f8b3ff38fb on Nov 4, 2016
  20. rebroad referenced this in commit 3f8e6e7be7 on Nov 4, 2016
  21. rebroad referenced this in commit 61786e6951 on Nov 5, 2016
  22. rebroad referenced this in commit 2cdeb5500e on Nov 5, 2016
  23. vogelito commented at 7:09 PM on November 5, 2016: none

    Here's a quick update. You can see increased IO activity as soon as I upgraded to 0.13.1, but it actually took roughly 48hrs until the activity became significant (just like last time).

    I will leave this running on 0.13.1 for a few days to see if IO comes down as more nodes are up, but I see we are already at 20%+ nodes running bitcoin-0.13.1 and this is still an issue.

    <img width="771" alt="screen shot 2016-11-05 at 1 06 11 pm" src="https://cloud.githubusercontent.com/assets/1325863/20032867/2269ada0-a359-11e6-8d21-aa8d99cc16d4.png">

  24. rebroad referenced this in commit ed9fa74cff on Nov 6, 2016
  25. rebroad referenced this in commit e78d28738c on Nov 6, 2016
  26. rebroad commented at 3:57 AM on November 6, 2016: contributor

    @vogelito yes, this is because it takes a while for the word-of-mouth address database of each node to learn that you are advertising NODE_WITNESS (due to #8937 ). Also partly due to #8949 all 0.13.1 nodes almost exclusively connect only to other 0.13.1 nodes, which now means that 0.13.1 nodes are being tasked with more than previously (as previously the load was shared among all nodes). #9082 should reduce the impact of this significantly, but it has not been merged yet (please feel free to ACK it).

    In the meantime, if you would like your node to get reduced attention, and yet be ready for SegWit when it actives, I believe https://github.com/rebroad/bitcoin/tree/SilentWitness might be what will help - it effectively keeps quiet about it's Witness capability right up until SegWit activates, and then it happily will declare itself Witness capable. SilentWitness is simply a kludge to avoid the impact of #8949 and not a proper solution - I think #9082 is the correct way to fix it.

  27. rebroad referenced this in commit e749d9f05f on Nov 6, 2016
  28. rebroad referenced this in commit 050f44836e on Nov 6, 2016
  29. rebroad referenced this in commit 226c2e348d on Nov 6, 2016
  30. rebroad referenced this in commit 0a59b5e99c on Nov 6, 2016
  31. rebroad referenced this in commit ede8b337ec on Nov 6, 2016
  32. fanquake added the label Resource usage on Nov 6, 2016
  33. rebroad referenced this in commit 8373c186e6 on Nov 6, 2016
  34. rebroad referenced this in commit f0295722ef on Nov 7, 2016
  35. rebroad referenced this in commit 3965b12528 on Nov 8, 2016
  36. rebroad referenced this in commit e9d817aa39 on Nov 9, 2016
  37. rebroad referenced this in commit c99fc633bc on Nov 11, 2016
  38. laanwj commented at 12:57 PM on November 16, 2016: member

    @vogelito The currently available way to restrict resource usage serving blocks is to set a max upload target: -maxuploadtarget=<n>. This will make your node stop serving blocks if it served this amount of data in a 24-hour period. Bandwidth throttling would be better to keep sustained usage down, but I don't think we can get there in time for 0.14.

    For something supposedly decentralized it is a shame that effective censorship continues to happen to this extent :(

    I know it's extremely hip to be complaining about censorship these days but you have no clue what you're talking about: "Bitcoin", as in the global network and consensus system, is a decentralized system. This project is NOT a decentralized system, but consists of the team maintaining it and regular contributors. We're free to organize and moderate the comments to issues here in any way we deem necessary. This is the only way to run a well-organized project. The issue tracker can't be like a public bathroom wall. So keep it on-topic here or I will start deleting messages.

  39. vogelito commented at 2:55 PM on November 16, 2016: none

    @laanwj thanks. I will give that a try. Is there a suggestion for a value of n.

    We've continued to run 0.13.1 and this is what we see:

    • a sustained higher IO rate for disk reads
    • sudden spikes for IO utilization (mostly reads) lasting roughly 24 hours
    • sustained IOPS

    Attaching updated graphs for your reference.

    <img width="765" alt="screen shot 2016-11-16 at 8 48 18 am" src="https://cloud.githubusercontent.com/assets/1325863/20351807/3b102bfc-abda-11e6-8abe-c41999bf898d.png">

  40. rebroad referenced this in commit cf668ad3c4 on Nov 24, 2016
  41. rebroad referenced this in commit 727c08c4b9 on Nov 24, 2016
  42. rebroad referenced this in commit a6e7623d23 on Nov 28, 2016
  43. rebroad referenced this in commit ee239ee211 on Nov 29, 2016
  44. rebroad referenced this in commit af7216837d on Nov 30, 2016
  45. rebroad referenced this in commit 40b56f6025 on Dec 1, 2016
  46. rebroad referenced this in commit dec8a9964e on Dec 1, 2016
  47. rebroad referenced this in commit fcef2382c5 on Dec 3, 2016
  48. rebroad referenced this in commit d94d0e3abf on Dec 5, 2016
  49. rebroad referenced this in commit 94d43df94a on Dec 6, 2016
  50. rebroad referenced this in commit 8bd41d082d on Dec 8, 2016
  51. rebroad referenced this in commit e883d79624 on Dec 9, 2016
  52. rebroad referenced this in commit ae4613f192 on Dec 11, 2016
  53. rebroad referenced this in commit cb4c25a229 on Dec 11, 2016
  54. rebroad referenced this in commit cc91bbf07d on Dec 11, 2016
  55. rebroad referenced this in commit 21d0334917 on Dec 12, 2016
  56. rebroad referenced this in commit 8e42b91862 on Dec 12, 2016
  57. rebroad referenced this in commit 881c56c3e6 on Dec 13, 2016
  58. rebroad referenced this in commit 02c0c029f4 on Dec 14, 2016
  59. vogelito commented at 9:03 AM on December 14, 2016: none

    I can confirm that resource usage has significantly gone down in the last three weeks. If there's anything you'd like me to report, please let me know

  60. rebroad referenced this in commit a22563ab66 on Dec 16, 2016
  61. rebroad referenced this in commit cabd3069f8 on Dec 18, 2016
  62. rebroad referenced this in commit 48254a6cf4 on Dec 20, 2016
  63. rebroad referenced this in commit b68a06b9e4 on Dec 23, 2016
  64. rebroad referenced this in commit 699017fdbe on Dec 23, 2016
  65. rebroad referenced this in commit 95f05146bb on Dec 23, 2016
  66. rebroad referenced this in commit f9df47ae7c on Dec 24, 2016
  67. rebroad referenced this in commit 7971e17797 on Dec 26, 2016
  68. rebroad referenced this in commit 19b2a3b5c4 on Dec 26, 2016
  69. rebroad referenced this in commit 308703442b on Dec 26, 2016
  70. rebroad referenced this in commit 4ec20271ab on Dec 26, 2016
  71. rebroad referenced this in commit 78cccec572 on Dec 27, 2016
  72. rebroad referenced this in commit e438189435 on Dec 27, 2016
  73. laanwj commented at 8:22 AM on March 25, 2017: member

    As this concerns Linux, #9245 which reduces I/O priority may be useful if the high i/o rate causes other processes to be slow.

  74. Leviathn commented at 1:12 PM on September 5, 2017: none

    Given that the situation resolved itself, this appears closeable @vogelito

  75. vogelito commented at 1:15 PM on September 5, 2017: none

    Sounds good. Closing.

  76. vogelito closed this on Sep 5, 2017

  77. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 09:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me