Drop IO priority to idle while reading blocks for peer requests and startup verification #9245

pull luke-jr wants to merge 3 commits into bitcoin:master from luke-jr:ionice changing 9 files +185 −15
  1. luke-jr commented at 8:29 am on November 30, 2016: member
  2. in src/util.h: in b9303aa36a outdated
    130@@ -131,6 +131,49 @@ void OpenDebugLog();
    131 void ShrinkDebugFile();
    132 void runCommand(const std::string& strCommand);
    133 
    134+#ifdef HAVE_IOPRIO_SYSCALL
    


    laanwj commented at 8:39 am on November 30, 2016:
    Please move this to a new include file util_ioprio.h.
  3. laanwj commented at 8:41 am on November 30, 2016: member

    Concept ACK. Though not very happy to introduce platform-specific voodoo - we only just got rid of thread priority manipulation. But it may be worth the hassle, I don’t know.

    Can we quantify whether this works or not somehow?

  4. laanwj added the label Block storage on Nov 30, 2016
  5. gmaxwell commented at 8:50 am on November 30, 2016: contributor

    This will also delay other processing, in particular block relay– at least until the handling is made more concurrent– no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.

    I second the need to quantify this– I could imagine it making for a big usability improvement. … or not mattering at all. If the former, I want it… if the latter…

  6. luke-jr commented at 9:32 am on November 30, 2016: member

    Whenever I restart my node lately, I find myself eventually manually ioniceing the entire process as it slows down other things monitoring it in iotop. I can’t be sure it’s sending out old blocks, but I can’t imagine what else it’d be spending so much time reading… :/

    Added Mac and Windows support for completeness.

  7. luke-jr force-pushed on Nov 30, 2016
  8. in src/utilioprio.h: in 6430b92320 outdated
    49+#endif
    50+
    51+#ifdef WIN32
    52+bool ioprio_set_file_idle(FILE *);
    53+#else
    54+#define ioprio_set_file_idle(f)  (false)
    


    ryanofsky commented at 3:03 pm on November 30, 2016:

    Maybe change this to ((void)false) to prevent a compiler warning:

    0main.cpp:1673:12: warning: statement has no effect [-Wunused-value]
    1     ioprio_set_file_idle(filein.Get());
    
  9. in src/utilioprio.h: in 6430b92320 outdated
    40+    }
    41+};
    42+#define IOPRIO_IDLER(actually_idle)  ioprio_idler ioprio_idler_(actually_idle)
    43+
    44+#else
    45+#define ioprio_get() (-1)
    


    ryanofsky commented at 3:06 pm on November 30, 2016:
    It doesn’t seem like it should be necessary to declare these if the ioprio_idler class isn’t around to call them.

    luke-jr commented at 9:30 pm on November 30, 2016:
    We can simplify some of the other stuff (move the logic into the class itself) if low-level access is undesired, but for now it’s too early to know if these won’t be needed IMO.
  10. in src/utilioprio.cpp: in 6430b92320 outdated
    0@@ -0,0 +1,65 @@
    1+// Copyright (c) 2016 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifdef HAVE_IOPRIO_SYSCALL
    


    ryanofsky commented at 3:29 pm on November 30, 2016:
    Needs #include "config/bitcoin-config.h" to prevent link errors.
  11. in src/utilioprio.h: in 6430b92320 outdated
    0@@ -0,0 +1,57 @@
    1+// Copyright (c) 2016 Satoshi Nakamoto
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifndef BITCOIN_UTIL_IOPRIO_H
    


    ryanofsky commented at 3:30 pm on November 30, 2016:
    Should add #include "config/bitcoin-config.h"
  12. ryanofsky approved
  13. ryanofsky commented at 3:33 pm on November 30, 2016: contributor
    ACK 6430b9232048666996d5d4c3ed154907e3daff67 (after adding missing #includes)
  14. fanquake commented at 0:18 am on December 1, 2016: member

    Travis failure:

    0'../../src/'`utilioprio.cpp
    1In file included from ../../src/utilioprio.cpp:9:0:
    2../../src/utilioprio.h: In destructor ‘ioprio_idler::~ioprio_idler()’:
    3../../src/utilioprio.h:42:51: error: ‘LogPrintf’ was not declared in this scope
    4             LogPrintf("failed to restore ioprio\n");
    
  15. luke-jr commented at 8:51 am on December 1, 2016: member
    Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I’m going to leave it for a separate PR…
  16. luke-jr force-pushed on Dec 1, 2016
  17. luke-jr force-pushed on Dec 2, 2016
  18. rebroad commented at 4:35 am on December 19, 2016: contributor
    I like this (concept ACK) although I wonder what the impact is on the p2p network as a whole if everyone ran this.
  19. luke-jr force-pushed on Feb 4, 2017
  20. martinschwarz commented at 7:57 pm on March 11, 2017: none

    Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer.

    There are win32 and win64 builds. Can’t this just be enabled on the win64 build only?

  21. laanwj commented at 1:55 pm on March 13, 2017: member

    Looks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I’m going to leave it for a separate PR…

    Isn’t Vista the version after Windows XP? As we dropped support for Windows XP in 0.13, it seems that requiring Vista for 0.15 is fine.

    There are win32 and win64 builds. Can’t this just be enabled on the win64 build only?

    Could be done, but it’d be confusing to couple those. The low-end systems running 32-bit versions would probably need this more.

  22. luke-jr force-pushed on Aug 21, 2017
  23. luke-jr commented at 5:06 am on August 21, 2017: member
    Rebased…
  24. TheBlueMatt commented at 3:31 pm on August 21, 2017: contributor
    Hmm, I dont think this is really the best idea as long as our message processing is still single-threaded. Really we need to refactor stuff so that block reading is async and the network processing can continue for other peers while we’re serving blocks for peers in IBD, otherwise we may block receiving a new block longer than required.
  25. luke-jr commented at 4:04 pm on August 21, 2017: member
    That’s somewhat independent from this issue. If users need to shut off their node to use their computer, the delay for processing a new block will be even longer.
  26. luke-jr force-pushed on Aug 21, 2017
  27. ryanofsky commented at 5:47 pm on October 12, 2017: contributor
    @TheBlueMatt @luke-jr, maybe a compromise would be to make this behavior configurable, and perhaps to default to dropping priority if user is running bitcoin-qt on a desktop.
  28. TheBlueMatt commented at 9:54 pm on November 10, 2017: contributor
    Another approach which might be simpler would be to have the validation.h-exposed versions of ReadBlockFromDisk drop io priority so that net_processing will use low priority when answering remote-node queries but connecting blocks will not. With 0.15 I/O when doing initial sync is somewhat better, so this may also be less of an issue now unless the user is running with -peerbloomfilters.
  29. luke-jr commented at 6:28 am on November 11, 2017: member
    @TheBlueMatt That’s exactly what this already does… priority is only dropped when serving peers, not when connecting blocks.
  30. TheBlueMatt commented at 10:57 pm on November 11, 2017: contributor

    @luke-jr I was referring to the possibility of not exposing a priority flag in validation.h’s API - that seems a bit overkill IMO, as evidenced by the fact that there are now two ReadBlockFromDisk calls in net_processing which dont get the low-priority flag :p. Though that would also result in RPC ReadBlockFromDisk calls getting de-prioritized.

    More importantly, I’m curious how much we need this anymore - it seems most of the complaints about I/O usage were primarily due to 0.13.1 preferential peering…On systems where your I/O is severely limited, I both don’t know how much this will help (in my experience Linux’ ionice is mostly worthless when it comes to desktop latency) and don’t know if its not better to direct people towards maxuploadtarget or peerbloomfilters so as to avoid simply slowing down your peers because your I/O is too slow.

  31. luke-jr commented at 11:23 pm on November 11, 2017: member
    Before writing this, I generally ionice’d the entire bitcoind process to maintain system usability.
  32. TheBlueMatt commented at 8:59 pm on November 16, 2017: contributor
    Concept ACK. You need to mark the other ReadBlockFromDisks in net_processing low-priority as well.
  33. sipa commented at 5:11 pm on March 6, 2018: member
    Concept ACK, but needs rebase.
  34. luke-jr force-pushed on Mar 6, 2018
  35. luke-jr renamed this:
    Drop IO priority to idle while reading blocks for getblock requests
    Drop IO priority to idle while reading blocks for peer requests and startup verification
    on Mar 6, 2018
  36. luke-jr commented at 7:22 pm on March 6, 2018: member
    Rebased and added the additional deprioritisations requested by @TheBlueMatt
  37. laanwj commented at 10:45 pm on March 6, 2018: member
    utACK 91ccbbb87
  38. eklitzke commented at 9:57 pm on March 20, 2018: contributor

    I considered making this change in #12618 and there’s a comment about it there. This is much more dangerous than a CPU scheduler change (like the one I just linked for two reasons).

    On Linux the I/O scheduling stuff is very primitive, and IOPRIO_CLASS_IDLE is quite a strong policy. From the man page:

    0       IOPRIO_CLASS_IDLE (3)
    1              This is the idle scheduling class.  Processes running at this
    2              level get I/O time only when no one else needs the disk.  The
    3              idle class has no class data.  Attention is required when
    4              assigning this priority class to a process, since it may
    5              become starved if higher priority processes are constantly
    6              accessing the disk.
    

    This means that at the idle I/O processing level you can get starved forever if anything else at all is using the disk. This is quite different from something like SCHED_BATCH, which just deprioritizes you a little bit. The CPU scheduler actually has a SCHED_IDLE that works like IOPRIO_CLASS_IDLE, but it’s dangerous for the same reason that IOPRIO_CLASS_IDLE is dangerous, so I didn’t use it.

    The other thing that I’m fairly certain of (but could be wrong about) is that I believe the kernel doesn’t really take into account multi-queue devices when considering idleness of a block device. There’s a lot of discussion about this online if you look at people talking about the %util field in iostat output, e.g. here. I believe for this reason IOPRIO_CLASS_IDLE could starve you out from accessing a disk when it actually does have idle capacity remaining.

    I think we should test this change more (or get a better understanding of the Linux I/O scheduler before proceeding with this change). The crude and easy-to-get wrong policy is part of the reason that I think glibc doesn’t expose this system call in the first place. Not sure that I can think of an actual attack off-hand, but this is a DOS vector if you create N connections to a host, ask for blocks from all N connections, and then have some other mechanism to cause them to use up their disk I/O.

  39. eklitzke changes_requested
  40. eklitzke commented at 10:06 pm on March 20, 2018: contributor
    Concept ACK is this is enabled via a flag, but this seems dangerous to enable by default. See the other comment I just left.
  41. maflcko added the label Needs rebase on Jun 6, 2018
  42. laanwj commented at 4:59 pm on September 10, 2018: member
    This has been open since 2016 and it’s still uncertain whether it’s a good idea to merge. Closing,f or now.
  43. laanwj closed this on Sep 10, 2018

  44. luke-jr commented at 9:18 pm on September 10, 2018: member
    There are plenty of ACKs here, plenty of testing, and only speculation on why there could (in very improbable circumstances) be a problem.
  45. laanwj reopened this on Sep 11, 2018

  46. laanwj commented at 5:32 pm on September 11, 2018: member
    reopened on request
  47. DrahtBot commented at 7:23 pm on December 12, 2018: contributor
  48. DrahtBot added the label Up for grabs on Dec 12, 2018
  49. DrahtBot closed this on Dec 12, 2018

  50. fanquake reopened this on Oct 14, 2019

  51. luke-jr force-pushed on Oct 14, 2019
  52. luke-jr force-pushed on Oct 14, 2019
  53. DrahtBot removed the label Needs rebase on Oct 14, 2019
  54. luke-jr commented at 11:34 pm on October 14, 2019: member
    Rebased
  55. maflcko removed the label Up for grabs on Oct 16, 2019
  56. ryanofsky approved
  57. ryanofsky commented at 3:58 pm on October 24, 2019: contributor

    Conditional concept and code review ACK e1276957ed2c42cfe028a0b2a725767d10c1f5ca if Evan’s comments #9245 (comment) are addressed. Only change since last review is dropping windows support.

    Easiest way to address the concerns, I think would be to make this configurable, and maybe only enable it by default for the gui client (assuming UI responsiveness is what motivated this), or only enabled by default on specific platforms where it’s been benchmarked.

    The change should probably also have release notes.

  58. DrahtBot commented at 1:01 pm on November 20, 2019: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK rebroad, TheBlueMatt, sipa, eklitzke
    Stale ACK laanwj, ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26326 (net: don’t lock cs_main while reading blocks in net processing by andrewtoth)
    • #26316 (use shared mutex to guard against block files being removed before read by andrewtoth)
    • #26288 (Enable -Wstring-concatenation and -Wstring-conversion on clang builds by Empact)
    • #25830 (refactor: Replace m_params with chainman.GetParams() by aureleoules)
    • #25574 (validation: Improve error handling when VerifyDB fails due to insufficient dbcache by mzumsande)
    • #25152 (refactor: Split util/system into exception, shell, and fs-specific files by Empact)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  59. vr646935 approved
  60. DrahtBot added the label Needs rebase on Apr 20, 2020
  61. rebroad referenced this in commit 8d43b4ed5f on Apr 23, 2020
  62. rebroad commented at 8:47 am on April 23, 2020: contributor

    This will also delay other processing, in particular block relay– at least until the handling is made more concurrent– no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.

    I’d have thought it’s possible to make the io lowered only for historical block serving and not for the most recent blocks.

  63. rebroad commented at 2:12 am on April 24, 2020: contributor

    I’ve been testing this under WSL and Windows 10 but I can’t discern any difference to the build without these changes. According to Process Explorer, all the threads are have I/O Priority of normal. image

    Is there any particular way that this ought to be tested?

  64. luke-jr commented at 3:09 am on April 24, 2020: member
    On Linux, which is the only OS this PR supports…
  65. rebroad referenced this in commit f0a14ee0d4 on Apr 25, 2020
  66. rebroad referenced this in commit b7b223cf30 on Apr 26, 2020
  67. rebroad referenced this in commit 9798bb904a on Apr 27, 2020
  68. rebroad referenced this in commit 39490eb808 on Apr 29, 2020
  69. luke-jr force-pushed on Aug 20, 2020
  70. luke-jr commented at 2:04 pm on August 20, 2020: member
    Rebased again
  71. DrahtBot removed the label Needs rebase on Aug 20, 2020
  72. ryanofsky commented at 10:58 pm on August 24, 2020: contributor

    re: #9245#pullrequestreview-306688566

    Conditional concept and code review ACK e127695 if Evan’s comments #9245 (comment) are addressed. Only change since last review is dropping windows support.

    Easiest way to address the concerns, I think would be to make this configurable, and maybe only enable it by default for the gui client (assuming UI responsiveness is what motivated this), or only enabled by default on specific platforms where it’s been benchmarked.

    The change should probably also have release notes.

    Same conditional ACK for 42321f15160a8de085a06fe13a841080c9eff104 which just rebases e1276957ed2c42cfe028a0b2a725767d10c1f5ca. Hasn’t been followup to the comments since then.

  73. luke-jr commented at 0:59 am on August 25, 2020: member
    The issue is impacting performance of the rest of the system, whether GUI or daemon is irrelevant.
  74. fanquake referenced this in commit 8e0f341779 on Aug 25, 2020
  75. sidhujag referenced this in commit 62b3455e58 on Aug 25, 2020
  76. DrahtBot added the label Needs rebase on Sep 7, 2020
  77. luke-jr commented at 6:02 pm on September 11, 2020: member
    @ryanofsky IMO it doesn’t make sense to continue giving real-world users a bad experience, because of some theoretical issue nobody has ever observed.
  78. luke-jr force-pushed on Sep 11, 2020
  79. DrahtBot removed the label Needs rebase on Sep 11, 2020
  80. DrahtBot added the label Needs rebase on Oct 19, 2020
  81. rebroad referenced this in commit 8637d69b88 on Feb 15, 2021
  82. rebroad referenced this in commit 94b84f8e0c on Feb 15, 2021
  83. rebroad referenced this in commit db7268aa15 on Feb 18, 2021
  84. rebroad referenced this in commit b0c2698587 on Feb 19, 2021
  85. rebroad referenced this in commit 76e0717c2d on Feb 19, 2021
  86. rebroad referenced this in commit 6c8aaf8726 on Feb 19, 2021
  87. rebroad commented at 8:51 pm on March 19, 2021: contributor
    I still think #9599 made sense (to do IBD in a separate thread, with low I/O priority) so that a node can still usefully serve other nodes during IBD (which it seems to fail at without some additional concurrency).
  88. rebroad referenced this in commit a33b33cb68 on Mar 28, 2021
  89. luke-jr force-pushed on Sep 12, 2021
  90. luke-jr force-pushed on Sep 12, 2021
  91. DrahtBot removed the label Needs rebase on Sep 12, 2021
  92. DrahtBot added the label Needs rebase on Sep 16, 2021
  93. kristapsk commented at 1:33 pm on July 1, 2022: contributor
    Needs rebase.
  94. davidgumberg commented at 8:05 pm on September 6, 2022: contributor

    @luke-jr

    I have rebased this here.

    Feel free to take this.

  95. glozow commented at 10:01 am on September 26, 2022: member
    Closing as this has needed rebase for more than 1 year. Feel free to reopen if you get a chance to work on this again in the future, thanks!
  96. glozow closed this on Sep 26, 2022

  97. glozow reopened this on Sep 26, 2022

  98. luke-jr force-pushed on Sep 26, 2022
  99. Drop IO priority to idle while reading blocks for peer requests and startup verification 91e407b07b
  100. util/ioprio: Add Mac support using iopolicy functions 7e442a38be
  101. LoadExternalBlockFile: Set low I/O priority b4647b2381
  102. luke-jr force-pushed on Sep 26, 2022
  103. luke-jr commented at 1:36 pm on September 26, 2022: member
    Rebased
  104. DrahtBot removed the label Needs rebase on Sep 26, 2022
  105. murchandamus commented at 5:50 pm on October 12, 2022: contributor
    Could you please add a description to the first comment?
  106. maflcko commented at 7:51 am on October 19, 2022: member

    From CI:

    0./.libs/libbitcoinkernel.so: undefined reference to `ioprio_set_idle()'
    
  107. DrahtBot commented at 9:33 am on October 19, 2022: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  108. DrahtBot added the label Needs rebase on Oct 19, 2022
  109. fanquake commented at 10:40 am on December 6, 2022: member

    Moving this to draft for now. Since re-opening (after being closed due to needing rebase for > a year), the request for a PR description hasn’t been fullfilled, it’s been failing to compile in the kernel CI, failing the tidy job, and has now been needing rebase again for nearly 2 months.

    Feel free to undraft when all 4 issues have been addressed.

  110. fanquake marked this as a draft on Dec 6, 2022
  111. DrahtBot commented at 1:38 am on March 6, 2023: contributor

    There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  112. fanquake commented at 2:58 pm on June 2, 2023: member
    Closing for now. Can be re-opened in future if/when above issues are going to be accounted for.
  113. fanquake closed this on Jun 2, 2023

  114. knst referenced this in commit 0b75e98727 on Jan 5, 2024
  115. knst referenced this in commit 82dc3f7705 on Jan 7, 2024
  116. knst referenced this in commit 2f045103db on Jan 10, 2024
  117. knst referenced this in commit eb8827f739 on Jan 10, 2024
  118. knst referenced this in commit 7ff385ea41 on Jan 10, 2024
  119. PastaPastaPasta referenced this in commit feb636352a on Jan 11, 2024
  120. bitcoin locked this on Jun 1, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me