Drop IO priority to idle while reading blocks for peer requests and startup verification #9245
pull luke-jr wants to merge 3 commits into bitcoin:master from luke-jr:ionice changing 9 files +185 −15-
luke-jr commented at 8:29 am on November 30, 2016: member
-
in src/util.h: in b9303aa36a outdated
130@@ -131,6 +131,49 @@ void OpenDebugLog(); 131 void ShrinkDebugFile(); 132 void runCommand(const std::string& strCommand); 133 134+#ifdef HAVE_IOPRIO_SYSCALL
laanwj commented at 8:39 am on November 30, 2016:Please move this to a new include fileutil_ioprio.h
.laanwj commented at 8:41 am on November 30, 2016: memberConcept ACK. Though not very happy to introduce platform-specific voodoo - we only just got rid of thread priority manipulation. But it may be worth the hassle, I don’t know.
Can we quantify whether this works or not somehow?
laanwj added the label Block storage on Nov 30, 2016gmaxwell commented at 8:50 am on November 30, 2016: contributorThis will also delay other processing, in particular block relay– at least until the handling is made more concurrent– no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.
I second the need to quantify this– I could imagine it making for a big usability improvement. … or not mattering at all. If the former, I want it… if the latter…
luke-jr commented at 9:32 am on November 30, 2016: memberWhenever I restart my node lately, I find myself eventually manually
ionice
ing the entire process as it slows down other things monitoring it iniotop
. I can’t be sure it’s sending out old blocks, but I can’t imagine what else it’d be spending so much time reading… :/Added Mac and Windows support for completeness.
luke-jr force-pushed on Nov 30, 2016in src/utilioprio.h: in 6430b92320 outdated
49+#endif 50+ 51+#ifdef WIN32 52+bool ioprio_set_file_idle(FILE *); 53+#else 54+#define ioprio_set_file_idle(f) (false)
ryanofsky commented at 3:03 pm on November 30, 2016:Maybe change this to
((void)false)
to prevent a compiler warning:0main.cpp:1673:12: warning: statement has no effect [-Wunused-value] 1 ioprio_set_file_idle(filein.Get());
in src/utilioprio.h: in 6430b92320 outdated
40+ } 41+}; 42+#define IOPRIO_IDLER(actually_idle) ioprio_idler ioprio_idler_(actually_idle) 43+ 44+#else 45+#define ioprio_get() (-1)
ryanofsky commented at 3:06 pm on November 30, 2016:It doesn’t seem like it should be necessary to declare these if theioprio_idler
class isn’t around to call them.
luke-jr commented at 9:30 pm on November 30, 2016:We can simplify some of the other stuff (move the logic into the class itself) if low-level access is undesired, but for now it’s too early to know if these won’t be needed IMO.in src/utilioprio.cpp: in 6430b92320 outdated
0@@ -0,0 +1,65 @@ 1+// Copyright (c) 2016 The Bitcoin Core developers 2+// Distributed under the MIT software license, see the accompanying 3+// file COPYING or http://www.opensource.org/licenses/mit-license.php. 4+ 5+#ifdef HAVE_IOPRIO_SYSCALL
ryanofsky commented at 3:29 pm on November 30, 2016:Needs#include "config/bitcoin-config.h"
to prevent link errors.in src/utilioprio.h: in 6430b92320 outdated
0@@ -0,0 +1,57 @@ 1+// Copyright (c) 2016 Satoshi Nakamoto 2+// Distributed under the MIT software license, see the accompanying 3+// file COPYING or http://www.opensource.org/licenses/mit-license.php. 4+ 5+#ifndef BITCOIN_UTIL_IOPRIO_H
ryanofsky commented at 3:30 pm on November 30, 2016:Should add#include "config/bitcoin-config.h"
ryanofsky approvedryanofsky commented at 3:33 pm on November 30, 2016: contributorACK 6430b9232048666996d5d4c3ed154907e3daff67 (after adding missing #includes)fanquake commented at 0:18 am on December 1, 2016: memberTravis failure:
0'../../src/'`utilioprio.cpp 1In file included from ../../src/utilioprio.cpp:9:0: 2../../src/utilioprio.h: In destructor ‘ioprio_idler::~ioprio_idler()’: 3../../src/utilioprio.h:42:51: error: ‘LogPrintf’ was not declared in this scope 4 LogPrintf("failed to restore ioprio\n");
luke-jr commented at 8:51 am on December 1, 2016: memberLooks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I’m going to leave it for a separate PR…luke-jr force-pushed on Dec 1, 2016luke-jr force-pushed on Dec 2, 2016rebroad commented at 4:35 am on December 19, 2016: contributorI like this (concept ACK) although I wonder what the impact is on the p2p network as a whole if everyone ran this.luke-jr force-pushed on Feb 4, 2017martinschwarz commented at 7:57 pm on March 11, 2017: noneLooks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer.
There are win32 and win64 builds. Can’t this just be enabled on the win64 build only?
laanwj commented at 1:55 pm on March 13, 2017: memberLooks like to make the Windows part work, we need to bump _WIN32_WINNT to 0x0600 which means it will only run on Vista or newer. AFAIK this is okay(?), but I’m going to leave it for a separate PR…
Isn’t Vista the version after Windows XP? As we dropped support for Windows XP in 0.13, it seems that requiring Vista for 0.15 is fine.
There are win32 and win64 builds. Can’t this just be enabled on the win64 build only?
Could be done, but it’d be confusing to couple those. The low-end systems running 32-bit versions would probably need this more.
luke-jr force-pushed on Aug 21, 2017luke-jr commented at 5:06 am on August 21, 2017: memberRebased…TheBlueMatt commented at 3:31 pm on August 21, 2017: contributorHmm, I dont think this is really the best idea as long as our message processing is still single-threaded. Really we need to refactor stuff so that block reading is async and the network processing can continue for other peers while we’re serving blocks for peers in IBD, otherwise we may block receiving a new block longer than required.luke-jr commented at 4:04 pm on August 21, 2017: memberThat’s somewhat independent from this issue. If users need to shut off their node to use their computer, the delay for processing a new block will be even longer.luke-jr force-pushed on Aug 21, 2017ryanofsky commented at 5:47 pm on October 12, 2017: contributor@TheBlueMatt @luke-jr, maybe a compromise would be to make this behavior configurable, and perhaps to default to dropping priority if user is running bitcoin-qt on a desktop.TheBlueMatt commented at 9:54 pm on November 10, 2017: contributorAnother approach which might be simpler would be to have the validation.h-exposed versions of ReadBlockFromDisk drop io priority so that net_processing will use low priority when answering remote-node queries but connecting blocks will not. With 0.15 I/O when doing initial sync is somewhat better, so this may also be less of an issue now unless the user is running with -peerbloomfilters.luke-jr commented at 6:28 am on November 11, 2017: member@TheBlueMatt That’s exactly what this already does… priority is only dropped when serving peers, not when connecting blocks.TheBlueMatt commented at 10:57 pm on November 11, 2017: contributor@luke-jr I was referring to the possibility of not exposing a priority flag in validation.h’s API - that seems a bit overkill IMO, as evidenced by the fact that there are now two ReadBlockFromDisk calls in net_processing which dont get the low-priority flag :p. Though that would also result in RPC ReadBlockFromDisk calls getting de-prioritized.
More importantly, I’m curious how much we need this anymore - it seems most of the complaints about I/O usage were primarily due to 0.13.1 preferential peering…On systems where your I/O is severely limited, I both don’t know how much this will help (in my experience Linux’ ionice is mostly worthless when it comes to desktop latency) and don’t know if its not better to direct people towards maxuploadtarget or peerbloomfilters so as to avoid simply slowing down your peers because your I/O is too slow.
luke-jr commented at 11:23 pm on November 11, 2017: memberBefore writing this, I generally ionice’d the entire bitcoind process to maintain system usability.TheBlueMatt commented at 8:59 pm on November 16, 2017: contributorConcept ACK. You need to mark the other ReadBlockFromDisks in net_processing low-priority as well.sipa commented at 5:11 pm on March 6, 2018: memberConcept ACK, but needs rebase.luke-jr force-pushed on Mar 6, 2018luke-jr renamed this:
Drop IO priority to idle while reading blocks for getblock requests
Drop IO priority to idle while reading blocks for peer requests and startup verification
on Mar 6, 2018luke-jr commented at 7:22 pm on March 6, 2018: memberRebased and added the additional deprioritisations requested by @TheBlueMattlaanwj commented at 10:45 pm on March 6, 2018: memberutACK 91ccbbb87eklitzke commented at 9:57 pm on March 20, 2018: contributorI considered making this change in #12618 and there’s a comment about it there. This is much more dangerous than a CPU scheduler change (like the one I just linked for two reasons).
On Linux the I/O scheduling stuff is very primitive, and
IOPRIO_CLASS_IDLE
is quite a strong policy. From the man page:0 IOPRIO_CLASS_IDLE (3) 1 This is the idle scheduling class. Processes running at this 2 level get I/O time only when no one else needs the disk. The 3 idle class has no class data. Attention is required when 4 assigning this priority class to a process, since it may 5 become starved if higher priority processes are constantly 6 accessing the disk.
This means that at the idle I/O processing level you can get starved forever if anything else at all is using the disk. This is quite different from something like
SCHED_BATCH
, which just deprioritizes you a little bit. The CPU scheduler actually has aSCHED_IDLE
that works likeIOPRIO_CLASS_IDLE
, but it’s dangerous for the same reason thatIOPRIO_CLASS_IDLE
is dangerous, so I didn’t use it.The other thing that I’m fairly certain of (but could be wrong about) is that I believe the kernel doesn’t really take into account multi-queue devices when considering idleness of a block device. There’s a lot of discussion about this online if you look at people talking about the %util field in iostat output, e.g. here. I believe for this reason
IOPRIO_CLASS_IDLE
could starve you out from accessing a disk when it actually does have idle capacity remaining.I think we should test this change more (or get a better understanding of the Linux I/O scheduler before proceeding with this change). The crude and easy-to-get wrong policy is part of the reason that I think glibc doesn’t expose this system call in the first place. Not sure that I can think of an actual attack off-hand, but this is a DOS vector if you create N connections to a host, ask for blocks from all N connections, and then have some other mechanism to cause them to use up their disk I/O.
eklitzke changes_requestedeklitzke commented at 10:06 pm on March 20, 2018: contributorConcept ACK is this is enabled via a flag, but this seems dangerous to enable by default. See the other comment I just left.maflcko added the label Needs rebase on Jun 6, 2018laanwj commented at 4:59 pm on September 10, 2018: memberThis has been open since 2016 and it’s still uncertain whether it’s a good idea to merge. Closing,f or now.laanwj closed this on Sep 10, 2018
luke-jr commented at 9:18 pm on September 10, 2018: memberThere are plenty of ACKs here, plenty of testing, and only speculation on why there could (in very improbable circumstances) be a problem.laanwj reopened this on Sep 11, 2018
laanwj commented at 5:32 pm on September 11, 2018: memberreopened on requestDrahtBot commented at 7:23 pm on December 12, 2018: contributorDrahtBot added the label Up for grabs on Dec 12, 2018DrahtBot closed this on Dec 12, 2018
fanquake reopened this on Oct 14, 2019
luke-jr force-pushed on Oct 14, 2019luke-jr force-pushed on Oct 14, 2019DrahtBot removed the label Needs rebase on Oct 14, 2019luke-jr commented at 11:34 pm on October 14, 2019: memberRebasedmaflcko removed the label Up for grabs on Oct 16, 2019ryanofsky approvedryanofsky commented at 3:58 pm on October 24, 2019: contributorConditional concept and code review ACK e1276957ed2c42cfe028a0b2a725767d10c1f5ca if Evan’s comments #9245 (comment) are addressed. Only change since last review is dropping windows support.
Easiest way to address the concerns, I think would be to make this configurable, and maybe only enable it by default for the gui client (assuming UI responsiveness is what motivated this), or only enabled by default on specific platforms where it’s been benchmarked.
The change should probably also have release notes.
DrahtBot commented at 1:01 pm on November 20, 2019: contributorThe following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #26326 (net: don’t lock cs_main while reading blocks in net processing by andrewtoth)
- #26316 (use shared mutex to guard against block files being removed before read by andrewtoth)
- #26288 (Enable -Wstring-concatenation and -Wstring-conversion on clang builds by Empact)
- #25830 (refactor: Replace m_params with chainman.GetParams() by aureleoules)
- #25574 (validation: Improve error handling when VerifyDB fails due to insufficient dbcache by mzumsande)
- #25152 (refactor: Split util/system into exception, shell, and fs-specific files by Empact)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
vr646935 approvedDrahtBot added the label Needs rebase on Apr 20, 2020rebroad referenced this in commit 8d43b4ed5f on Apr 23, 2020rebroad commented at 8:47 am on April 23, 2020: contributorThis will also delay other processing, in particular block relay– at least until the handling is made more concurrent– no? Not a reason to not do it, but maybe a reason to not do it by default for everyone.
I’d have thought it’s possible to make the io lowered only for historical block serving and not for the most recent blocks.
rebroad commented at 2:12 am on April 24, 2020: contributorI’ve been testing this under WSL and Windows 10 but I can’t discern any difference to the build without these changes. According to Process Explorer, all the threads are have I/O Priority of normal.
Is there any particular way that this ought to be tested?
luke-jr commented at 3:09 am on April 24, 2020: memberOn Linux, which is the only OS this PR supports…rebroad referenced this in commit f0a14ee0d4 on Apr 25, 2020rebroad referenced this in commit b7b223cf30 on Apr 26, 2020rebroad referenced this in commit 9798bb904a on Apr 27, 2020rebroad referenced this in commit 39490eb808 on Apr 29, 2020luke-jr force-pushed on Aug 20, 2020luke-jr commented at 2:04 pm on August 20, 2020: memberRebased againDrahtBot removed the label Needs rebase on Aug 20, 2020ryanofsky commented at 10:58 pm on August 24, 2020: contributorre: #9245#pullrequestreview-306688566
Conditional concept and code review ACK e127695 if Evan’s comments #9245 (comment) are addressed. Only change since last review is dropping windows support.
Easiest way to address the concerns, I think would be to make this configurable, and maybe only enable it by default for the gui client (assuming UI responsiveness is what motivated this), or only enabled by default on specific platforms where it’s been benchmarked.
The change should probably also have release notes.
Same conditional ACK for 42321f15160a8de085a06fe13a841080c9eff104 which just rebases e1276957ed2c42cfe028a0b2a725767d10c1f5ca. Hasn’t been followup to the comments since then.
luke-jr commented at 0:59 am on August 25, 2020: memberThe issue is impacting performance of the rest of the system, whether GUI or daemon is irrelevant.fanquake referenced this in commit 8e0f341779 on Aug 25, 2020sidhujag referenced this in commit 62b3455e58 on Aug 25, 2020DrahtBot added the label Needs rebase on Sep 7, 2020luke-jr commented at 6:02 pm on September 11, 2020: member@ryanofsky IMO it doesn’t make sense to continue giving real-world users a bad experience, because of some theoretical issue nobody has ever observed.luke-jr force-pushed on Sep 11, 2020DrahtBot removed the label Needs rebase on Sep 11, 2020DrahtBot added the label Needs rebase on Oct 19, 2020rebroad referenced this in commit 8637d69b88 on Feb 15, 2021rebroad referenced this in commit 94b84f8e0c on Feb 15, 2021rebroad referenced this in commit db7268aa15 on Feb 18, 2021rebroad referenced this in commit b0c2698587 on Feb 19, 2021rebroad referenced this in commit 76e0717c2d on Feb 19, 2021rebroad referenced this in commit 6c8aaf8726 on Feb 19, 2021rebroad referenced this in commit a33b33cb68 on Mar 28, 2021luke-jr force-pushed on Sep 12, 2021luke-jr force-pushed on Sep 12, 2021DrahtBot removed the label Needs rebase on Sep 12, 2021DrahtBot added the label Needs rebase on Sep 16, 2021kristapsk commented at 1:33 pm on July 1, 2022: contributorNeeds rebase.davidgumberg commented at 8:05 pm on September 6, 2022: contributorglozow commented at 10:01 am on September 26, 2022: memberClosing as this has needed rebase for more than 1 year. Feel free to reopen if you get a chance to work on this again in the future, thanks!glozow closed this on Sep 26, 2022
glozow reopened this on Sep 26, 2022
luke-jr force-pushed on Sep 26, 2022Drop IO priority to idle while reading blocks for peer requests and startup verification 91e407b07butil/ioprio: Add Mac support using iopolicy functions 7e442a38beLoadExternalBlockFile: Set low I/O priority b4647b2381luke-jr force-pushed on Sep 26, 2022luke-jr commented at 1:36 pm on September 26, 2022: memberRebasedDrahtBot removed the label Needs rebase on Sep 26, 2022murchandamus commented at 5:50 pm on October 12, 2022: contributorCould you please add a description to the first comment?maflcko commented at 7:51 am on October 19, 2022: memberFrom CI:
0./.libs/libbitcoinkernel.so: undefined reference to `ioprio_set_idle()'
DrahtBot commented at 9:33 am on October 19, 2022: contributor🐙 This pull request conflicts with the target branch and needs rebase.
Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.
DrahtBot added the label Needs rebase on Oct 19, 2022fanquake commented at 10:40 am on December 6, 2022: memberMoving this to draft for now. Since re-opening (after being closed due to needing rebase for > a year), the request for a PR description hasn’t been fullfilled, it’s been failing to compile in the kernel CI, failing the tidy job, and has now been needing rebase again for nearly 2 months.
Feel free to undraft when all 4 issues have been addressed.
fanquake marked this as a draft on Dec 6, 2022DrahtBot commented at 1:38 am on March 6, 2023: contributorThere hasn’t been much activity lately and the patch still needs rebase. What is the status here?
- Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
- Is it no longer relevant? ➡️ Please close.
- Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
fanquake commented at 2:58 pm on June 2, 2023: memberClosing for now. Can be re-opened in future if/when above issues are going to be accounted for.fanquake closed this on Jun 2, 2023
knst referenced this in commit 0b75e98727 on Jan 5, 2024knst referenced this in commit 82dc3f7705 on Jan 7, 2024knst referenced this in commit 2f045103db on Jan 10, 2024knst referenced this in commit eb8827f739 on Jan 10, 2024knst referenced this in commit 7ff385ea41 on Jan 10, 2024PastaPastaPasta referenced this in commit feb636352a on Jan 11, 2024bitcoin locked this on Jun 1, 2024
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me