bug-fix: delay flushing undo files until after they are finalized #17892

pull kallewoof wants to merge 1 commits into bitcoin:master from kallewoof:200108-reflush-undo changing 1 files +7 −1
  1. kallewoof commented at 6:06 pm on January 7, 2020: member

    The blk files have blocks written to them as they come in. Each blk file and rev file reference the same blocks, but the order in which block data is written into the rev (undo) files is in order of activation, i.e. in block height order.

    Thus, when the code decides to stop writing into blk000X.dat, it will flush this and rev000X.dat, and then move to X+1. The problem is that since blocks are activated later than they are stored in the block file, we will still be writing to rev X for awhile after this.

    This code fixes this by simply finalizing the previous undo file each time we finish up a block file. This pretty much guarantees that we will never pre-allocate space and never flush for an undo file (which is happening right now).

    Unfortunately for anyone with affected machines, there will be wasted drive space for each rev file if they did IBD before this fix. Probably affects all architectures.

    Edit: note that this PR initially added a new critical section for undo pos and some other stuff, but I scratched that for this simpler solution, hence practicalswift’s comment below.

    Fixes #17890.

  2. DrahtBot commented at 6:33 pm on January 7, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #17994 (validation: flush undo files after last block write by kallewoof)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. fanquake added the label Validation on Jan 7, 2020
  4. practicalswift commented at 9:29 pm on January 7, 2020: contributor
    Thread-safety annotations (GUARDED_BY etc.) seems to be missing? :)
  5. kallewoof force-pushed on Jan 8, 2020
  6. kallewoof commented at 3:41 am on January 8, 2020: member
    @practicalswift Thanks for the nudge. My apologies, though… I ended up scratching this solution for a much simpler one (see updated code).
  7. kallewoof renamed this:
    bug-fix: re-flush undo files after they have been re-opened
    bug-fix: delay flushing undo files until after they are finalized
    on Jan 8, 2020
  8. kallewoof commented at 3:54 am on January 8, 2020: member

    Some stats: after syncing my mac up to ~ block 186k, the blocks dir looks like this:

     0$ du -ch *.dat
     1128M	blk00000.dat
     2128M	blk00001.dat
     3128M	blk00002.dat
     4128M	blk00003.dat
     5128M	blk00004.dat
     6128M	blk00005.dat
     7128M	blk00006.dat
     8128M	blk00007.dat
     9128M	blk00008.dat
    10128M	blk00009.dat
    11128M	blk00010.dat
    12128M	blk00011.dat
    13128M	blk00012.dat
    14128M	blk00013.dat
    15144M	blk00014.dat
    16 19M	rev00000.dat
    17 16M	rev00001.dat
    18 16M	rev00002.dat
    19 16M	rev00003.dat
    20 16M	rev00004.dat
    21 17M	rev00005.dat
    22 17M	rev00006.dat
    23 17M	rev00007.dat
    24 18M	rev00008.dat
    25 17M	rev00009.dat
    26 17M	rev00010.dat
    27 17M	rev00011.dat
    28 18M	rev00012.dat
    29171M	rev00013.dat
    30 21M	rev00014.dat
    312.3G	total
    

    The fact the blk for 14 and rev files for 13-14 are bloated is due to a separate bug (see #17887).

    Note that rev 13 is still huge despite us moving on to rev 14. When blk 14 is finalized, it will truncate rev 13 down to its right size. It may be worthwhile to actually finalize current and previous twice, to avoid this, but since we are more or less guaranteed to again pre-allocate into X-1, this seems unnecessary, and #17887 will make this have significantly less impact.

  9. maaku commented at 8:14 am on January 8, 2020: contributor

    I can confirm the bug. I’ve run into it multiple times. Kalle is convinced that we are at fault, although I’m not so sure, and I think this might even be an Apple bug. It only shows up on APFS, and the implementation of rsync which ships with macOS also suffers from the same issue.

    But in any case, if we can fix it on our end we should, and this should be pretty high priority. It can end up wasting hundreds of gigabytes of space during an IBD, in some cases exhausting the file on 512GB drives in a way that is very non-intuitive to figure out and fix as a user.

  10. fanquake added the label Needs backport (0.19) on Jan 8, 2020
  11. fanquake requested review from sipa on Jan 12, 2020
  12. fanquake added this to the "Bugfixes" column in a project

  13. eriknylund commented at 8:15 pm on January 20, 2020: contributor

    ACK 0508072e1a45c339a181d24c2edfc88e3cd5201f built locally. All tests pass. I see the blow-up happening and the files being truncated as the following blk file is opened and the previous is finalized.

    Here rev00004.dat is still not truncated since we are on blk00005.dat

     0128M	blk00000.dat
     1128M	blk00001.dat
     2128M	blk00002.dat
     3128M	blk00003.dat
     4128M	blk00004.dat
     5576M	blk00005.dat
     6 19M	rev00000.dat
     7 16M	rev00001.dat
     8 16M	rev00002.dat
     9 16M	rev00003.dat
    10153M	rev00004.dat
    11136M	rev00005.dat
    

    In the next step blk00005.dat has been finalized and rev00004.dat is truncated.

     0128M	blk00000.dat
     1128M	blk00001.dat
     2128M	blk00002.dat
     3128M	blk00003.dat
     4128M	blk00004.dat
     5128M	blk00005.dat
     6 16M	blk00006.dat
     7 19M	rev00000.dat
     8 16M	rev00001.dat
     9 16M	rev00002.dat
    10 16M	rev00003.dat
    11 16M	rev00004.dat
    12153M	rev00005.dat
    131.0M	rev00006.dat
    
  14. in src/validation.cpp:1742 in 0508072e1a outdated
    1734@@ -1735,8 +1735,14 @@ void static FlushBlockFile(bool fFinalize = false)
    1735 {
    1736     LOCK(cs_LastBlockFile);
    1737 
    1738+    // we want to finalize flush the previous undo file, as it will have been written to
    1739+    // for awhile even after we've moved onto the next blk file; if we don't do this, we
    1740+    // end up with a re-opened but unflushed rev file, resulting in pre-allocated but not
    1741+    // returned disk space allocation
    1742+    int last_undo_file = fFinalize && nLastBlockFile > 0 ? nLastBlockFile - 1 : nLastBlockFile;
    


    eriknylund commented at 8:20 pm on January 20, 2020:
    0    int previous_undo_file = fFinalize && nLastBlockFile > 0 ? nLastBlockFile - 1 : nLastBlockFile;
    

    Given the comment on line 1738, is previous more appropriate than last?


    kallewoof commented at 0:23 am on January 21, 2020:
    Makes sense.
  15. eriknylund changes_requested
  16. kallewoof force-pushed on Jan 21, 2020
  17. kallewoof force-pushed on Jan 21, 2020
  18. fjahr commented at 9:44 pm on January 21, 2020: member
    Similar to here #17887 (comment) I am not able to reproduce this issue on current master (no matter if --enable-debug or not). Happy to take hints and retest as I am curious why my results are different from everyone else.
  19. kallewoof commented at 1:16 am on January 22, 2020: member
    @fjahr #17887 (comment) seems to indicate you were finally able to reproduce the issue. Or is that comment older?
  20. kallewoof force-pushed on Jan 22, 2020
  21. fjahr commented at 3:03 pm on January 22, 2020: member

    @fjahr #17887 (comment) seems to indicate you were finally able to reproduce the issue. Or is that comment older?

    Yes, sorry, this is not worded clearly: I could reproduce #17887 with rev files up 64M large if compiled without --enable-debug but I could not reproduce it with --enable-debug. This one I can not reproduce no matter if I set debugging or not.

  22. kallewoof commented at 3:33 pm on January 22, 2020: member

    @fjahr You can not detect the following pattern anywhere in your log file even after syncing for awhile?

    0(date) Leaving block file X: CBlockFileInfo(...)
    1(date) Pre-allocating up to position 0x1000000 in blk(X+1).dat
    2(date) Pre-allocating up to position 0x100000 in rev(X+1).dat
    3[...]
    4(date) Pre-allocating up to position (...) in rev(X).dat
    
  23. fjahr commented at 3:50 pm on January 22, 2020: member
    @kallewoof I do see it, but not regularly. From my last sync up to 190k it seems to appear 3 times: For X=7, X=10 and X=14. I was trying hard to catch a larger than normal file size with du -ch *.dat, so I missed the right block/rev file transition?
  24. kallewoof commented at 4:06 pm on January 22, 2020: member

    There’s some amount of randomness involved, as the order in which blocks are received and the order in which they are connected is different.

    With #17887 merged, it will be very hard to physically spot these now, but it should occasionally show larger-than-necessary rev files (in the range of 1 b up to 1 MB).

  25. bug-fix: delayed flush of undo files
    When creating blk and rev files, these are pre-allocated for performance reasons, and then truncated down to their final size after completion.
    This works for blk files which are sequentially created, but since blocks are stored out of order, and since the undo data for blocks are stored sequentially (upon activation of the block), the rev file will be written to for awhile even after the blk file has been finalized. This change switches the flushing part to flush the *previous* undo file for the finalize (truncate) case, which addresses the problem, as we never write to the rev file 2 steps back.
    b49b34139a
  26. kallewoof force-pushed on Jan 22, 2020
  27. Sjors commented at 4:56 pm on January 22, 2020: member
    Given that #17887 is merged, is it still worth the risk of back-porting? (can always do that later, after 0.20 is released)
  28. in src/validation.cpp:1742 in b49b34139a
    1734@@ -1735,8 +1735,14 @@ void static FlushBlockFile(bool fFinalize = false)
    1735 {
    1736     LOCK(cs_LastBlockFile);
    1737 
    1738+    // we want to finalize flush the previous undo file, as it will have been written to
    1739+    // for awhile even after we've moved onto the next blk file; if we don't do this, we
    1740+    // end up with a re-opened but unflushed rev file, resulting in pre-allocated but not
    1741+    // returned disk space allocation
    1742+    int previous_undo_file = fFinalize && nLastBlockFile > 0 ? nLastBlockFile - 1 : nLastBlockFile;
    


    Sjors commented at 4:59 pm on January 22, 2020:
    nit: add brackets to clarify (correct) precedence: (nLastBlockFile > 0)

    kallewoof commented at 1:27 am on January 23, 2020:
    I honestly don’t think that’s necessary, but I’m open to change my opinion.
  29. kallewoof commented at 1:07 am on January 23, 2020: member

    @Sjors

    Given that #17887 is merged, is it still worth the risk of back-porting? (can always do that later, after 0.20 is released)

    There are nearly two thousand rev files right now. Someone doing an IBD sync using 0.19 today would end up wasting up to 2 GB of space due to this bug. I do believe we want to plug that hole, even though it has become much smaller of an issue as you mention.

  30. maaku commented at 2:08 am on January 23, 2020: contributor
    Last time I did an IBD (a few months ago) it was >100GB wasted. That’s actually how I discovered this problem initially, it was enough wasted space to impact me (full disk).
  31. kallewoof commented at 3:06 am on January 23, 2020: member

    @maaku

    Last time I did an IBD (a few months ago) it was >100GB wasted. That’s actually how I discovered this problem initially, it was enough wasted space to impact me (full disk).

    Yeah, that was partially fixed in #17887, so you shouldn’t see quite that drastic of an effect anymore, but still..

  32. maaku commented at 3:20 am on January 23, 2020: contributor
    And this discussion is about backporting those changes, no?
  33. kallewoof commented at 3:34 am on January 23, 2020: member

    No, #17887 is already in the backported list and is not under discussion. @Sjors is wondering whether this patch, which delays finalize-flushing the rev files until we’re (pretty) sure they are no longer appended to, should be backported or not.

    Without this patch, we will experience 0-1 MB of wasted space for each rev file.

    Without #17887 we would experience what you saw.

  34. fjahr commented at 11:00 am on January 23, 2020: member

    I have now done another run with this fix and could not detect the pattern described in #17892 (comment) in my debug.log

    ACK b49b34139ac3c187eaac21b25f88cd493ce668b2

  35. MarcoFalke commented at 1:08 pm on January 23, 2020: member

    Without this patch, we will experience 0-1 MB of wasted space for each rev file.

    It seems like this does not warrant a backport to 0.19.1

  36. kallewoof commented at 1:26 pm on January 23, 2020: member

    @MarcoFalke

    Depends if you think “a few GBs wasted” is negligible or not. (This is for non-pruned nodes only, so a few GBs in a 250 GB datadir.)

  37. MarcoFalke commented at 2:15 pm on January 23, 2020: member
    Yes, based on your comments this should be less than 1% off (not noticeable for a user)
  38. anton48 commented at 2:21 pm on January 23, 2020: none

    @MarcoFalke

    Depends if you think “a few GBs wasted” is negligible or not. (This is for non-pruned nodes only, so a few GBs in a 250 GB datadir.)

    I did a quick test with downloading from scratch up to year 2014 (approx. 270k blocks):

    master (278292 blocks, 1931064 bytes, 10 files with size 19M)

     02020-01-23T11:01:43Z Leaving block file 104: CBlockFileInfo(blocks=871, size=134096120, heights=276943...278292, time=2013-12-25...2014-01-02)
     1
     2Antons-Mac-mini:HDD2 a48$ ls -la Bitcoin.nopatch/blocks/rev00* | wc -l
     3     105
     4
     5Antons-Mac-mini:HDD2 a48$ du -ck Bitcoin.nopatch/blocks/rev00* | grep total
     61931064	total
     7
     8Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.nopatch/blocks/rev00* | grep 19M
     9 19M	Bitcoin.nopatch/blocks/rev00000.dat
    10 19M	Bitcoin.nopatch/blocks/rev00013.dat
    11 19M	Bitcoin.nopatch/blocks/rev00073.dat
    12 19M	Bitcoin.nopatch/blocks/rev00075.dat
    13 19M	Bitcoin.nopatch/blocks/rev00081.dat
    14 19M	Bitcoin.nopatch/blocks/rev00083.dat
    15 19M	Bitcoin.nopatch/blocks/rev00086.dat
    16 19M	Bitcoin.nopatch/blocks/rev00087.dat
    17 19M	Bitcoin.nopatch/blocks/rev00094.dat
    18 19M	Bitcoin.nopatch/blocks/rev00101.dat
    19Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.nopatch/blocks/rev00* | grep 19M | wc -l
    20      10
    21Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.nopatch/blocks/rev00* | grep 18M | wc -l
    22      80
    

    this PR (278293 blocks, 1865004 bytes, no files with size 19M, all of them are 17M or 18M)

     02020-01-23T12:48:45Z Leaving block file 104: CBlockFileInfo(blocks=865, size=134045380, heights=277411...278293, time=2013-12-28...2014-01-02)
     1
     2Antons-Mac-mini:HDD2 a48$ ls -la Bitcoin.patch/blocks/rev00* | wc -l
     3     105
     4
     5Antons-Mac-mini:HDD2 a48$ du -ck Bitcoin.patch/blocks/rev00* | grep total
     61865004	total
     7
     8Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.patch/blocks/rev00* | grep 19M
     9Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.patch/blocks/rev00* | grep 19M | wc -l
    10       0
    11Antons-Mac-mini:HDD2 a48$ du -ch Bitcoin.patch/blocks/rev00* | grep 18M | wc -l
    12      38
    

    so, it looks for me that there could be more than 1G of overhead on a fullnode with all blocks (will test it, but it takes time).

  39. kallewoof commented at 2:25 pm on January 23, 2020: member

    @anton48 Thanks for the initiative. I’ll do the same thing on my side so we can get some clarity on the matter. @MarcoFalke I agree with you, but with the caveat that it’s temporary. It’s ludicrous to waste several GB of space on people’s machines for no good reason. My proposal is that we (1) make a PR that does a one time sweep (esp important for people with lots of disk space who are stuck with pre-#17887 datadirs, if such people exist at all) + implements this code*, and to potentially backport those two.

    (* or possibly more sophisticated; ideal implementation is to, for each blk file, track which block is the highest height, and once that has been connected, to finalize the corresponding rev file)

  40. Sjors commented at 6:13 pm on January 23, 2020: member

    I agree with you, but with the caveat that it’s temporary. It’s ludicrous to waste several GB of space on people’s machines for no good reason.

    The waste would be temporary; until the user installs 0.20 or 0.19.2. Worst case scenario is block data corruption. Because I’m personally not familiar with this area of the code, I prefer to let it stew on master for a while before shipping. But if others feel very comfortable that this code is correct then I’m fine with backporting.

  41. eriknylund commented at 6:20 pm on January 23, 2020: contributor

    @anton48 Thanks for the initiative. I’ll do the same thing on my side so we can get some clarity on the matter.

    @MarcoFalke I agree with you, but with the caveat that it’s temporary. It’s ludicrous to waste several GB of space on people’s machines for no good reason. My proposal is that we (1) make a PR that does a one time sweep (esp important for people with lots of disk space who are stuck with pre-#17887 datadirs, if such people exist at all) + implements this code*, and to potentially backport those two.

    (* or possibly more sophisticated; ideal implementation is to, for each blk file, track which block is the highest height, and once that has been connected, to finalize the corresponding rev file) @kallewoof Given it has not been getting attention since the release of APFS in September, 2017 it seems not that many installations exist. Is it worth adding this extra complexity for little or no users? Do we know how many are potentially running a version that has the bug? The workarounds are not optimal either of course, but at least there is one ~are two relatively easy ones~:

    1. Do the IBD from scratch using the latest version ~2. Run a truncate bash script (https://github.com/bitcoin/bitcoin/issues/17827#issuecomment-570125891)~
  42. eriknylund commented at 6:25 pm on January 23, 2020: contributor
    @kallewoof @anton48 I can also repeat the full IBD test running Bitcoin-Qt on the Mac Mini like I did for #17887, it will take about 3 days to complete.
  43. kallewoof commented at 1:36 am on January 24, 2020: member
    @eriknylund Unfortunately that bash script only addresses #17887, not #17892. @Sjors It’s not technically temporary until someone writes code that does a one-time sweep of people’s rev files and backport this & that, but since we only see ~1 rev file a day, people’s drives shouldn’t suffer more than a few hundred MBs before a patch is in place, hopefully.
  44. kallewoof commented at 5:51 am on January 24, 2020: member
    I am working on an improved version of this, based on #17892 (comment)
  45. eriknylund commented at 6:47 am on January 24, 2020: contributor

    @eriknylund Unfortunately that bash script only addresses #17887, not #17892.

    @Sjors It’s not technically temporary until someone writes code that does a one-time sweep of people’s rev files and backport this & that, but since we only see ~1 rev file a day, people’s drives shouldn’t suffer more than a few hundred MBs before a patch is in place, hopefully. @kallewoof Ah you’re absolutely correct. Thanks for pointing that out.

  46. anton48 commented at 8:40 am on January 24, 2020: none

    @eriknylund Unfortunately that bash script only addresses #17887, not #17892. @Sjors It’s not technically temporary until someone writes code that does a one-time sweep of people’s rev files and backport this & that, but since we only see ~1 rev file a day, people’s drives shouldn’t suffer more than a few hundred MBs before a patch is in place, hopefully.

    @kallewoof Ah you’re absolutely correct. Thanks for pointing that out.

    however, in my case the script made some truncations at yesterday test directory (which is on AFPS volume):

     0Antons-Mac-mini:blocks a48$ /Users/a48/slim.sh
     1./rev00000.dat 19M  ->  19M
     2[..]
     3./rev00013.dat 19M  ->  18M
     4[..]
     5./rev00047.dat 18M  ->  17M
     6[..]
     7./rev00053.dat 18M  ->  17M
     8[..]
     9./rev00063.dat 18M  ->  17M
    10./rev00064.dat 18M  ->  17M
    11[..]
    12./rev00070.dat 18M  ->  17M
    13[..]
    14./rev00073.dat 19M  ->  18M
    15[..]
    16./rev00075.dat 19M  ->  18M
    17[..]
    18./rev00083.dat 19M  ->  18M
    19[..]
    20./rev00086.dat 19M  ->  18M
    21./rev00087.dat 19M  ->  18M
    22[..]
    23./rev00101.dat 19M  ->  18M
    24[..]
    25./rev00104.dat 18M  ->  18M
    
  47. kallewoof commented at 11:41 am on January 26, 2020: member
    I did some number crunching and concluded that a backport is probably not necessary. I am also closing this PR in favor of #17994. Will re-open if people object/prefer this one.
  48. kallewoof closed this on Jan 26, 2020

  49. fanquake removed the label Needs backport (0.19) on Jan 26, 2020
  50. kallewoof deleted the branch on Jan 26, 2020
  51. laanwj removed this from the "Bugfixes" column in a project

  52. laanwj referenced this in commit 011fe009f9 on Jun 4, 2020
  53. sidhujag referenced this in commit 3f5d6102a6 on Jun 4, 2020
  54. PastaPastaPasta referenced this in commit d4a9c7d6d4 on Sep 17, 2021
  55. PastaPastaPasta referenced this in commit 480fc90ef7 on Sep 19, 2021
  56. thelazier referenced this in commit f2f86ccf33 on Sep 25, 2021
  57. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me