validation: flush undo files after last block write #17994

pull kallewoof wants to merge 1 commits into bitcoin:master from kallewoof:200124-rev-files changing 1 files +27 −9
  1. kallewoof commented at 6:04 am on January 24, 2020: member

    Fixes #17890. Replaces #17892.

    Data files ({blk|rev}<number>.dat) pre-allocate space as they are written, and then trims down to the final size once they move on to the next sequence (“finalized flush”). The code currently assumes (incorrectly) that blk and rev files finish at the same time, but because blk files are written as blocks come in, and rev files are written in block height order, rev files end up being written to for awhile after moving on to the next block file, resulting in pre-allocation and waste of up to 1 MB of space per rev file.

    The exact point at which rev file writing finishes is the highest height block found inside the corresponding block file, which is already available in the CBlockFileInfo vector. This PR moves finalized flushing of undo files to to directly after the undo data for the previous block file has been written.

    There is a branch with annotation that demonstrates how this is handling flushing here: https://github.com/kallewoof/bitcoin/tree/200124-rev-files-annotated

  2. fanquake added the label Validation on Jan 24, 2020
  3. in src/validation.cpp:1766 in 48081229c1 outdated
    1760@@ -1757,6 +1761,12 @@ static bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationSt
    1761             return error("ConnectBlock(): FindUndoPos failed");
    1762         if (!UndoWriteToDisk(blockundo, _pos, pindex->pprev->GetBlockHash(), chainparams.MessageStart()))
    1763             return AbortNode(state, "Failed to write undo data");
    1764+        // rev files are written in block height order, whereas blk files are written as blocks come in (often out of order)
    1765+        // we want to flush the rev (undo) file once we've written the last block, which is indicated by the last height
    


    Sjors commented at 9:05 am on January 24, 2020:
    The last height, or the maximum height?

  4. Sjors commented at 9:06 am on January 24, 2020: member
    Concept ACK. This seems more elegant than #17892, even though it involves more code.
  5. DrahtBot commented at 10:03 am on January 24, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  6. kallewoof commented at 11:38 am on January 26, 2020: member

    Synced two datadirs (master and this branch) to block 484516, which resulted in 995 rev files, master using 38277312 bytes, and this branch 35573680 bytes. The average wasted space per rev file comes out at 2717 bytes (~3kb), which is quite a bit lower than my assumptions (if it was completely random, it should be ~500kb per rev file).

    Copying blocks/rev* into a separate dir reveals a few kb lower for this branch – 35570968 – caused by the latest rev file having some pre-allocation. The same for the master datadir results in 36547408 bytes.

    Not sure why it’s not higher, but still. I don’t think a backport is even necessary, but fixing this bug and adding a one time sweep/finalize of all rev files is something that we should do.

  7. laanwj added the label Bug on Jan 29, 2020
  8. laanwj added this to the "Bugfixes" column in a project

  9. fjahr commented at 4:54 pm on February 17, 2020: member

    utACK 48081229c10969ac5993c826c8bd7687cd0503e5

    Agree, looks cleaner than #17892!

  10. in src/validation.cpp:1750 in 48081229c1 outdated
    1753-    if (!status) {
    1754+    if (!BlockFileSeq().Flush(block_pos_old, fFinalize)) {
    1755         AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error.");
    1756     }
    1757+    // the undo file is actually finalized at a later stage, but we do want to flush it along with the block file
    1758+    if (!fFinalize) FlushUndoFile(nLastBlockFile);
    


    vasild commented at 8:19 pm on March 11, 2020:
    We want to flush the rev file along with the block file, but we are not going to do that if fFinalize is true. What about calling FlushUndoFile(nLastBlockFile) unconditionally here?

    kallewoof commented at 6:04 am on March 12, 2020:
    There’s basically no reason to do a non-final flush of the undo file, except on shutdown (which happens now). This is the same for the block file (since the case you are arguing is for a finalized flush).

    vasild commented at 8:01 am on March 12, 2020:
    Ok, then this turns into a nit, feel free to ignore: maybe update the comment to be in line with the code: “we do want to flush it along with the block file in non-finalizing flushes”.

    kallewoof commented at 2:21 pm on March 19, 2020:
    I didn’t see this comment until now. I’ll update the code in the next push.
  11. in src/validation.cpp:1771 in 48081229c1 outdated
    1760@@ -1757,6 +1761,12 @@ static bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationSt
    1761             return error("ConnectBlock(): FindUndoPos failed");
    1762         if (!UndoWriteToDisk(blockundo, _pos, pindex->pprev->GetBlockHash(), chainparams.MessageStart()))
    1763             return AbortNode(state, "Failed to write undo data");
    1764+        // rev files are written in block height order, whereas blk files are written as blocks come in (often out of order)
    1765+        // we want to flush the rev (undo) file once we've written the last block, which is indicated by the last height
    1766+        // in the block file info as below
    1767+        if (_pos.nFile < nLastBlockFile && static_cast<uint32_t>(pindex->nHeight) == vinfoBlockFile[_pos.nFile].nHeightLast) {
    1768+            FlushUndoFile(_pos.nFile, true);
    


    vasild commented at 8:29 pm on March 11, 2020:
    Maybe I am missing something, but it looks to me that if blocks are being downloaded in-order, then the _pos.nFile < nLastBlockFile condition will never be true because those will always be equal?

    kallewoof commented at 6:04 am on March 12, 2020:
    I am pretty sure I tested this but I can’t convince myself that you’re wrong by looking at the code so I will test this again.

    vasild commented at 8:07 am on March 12, 2020:
    The way I see it is that _pos.nFile < nLastBlockFile && can (should) be dropped and the second condition suffices - if we are writing the last (highest) block in the rev file, then also finalize it because we are not going to touch it again.

    kallewoof commented at 5:23 am on March 14, 2020:
    Sorry for tardiness, work kept me busy, but I managed to verify that this is not working as it should. I will fix soon.

    vasild commented at 9:11 am on March 14, 2020:

    No worries. I started at this problem long enough and finally it cracked.

    Indeed _pos.nFile < nLastBlockFile is never true if blocks come in-height-order and this needs to be adjusted. However, just removing that condition, as I suggest above, is not ok because then the second condition will be true for every in-height-order write and this will cause excessive flushing and finalizing. See the proposed patch for a fully working solution (I hope).


    kallewoof commented at 10:09 am on March 19, 2020:
    I’ve got something but testing takes time (I don’t wanna have to wait days each time so I’m juggling 300 GB full node dirs around).
  12. vasild commented at 9:07 am on March 14, 2020: member

    I reproduced the bug in a controlled, deterministic environment with a unit test + tweaks. The current patch in this PR (4808122) indeed fixes it, but it introduces another problem - it would wrongly skip finalizing of the undo files in case the blocks are downloaded in-height-order.

    There are two cases when we should finalize an undo file (IFF):

    • we have just moved to a subsequent one (like with block files) - “normal” in-height-order scenario or
    • we are coming back to a lower undo file from a higher undo file to complete an out-of-order block’s undo and this undo slot is the last/final in the lower undo file.

    The following patch (on top of master @ 9cc7eba1b) fixes it:

      0--- a/src/validation.cpp
      1+++ b/src/validation.cpp
      2@@ -131,13 +131,39 @@ CTxMemPool mempool(&feeEstimator);
      3 // Internal stuff
      4 namespace {
      5     CBlockIndex* pindexBestInvalid = nullptr;
      6 
      7     RecursiveMutex cs_LastBlockFile;
      8     std::vector<CBlockFileInfo> vinfoBlockFile;
      9+
     10+    /** Last blk*.dat file that has been written to.
     11+     * Used to detect when we move from blkN to blkN+1 so we can trigger a
     12+     * finalization of blkN.
     13+     * Blocks are appended to blk files in the order they are received from
     14+     * the network (which may not be height order). When a blk file is about
     15+     * to become larger than MAX_BLOCKFILE_SIZE then we move to the next blk
     16+     * file and finalize the one we just left.
     17+     */
     18     int nLastBlockFile = 0;
     19+
     20+    /** Highest rev*.dat file that has been written to.
     21+     * Used to detect the last write to revN so it can be finalized.
     22+     * Undo files don't have a max size limit. For every block that resides in
     23+     * blkN, its corresponding undo resides in revN.
     24+     * However, the undo for a given block is only appended to the rev file
     25+     * after all blocks at a lower height have been stored together with their
     26+     * undos. It follows that block and undo data may be in different order
     27+     * in the corresponding blk and rev files, for example:
     28+     * blkN: block5, block7, block6
     29+     * revN: block5, block6, block7
     30+     * if block7 was received before block6.
     31+     * Also we may write to revN after writing to revM, for N < M which never
     32+     * happens with blk files.
     33+     */
     34+    int nHighestUndoFile = 0;
     35+
     36     /** Global flag to indicate we should check to see if there are
     37      *  block/undo files that should be deleted.  Set on startup
     38      *  or if we allocate more file space when we're in prune mode
     39      */
     40     bool fCheckForPruning = false;
     41 
     42@@ -1733,23 +1759,25 @@ DisconnectResult CChainState::DisconnectBlock(const CBlock& block, const CBlockI
     43     // move best block pointer to prevout block
     44     view.SetBestBlock(pindex->pprev->GetBlockHash());
     45 
     46     return fClean ? DISCONNECT_OK : DISCONNECT_UNCLEAN;
     47 }
     48 
     49-void static FlushBlockFile(bool fFinalize = false)
     50+static void FlushUndoFile(int undo_file, bool finalize = false)
     51 {
     52-    LOCK(cs_LastBlockFile);
     53-
     54-    FlatFilePos block_pos_old(nLastBlockFile, vinfoBlockFile[nLastBlockFile].nSize);
     55-    FlatFilePos undo_pos_old(nLastBlockFile, vinfoBlockFile[nLastBlockFile].nUndoSize);
     56+    FlatFilePos pos(undo_file, vinfoBlockFile[undo_file].nUndoSize);
     57+    if (!UndoFileSeq().Flush(pos, finalize)) {
     58+        AbortNode("Flushing undo file to disk failed. This is likely the result of an I/O error.");
     59+    }
     60+}
     61 
     62-    bool status = true;
     63-    status &= BlockFileSeq().Flush(block_pos_old, fFinalize);
     64-    status &= UndoFileSeq().Flush(undo_pos_old, fFinalize);
     65-    if (!status) {
     66+static void FlushBlockFile(bool fFinalize = false)
     67+{
     68+    LOCK(cs_LastBlockFile);
     69+    FlatFilePos pos(nLastBlockFile, vinfoBlockFile[nLastBlockFile].nSize);
     70+    if (!BlockFileSeq().Flush(pos, fFinalize)) {
     71         AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error.");
     72     }
     73 }
     74 
     75 static bool FindUndoPos(BlockValidationState &state, int nFile, FlatFilePos &pos, unsigned int nAddSize);
     76 
     77@@ -1760,12 +1788,37 @@ static bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationSt
     78         FlatFilePos _pos;
     79         if (!FindUndoPos(state, pindex->nFile, _pos, ::GetSerializeSize(blockundo, CLIENT_VERSION) + 40))
     80             return error("ConnectBlock(): FindUndoPos failed");
     81         if (!UndoWriteToDisk(blockundo, _pos, pindex->pprev->GetBlockHash(), chainparams.MessageStart()))
     82             return AbortNode(state, "Failed to write undo data");
     83 
     84+        const unsigned thisHeight = static_cast<unsigned>(pindex->nHeight);
     85+
     86+        // Flush and finalize the preceding undo file in the case when blocks
     87+        // come in height order and we have moved to a subsequent undo file.
     88+        const bool movedToSubsequent = _pos.nFile > nHighestUndoFile;
     89+        const bool adjacentHeights =
     90+            thisHeight == vinfoBlockFile[nHighestUndoFile].nHeightLast + 1;
     91+
     92+        if (movedToSubsequent && adjacentHeights) {
     93+            FlushUndoFile(nHighestUndoFile, true);
     94+        }
     95+
     96+        // Flush and finalize the old undo file if we went back to write
     97+        // to it and we wrote its last entry.
     98+        const bool wroteToOldFile = _pos.nFile < nHighestUndoFile;
     99+        const bool isLastEntry = thisHeight == vinfoBlockFile[_pos.nFile].nHeightLast;
    100+
    101+        if (wroteToOldFile && isLastEntry) {
    102+            FlushUndoFile(_pos.nFile, true);
    103+        }
    104+
    105+        if (movedToSubsequent) {
    106+            nHighestUndoFile = _pos.nFile;
    107+        }
    108+
    109         // update nUndoPos in block index
    110         pindex->nUndoPos = _pos.nPos;
    111         pindex->nStatus |= BLOCK_HAVE_UNDO;
    112         setDirtyBlockIndex.insert(pindex);
    113     }
    114 
    115@@ -2291,12 +2344,13 @@ bool CChainState::FlushStateToDisk(
    116             }
    117             {
    118                 LOG_TIME_MILLIS("write block and undo data to disk", BCLog::BENCH);
    119 
    120                 // First make sure all block and undo data is flushed to disk.
    121                 FlushBlockFile();
    122+                FlushUndoFile(nHighestUndoFile);
    123             }
    124 
    125             // Then update all block file information (which may refer to block and undo files).
    126             {
    127                 LOG_TIME_MILLIS("write block index to disk", BCLog::BENCH);
    128 
    

    Feel free to pick it in its entirety, just pieces of it, just the idea or propose another approach. I will test it.

    Annotated debug output with the bug present and with the bug fixed (by the above patch).

    Run the unit test like ./src/test/test_bitcoin --run_test="*/download_blocks_out_of_order" --log_level=all -x 0 > /tmp/log Filter its output: grep -w BBB /tmp/log |sed -E -e 's/^2020[^ ]+ //' -e 's/BBB //'

    I hope this helps!

  13. kallewoof commented at 2:01 pm on March 19, 2020: member

    @vasild Tremendous work. Thank you very much. I will look closer at your patch, and also evaluate my solution over the next few days to see if it does what I want. (And compare it to yours.)

    My solution was to patch this in two places; one for the case “rev file lags behind”, and one for “rev file is up to date”. This was very little code (note that the last commit is a NOMERGE to demonstrate the fix).

    I see this, for a node that was a week or so behind, if I grep -v ‘UpdateTip’:

     02020-03-19T12:34:31Z Pre-allocating up to position 0x7000000 in blk02004.dat
     12020-03-19T12:34:32Z Pre-allocating up to position 0x8000000 in blk02004.dat
     22020-03-19T12:34:33Z iterating nFile:
     3    last height = 622161;
     4    chain tip height = 621899
     5    -> DO NOT FLUSH UNDO FILE
     62020-03-19T12:34:33Z Leaving block file 2004: CBlockFileInfo(blocks=112, size=133860058, heights=621899...622161, time=2020-03-16...2020-03-19)
     72020-03-19T12:34:34Z unflushed undo: [2003 2004]
     82020-03-19T12:34:34Z Pre-allocating up to position 0x1000000 in blk02005.dat
     9...
    102020-03-19T12:34:38Z Pre-allocating up to position 0x5000000 in blk02005.dat
    112020-03-19T12:34:43Z Pre-allocating up to position 0x100000 in rev02005.dat
    12...
    132020-03-19T12:44:01Z Pre-allocating up to position 0x400000 in rev02005.dat
    142020-03-19T12:44:02Z Pre-allocating up to position 0x600000 in rev02003.dat
    15...
    162020-03-19T12:44:05Z Pre-allocating up to position 0xb00000 in rev02003.dat
    172020-03-19T12:44:06Z Pre-allocating up to position 0x500000 in rev02005.dat
    182020-03-19T12:44:06Z Pre-allocating up to position 0x200000 in rev02004.dat
    192020-03-19T12:44:06Z Pre-allocating up to position 0xc00000 in rev02003.dat
    202020-03-19T12:44:07Z Pre-allocating up to position 0xd00000 in rev02003.dat
    212020-03-19T12:44:08Z Pre-allocating up to position 0xe00000 in rev02003.dat
    222020-03-19T12:44:09Z Pre-allocating up to position 0x300000 in rev02004.dat
    232020-03-19T12:44:09Z Pre-allocating up to position 0x600000 in rev02005.dat
    242020-03-19T12:44:10Z Pre-allocating up to position 0xf00000 in rev02003.dat
    252020-03-19T12:44:11Z Pre-allocating up to position 0x400000 in rev02004.dat
    262020-03-19T12:44:12Z Pre-allocating up to position 0x500000 in rev02004.dat
    272020-03-19T12:44:12Z Pre-allocating up to position 0x1000000 in rev02003.dat
    282020-03-19T12:44:13Z Pre-allocating up to position 0x1100000 in rev02003.dat
    292020-03-19T12:44:14Z *** FLUSH UNDO FILE 2003 ***
    302020-03-19T12:44:14Z finalize undo 2003
    312020-03-19T12:44:14Z unflushed undo: [2004 2005]
    322020-03-19T12:44:14Z Pre-allocating up to position 0x600000 in rev02004.dat
    33...
    342020-03-19T12:44:16Z Pre-allocating up to position 0x900000 in rev02004.dat
    352020-03-19T12:44:16Z Leaving InitialBlockDownload (latching to false)
    362020-03-19T12:44:18Z Pre-allocating up to position 0x700000 in rev02005.dat
    372020-03-19T12:44:19Z Pre-allocating up to position 0xa00000 in rev02004.dat
    38...
    392020-03-19T12:44:22Z Pre-allocating up to position 0xe00000 in rev02004.dat
    402020-03-19T12:44:22Z Pre-allocating up to position 0x800000 in rev02005.dat
    412020-03-19T12:44:23Z Pre-allocating up to position 0x900000 in rev02005.dat
    422020-03-19T12:44:23Z Pre-allocating up to position 0xf00000 in rev02004.dat
    432020-03-19T12:44:24Z Pre-allocating up to position 0x1000000 in rev02004.dat
    442020-03-19T12:44:25Z Pre-allocating up to position 0xa00000 in rev02005.dat
    452020-03-19T12:44:26Z Pre-allocating up to position 0x1100000 in rev02004.dat
    462020-03-19T12:44:26Z Pre-allocating up to position 0xb00000 in rev02005.dat
    472020-03-19T12:44:27Z *** FLUSH UNDO FILE 2004 ***
    482020-03-19T12:44:27Z finalize undo 2004
    492020-03-19T12:44:27Z unflushed undo: [2005]
    502020-03-19T12:44:27Z Pre-allocating up to position 0xc00000 in rev02005.dat
    51...
    522020-03-19T13:09:43Z Pre-allocating up to position 0x1100000 in rev02005.dat
    532020-03-19T13:32:41Z unflushed undo: [2005]
    542020-03-19T13:53:07Z iterating nFile:
    55    last height = 622198;
    56    chain tip height = 622198
    57    -> FLUSH UNDO FILE
    582020-03-19T13:53:07Z finalize undo 2005
    592020-03-19T13:53:07Z unflushed undo: []
    602020-03-19T13:53:07Z Leaving block file 2005: CBlockFileInfo(blocks=104, size=133278069, heights=621900...622198, time=2020-03-16...2020-03-19)
    612020-03-19T13:53:07Z unflushed undo: []
    622020-03-19T13:53:07Z Pre-allocating up to position 0x1000000 in blk02006.dat
    632020-03-19T13:53:07Z Pre-allocating up to position 0x100000 in rev02006.dat
    642020-03-19T14:32:42Z unflushed undo: [2006]
    652020-03-19T15:35:22Z Pre-allocating up to position 0x200000 in rev02006.dat
    66..
    672020-03-19T16:35:07Z Pre-allocating up to position 0x2000000 in blk02006.dat
    68..
    692020-03-20T12:12:27Z Pre-allocating up to position 0x8000000 in blk02006.dat
    702020-03-20T13:32:23Z Pre-allocating up to position 0x1100000 in rev02006.dat
    712020-03-20T15:01:34Z iterating nFile:
    72    last height = 622304;
    73    chain tip height = 622304
    74    -> FLUSH UNDO FILE
    752020-03-20T15:01:34Z finalize undo 2006
    762020-03-20T15:01:34Z unflushed undo: []
    772020-03-20T15:01:34Z Leaving block file 2006: CBlockFileInfo(blocks=106, size=133794291, heights=622199...622304, time=2020-03-19...2020-03-20)
    782020-03-20T15:01:34Z unflushed undo: []
    792020-03-20T15:01:34Z Pre-allocating up to position 0x1000000 in blk02007.dat
    802020-03-20T15:01:34Z Pre-allocating up to position 0x100000 in rev02007.dat
    81..
    822020-03-21T12:09:59Z Pre-allocating up to position 0x8000000 in blk02007.dat
    832020-03-21T13:13:00Z Pre-allocating up to position 0x1100000 in rev02007.dat
    842020-03-21T14:46:12Z iterating nFile:
    85    last height = 622417;
    86    chain tip height = 622417
    87    -> FLUSH UNDO FILE
    882020-03-21T14:46:12Z finalize undo 2007
    892020-03-21T14:46:12Z unflushed undo: []
    902020-03-21T14:46:12Z Leaving block file 2007: CBlockFileInfo(blocks=113, size=133506562, heights=622305...622417, time=2020-03-20...2020-03-21)
    912020-03-21T14:46:12Z unflushed undo: []
    922020-03-21T14:46:12Z Pre-allocating up to position 0x1000000 in blk02008.dat
    932020-03-21T14:46:12Z Pre-allocating up to position 0x100000 in rev02008.dat
    942020-03-21T15:32:45Z unflushed undo: [2008]
    95..
    

    (Edit: stripped out entries pre-allocating up to the next position in the same file as the previous and next line.)

    Last week worth (trimmed down to bare essentials, i.e. no duplicate pre-allocs for the same file & file #, etc):

     02020-03-23T13:48:11Z Bitcoin Core version v0.19.99.0-84ed7228b-dirty (release build)
     12020-03-23T13:50:37Z Pre-allocating up to position 0xc00000 in rev02009.dat
     22020-03-23T14:48:19Z unflushed undo: [2009]
     32020-03-23T15:14:29Z Pre-allocating up to position 0x7000000 in blk02009.dat
     42020-03-23T21:18:48Z iterating nFile: last height = 622686; chain tip height = 622686 -> FLUSH UNDO FILE
     52020-03-23T21:18:48Z Leaving block file 2009: CBlockFileInfo(blocks=123, size=133738361, heights=622564...622686, time=2020-03-22...2020-03-23)
     62020-03-23T21:18:48Z finalize undo 2009
     72020-03-23T21:18:48Z unflushed undo: []
     82020-03-23T21:18:48Z Pre-allocating up to position 0x1000000 in blk02010.dat
     92020-03-23T21:18:49Z Pre-allocating up to position 0x100000 in rev02010.dat
    102020-03-23T21:48:26Z unflushed undo: [2010]
    112020-03-24T17:37:20Z iterating nFile: last height = 622800; chain tip height = 622800 -> FLUSH UNDO FILE
    122020-03-24T17:37:20Z Leaving block file 2010: CBlockFileInfo(blocks=114, size=133105740, heights=622687...622800, time=2020-03-23...2020-03-24)
    132020-03-24T17:37:20Z finalize undo 2010
    142020-03-24T17:37:20Z unflushed undo: []
    152020-03-24T17:37:20Z Pre-allocating up to position 0x1000000 in blk02011.dat
    162020-03-24T17:37:20Z Pre-allocating up to position 0x100000 in rev02011.dat
    172020-03-24T18:48:22Z unflushed undo: [2011]
    182020-03-25T22:07:57Z iterating nFile: last height = 622918; chain tip height = 622918 -> FLUSH UNDO FILE
    192020-03-25T22:07:57Z Leaving block file 2011: CBlockFileInfo(blocks=118, size=133871217, heights=622801...622918, time=2020-03-24...2020-03-25)
    202020-03-25T22:07:57Z finalize undo 2011
    212020-03-25T22:07:57Z unflushed undo: []
    222020-03-25T22:07:57Z Pre-allocating up to position 0x1000000 in blk02012.dat
    232020-03-25T22:07:57Z Pre-allocating up to position 0x100000 in rev02012.dat
    242020-03-25T22:48:29Z unflushed undo: [2012]
    252020-03-26T16:59:04Z iterating nFile: last height = 623044; chain tip height = 623044 -> FLUSH UNDO FILE
    262020-03-26T16:59:04Z Leaving block file 2012: CBlockFileInfo(blocks=126, size=133001579, heights=622919...623044, time=2020-03-25...2020-03-26)
    272020-03-26T16:59:04Z finalize undo 2012
    282020-03-26T16:59:05Z unflushed undo: []
    292020-03-26T16:59:05Z Pre-allocating up to position 0x1000000 in blk02013.dat
    302020-03-26T16:59:05Z Pre-allocating up to position 0x100000 in rev02013.dat
    312020-03-26T17:48:23Z unflushed undo: [2013]
    322020-03-27T08:53:09Z iterating nFile: last height = 623152; chain tip height = 623152 -> FLUSH UNDO FILE
    332020-03-27T08:53:09Z Leaving block file 2013: CBlockFileInfo(blocks=108, size=134054702, heights=623045...623152, time=2020-03-26...2020-03-27)
    342020-03-27T08:53:09Z finalize undo 2013
    352020-03-27T08:53:09Z unflushed undo: []
    362020-03-27T08:53:09Z Pre-allocating up to position 0x1000000 in blk02014.dat
    372020-03-27T08:53:09Z Pre-allocating up to position 0x100000 in rev02014.dat
    382020-03-27T09:48:41Z unflushed undo: [2014]
    392020-03-28T08:17:02Z iterating nFile: last height = 623299; chain tip height = 623299 -> FLUSH UNDO FILE
    402020-03-28T08:17:02Z Leaving block file 2014: CBlockFileInfo(blocks=147, size=133428978, heights=623153...623299, time=2020-03-27...2020-03-28)
    412020-03-28T08:17:02Z finalize undo 2014
    422020-03-28T08:17:02Z unflushed undo: []
    432020-03-28T08:17:02Z Pre-allocating up to position 0x1000000 in blk02015.dat
    442020-03-28T08:17:02Z Pre-allocating up to position 0x100000 in rev02015.dat
    452020-03-28T08:48:34Z unflushed undo: [2015]
    462020-03-29T10:30:39Z iterating nFile: last height = 623444; chain tip height = 623444 -> FLUSH UNDO FILE
    472020-03-29T10:30:39Z Leaving block file 2015: CBlockFileInfo(blocks=145, size=133249498, heights=623300...623444, time=2020-03-28...2020-03-29)
    482020-03-29T10:30:39Z finalize undo 2015
    492020-03-29T10:30:39Z unflushed undo: []
    502020-03-29T10:30:39Z Pre-allocating up to position 0x1000000 in blk02016.dat
    512020-03-29T10:30:39Z Pre-allocating up to position 0x100000 in rev02016.dat
    522020-03-29T10:48:46Z unflushed undo: [2016]
    
  14. in src/validation.cpp:3241 in c4400e1b8a outdated
    3259+            LogPrintf("iterating nFile: last height = %d; chain tip height = %d -> %s\n",
    3260+                vinfoBlockFile[nFile].nHeightLast,
    3261+                ChainActive().Tip()->nHeight,
    3262+                vinfoBlockFile[nFile].nHeightLast == ChainActive().Tip()->nHeight ? "FLUSH UNDO FILE" : "DO NOT FLUSH UNDO FILE");
    3263+            if (vinfoBlockFile[nFile].nHeightLast == ChainActive().Tip()->nHeight) {
    3264+                FlushUndoFile(nFile, true);
    


    vasild commented at 8:29 pm on March 19, 2020:

    This will flush the undo file before the corresponding block file which is flushed a few lines below. Also visible in the log you pasted where finalize undo 2005 appears before Leaving block file 2005.

    I am not sure but I think we want to always flush the block file first and the undo file afterwards.


    kallewoof commented at 5:40 am on March 20, 2020:

    I think the way it works now is fine; the “unflushed undo files” output is consistently showing the current opened one only, so it indicates the same.

    Let me go through the steps:

    • Sync state: HEIGHT = 100, BLK = 100, UNDO = 100, block file 0
    • Block 101 comes in (we assume it does not fit into the current block file, so we will move to block file 1)
    • First we want to store it to disk:

    https://github.com/bitcoin/bitcoin/blob/c4400e1b8a62269d04b57a46d68941fdbaae4303/src/validation.cpp#L3731-L3736

    • We go find a position and end up in the next block file (1), at which point we flush the undo file (0):

    https://github.com/bitcoin/bitcoin/blob/c4400e1b8a62269d04b57a46d68941fdbaae4303/src/validation.cpp#L3254-L3264

    • Sync state: HEIGHT = 100, BLK = 101, UNDO = 100, block file 1
    • We now want to accept block 101, and at this point we write the undo data (for block 101)

    https://github.com/bitcoin/bitcoin/blob/c4400e1b8a62269d04b57a46d68941fdbaae4303/src/validation.cpp#L1773-L1787

    • The condition does not trigger here, because we are synced up (as you pointed out), but we end up writing into undo file 1, and undo file 0 was finalized as it should.
    • Sync state: HEIGHT = 101, BLK = 101, UNDO = 101, block file 1

    Let me know if I got something wrong in the above (or if there are any other holes in the patch). I will update the log above once it moves to block file 2007.


    vasild commented at 1:28 pm on March 20, 2020:

    Yes, the above is correct (or at least coincides with my understanding for this elusive code!) - I think the code will do what you describe and that it is the correct behavior wrt finalizing the undo.

    However, one piece is missing from the above - when do we flush block file 0?

    We go find a position and end up in the next block file (1), at which point we flush the undo file (0)

    … and after this we flush the blk file 0.

    Just to be explicit, something like this may be warranted, delay the flush of the undo until after we have flushed the block file:

     0+    int thisUndoNeedsFlushingAndFinalizing = -1;
     1+
     2     if (!fKnown) {
     3         while (vinfoBlockFile[nFile].nSize + nAddSize >= MAX_BLOCKFILE_SIZE) {
     4             // when the undo file is keeping up with the block file, we flush it explicitly, here
     5             // when it is lagging behind (more blocks arrive than are being connected), we let the
     6             // undo block write case handle it
     7             LogPrintf("iterating nFile: last height = %d; chain tip height = %d -> %s\n",
     8                 vinfoBlockFile[nFile].nHeightLast,
     9                 ChainActive().Tip()->nHeight,
    10                 vinfoBlockFile[nFile].nHeightLast == ChainActive().Tip()->nHeight ? "FLUSH UNDO FILE" : "DO NOT FLUSH UNDO FILE");
    11             if (vinfoBlockFile[nFile].nHeightLast == ChainActive().Tip()->nHeight) {
    12-                FlushUndoFile(nFile, true);
    13+                thisUndoNeedsFlushingAndFinalizing = nFile;
    14             }
    15             nFile++;
    16             if (vinfoBlockFile.size() <= nFile) {
    17                 vinfoBlockFile.resize(nFile + 1);
    18             }
    19         }
    20         pos.nFile = nFile;
    21         pos.nPos = vinfoBlockFile[nFile].nSize;
    22     }
    23 
    24     if ((int)nFile != nLastBlockFile) {
    25         if (!fKnown) {
    26             LogPrintf("Leaving block file %i: %s\n", nLastBlockFile, vinfoBlockFile[nLastBlockFile].ToString());
    27         }
    28+        // Order is important: first FlushBlockFile(), then FlushUndoFile().
    29+        // FlushBlockFile() will flush internally nLastBlockFile and at this
    30+        // point I believe thisUndoNeedsFlushingAndFinalizing == nLastBlockFile
    31+        // so we will flush corresponding blk and undo files.
    32         FlushBlockFile(!fKnown);
    33+        if (thisUndoNeedsFlushingAndFinalizing != -1) {
    34+            FlushUndoFile(thisUndoNeedsFlushingAndFinalizing, true);
    35+        }
    36         nLastBlockFile = nFile;
    37     }
    

    kallewoof commented at 4:17 am on March 21, 2020:

    Oh wait, I see what you mean – you’re concerned about the non-finalized undo flush happening after the finalized undo flush (called from within the finalized block flush), right?

    I don’t think this it’s a problem to flush twice. The only difference between a finalized=true flush and a finalized=false flush is L87 in

    https://github.com/bitcoin/bitcoin/blob/a3abe25f720104e4abf0d3f01f0b847929167b8b/src/flatfile.cpp#L81-L98

    which means the non-finalized flush is a NOP, I think.


    vasild commented at 1:32 pm on March 23, 2020:

    No, sorry, I mentioned my concern (order of flushing of blk and undo files), but did not mention why is that a concern. Previously the code always flushed blkN before flushing revN (finalizing is not relevant for this):

    https://github.com/bitcoin/bitcoin/blob/97b0687501cee77a9170f9e288755a5d268e9bd4/src/validation.cpp#L1747-L1748

    In case of a crash (or power failure) between the two, we would end up with a block data on disk without a corresponding undo data. I presume this is ok in the current code.

    Now, with the change in this PR we would flush revN first and blkN afterwards. This could potentially leave us with an undo data for some block(s) without the corresponding block data. There is at least this code which would get upset by that:

    https://github.com/bitcoin/bitcoin/blob/97b0687501cee77a9170f9e288755a5d268e9bd4/src/validation.cpp#L4810

    Now, I am not 100% sure if that is a real concern or not, but is relatively easy to tweak in the current patch so that it keeps flushing first blk and then undo (as before the patch).

  15. in src/validation.cpp:3238 in c4400e1b8a outdated
    3252@@ -3222,6 +3253,16 @@ static bool FindBlockPos(FlatFilePos &pos, unsigned int nAddSize, unsigned int n
    3253 
    3254     if (!fKnown) {
    3255         while (vinfoBlockFile[nFile].nSize + nAddSize >= MAX_BLOCKFILE_SIZE) {
    


    vasild commented at 8:34 pm on March 19, 2020:

    I find this while extremely confusing - it looks to me that it will never iterate more than once. We need a block size of more than 128MB (blk file size limit) to get this execute 2 or more times, so that one block spans multiple blk files (!?).

    Maybe out of the scope of this PR, but I think we should s/while/if/. Is my understanding correct? If not then we will end up finalizing an empty undo file with this patch on the second and subsequent iterations.


    kallewoof commented at 5:47 am on March 20, 2020:
    I dug into this. Apparently, the system is supposed to support multiple files per block. This was a change made 8 years ago in 5382bcf8cd23c36a435c29080770a79b5e28af42 by @sipa. It doesn’t look like it is actually working, though, but I could be wrong.

    sipa commented at 5:56 am on March 20, 2020:
    @kallewoof No, that commit adds support for multiple blocks per file, which is definitely used. There has never been support for multiple files per block.

    kallewoof commented at 6:20 am on March 20, 2020:
    Oh, yeah you’re right. But this while loop makes no sense to me still.

    sipa commented at 6:36 am on March 20, 2020:
    I think it’s just iterating to find the first block file with space for the new block.

    kallewoof commented at 6:53 am on March 20, 2020:

    Okay, so assuming we one day decided to stop writing block files sequentially, we may have a situation where we try to find block pos for nFile = 2007 where the last block file in existence is actually nFile = 2010 or something >2008, in which case this while loop becomes necessary.

    I don’t really see that ever happening though, and it feels like someone mixed up “block file writing order” (sequentially; we never write into older block files or into the middle of a file in between other stuff) with “connect order” (non-sequentially, as blocks are not stored in height order).


    vasild commented at 1:57 pm on March 20, 2020:

    Ok, for the purposes of this PR we just need to agree that this while loop will never iterate more than once (if it is possible that it iterates 2 or more times then I am wrong and this PR will have to be adjusted).

    Observation: before the loop condition is evaluated for the first time nFile equals nLastBlockFile. I.e. we start the search at the end.

    So two things can happen:

    1. We find space in the last file - the loop condition evaluates to false and the loop iterates 0 times
    2. There is no space in the last file, the loop condition evaluates to true and the loop body “creates” a new empty block file (not yet on disk, but this is not relevant). The new empty block file has nSize == 0 and so the next loop condition will for sure evaluate to false because nSize + addedBlockSize cannot be >= than 128 MiB.

    I think it’s just iterating to find the first block file with space for the new block.

    Yes, and given that it always starts the search at the last file it will never iterate more than once, as per above.


    sipa commented at 6:46 pm on March 20, 2020:
    I think in the original version of this code the loop was there so that after a restart a good file number would be found (even if it always reset to 0). I don’t know if that’s still needed.
  16. kallewoof force-pushed on Mar 20, 2020
  17. kallewoof force-pushed on Mar 22, 2020
  18. kallewoof force-pushed on Mar 23, 2020
  19. kallewoof force-pushed on Mar 23, 2020
  20. vasild commented at 7:43 pm on March 23, 2020: member

    ACK ed34e00da. Code review and manual testing.

    I tested this by applying the patch folded below which introduces a unit test and lowers the max size of blk files so that they can accommodate just 4 test blocks. It also adds some printouts which I manually analyzed after running the unit test as described at the bottom of #17994 (comment). I think this unit test does not qualify for an automated unit test.

    Maybe squash all 4 commits in this PR to one.

      0--- i/src/flatfile.cpp
      1+++ w/src/flatfile.cpp
      2@@ -55,21 +55,23 @@ FILE* FlatFileSeq::Open(const FlatFilePos& pos, bool read_only)
      3 size_t FlatFileSeq::Allocate(const FlatFilePos& pos, size_t add_size, bool& out_of_space)
      4 {
      5     out_of_space = false;
      6 
      7     unsigned int n_old_chunks = (pos.nPos + m_chunk_size - 1) / m_chunk_size;
      8     unsigned int n_new_chunks = (pos.nPos + add_size + m_chunk_size - 1) / m_chunk_size;
      9+    LogPrintf("BBB FlatFileSeq::Allocate(): %s%05u@[%u..%zu]\n",
     10+            m_prefix, pos.nFile, pos.nPos, pos.nPos + add_size - 1);
     11     if (n_new_chunks > n_old_chunks) {
     12         size_t old_size = pos.nPos;
     13         size_t new_size = n_new_chunks * m_chunk_size;
     14         size_t inc_size = new_size - old_size;
     15 
     16         if (CheckDiskSpace(m_dir, inc_size)) {
     17             FILE *file = Open(pos);
     18             if (file) {
     19-                LogPrintf("Pre-allocating up to position 0x%x in %s%05u.dat\n", new_size, m_prefix, pos.nFile);
     20+                LogPrintf("BBB FlatFileSeq::Allocate(): Pre-allocating up to position %u in %s%05u\n", new_size, m_prefix, pos.nFile);
     21                 AllocateFileRange(file, pos.nPos, inc_size);
     22                 fclose(file);
     23                 return inc_size;
     24             }
     25         } else {
     26             out_of_space = true;
     27diff --git i/src/test/validation_block_tests.cpp w/src/test/validation_block_tests.cpp
     28index dae389a16..4f3dcab30 100644
     29--- i/src/test/validation_block_tests.cpp
     30+++ w/src/test/validation_block_tests.cpp
     31@@ -211,12 +211,66 @@ BOOST_AUTO_TEST_CASE(processnewblock_signals_ordering)
     32     UnregisterValidationInterface(&sub);
     33 
     34     LOCK(cs_main);
     35     BOOST_CHECK_EQUAL(sub.m_expected_tip, ::ChainActive().Tip()->GetBlockHash());
     36 }
     37 
     38+BOOST_AUTO_TEST_CASE(download_blocks_out_of_order)
     39+{
     40+    const int numberOfExtraBlocks = 10;
     41+    std::vector<std::shared_ptr<const CBlock>> extraBlocks;
     42+
     43+    BuildChain(Params().GenesisBlock().GetHash(), numberOfExtraBlocks, 0, 0, numberOfExtraBlocks, extraBlocks);
     44+
     45+    assert(extraBlocks.size() == numberOfExtraBlocks);
     46+
     47+    bool ignored;
     48+    BlockValidationState state;
     49+    std::vector<CBlockHeader> headers;
     50+    std::transform(extraBlocks.begin(), extraBlocks.end(), std::back_inserter(headers), [](std::shared_ptr<const CBlock> b) { return b->GetBlockHeader(); });
     51+
     52+    // Process all the headers so we understand the toplogy of the chain
     53+    BOOST_CHECK(ProcessNewBlockHeaders(headers, state, Params()));
     54+
     55+    // Connect the genesis block and drain any outstanding events
     56+    BOOST_CHECK(ProcessNewBlock(Params(), std::make_shared<CBlock>(Params().GenesisBlock()), true, &ignored));
     57+    SyncWithValidationInterfaceQueue();
     58+
     59+    // subscribe to events (this subscriber will validate event ordering)
     60+    const CBlockIndex* initial_tip = nullptr;
     61+    {
     62+        LOCK(cs_main);
     63+        initial_tip = ::ChainActive().Tip();
     64+    }
     65+    TestSubscriber sub(initial_tip->GetBlockHash());
     66+    RegisterValidationInterface(&sub);
     67+
     68+    // Just 4 blocks fit in one blk file
     69+
     70+    //std::vector<size_t> order{0, 1, 2, 3,  4, 5, 6, 7,  8, 9, 10};
     71+    std::vector<size_t> order{0, 1, 2, 3,  4, 5, 6, 8,  7, 9, 10};
     72+    //std::vector<size_t> order{0, 1, 2, 3,  4, 5, 6, 7,  9, 8, 10};
     73+    //std::vector<size_t> order{0, 1, 2, 3,  4, 7, 8, 9,  10, 5, 6};
     74+    //std::vector<size_t> order{0, 2, 3, 4,  5, 6, 7, 8,  9, 1, 10};
     75+
     76+    for (size_t i = 1; i <= numberOfExtraBlocks; ++i) {
     77+        // Block at height H is in extraBlocks[H - 1].
     78+        bool ok = ProcessNewBlock(Params(), extraBlocks[order[i] - 1], true, &ignored);
     79+        assert(ok);
     80+    }
     81+
     82+    while (GetMainSignals().CallbacksPending() > 0) {
     83+        MilliSleep(100);
     84+    }
     85+
     86+    UnregisterValidationInterface(&sub);
     87+
     88+    LOCK(cs_main);
     89+    BOOST_CHECK_EQUAL(sub.m_expected_tip, ::ChainActive().Tip()->GetBlockHash());
     90+}
     91+
     92 /**
     93  * Test that mempool updates happen atomically with reorgs.
     94  *
     95  * This prevents RPC clients, among others, from retrieving immediately-out-of-date mempool data
     96  * during large reorgs.
     97  *
     98diff --git i/src/validation.cpp w/src/validation.cpp
     99index faf92b3b0..0ff0dbcd9 100644
    100--- i/src/validation.cpp
    101+++ w/src/validation.cpp
    102@@ -1731,40 +1731,44 @@ DisconnectResult CChainState::DisconnectBlock(const CBlock& block, const CBlockI
    103     return fClean ? DISCONNECT_OK : DISCONNECT_UNCLEAN;
    104 }
    105 
    106 static void FlushUndoFile(int block_file, bool finalize = false)
    107 {
    108     FlatFilePos undo_pos_old(block_file, vinfoBlockFile[block_file].nUndoSize);
    109+    LogPrintf("BBB FlushUndoFile(): rev%05u pos=%u, finalize=%d\n", undo_pos_old.nFile, undo_pos_old.nPos, finalize);
    110     if (!UndoFileSeq().Flush(undo_pos_old, finalize)) {
    111         AbortNode("Flushing undo file to disk failed. This is likely the result of an I/O error.");
    112     }
    113 }
    114 
    115 static void FlushBlockFile(bool fFinalize = false, bool finalize_undo = false)
    116 {
    117     LOCK(cs_LastBlockFile);
    118     FlatFilePos block_pos_old(nLastBlockFile, vinfoBlockFile[nLastBlockFile].nSize);
    119+    LogPrintf("BBB FlushBlockFile(): blk%05u pos=%u, finalize=%d\n", block_pos_old.nFile, block_pos_old.nPos, fFinalize);
    120     if (!BlockFileSeq().Flush(block_pos_old, fFinalize)) {
    121         AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error.");
    122     }
    123     // we do not always flush the undo file, as the chain tip may be lagging behind the incoming blocks,
    124     // e.g. during IBD or a sync after a node going offline
    125     if (!fFinalize || finalize_undo) FlushUndoFile(nLastBlockFile, finalize_undo);
    126 }
    127 
    128-static bool FindUndoPos(BlockValidationState &state, int nFile, FlatFilePos &pos, unsigned int nAddSize);
    129+static bool FindUndoPos(BlockValidationState &state, int nFile, FlatFilePos &pos, unsigned int nAddSize, int nHeight);
    130 
    131 static bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex* pindex, const CChainParams& chainparams)
    132 {
    133     // Write undo information to disk
    134     if (pindex->GetUndoPos().IsNull()) {
    135         FlatFilePos _pos;
    136-        if (!FindUndoPos(state, pindex->nFile, _pos, ::GetSerializeSize(blockundo, CLIENT_VERSION) + 40))
    137+        if (!FindUndoPos(state, pindex->nFile, _pos, ::GetSerializeSize(blockundo, CLIENT_VERSION) + 40, pindex->nHeight))
    138             return error("ConnectBlock(): FindUndoPos failed");
    139         if (!UndoWriteToDisk(blockundo, _pos, pindex->pprev->GetBlockHash(), chainparams.MessageStart()))
    140             return AbortNode(state, "Failed to write undo data");
    141+        LogPrintf("BBB WriteUndoDataForBlock(): _pos.nFile=%d, nLastBlockFile=%d, pindex->height=%d, last height in file=%d\n",
    142+            _pos.nFile, nLastBlockFile, pindex->nHeight, vinfoBlockFile[_pos.nFile].nHeightLast);
    143         // rev files are written in block height order, whereas blk files are written as blocks come in (often out of order)
    144         // we want to flush the rev (undo) file once we've written the last block, which is indicated by the last height
    145         // in the block file info as below; note that this does not catch the case where the undo writes are keeping up
    146         // with the block writes (usually when a synced up node is getting newly mined blocks) -- this case is caught in
    147         // the FindBlockPos function
    148         if (_pos.nFile < nLastBlockFile && static_cast<uint32_t>(pindex->nHeight) == vinfoBlockFile[_pos.nFile].nHeightLast) {
    149@@ -3246,12 +3250,15 @@ static bool FindBlockPos(FlatFilePos &pos, unsigned int nAddSize, unsigned int n
    150             }
    151         }
    152         pos.nFile = nFile;
    153         pos.nPos = vinfoBlockFile[nFile].nSize;
    154     }
    155 
    156+    LogPrintf("BBB FindBlockPos(): height=%u, belongs to blk%05u[%u..%u]\n",
    157+        nHeight, pos.nFile, pos.nPos, pos.nPos + nAddSize - 1);
    158+
    159     if ((int)nFile != nLastBlockFile) {
    160         if (!fKnown) {
    161             LogPrintf("Leaving block file %i: %s\n", nLastBlockFile, vinfoBlockFile[nLastBlockFile].ToString());
    162         }
    163         FlushBlockFile(!fKnown, finalize_undo);
    164         nLastBlockFile = nFile;
    165@@ -3275,22 +3282,25 @@ static bool FindBlockPos(FlatFilePos &pos, unsigned int nAddSize, unsigned int n
    166     }
    167 
    168     setDirtyFileInfo.insert(nFile);
    169     return true;
    170 }
    171 
    172-static bool FindUndoPos(BlockValidationState &state, int nFile, FlatFilePos &pos, unsigned int nAddSize)
    173+static bool FindUndoPos(BlockValidationState &state, int nFile, FlatFilePos &pos, unsigned int nAddSize, int nHeight)
    174 {
    175     pos.nFile = nFile;
    176 
    177     LOCK(cs_LastBlockFile);
    178 
    179     pos.nPos = vinfoBlockFile[nFile].nUndoSize;
    180     vinfoBlockFile[nFile].nUndoSize += nAddSize;
    181     setDirtyFileInfo.insert(nFile);
    182 
    183+    LogPrintf("BBB FindUndoPos(): height=%d, belongs to: rev%05u[%u..%u]\n",
    184+        nHeight, nFile, pos.nPos, pos.nPos + nAddSize - 1);
    185+
    186     bool out_of_space;
    187     size_t bytes_allocated = UndoFileSeq().Allocate(pos, nAddSize, out_of_space);
    188     if (out_of_space) {
    189         return AbortNode(state, "Disk space is too low!", _("Error: Disk space is too low!").translated, CClientUIInterface::MSG_NOPREFIX);
    190     }
    191     if (bytes_allocated != 0 && fPruneMode) {
    192@@ -3703,12 +3713,13 @@ bool ProcessNewBlockHeaders(const std::vector<CBlockHeader>& headers, BlockValid
    193     }
    194     return true;
    195 }
    196 
    197 /** Store block on disk. If dbp is non-nullptr, the file is known to already reside on disk */
    198 static FlatFilePos SaveBlockToDisk(const CBlock& block, int nHeight, const CChainParams& chainparams, const FlatFilePos* dbp) {
    199+    LogPrintf("BBB SaveBlockToDisk(): height=%d\n", nHeight);
    200     unsigned int nBlockSize = ::GetSerializeSize(block, CLIENT_VERSION);
    201     FlatFilePos blockPos;
    202     if (dbp != nullptr)
    203         blockPos = *dbp;
    204     if (!FindBlockPos(blockPos, nBlockSize+8, nHeight, block.GetBlockTime(), dbp != nullptr)) {
    205         error("%s: FindBlockPos failed", __func__);
    206diff --git i/src/validation.h w/src/validation.h
    207index a5335edc4..e45d29ef5 100644
    208--- i/src/validation.h
    209+++ w/src/validation.h
    210@@ -67,17 +67,25 @@ static const unsigned int DEFAULT_DESCENDANT_SIZE_LIMIT = 101;
    211 static const unsigned int EXTRA_DESCENDANT_TX_SIZE_LIMIT = 10000;
    212 /** Default for -mempoolexpiry, expiration time for mempool transactions in hours */
    213 static const unsigned int DEFAULT_MEMPOOL_EXPIRY = 336;
    214 /** Maximum kilobytes for transactions to store for processing during reorg */
    215 static const unsigned int MAX_DISCONNECTED_TX_POOL_SIZE = 20000;
    216 /** The maximum size of a blk?????.dat file (since 0.8) */
    217-static const unsigned int MAX_BLOCKFILE_SIZE = 0x8000000; // 128 MiB
    218+// Size chosen so that just 4 blocks from the download_blocks_out_of_order unit
    219+// tests fit in one blk file.
    220+static const unsigned int MAX_BLOCKFILE_SIZE = 1200;
    221 /** The pre-allocation chunk size for blk?????.dat files (since 0.8) */
    222-static const unsigned int BLOCKFILE_CHUNK_SIZE = 0x1000000; // 16 MiB
    223+static const unsigned int BLOCKFILE_CHUNK_SIZE = 600;
    224 /** The pre-allocation chunk size for rev?????.dat files (since 0.8) */
    225-static const unsigned int UNDOFILE_CHUNK_SIZE = 0x100000; // 1 MiB
    226+// Size chosen so that one chunk can accommodate 3 undos from the
    227+// download_blocks_out_of_order unit test. Given that we will have max
    228+// 4 undos per one undo file it means that when writing the first undo
    229+// we allocate space for 3 undos, when writing the 4th undo we allocate
    230+// space for 3 more (6 in total). During finalizing we should shrink
    231+// the file size from 6 to 4 undos.
    232+static const unsigned int UNDOFILE_CHUNK_SIZE = 130;
    233 
    234 /** Maximum number of dedicated script-checking threads allowed */
    235 static const int MAX_SCRIPTCHECK_THREADS = 15;
    236 /** -par default (number of script-checking threads, 0 = auto) */
    237 static const int DEFAULT_SCRIPTCHECK_THREADS = 0;
    238 /** Number of blocks that can be requested at any given time from a single peer. */
    
  21. kallewoof force-pushed on Mar 24, 2020
  22. kallewoof commented at 2:56 am on March 24, 2020: member

    @vasild Thanks a lot for testing and for helping get this into shape!

    I’ve squashed the commits into one, and also updated the commit description. I am going to post updated results from the node running this in a bit.

  23. validation: delay flushing undo files in syncing node case
    Data files are pre-allocated, and upon flush/finalization, they are trimmed down to their resulting size.
    Block (blk) files are written to disk as blocks come in, which is often out of order, whereas undo (rev) files are written sequentially, as each block is added to the top of the chain.
    When a block file hits the size limit, the system flushes and trims the file down to its final size, and moves on to the next block file.
    
    Case 1: blocks are added to the chain as they come in (synced up node case) -- in this case, we will flush and finalize the undo file together with the block file.
    
    Case 2: blocks are added to the chain after they have been downloaded (syncing node case) -- in this case, we postpone finalizing the undo file until we know the undo data for the last block in the file has been written to disk.
    ac94141af0
  24. kallewoof force-pushed on Mar 24, 2020
  25. vasild commented at 8:11 am on March 24, 2020: member
    ACK ac94141af (no changes in the code since ed34e00da).
  26. kallewoof commented at 8:24 am on March 30, 2020: member
    Updated #17994 (comment) with more logs. They’re not very exciting, but they indicate things are working as they should.
  27. fjahr commented at 6:56 pm on April 20, 2020: member

    Code review re-ACK ac94141af0c16161afa68de1c3720f254ae4e12c

    Only changes since my last review is the introduction of the finalize_undo parameter and another comments change.

  28. in src/validation.cpp:1751 in ac94141af0
    1754+    if (!BlockFileSeq().Flush(block_pos_old, fFinalize)) {
    1755         AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error.");
    1756     }
    1757+    // we do not always flush the undo file, as the chain tip may be lagging behind the incoming blocks,
    1758+    // e.g. during IBD or a sync after a node going offline
    1759+    if (!fFinalize || finalize_undo) FlushUndoFile(nLastBlockFile, finalize_undo);
    


    jonatack commented at 4:43 pm on April 22, 2020:
    finalize_undo is clear but it might be good to comment on the fFinalize meaning and logic
  29. in src/validation.cpp:3241 in ac94141af0
    3232@@ -3220,8 +3233,13 @@ static bool FindBlockPos(FlatFilePos &pos, unsigned int nAddSize, unsigned int n
    3233         vinfoBlockFile.resize(nFile + 1);
    3234     }
    3235 
    3236+    bool finalize_undo = false;
    3237     if (!fKnown) {
    3238         while (vinfoBlockFile[nFile].nSize + nAddSize >= MAX_BLOCKFILE_SIZE) {
    3239+            // when the undo file is keeping up with the block file, we want to flush it explicitly
    3240+            // when it is lagging behind (more blocks arrive than are being connected), we let the
    3241+            // undo block write case handle it
    


    jonatack commented at 4:47 pm on April 22, 2020:
    It did take a couple of readings to parse this paragraph as two sentences rather than one. I think that adding the missing punctuation would be more clear.
  30. jonatack commented at 4:58 pm on April 22, 2020: member

    Code review ACK ac94141af0c16

    Also built, ran tests and bitcoind as a minimal sanity check.

    Thanks for fixing this and adding documentation. I think additional/better documentation would be helpful, including but not limited to the small suggestions below and the while mentioned above. Happy to re-ACK for that.

    Kudos @vasild for the unit test. The code wasn’t designed for the components to be easily tested separately, but behavior specs / regression tests, if feasible with some refactoring, would be continuously valuable.

    Don’t hesitate to ping me for review if you open a follow-up PR to improve the documentation or tests.

    EDIT: a further follow-up idea by @sipa from today’s Review Club session: benchmark how much the pre-allocation actually helps nowadays (to maybe remove it entirely).

  31. kallewoof commented at 9:07 am on April 28, 2020: member
    Will address @jonatack points if I need to rebase or modify in other ways.
  32. whitslack commented at 2:37 pm on May 20, 2020: contributor

    benchmark how much the pre-allocation actually helps nowadays (to maybe remove it entirely).

    You could be letting the operating system handle your speculative preallocation. On Linux you can even set the allocation extent size manually if you really want to maintain your 16MB-at-a-time behavior. See the XFS_IOC_FSSETXATTR ioctl and the fsx_extsize structure member. (Note, this API was introduced by XFS but is now implemented by several popular Linux file systems.)

  33. laanwj commented at 2:32 pm on June 4, 2020: member

    See the XFS_IOC_FSSETXATTR ioctl and the fsx_extsize structure member. (Note, this API was introduced by XFS but is now implemented by several popular Linux file systems.)

    TIL! Looks like the generic name is FS_IOC_FSSETXATTR, searching for that gives hits in btrfs, ext4, f2fs (besides xfs itself ofc.).

    Will leave this to a follow-up PR though.

  34. laanwj merged this on Jun 4, 2020
  35. laanwj closed this on Jun 4, 2020

  36. laanwj removed this from the "Bugfixes" column in a project

  37. sidhujag referenced this in commit 3f5d6102a6 on Jun 4, 2020
  38. Fabcien referenced this in commit 5ab1c1ba38 on Mar 1, 2021
  39. PastaPastaPasta referenced this in commit d4a9c7d6d4 on Sep 17, 2021
  40. PastaPastaPasta referenced this in commit 480fc90ef7 on Sep 19, 2021
  41. thelazier referenced this in commit f2f86ccf33 on Sep 25, 2021
  42. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me