Improve runtime performance of --reindex

LarryRuane commented at 0:17 am on September 29, 2019: contributor

Background

During the first part of reindexing, LoadExternalBlockFile() sequentially reads raw blocks from the blocks/blk00nnn.dat files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can’t be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can’t be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, LoadExternalBlockFile() reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.

This PR

During the initial read, it’s rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to LoadExternalBlockFile() initially deserializes only a block’s header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.

Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.

fanquake added the label Resource usage on Sep 29, 2019

fanquake added the label Validation on Sep 29, 2019

LarryRuane renamed this:
~~improve runtime performance of --reindex~~
[WIP] improve runtime performance of --reindex
on Sep 29, 2019

LarryRuane commented at 1:32 am on September 29, 2019: contributor

Note: this comment is outdated; the scope of this PR has been reduced.

Running bitcoind --reindex causes the client to reread the .bitcoin/blocks/blknnnnn.dat files and re-derive all the index and chainstate information. It’s similar to IBD but the client uses these local files rather than download blocks from the network. As LoadExternalBlockFile() reads each block from disk, it checks to see if its parent has already been seen (and processed). If so, then this block can be immediately processed (by giving it to AcceptBlock()). If this block’s parent hasn’t yet been seen, it cannot be processed until after its parent is seen. Its prev-hash (parent’s hash) is put into a map of “pending” blocks (called mapBlocksUnknownParent). Later, when its parent is encountered and processed, this map is consulted to see it has a child that’s already been seen that can now be processed. If so, LoadExternalBlockFile() now has to go back and re-read the child block from disk (ReadBlockFromDisk()), deserialize it again, and then process (pass it to AcceptBlock()).

Performance isn’t very good because about 90% of blocks are read before their parents are read. (I think this is a result of the headers-first IBD performance enhancement of a few years ago.) This can be seen by running bitcoind --reindex --debug=reindex, causing many messages like:

02019-09-29T00:29:00Z LoadExternalBlockFile: Out of order block 00000000a2268fb2d4ddc0408fae961a96668db58de415d3c9e0a7694eeb1657, parent 000000005fb53da5c3cda78aef28cbc44fde3e435f1dc2ac04df53387049171f not known

then, a short time later:

02019-09-29T00:29:00Z LoadExternalBlockFile: Processing out of order child 00000000a2268fb2d4ddc0408fae961a96668db58de415d3c9e0a7694eeb1657 of 000000005fb53da5c3cda78aef28cbc44fde3e435f1dc2ac04df53387049171f

So 90% of blocks end up being read from disk twice (actually, not twice, since the operating system’s buffer cache likely still has the data, but there is still a memory-to-memory copy), and deserialized twice.

This PR mitigates this problem by taking advantage of the fact that recent blocks are still in the buffer (blkdat) that is used to read from the blk files. Since rewinding to an earlier offset in the buffer has been fixed by the recently merged #16577, we can see if the child block is still in memory (the probability is increased by making the buffer somewhat larger), and if so, avoid having to read the child block data from disk. It turns out that the child block is still in the buffer 60% of the time. This depends on some randomness that’s introduced during IBD, so it may be different for you. (My IBD was done using all the default settings, nothing special.)

The second part of this change initially deserializes only the block header, rather than the entire block. since about 90% of the time we’re going to have to process this block later; only the hashPrevBlock field of the header is needed to determine if we can process this block immediately (if this block’s parent has been seen). This does mean that 10% of the time, we deserialize the header, and then immediately also the full block (where it would have been more efficient to simply deserialize the full block), but overall this is a big performance win.

On my system, this PR reduces the time to --reindex on mainnet by 48 minutes. I encourage reviewers to attempt to replicate my findings. Reindexing has two parts, reading the blocks/blknnnnn.dat files (which this PR improves), and then connecting the blocks to the blockchain (the UpdateTip messages), which this PR doesn’t affect. For me, this PR reduces the first part from 123 to 75 minutes. After you start seeing UpdateTip messages, you can stop the client, and you will have the time measurement you need from debug.log.

in src/validation.cpp:4730 in 48c0cc7245 outdated

4656+                            blkdat >> nSize;
4657+                            CBlock& blockrecursive = *pblockrecursive;
4658+                            blkdat.SetLimit(pos.nPos + nSize);
4659+                            blkdat >> blockrecursive;
4660+                            LogPrint(BCLog::REINDEX, "%s: Processing in-memory out of order child %s of %s\n", __func__, pblockrecursive->GetHash().ToString(),
4661+                                    head.ToString());

LarryRuane commented at 1:35 am on September 29, 2019:

This LogPrint probably should be combined with the (existing) one a few lines below before merging, but I wanted reviewers to be aware of when the new code is being run.

LarryRuane commented at 1:40 am on September 29, 2019:

I don’t know how to write a unit test for this PR (there are no existing unit tests for any of this area of the code), but test/functional/wallet_basic.py does run through the new code – you can add an assert(false) here and that test fails, so that’s at least something.

LarryRuane renamed this:
~~[WIP] improve runtime performance of --reindex~~
Improve runtime performance of --reindex
on Sep 29, 2019

LarryRuane marked this as ready for review on Sep 29, 2019

promag commented at 9:00 pm on September 29, 2019: member

Concept ACK.

Do you think it’s possible to split in 2 commits

first commit only refactors CBufferedFile - no behavior change
second improves LoadExternalBlockFile.

LarryRuane force-pushed on Sep 30, 2019

LarryRuane commented at 4:42 am on September 30, 2019: contributor

Thanks for the suggestion, @promag, done (force-pushed).

LarryRuane force-pushed on Sep 30, 2019

LarryRuane commented at 5:40 am on September 30, 2019: contributor

I forgot to update the CBufferedFile unit tests, just did that and another force-push.

LarryRuane renamed this:
~~Improve runtime performance of --reindex~~
[WIP] Improve runtime performance of --reindex
on Oct 1, 2019

LarryRuane commented at 2:40 pm on October 1, 2019: contributor

I changed this PR to WIP because I thought of a further improvement, which I’ll try to push by the end of the week. It will build on the existing two commits, so reviewing those wouldn’t be a waste of time. Thanks in advance for any reviews and suggestions.

DrahtBot commented at 8:46 pm on October 3, 2019: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

fanquake added the label Waiting for author on Oct 4, 2019

LarryRuane force-pushed on Oct 9, 2019

in src/init.cpp:671 in 7af058771e outdated

667@@ -668,6 +668,7 @@ static void ThreadImport(std::vector<fs::path> vImportFiles)
668 
669     // -reindex
670     if (fReindex) {
671+        CBufferedFile blkdat(160*MAX_BLOCK_SERIALIZED_SIZE, 140*MAX_BLOCK_SERIALIZED_SIZE, SER_DISK, CLIENT_VERSION);

LarryRuane commented at 3:29 pm on October 9, 2019:

This is a much larger buffer (640 mb), but it only exists during the first phase of reindexing (when you see Reindexing block... in the debug log); after that, the bitcoind memory usage increases for other reasons anyway. During the first phase, memory (VSZ in the ps output) is about 2.4 gb on my system; afterwards (when it’s logging UpdateTip: new best...), it’s about 3.5 gb. The benefit of a large buffer is that a larger fraction of the out-of-order blocks (that have to be processed later, when their parents are seen) can be found in memory, rather than having to read them again from disk. The “hit ratio” with this buffer size is just under 99%. The difference between 160 (but overall buffer size in max-blocks)) and 140 (the rewind limit) determines the size of disk reads (the larger, the better, more efficient).

in src/streams.h:708 in 7af058771e outdated

847-            return false;
848-        nLongPos = ftell(src);
849-        nSrcPos = nLongPos;
850-        nReadPos = nLongPos;
851-        return true;
852-    }

LarryRuane commented at 3:30 pm on October 9, 2019:

This method is unused, and really doesn’t make sense. The buffer contents should match file (or, now, files) content, and this messes that up. If it’s needed in the future, it can be re-thought.

in src/validation.cpp:4762 in 7af058771e outdated

4692@@ -4659,7 +4693,8 @@ bool LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
4693         AbortNode(std::string("System error: ") + e.what());
4694     }
4695     if (nLoaded > 0)
4696-        LogPrintf("Loaded %i blocks from external file in %dms\n", nLoaded, GetTimeMillis() - nStart);
4697+        LogPrintf("Loaded %i blocks from external file in %dms pim %i pom %i\n",
4698+            nLoaded, GetTimeMillis() - nStart, nProcessedInMem, nProcessedOutMem);

LarryRuane commented at 3:41 pm on October 9, 2019:

These pim (out-of-order block found in-memory) and pom (out-of-order block not in memory) let reviewers see the effectiveness of the read caching during testing; they can be removed before merging. Here’s a typical instance:

02019-10-09T15:28:17Z Loaded 839 blocks from external file in 11729ms pim 726 pom 95

This means 821 (726+95) blocks were processed out-of-order, so 18 (839-821) blocks were processed in-order (their parents had already been seen). This is typical, and shows that almost all blocks are processed out-of-order. With this PR, 726 of these out-of-order blocks were recent enough that they were still in the buffer, and 95 had already been pushed out and had to be read again from disk. Overall, 99% of the out-of-order blocks are still in the buffer.

mzumsande commented at 10:49 pm on October 17, 2019:

I think the pim/pom statistics is useful, but maybe too special for the main log. You could use a separate BCLog::REINDEX message for that information.

LarryRuane commented at 3:48 pm on October 9, 2019: contributor

The main change is that the CBufferedFile memory buffer object can now span multiple files. It’s very common that a block is read from disk before its parent, and that parent is found in a later blocks/blk00nnn.dat file. In that case, it’s usually possible to find the earlier (child) block in the memory buffer, obviating the need to read it again from disk.

LarryRuane renamed this:
~~[WIP] Improve runtime performance of --reindex~~
Improve runtime performance of --reindex
on Oct 9, 2019

fanquake removed the label Waiting for author on Oct 9, 2019

LarryRuane force-pushed on Oct 9, 2019

LarryRuane commented at 4:32 pm on October 11, 2019: contributor

Testnet performance isn’t important, but as a sanity check, I verified that the new code runs faster there too on my laptop. The time between Reindexing block file blk00001.dat... and Reindexing finished reduced from 10m21s to 7m13s.

mzumsande commented at 9:41 pm on October 12, 2019: contributor

This looks very interesting, will review the code more closely soon. I am concept ACK for the first iteration, but would like to understand better if the additional benefits of the second iteration (multiple-files) outweigh the costs (memory):

Do you have data from mainnet for the multiple-file buffer implementation? I’m interested if there are diminishing returns or another huge speedup compared to the first revision of this PR.
While other objects that use a lot of memory can be reduced (see reduce-memory.md) in order to run a full node in low-memory environments, the large size of CBufferedFile is hardcoded, which does not seem ideal. The minimum requirements mentioned in (https://bitcoin.org/en/bitcoin-core/features/requirements) are also considerably lower than the 2.4 gb that are used.

LarryRuane commented at 6:32 am on October 13, 2019: contributor

@mzumsande, those are very good thoughts, thank you. I will study this further (and invite others to also). One nice thing about this design is that any amount of memory will work, in that it will be an improvement – it is one of those classic space-time tradeoffs. The idea of making the buffer not have a hard-coded size is intriguing and promising. I think we would not go back to the pre-second iteration (multiple-files), because even reducing the buffer memory to the original amount would be improved by the multi-file version. (That is to say, there is no downside to the latest version here.) That’s because all the multi-file commit means is that we do not discard cached data when switching files.

One thing I hope I made clear enough in an earlier comment is that I tried to keep the memory usage during reindexing be strictly less than what is needed during normal operation, so that there would be no overall increase in memory demand. We certainly would not want this design to create a spike (high water mark) of memory usage. But I was very likely mistaken, since I was not aware that memory could be “dialed down” for small environments.

LarryRuane force-pushed on Oct 15, 2019

LarryRuane commented at 3:29 am on October 15, 2019: contributor

I force-pushed a reorganization of the commits to make reviewing easier. I split the largest commit, speed up reindex by reading out-of-order blocks from memory, into two separate commits:

Commit message: initially deserialize only CBlockHeader, not CBlock. Deserialize only a block header (only 32 bytes) initially, instead of a full block. The block header is all that’s needed to tell us if this block’s parent has been seen yet; if not, the block will have to be deserialized later, after its parent is found. Nearly all blocks (89% on my system) are seen before their parents, so this is a big improvement.
Commit message: speed up reindex by reading out-of-order blocks from memory. If we’ve just processed a block that’s the parent of an earlier-seen block, we must now process that earlier block. Instead of always reading that block from disk, see if it happens to still be in the circular disk read memory buffer (‘CBufferedFile blkdat`). This saves a disk read.

We could merge the first commit without the second. The first commit is pretty simple; the second commit is a little more involved.

These two commits have a buffer size unchanged from master, 8mb (2*MAX_BLOCK_SERIALIZED_SIZE), so there is no change in memory usage. But with this buffer size, only 29% of the out-of-order blocks are found in the buffer. If we increase the buffer by not too much (as I did in an earlier commit that’s now been erased by force-push, 640mb), to 32mb, the “hit rate” increases to 50%. This is probably a good space-time balance.

Here are the block-load times on my system (with all default settings), which is just the first part of reindexing (each row builds on the previous row):

version	real time (minutes)
master	130
deserialize only header	90
look for blocks in buffer, 8mb	84
look for blocks in buffer, 32mb	78
look for blocks in buffer, 640mb	72

(Take with a grain of salt, because the times varied even on identical runs, and a lot probably depends on the specifics of my system.) Keep in mind that this buffer exists only during the block-load part of reindex.

in src/streams.h:766 in b2919cad00 outdated

762@@ -762,12 +763,12 @@ class CBufferedFile
763     }
764 
765 public:
766-    CBufferedFile(FILE *fileIn, uint64_t nBufSize, uint64_t nRewindIn, int nTypeIn, int nVersionIn) :
767-        nType(nTypeIn), nVersion(nVersionIn), nSrcPos(0), nReadPos(0), nReadLimit(std::numeric_limits<uint64_t>::max()), nRewind(nRewindIn), vchBuf(nBufSize, 0)
768+    CBufferedFile(uint64_t nBufSize, uint64_t nRewindIn, int nTypeIn, int nVersionIn) :

mzumsande commented at 10:30 pm on October 17, 2019:

Is it necessary to remove the FILE from the constructor? Couldn’t you just initialize it with nullptr in the multi-file case and call SetFile() later, so that there need to be no syntax changes for the single-file case? It is also easy to create a CBufferedFile object and forget about SetFile(), leading to a memory access violation.

LarryRuane commented at 8:00 pm on October 20, 2019:

@mzumsande, good idea, I restored that argument to the constructor.

in src/init.cpp:694 in c91b547afb outdated

690@@ -689,11 +691,13 @@ static void ThreadImport(std::vector<fs::path> vImportFiles)
691     // hardcoded $DATADIR/bootstrap.dat
692     fs::path pathBootstrap = GetDataDir() / "bootstrap.dat";
693     if (fs::exists(pathBootstrap)) {
694+        CBufferedFile blkdat(2*MAX_BLOCK_SERIALIZED_SIZE, MAX_BLOCK_SERIALIZED_SIZE+8, SER_DISK, CLIENT_VERSION);

mzumsande commented at 10:39 pm on October 17, 2019:

nit: This could be moved into the if block (same for -loadblock below)

mzumsande commented at 11:22 pm on October 17, 2019: contributor

Concept ACK. Thanks for the statistics, I agree that 32MB is a reasonable size.

maflcko commented at 12:58 pm on October 18, 2019: member

Under what circumstances is a -reindex needed these days? Only when disabling -prune? Though, then you’ll download from the network anyway.

leishman commented at 1:11 am on October 21, 2019: contributor

@MarcoFalke Building new indices is required if upgrading from pre-0.17 to post-0.17 (https://bitcoin.org/en/release/v0.17.0) with a txindex.

If your node has a txindex, the txindex db will be migrated the first time you run 0.17.0 or newer, which may take up to a few hours. Your node will not be functional until this migration completes.

I haven’t looked into whether this PR speeds up that process, but it is something to consider.

in src/validation.cpp:4615 in 869aed06b9 outdated

4599-    static std::multimap<uint256, FlatFilePos> mapBlocksUnknownParent;
4600+    struct unknownParentPos {
4601+        FlatFilePos filePos;
4602+        uint64_t pos;
4603+    };
4604+    static std::multimap<uint256, unknownParentPos> mapBlocksUnknownParent;

LarryRuane commented at 3:43 pm on October 22, 2019:

Does it bother anyone else that this is static? This is global state, which, in general, equals evil. It must be static, given that it’s declared here, since its state must persist across calls to this function (very often, a block and its parent are in different blk files). We could eliminate this global state by allocating this map in the caller of this function, src/init.cpp: ThreadImport(), and passing it by reference to LoadExternalBlockFile().

Another thing that seems missing from the current design is, what happens if there are leftover entries in this map after reindexing has completed? That means we’ve read blocks from disk but never found their parent blocks. In my testing, this map is empty at the end of reindexing, as expected. But what if it wasn’t? That would mean we’re missing one or more blocks/blk00nnn.dat files, or some blocks from those files. Isn’t that worth at least a LogPrintf()? Making this change would also require moving the map out to the caller, because this function doesn’t know if it will be called again.

LarryRuane commented at 7:05 pm on October 20, 2022:

UPDATE: this was fixed in #25571

laanwj commented at 11:40 am on October 30, 2019: member

@MarcoFalke Building new indices is required if upgrading from pre-0.17 to post-0.17 (https://bitcoin.org/en/release/v0.17.0) with a txindex.

Isn’t -reindex-chainstate enough in that case? I don’t think that requres a rescan of the block files.

DrahtBot added the label Needs rebase on Oct 30, 2019

LarryRuane force-pushed on Oct 30, 2019

LarryRuane commented at 9:34 pm on October 30, 2019: contributor

Rebased. I think running --reindex is also good when for whatever reason you don’t trust your index files; for example, think files may be missing or may have been corrupted. It’s true that in that case you could just re-IBD, but this is faster and less disruptive to peers.

I wasn’t sure how much variation there might be in the runtimes I posted here earlier, so to get a few more data points, I added assert(0) immediately after printing Reindexing finished, and ran bitcoind --reindex several times in a shell loop, timing each run (using the standard time bash builtin). Here are the real times without this PR (which I reported as 130 minutes in the earlier comment):

0real	131m52.120s
1real	130m55.904s
2real	133m20.911s
3real	134m37.052s
4real	146m17.564s
5real	135m17.301s

and here are the times for runs with this PR (I ran this a few more times), this is what I reported as 84 minutes above:

0real	79m55.050s
1real	92m30.341s
2real	82m44.119s
3real	80m50.067s
4real	89m31.793s
5real	87m36.802s
6real	97m42.247s
7real	86m31.999s
8real	87m13.304s
9real	82m21.497s

DrahtBot removed the label Needs rebase on Oct 30, 2019

DrahtBot added the label Needs rebase on Nov 5, 2019

LarryRuane force-pushed on Nov 7, 2019

LarryRuane commented at 7:34 pm on November 7, 2019: contributor

Rebased to fix merge conflicts, force-pushed.

DrahtBot removed the label Needs rebase on Nov 7, 2019

LarryRuane force-pushed on Nov 21, 2019

LarryRuane commented at 5:41 pm on November 21, 2019: contributor

Rebased, force-pushed

LarryRuane force-pushed on Feb 21, 2020

LarryRuane commented at 7:07 pm on February 21, 2020: contributor

Reviewers, please take a fresh look at the latest force-push; this PR is much better than before. I’ll explain from scratch, so it’s not necessary to read any of the previous comments in this PR.

The previous version of this PR included two separate changes (both affecting -reindex):

initially deserializing only the header instead of the entire block
when we find a block whose child has already been read, attempt to “rewind” in the stream memory buffer to access and process the child (instead of always reading it again from disk)

These should always have been separate PRs. This PR now contains only the first change, which is the more effective and localized. I’ll make a separate PR for the second change if it turns out to be a significant improvement. This PR no longer has any effect on memory usage, which reviewers had concerns about.

To go over some basics again, for background: During -reindex, blocks are read from disk (the blocks/blk00???.dat files), deserialized, and processed. But a block can’t be processed if its parent hasn’t yet been seen (we know this only after deserializing the block). In that case, its disk location is saved in a map and we go on to the next block.

Whenever we do successfully process a block (we’ve got its parent), we check to see if its child has already been seen (is present in the map, looked up by block hash), and, if so, we read the child block from disk (again), deserialize it, and process it (and remove it from the map).

It turns out that, due to the “headers first” initial block download (IBD) algorithm, over 92% of blocks (on my system) are out of order, in the sense that 92% of the time when we first read a block, we haven’t yet seen its parent. You can see these by specifying the -debug=reindex option (see the second comment in this PR).

What this PR does – and now all it does – is to initially deserialize only the header (80 bytes), not the entire block. From the header, we can tell if this block’s parent has already been seen (by its prev-hash). If not (92% of the time), we skip over the rest of the block. So the savings are to not deserialize the entire block in the most common case where that’s unnecessary.

It’s true that if all blocks were in order, this would be slightly slower because we’d deserialize the header, discover that the block’s parent is already known, then deserialize the full block. (There’s no easy way to deserialize the rest of the block.)

To test this change,you may want to apply this patch:

0--- a/src/init.cpp
1+++ b/src/init.cpp
2@@ -692,6 +692,7 @@ static void ThreadImport(std::vector<fs::path> vImportFiles)
3         pblocktree->WriteReindexing(false);
4         fReindex = false;
5         LogPrintf("Reindexing finished\n");
6+        StartShutdown();
7         // To avoid ending up in a situation without genesis block, re-try initializing (no-op if reindexing worked):
8         LoadGenesisBlock(chainparams);
9     }

Then you can time the reindexing in isolation (without having to look at timestamps in the log file):

0$ time src/bitcoind --reindex --maxconnections=0

Setting maxconnections to zero makes the timing less variable, since you can get very different peers each time.

When I run this, I consistently get 83 minutes with this PR’s branch, and 125 minutes without, an improvement of 42 minutes.

LarryRuane commented at 7:27 pm on March 3, 2020: contributor

Just to clarify, the last bullet point in the review club writeup:

Before this PR, we would always read entire blocks from disk into our buffer. If the predecessor of a block was not available, the blocks would have to be read again at a later point. This PR changes behavior such that initially only the 80 byte block header is read into the buffer, and if we can’t accept the block yet, the rest of the block is skipped for now

is slightly incorrect. Even with this PR, we always read entire blocks from disk (most of them twice); the disk read behavior before and after this PR is exactly the same. The difference is that with this PR, only the header is (initially) deserialized, rather than the entire block, which saves CPU.

The CBufferedFile object does somewhat large reads, around 4 MB (MAX_BLOCK_SERIALIZED_SIZE), each of which often includes multiple entire blocks and a partial block (because we don’t know where the boundaries are; we don’t know how much to read to get exactly a block or blocks). Recent blocks are close to full, so each disk read probably reads in only one or two blocks (plus a partial block).

But now I’m wondering if this PR should work as described in the review club comment. We could read only the 80-byte header from disk, and then, if we’ve seen its parent, read the rest of the block. Two disk reads are less efficient than one large read (a disk read has a fixed overhead and a length-proportional overhead), but that happens only about 8% of the time. If we haven’t seen this block’s parent (probability 92%), we could seek ahead in the file, instead of reading it into memory. But as explained earlier, there could be more than one block per large read, so we may end up exchanging one large read for 2 or 3 80-byte reads – hard to say which is better. I’d recommend leaving this PR as is, because it’s an (almost) unambiguous improvement on the status quo, and changing it to do 80-byte reads and seeks is more complex than this PR is currently. (The reason I say almost is because, if the blocks are already sorted, this ends up deserializing the header one extra time.)

I’m just now thinking, we could make reindex (part 1) screaming fast by storing the headers in separate “parallel” files in the blocks/ directory: blk01234.dat would be the same as now, but there would be a hdr01234.dat (similar to the way there’s a rev001234.dat today) that would be that file’s blocks’ headers in the same order. Actually, that file would be a sequence of (header, file-offset, size) tuples. The file-offset is the file offset of that block’s block in the same-named blk file. That’s only 100 MB of additional disk space (about 100 bytes times about 1M blocks, this is far less than 1% of the size of the blocks/ directory).

Then -reindex could do 4 MB reads, same as today except reading the hdr001234.dat files, bringing in 50k headers in a single read – super-efficient. Then loop through each of those, and if the block can be accepted now (its parent has been seen, 8% case), then seek and read from the blk01234.dat file (we would know the exact seek offset and the size); that would be slower than today. But the 92% case would be much faster because we wouldn’t be reading entire blocks into memory unnecessarily. This would be a significant reduction in the number of disk reads. (Definitely this would be a separate PR.) Opinions?

in src/test/streams_tests.cpp:350 in a50f956e30 outdated

338@@ -335,6 +339,57 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file)
339     fs::remove("streams_test_tmp");
340 }
341 
342+BOOST_AUTO_TEST_CASE(streams_buffered_file_skip)
343+{
344+    FILE* file = fsbridge::fopen("streams_test_tmp", "w+b");
345+    // The value at each offset is the offset.

kallewoof commented at 7:36 am on March 4, 2020:

I only understood this comment after I got to L#359.

0    // The value at each offset is the byte offset (e.g. byte 1 in the file has the value 0x01, etc).

in src/test/streams_tests.cpp:372 in a50f956e30 outdated

380+        bf.Skip(2);
381+        BOOST_CHECK(false);
382+    } catch (const std::exception& e) {
383+        BOOST_CHECK(strstr(e.what(),
384+                        "Read attempted past buffer limit") != nullptr);
385+    }

kallewoof commented at 7:42 am on March 4, 2020:

You could do

0BOOST_CHECK_THROW(bf.Skip(2), std::exception);

LarryRuane commented at 4:16 pm on March 5, 2020:

This kind of check occurs four places in this file (and was being done before this PR, for example line 256). I like the conciseness of your suggestion, but it loses the check of the exception string. (The test wouldn’t detect if a completely different exception occurred.) Another advantage of checking the exception string is that it documents why the exception is thrown (could be in a comment, but code is better).

It would be nice if there was a variant of BOOST_CHECK_THROW that took a string argument. What do you think about implementing this in this file, or in a more general place:

0#define BOOST_CHECK_THROW_WHAT(statement, exception, str) \
1    do { \
2        try { \
3            statement; \
4            BOOST_CHECK(false); \
5        } catch (const exception& e) { \
6            BOOST_CHECK(strstr(e.what(), str) != nullptr); \
7        } \
8    } while (false)

and then replacing the try/catch (here and in 3 other places) with:

0    BOOST_CHECK_THROW_WHAT(bf.Skip(2),
1        std::exception, "Read attempted past buffer limit");

(I tested this.) It’s weird to name this macro BOOST_* because it’s not part of the Boost library, but what it does is so close to BOOST_CHECK_THROW() that it may make sense. If the casual reader thinks this is part of Boost, is that so bad?

maflcko commented at 3:28 pm on November 16, 2022:

There’d be BOOST_CHECK_EXCEPTION?

kallewoof commented at 7:54 am on March 4, 2020: member

Concept and code review ACK a50f956e304adc7c75428ec391f1a76a99c1ddde

kallewoof commented at 9:26 am on March 4, 2020: member

Btw https://bitcoincore.reviews/16981.html notes that this fixes a bug as well. I would really like if this bug-fix was its own separate commit (followed by the commit doing the actual runtime performance improvement).

mzumsande commented at 10:16 am on March 4, 2020: contributor

This seems to be a misunderstanding. I don’t think that this PR fixes a bug, the sentence in question referred to the older PR #16577 which is already merged.

kallewoof commented at 10:26 am on March 4, 2020: member

OH. Yeah, that’s what it says. I wasn’t reading carefully enough.

emilengler commented at 5:40 pm on March 4, 2020: contributor

Concept ACK Thanks! I like this more low-level oriented way. I will test this now a bit after my chain fully synched. The code looks good to me so far but I will wait until todays PR review club meeting before I give more feedback on this yet.

in src/validation.cpp:4681 in a50f956e30 outdated

4685-                CBlock& block = *pblock;
4686-                blkdat >> block;
4687-                nRewind = blkdat.GetPos();
4688-
4689-                uint256 hash = block.GetHash();
4690+                blkdat.SetLimit(nBlockPos + 80);

emilengler commented at 5:44 pm on March 4, 2020:

nit: Could this 80 become a const? The block header might get bigger or smaller in future and this 80 could get missed.

sipa commented at 5:52 pm on March 4, 2020:

I think this call is unnecessary. It’s there in the old code to make sure there is consistency between the block length in the file, and the actually serialized data. Headers are always just 80 bytes, though, so this check isn’t necessary here.

LarryRuane commented at 4:44 pm on March 5, 2020:

I think this call is unnecessary.

Thanks, I’ll remove it. (@emilengler’s comment is no longer relevant.) It’s similar to the way we don’t call SetLimit() before deserializing into the 4-byte nSize just above.

in src/validation.cpp:4199 in a50f956e30 outdated

4703                         if (dbp)
4704-                            mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
4705+                            mapBlocksUnknownParent.insert(std::make_pair(header.hashPrevBlock, *dbp));
4706+
4707+                        // Position to the start of the next block.
4708+                        nRewind = nBlockPos + nSize;

sipa commented at 5:55 pm on March 4, 2020:

I think this is incorrect. If the size field of the block was corrupted this would cause us to potentially jump into nowhere. Do things still work (and still give a performance benefit…) if you change this to nBlockPos + 1?

ajtowns commented at 5:05 am on March 5, 2020:

I think if nSize as read from disk is wrong, then

(a) if it’s smaller than the actual size of the block, blkdat >> block will currently throw which will spit out a Deserialize or I/O error and start trying again at the end of the nSize bytes – so this behaviour should be near enough to the same, and this is fine

and (b) if it’s larger than the actual size of the block, the nSize will be effectively ignored, and we’ll pick up at a different, possibly incorrect point, and try to find the next chainparams.MessageStart() marker, so we could miss up to, I guess, 8MB worth of block data (4MB due to a bad nSize which is just skipped over, then just under 4MB if the next block was as large as possible and we just missed its start which is read byte-by-byte) due to 4 bytes of corrupted size data, rather than just coping with it. This is worse, but doesn’t seem terrible – if the four bytes that were corrupted were the last bytes of one block and the first bytes of another you’d have lost up to 8MB of data that way too, without any more precise targeting.

LarryRuane commented at 8:22 pm on March 5, 2020:

These comments create a good opportunity for me to ask some questions about the existing design while doing this PR, hope that’s okay. Please don’t feel an obligation to answer these.

What’s the purpose of the MessageStart() marker? It seems to allow the skipping over some arbitrary junk that follows a block before the start of the next [marker-nSize-block] sequence of bytes. What could cause that? In other projects, I’ve seen these markers used when a file gets overwritten (such as a journal file that implements a circular log), because after restart, the previous write offset is unknown. Are these blk files overwritten?
Are we trying to be robust against arbitrary file-overwrite corruption? If the file contains 100 blocks, and data corruption overwrote part of the second block, then this design (searching for the marker) will find the start of the third block and pick up from there and correctly parse the remaining 97 blocks. This sounds good, but the second block’s child (or children) will never be “accepted” (AcceptBlock() called on it), recursively (and not just for this file but for all remaining files), so we’ll end up with a very truncated blockchain. What happens then? I think we may download blocks from peers (IBD) starting from where the corruption occurred. That great if true (why don’t I actually try this?), although a minor problem is that the many entries in mapBlocksUnknownParent are never released since that map is static – which is kind of weird. But since these (following) blocks are never accepted, the rest of this part 1 of reindex is wasted effort. It may be better to just abort the rest of this part 1 of reindexing if we encounter any file corruption.

If the size field of the block was corrupted

It’s true that this PR increases dependence on this field, but If we’re assuming the size can be corrupted, I’m not sure what benefit it can be providing, either with or without this PR. Yes, setting nRewind to nBlockPos+1 would work (I haven’t tried it but I could), but it would be significantly slower because we’d have to scan for the next marker (a megabyte or 2 away) one byte at a time.

It seems to me that if the size field is corrupted, all bets are off, the train’s going off the tracks in a hurry.

ajtowns commented at 3:05 am on March 6, 2020:

The main point of -reindex now is recovery after on-disk corruption, so I think it makes sense to be as robust to that as we can reasonably be?

As well as just random disk corruption, I think you can get problems if (a) you’re downloading blocks and your power goes out before the data gets synced; I’d expect you might end up with something like “…[marker][nSize=1MB][250kB of block data][64kB of zeroes]” in that case; at which point when you restart you might start writing the next block straight after the zeroes. When running reindex currently, I think your parsing would fail during the 64kB of zeroes, and you’d then scan to the next marker and continue. If so, I guess that means that with this patch, if you ran -reindex you’d not see that new block and have to redownload it. That’s all speculation though, so don’t take my word for it.

Hmm, as an alternative approach, what if you just still read/parse the entire block off the disk, but if it’s not one you can immediately accept, you store it into a map for later acceptance? Limit the map to 32MB worth of data, or maybe somehow tie it in to the mempool limit? Keeps the same logic, but reduces the amount of double-reading and double-parsing needed.

LarryRuane commented at 0:24 am on March 7, 2020:

I did an experiment yesterday in which I simulated corruption in one of the blk??.dat files, and as I suspected, all the blocks beyond the coorrupted block were never accepted (because their parents were missing). But the node then seemed to get all the missed blocks from peers, and at the end all seemed good.

Anthony, your suggestion is similar to what this PR originally did before I reduced its scope. You can see the discussion above, but to summarize here, the idea was to increase the size of the circular memory buffer (CBufferedFile object) so it could act as a cache, so that we could sometimes access blocks read earlier without having to re-read from disk; the concern was about increasing memory requirements. I think something along the lines I proposed or what you’re proposing is likely to help further (especially tying it to mempool limit; that’s a good idea), but I suggest keeping this PR limited (no change in memory, a definite improvement) if you don’t mind.

LarryRuane commented at 0:27 am on March 7, 2020:

On the topic of the size being corrupted, if the size field is set to a uniformly distributed random value (or zero, probably the most likely), that is caught (see the check nSize < 80 || nSize > MAX_BLOCK_SERIALIZED_SIZE above); the probability that the corrupted value will be in that range is less than 0.1%. So most data corruption there will be detected.

in src/validation.cpp:4200 in a50f956e30 outdated

4705+                            mapBlocksUnknownParent.insert(std::make_pair(header.hashPrevBlock, *dbp));
4706+
4707+                        // Position to the start of the next block.
4708+                        nRewind = nBlockPos + nSize;
4709+                        blkdat.SetLimit(nRewind);
4710+                        blkdat.Skip(nRewind - blkdat.GetPos());

narula commented at 6:30 pm on March 4, 2020:

Skip() calls read(), which means (IIUC) that this change makes it so we are now doing a disk read while holding cs_main. Given this is reindexing this might not matter but worth noting. Is this a problem?

fjahr commented at 8:57 pm on March 4, 2020:

Also, I think this part could be more concise. I mean adding position + size just do subtract position below, seems a bit redundant.

LarryRuane commented at 9:23 pm on March 5, 2020:

Skip() calls read()

Good catch, and it’s actually even worse, because if we process the block immediately (don’t take the Skip() path), then we deserialize the entire block (blkdat >> block, just below), and that calls read() which can read the disk. But that’s the less common path (only about 8%).

As someone said at review club, the node isn’t fully up and running during reindexing, so there probably isn’t much contention for this lock. But I think we can avoid this problem; I’ll push a fix if it doesn’t complicate the code too much.

this part could be more concise. I mean adding position + size just do subtract position below, seems a bit redundant.

I don’t think it’s redundant; note that nBlockPos is the start of the current block (whose header we just read), while blkdat.GetPos() returns the current position, which is after the header. We don’t want to skip nSize bytes (which is the length of the block), we want to skip the remainder of the current block, so the distance to move forward is the start of the next block minus the current position. This could have been written blkdat.Skip(nSize - 80) but I think that’s not as clear.

I just noticed that the SetLimit() here can be removed; it’s not needed when doing a Skip() (it’s only needed when deserializing something of variable length, such as a block). I’ll remove it.

fjahr commented at 9:02 pm on March 4, 2020: contributor

Concept ACK, also reviewed the code and mostly satisfied. I think there is good potential for performance improvement here. I will wait with testing until the current comments are addressed.

LarryRuane commented at 9:29 pm on March 5, 2020: contributor

Thanks so much to all the reviewers – these are great comments and suggestions! I’ll force-push fixes for most of the suggestions (where it’s clear what to do) probably tomorrow.

Another thing I said during review club that I’d look into is seeing if we can test the changes to the reindex code in validation.cpp in a functional test, so I’ll do that too.

LarryRuane force-pushed on Mar 6, 2020

in src/test/streams_tests.cpp:197 in f714ad1fe7 outdated

209+            BOOST_CHECK(false); \
210+        } catch (const exception& e) { \
211+            BOOST_CHECK(strstr(e.what(), str) != nullptr); \
212+        } \
213+    } while (false)
214+

LarryRuane commented at 0:01 am on March 7, 2020:

No functional change here, just refactoring a check that’s repeated 4 times. (It’s a macro rather than a function so that the line numbers report correctly.) This macro is misnamed (since it’s not part of Boost), but it’s so close to BOOST_CHECK_THROW() that this is probably the least confusing name. (But I can change it.)

LarryRuane commented at 1:05 pm on March 15, 2021:

Update: I removed this new test macro, BOOST_CHECK_THROW_STR(), because it’s out of scope for this PR.

LarryRuane force-pushed on Mar 7, 2020

in src/validation.cpp:4191 in 0e27461482 outdated

4691+                CBlockHeader header;
4692+                blkdat >> header;
4693+
4694+                // Read the rest of the block (from the end of its header to the end
4695+                // of the block) to ensure it's in memory while holding cs_main (below).
4696+                blkdat.Skip(nBlockPos + nSize - blkdat.GetPos());

LarryRuane commented at 0:30 am on March 7, 2020:

@narula, this addresses your concern about holding cs_main while reading disk.

vasild commented at 2:36 pm on April 5, 2020:

This Skip() can be skipped. What the PR does now is:

Read from disk and deserialize the block header.
Read from disk the remaining of the block with this Skip() call (just buffer it into blkdat).
If it is out-of-order block, continue with the next block. Notice that even if the data we read from disk in 2. remains buffered in blkdat, we will never make use of it because we read out-of-order blocks using ReadBlockFromDisk() which will not use our blkdat cache and will read the block from disk again (2x read).
If it is in-order block, deserialize the cached data from 2.

We can do better (find a patch at the bottom of this comment):

Read from disk and deserialize the block header.
If it is out-of-order block, continue with the next block.
If it is in-order block, read from disk and deserialize it.

It follows that we don’t need the newly introduced method Skip() since we only call it here.

 0diff --git i/src/validation.cpp w/src/validation.cpp
 1index 34a6e028b..705595625 100644
 2--- i/src/validation.cpp
 3+++ w/src/validation.cpp
 4@@ -4657,39 +4657,42 @@ bool LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
 5                 uint64_t nBlockPos = blkdat.GetPos();
 6                 if (dbp)
 7                     dbp->nPos = nBlockPos;
 8                 CBlockHeader header;
 9                 blkdat >> header;
10 
11-                // Read the rest of the block (from the end of its header to the end
12-                // of the block) to ensure it's in memory while holding cs_main (below).
13-                blkdat.Skip(nBlockPos + nSize - blkdat.GetPos());
14-
15                 const uint256 hash = header.GetHash();
16+                bool outOfOrder;
17+
18                 {
19                     LOCK(cs_main);
20-                    // detect out of order blocks, and store them for later
21-                    if (hash != chainparams.GetConsensus().hashGenesisBlock && !LookupBlockIndex(header.hashPrevBlock)) {
22-                        LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
23-                                header.hashPrevBlock.ToString());
24-                        if (dbp)
25-                            mapBlocksUnknownParent.insert(std::make_pair(header.hashPrevBlock, *dbp));
26-
27-                        // Skip the rest of this block; position to the marker before the next block.
28-                        nRewind = nBlockPos + nSize;
29-                        continue;
30-                    }
31+                    outOfOrder =
32+                        hash != chainparams.GetConsensus().hashGenesisBlock
33+                        && !LookupBlockIndex(header.hashPrevBlock);
34+                }
35 
36-                    // This block can be processed immediately; rewind to its start then read it.
37-                    blkdat.SetPos(nBlockPos);
38-                    blkdat.SetLimit(nBlockPos + nSize);
39-                    std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();
40-                    CBlock& block = *pblock;
41-                    blkdat >> block;
42-                    nRewind = blkdat.GetPos();
43+                // Store positions of out of order blocks for later.
44+                if (outOfOrder) {
45+                    LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
46+                            header.hashPrevBlock.ToString());
47+                    if (dbp)
48+                        mapBlocksUnknownParent.insert(std::make_pair(header.hashPrevBlock, *dbp));
49+                    // Skip the rest of this block; position to the marker before the next block.
50+                    nRewind = nBlockPos + nSize;
51+                    continue;
52+                }
53+
54+                // This block can be processed immediately; rewind to its start then read it.
55+                blkdat.SetPos(nBlockPos);
56+                std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();
57+                CBlock& block = *pblock;
58+                blkdat >> block;
59+                nRewind = blkdat.GetPos();
60 
61+                {
62+                    LOCK(cs_main);
63                     // process in case the block isn't known yet
64                     CBlockIndex* pindex = LookupBlockIndex(hash);
65                     if (!pindex || (pindex->nStatus & BLOCK_HAVE_DATA) == 0) {
66                       BlockValidationState state;
67                       if (::ChainstateActive().AcceptBlock(pblock, state, chainparams, nullptr, true, dbp, nullptr)) {
68                           nLoaded++;

LarryRuane commented at 1:13 am on April 7, 2020:

Thanks, Vasil. Steps 1 and 2 (in your first list) are not separate disk reads; each disk read generally includes multiple blocks. The CBufferedFile object doesn’t know (and has no way to know) where the block boundaries are, so it does as large a read as the buffer allows, which is about 4 MB (MAX_BLOCK_SERIALIZED_SIZE = 4000000). The buffer boundaries and block boundaries aren’t synchronized in any way, so each read usually includes a partial block at the end. (Blocks never span blk00nnn.dat files.)

I did consider changing this function to do exactly what you suggest – read only the block header of each block, and read the rest of the block only if we can accept it immediately. We’d read fewer bytes from disk – but the tradeoff is that we’d do many more read operations. It’s unclear which is better. I suspect it would depend on the underlying kind of disk. I have an SSD, so small reads at different file offsets would probably be efficient, but if it’s a spinning disk, doing fewer large reads is probably more efficient since there’s less rotational delay.

Also, making that change would be more complex.

Did you time your patch? When your patch sets nRewind to the beginning of the next block and takes the continue path, that rewind value is used as the argument to blkdat.SetPos() at line 4656 in this PR. I added some debug prints, and as I suspected, that SetPos() call isn’t always doing what we expect. The caller doesn’t check its return value (there’s much to be desired about this code); if it can’t move ahead to the requested position, it moves ahead as far as it can and returns false.

When running your patch, SetPos() usually can move to the requested position, but about 30 times per file, it can’t because it reaches the end of the buffer (SetPos isn’t designed to read from disk). We end up usually positioned to the middle of a block, and then FindByte() searches for (and finds) the start of the next block. It works, but obviously it’s not very efficient since each byte is inspected individually.

I think the best way to avoid reading most blocks from disk twice is an idea I described in the last two paragraphs of this comment. But that’s a much larger change and I don’t know if it’s worth it. But even if we implement that idea later, I still think this PR, as it is, has a high enough performance improvement to complexity ratio to be merged.

vasild commented at 12:41 pm on April 7, 2020:

Uh, oh. You are right that SetPos() must be within what has already been read into the buffer (CBufferedFile::vchBuf). I tried timing the suggested patch and it was either the same or ~1 min slower than the PR. This is within noise, but maybe is because of what you describe above (FindByte() checking every byte).

CBufferedFile is indeed not designed to skip through data (without reading it from disk) and changing it to be able to do so would require some substantial changes and increase in its complexity.

Scrap this suggestion.

in test/functional/feature_reindex.py:75 in 0e27461482 outdated

62+        extra_args = [["-reindex", "-debug=reindex"]]
63+        self.start_nodes(extra_args)
64+        wait_until(lambda: self.nodes[0].getblockcount() > 0)
65+
66+        # All blocks should be accepted and processed.
67+        assert_equal(self.nodes[0].getblockcount(), 12)

LarryRuane commented at 0:37 am on March 7, 2020:

I did verify that if I break this PR’s code, this test doesn’t pass. I know hardcoded constants are generally bad, but I think they’re okay here, because the size of an empty regtest block is very unlikely to change, and if it does, this test will fail, and it should be pretty easy for someone to figure out why and fix the test. (That is, the test will not falsely succeed.)

LarryRuane commented at 0:44 am on March 7, 2020: contributor

I think I’ve addressed all the review comments, thanks again!

kallewoof commented at 2:13 am on March 7, 2020: member

Edit: I forgot to mention; this node has txindex enabled.

Benchmarks running on a50f956e304adc7c75428ec391f1a76a99c1ddde:

025.11s user 163.37s system 7% cpu 39:53.51 total
125.59s user 162.06s system 7% cpu 39:43.73 total
225.35s user 161.31s system 7% cpu 39:16.82 total
324.98s user 162.59s system 7% cpu 39:06.45 total
425.36s user 162.80s system 8% cpu 39:01.81 total

With SCHED_BATCH off:

024.64s user 155.84s system 8% cpu 35:31.20 total
125.22s user 156.17s system 8% cpu 36:08.64 total
225.31s user 156.53s system 8% cpu 36:44.35 total
325.08s user 156.75s system 8% cpu 37:30.11 total
424.97s user 156.26s system 8% cpu 37:15.24 total

Master (too slow; only ran one time, on 3d28c886f077ce22fb7755fe9ec1f4e08d3d4a62):

04261.85s user 151.57s system 66% cpu 1:49:57.76 total

in src/validation.cpp:4426 in 0e27461482 outdated

4678-                // read block
4679+                // read block header
4680                 uint64_t nBlockPos = blkdat.GetPos();
4681                 if (dbp)
4682                     dbp->nPos = nBlockPos;
4683-                blkdat.SetLimit(nBlockPos + nSize);

vasild commented at 12:27 pm on April 5, 2020:

This call to SetLimit() has been postponed down to line 4706 where we read the block if it is in-order. There is no need to postpone it - reading the header should also not attempt a read beyond nBlockPos + nSize which is expected to be the end of the block.

So, I would suggest restore it here and remove it from below (line 4706).

in src/validation.cpp:4439 in 0e27461482 outdated

4696+                blkdat.Skip(nBlockPos + nSize - blkdat.GetPos());
4697+
4698+                const uint256 hash = header.GetHash();
4699                 {
4700                     LOCK(cs_main);
4701                     // detect out of order blocks, and store them for later

vasild commented at 12:32 pm on April 5, 2020:

We don’t store the blocks for later. Suggestion:

0                    // Store positions of out of order blocks for later.

vasild commented at 2:43 pm on April 5, 2020: contributor

Approach ACK

I tested this, it is CPU-bound on my laptop. Time to complete the reindex (on mainnet database, up to height 624363):

code tested	run1	run2
master (f0d6487e2)	63m26s	63m41s
above + cherry-pick this PR (0e2746148)	44m08s	43m02s

A few suggestions below.

LarryRuane force-pushed on Apr 9, 2020

LarryRuane commented at 8:54 pm on April 9, 2020: contributor

@vasild, thanks for the timings, I like both of your suggestions (here and here) so I just force-pushed both of them. I also fixed the reindex functional test, which I hadn’t noticed has been broken for a month.

Oops, I accidentally force-pushed a rebase onto latest master (because I was trying out your suggestion, which didn’t merge cleanly without rebasing), sorry about that, reviewers.

vasild commented at 6:08 am on April 10, 2020: contributor

ACK fbf708b8c

Compiled and tested locally.

I viewed the changes introduced in the latest force-push (0e27461 -> fbf708b8c) using:

0diff -u <(git diff 0e27461~..0e27461) <(git diff fbf708b8c~..fbf708b8c)

Do you think it would be possible to demonstrate that it produces identical result as master? That is - reindex with master, dump the leveldb database from blocks/index in some textual form, reindex with this PR, dump again and compare the two dumps?

LarryRuane commented at 7:55 pm on April 12, 2020: contributor

@vasild

Do you think it would be possible to demonstrate that it produces identical result as master? That is - reindex with master, dump the leveldb database from blocks/index in some textual form, reindex with this PR, dump again and compare the two dumps?

That’s a good idea, but I can’t find a tool to create such a dump. (Maybe that would be a nice addition to contrib/?)

I did think of a test to convince me that it’s working:

I ran bitcoind --reindex --maxconnections=0 and let it finish reindexing. Not just the first part that this PR modifies, but the entire reindex procedure, to the point it had logged an UpdateTip of a very recent block. Then, of course, it couldn’t add new blocks because it had no peers.

Then I restarted the client without any arguments, and it started syncing at that very recent block height (since now it had peers) and within about a minute was fully synced.

This convinces me because once before I’d purposely damaged the first part of reindexing by simulating a disk read error early in the process, which resulted in one block not being processed. The client could not load blocks from that point onward, because this missing block’s child couldn’t be processed (parent is required), so that child block’s child couldn’t be processed, and so on, recursively. The system very nicely recovered by going into IBD mode and getting all blocks following that missing block from peers and then running normally.

If this PR broke the reindexing process, the same would have happened (it would have started IBD-ing from when the bug introduced damage, probably very early). But it didn’t. It picked up downloading from a very recent height, and accepted these blocks ("UpdateTip: new best=..."), proving that all previous blocks were recorded correctly in the block index.

What do you think, does that seem like a good test to you?

LarryRuane commented at 8:26 pm on April 12, 2020: contributor

I had an idea for a further speedup, but it didn’t work out as well as I had hoped. It’s about 10% faster, but probably should not be merged because the added complexity isn’t worth it. The branch is here (based on this PR’s branch plus a single commit): https://github.com/LarryRuane/bitcoin/tree/reindex-speedup-thread

The idea is to move the delayed processing of child blocks (delayed because the parent wasn’t yet available) into a separate thread. This overlaps the child processing with the reading of the block files using the CBufferedFile object and deserializing the header, which is a natural separation. If the times to do those two things were equal, this would have given a 2x improvement. But some debug printing shows that processing the child blocks takes the majority of the time (the parent thread ends up waiting for the child thread near the bottom of the function). And, unfortunately, the processing of child blocks can’t be multithreaded, because each child depends on its parent. (If there was a long chain split, it may be possible to process the two chains in parallel, but those are very rare.)

fjahr commented at 11:07 pm on April 16, 2020: contributor

Code looks good to me so far but I would like it more if the refactoring was in a separate commit and the commit message could go into more detail on how the performance improvement is achieved.

But I am currently also having trouble to reproduce the performance improvement itself. I am comparing the current master (d8dfcea5d956126fe0e2b1f36db74c8ef805e7ab) with this PR and I am using testnet locally with no other special configurations. I am looking at the time span between Reindexing block file blk00001.dat and Leaving InitialBlockDownload, so this includes the entire reindex procedure.

My two runs with this PR were 1:10 h and 1:14 h while on master they were 1:05 h and 1:06 h. Any idea why that could be the case?

LarryRuane force-pushed on Apr 18, 2020

LarryRuane commented at 2:51 am on April 18, 2020: contributor

Thanks, @fjahr, good idea, I just force-pushed three commits to replace the previous single commit so the changes are separated out more clearly.

I also just updated the description, which was outdated.

I timed testnet and this PR was 4 minutes faster than latest master (with this PR based on latest master). On mainnet, it’s more like 24 minutes. I think the difference is that testnet blocks are smaller and less complex on average, so the savings of deserializing only the header instead of the entire block is less. If you still have your debug.log files, how much time was there between reindexing the first block file and Reindexing finished? That’s the only phase of reindexing that this PR changes. I suspect there is enough randomness in the entire reindex that the 4 minutes advantage on testnet can be lost in the noise.

One suggestion, if you could try it again, is to start bitcoind with the -maxconnections=0. I think each time you start bitcoind you can get a different set of peers, and different peers can put a different load on your node, so not having any peers reduces the variance, at least that’s been my experience.

fjahr commented at 10:13 pm on April 19, 2020: contributor

One suggestion, if you could try it again, is to start bitcoind with the -maxconnections=0. I think each time you start bitcoind you can get a different set of peers, and different peers can put a different load on your node, so not having any peers reduces the variance, at least that’s been my experience.

Sorry, I should have said that I already set -maxconnections=0, I tried to replicate your setup described earlier. It wouldn’t be the first time my laptop gets odd benchmark results (see #18352) but I don’t see a direct connection.

DrahtBot added the label Needs rebase on May 6, 2020

LarryRuane force-pushed on May 8, 2020

DrahtBot removed the label Needs rebase on May 8, 2020

LarryRuane force-pushed on Jul 13, 2020

LarryRuane commented at 5:38 am on July 13, 2020: contributor

Force-pushed and rebased to fix merge conflicts. This PR had momentum for a while but that’s faded. Reviewers may be put off by the number of comments, but the result of all the discussion is a much simpler PR. I believe all reviewers’ comments have been addressed. I did make sure that the description (first comment) is up-to-date and accurate.

I believe the tests may be the best aspect of this PR; see especially the feature_reindex.py functional test. It tests out-of-order block processing in LoadExternalBlockFile() which has never been (auto-) tested before. Even if all we merged from this PR is that test, it probably would be worth it. Thanks in advance for helping to get this PR over the hump.

vasild commented at 6:28 pm on July 13, 2020: contributor

The fixup commit fix CI failures, same changes as [#18216](/bitcoin-bitcoin/18216/) would better be squashed into whichever commit introduced the failures.

LarryRuane force-pushed on Jul 13, 2020

LarryRuane commented at 11:36 pm on July 13, 2020: contributor

@vasild, good suggestion, done (squashed and force-pushed).

I re-ran the timing experiments again since a lot has changed. I started the client with maxconnections=0 (so it wouldn’t be bothered by peers). I ran the experiments multiple times in a controlled environment, so the variation was very small. These are the times of the first phase of -reindex (reading the blk?????.dat files) on mainnet:

	Real time (minutes, seconds)	User CPU time	System CPU time
master	133m35s	116m11s	4m28s
master + PR	87m3s	70m36s	3m35s

The real-time with the PR (probably what people care about most) is 65% of the real-time without this PR, or 46m32s faster.

DrahtBot added the label Needs rebase on Jul 21, 2020

LarryRuane force-pushed on Jul 21, 2020

LarryRuane commented at 5:00 pm on July 21, 2020: contributor

Force-pushed rebase to eliminate merge conflict.

DrahtBot removed the label Needs rebase on Jul 21, 2020

maflcko referenced this in commit 6ee36a263c on Jul 23, 2020

in src/streams.h:742 in da17e6efee outdated

796         }
797     }
798 
799+    //! move ahead in the stream
800+    void Skip(size_t nSize) {
801+        read(nullptr, nSize);

pinheadmz commented at 9:23 pm on July 23, 2020:

How come you still use read() for this? You made the memcpy optional but if you are fast-forwarding past the buffer size, couldn’t you still end up calling Fill() several times? Why not just add nSize to nSrcPos and Seek()?

Never mind, I see below you can keep the block in memory this way in case it is ready for processing. And the buffer size is big enough so my worry is moot.

in src/test/streams_tests.cpp:458 in da17e6efee outdated

454+                if (currentPos + nSize > fileSize)
455+                    continue;
456+                bf.SetLimit(currentPos + nSize);
457+                bf.Skip(nSize);
458+                currentPos += nSize;
459+                break;

pinheadmz commented at 9:58 pm on July 23, 2020:

no assert / BOOST_CHECK for this test?

LarryRuane commented at 3:23 am on July 24, 2020:

I agree that looks strange, but this case of the enclosing switch statement isn’t an isolated test, we’re within a 100-iteration loop. As the test continues to loop, any Skip() bug would cause the BOOST_CHECK_EQUAL at line 407 to fail (and also the other cases to fail since currentPos keeps track of where we should be in the stream).

in test/functional/feature_reindex.py:45 in da17e6efee outdated

40+        # test out-of-order handling, swap blocks 1 and 2 on disk.
41+        blk0 = os.path.join(self.nodes[0].datadir, self.nodes[0].chain, 'blocks', 'blk00000.dat')
42+        with open(blk0, 'r+b') as bf:
43+            b = bf.read(814)
44+
45+            # This is really just checking the test; these are is the first

pinheadmz commented at 10:10 pm on July 23, 2020:

nit: “these are the first…”

pinheadmz commented at 10:25 pm on July 23, 2020: member

concept ACK, code LGTM just little questions and nits. Passed all unit tests and functional tests locally. Will try benchmarking later with a fully synced node.

LarryRuane force-pushed on Jul 24, 2020

LarryRuane commented at 3:24 pm on July 24, 2020: contributor

Force-push to fix comment (review suggestion), also took the opportunity to improve the test by checking for expected text in debug.log.

jamesob commented at 3:47 pm on July 24, 2020: member

Running a -reindex benchmark on this; will report back with results.

pinheadmz commented at 2:30 am on July 25, 2020: member

Ran a pair of benchmark tests:

With PR: 1 hour, 8 minutes 5b1e92f539c24b6518f577b92874ccac0bbf617f

02020-07-24T22:41:21Z Reindexing block file blk00000.dat...
1...
22020-07-24T23:49:24Z Reindexing block file blk02165.dat...
32020-07-24T23:49:27Z Loaded 126 blocks from external file in 2570ms
42020-07-24T23:49:27Z Reindexing finished

Without PR: 1 hour, 40 minutes HEAD~3 2c0c3f8e8ca6c64234b1861583f55da2bb385446

02020-07-25T00:29:27Z Reindexing block file blk00000.dat...
1...
22020-07-25T02:10:48Z Reindexing block file blk02165.dat...
32020-07-25T02:10:52Z Loaded 126 blocks from external file in 3120ms
42020-07-25T02:10:52Z Reindexing finished

Nice job @LarryRuane !

pinheadmz approved

pinheadmz commented at 2:31 am on July 25, 2020: member

ACK 5b1e92f539c24b6518f577b92874ccac0bbf617f

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA256
 2
 3ACK 5b1e92f539c24b6518f577b92874ccac0bbf617f
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAl8bmWwACgkQ5+KYS2KJ
 7yTqdvg//Sf2sqHYQgRt3S9VLbgYAPDtG2gyDAm+rMLVqSa4Kf8e7om7TD7Wd6R+G
 8u1SQu/z6tvudHIeQbiv5HMiHM88vwoVu+4QxJ3w8o5/vjSfTKUg2eqeMoQcXclkd
 9fgu1HT8DDfqBRWWmnkpTAZEMr5XTNU3en2+gALzkGHKuFcvuNeLYfSVehzOVg079
10wOwAhUpFpArRvKomQuhA7ohYdIjjU1lm1rRwARXpX481zMsRo+8xPlI4zrKKJ7yV
115LVzS6pH9EbPPurXKCOqw7HGm0r4Tjyc8iGcEt3gSDKOzx6MRNVxABc9k47v/039
120TWyIcNeAVLr6bdoG6gZrfVAyJSm4lViNy33uvIJcdgfeDRSqxl/KXpz6Sa6Ljz4
13XURdxFKULhV4vm/1K5Qp/Wc6gaTqVjgpr2JfWo5Z6p9VhaSTukVdPz9QS4XFIMSf
14dxfKhkFlakG7hJtqkIaute2rRrXy0z99VduHHo5IwVuFFssFlPjXZNyZKITZwWnQ
15tfzvMbgylhNRUjEqJbNptVq0/OryD3GhM9yesyry0A18m7TpSYYT5rMP8Q+1aPLm
16y2MUvbV7LC64Gq1SpwNshmAZcTwDSBZJjrRbmdPmLwlN2QgQHBREJ9m6eUZxUjFY
17Noo9T+4/4kThC7ruj2twFz4qqt75rwTJ2FT9dhEqE/pk00nEwR0=
18=vTu8
19-----END PGP SIGNATURE-----

pinheadmz’s public key is on keybase

hebasto commented at 2:42 pm on July 25, 2020: member

Concept ACK.

fjahr commented at 8:29 pm on July 25, 2020: contributor

ACK 5b1e92f539c24b6518f577b92874ccac0bbf617f

I did a fresh re-review of the code. I currently don’t have access to the machine where I saw no speed-up before but I ran another test on a different machine and that confirms the benchmarks that others have seen here: 2:05h on current master vs. 1:22h with this PR.

hebasto commented at 7:32 am on July 26, 2020: member

Benchmarked and tested 5b1e92f539c24b6518f577b92874ccac0bbf617f on Linux Mint 20 (x86_64).

Setup:

block height=640740
recent block file blk02168.dat
commad-line arguments: -nonetworkactive -reindex

master (40a04814d130dfc9131af3f568eb44533e2bcbfc)

 02020-07-25T14:56:14Z [main] Bitcoin Core version v0.20.99.0-40a04814d (release build)
 1...
 22020-07-25T14:56:14Z [loadblk] loadblk thread start
 32020-07-25T14:56:14Z [loadblk] Reindexing block file blk00000.dat...
 4...
 52020-07-25T16:29:02Z [loadblk] Reindexing block file blk02167.dat...
 62020-07-25T16:29:03Z [loadblk] Loaded 2 blocks from external file in 1515ms
 72020-07-25T16:29:03Z [loadblk] Reindexing block file blk02168.dat...
 82020-07-25T16:29:14Z [loadblk] Loaded 837 blocks from external file in 10413ms
 92020-07-25T16:29:14Z [loadblk] Reindexing finished
102020-07-25T16:29:14Z [loadblk] Pre-allocating up to position 0x100000 in rev00000.dat
112020-07-25T16:29:14Z [loadblk] UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09T02:54:25Z' progress=0.000000 cache=0.0MiB(1txo)
12...
132020-07-25T21:00:01Z [loadblk] UpdateTip: new best=0000000000000000000f9ddd409d655a2e420c442a877813a7697289a84d4525 height=640740 version=0x37ffe000 log2_work=92.145733 tx=551982677 date='2020-07-25T14:51:51Z' progress=0.999868 cache=586.0MiB(4419378txo)
142020-07-25T21:00:01Z [loadblk] Leaving InitialBlockDownload (latching to false)
152020-07-25T21:00:01Z [loadblk] Imported mempool transactions from disk: 137 succeeded, 0 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
162020-07-25T21:00:01Z [loadblk] loadblk thread exit
17...

master + this PR

 02020-07-25T21:14:58Z [main] Bitcoin Core version v0.20.99.0-b0a2130fb (release build)
 1...
 22020-07-25T21:14:58Z [loadblk] loadblk thread start
 32020-07-25T21:14:58Z [loadblk] Reindexing block file blk00000.dat...
 4...
 52020-07-25T22:13:00Z [loadblk] Reindexing block file blk02167.dat...
 62020-07-25T22:13:01Z [loadblk] Loaded 0 blocks from external file in 1233ms
 72020-07-25T22:13:01Z [loadblk] Reindexing block file blk02168.dat...
 82020-07-25T22:13:02Z [loadblk] Loaded 0 blocks from external file in 1210ms
 92020-07-25T22:13:02Z [loadblk] Reindexing finished
102020-07-25T22:13:02Z [loadblk] Pre-allocating up to position 0x100000 in rev00000.dat
112020-07-25T22:13:02Z [loadblk] UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09T02:54:25Z' progress=0.000000 cache=0.0MiB(1txo)
12...
132020-07-26T01:09:08Z [loadblk] UpdateTip: new best=000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 height=467466 version=0x20000002 log2_work=86.465712 tx=224507800 date='2017-05-21T16:50:34Z' progress=0.408111 cache=256.7MiB(1729948txo)
142020-07-26T01:09:09Z [loadblk] Imported mempool transactions from disk: 1 succeeded, 136 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
152020-07-26T01:09:09Z [loadblk] loadblk thread exit
16...

Please note some differences:

log messages about the blk02168.dat reindexing: 837 vs 0 blocks
block height at which the loadblk thread exits: 640740 vs 467466
this PR log lacks Leaving InitialBlockDownload (latching to false) log message
Imported mempool transactions from disk transaction statistics

hebasto commented at 1:06 pm on July 26, 2020: member

Also the resulted index sizes differ substantially:

master (40a04814d130dfc9131af3f568eb44533e2bcbfc):

0$ ls -l
1total 168532
2-rw------- 1 hebasto hebasto 83614454 Jul 26 15:59 000003.log
3-rw------- 1 hebasto hebasto       27 Jul 26 15:59 000004.log
4-rw------- 1 hebasto hebasto 88941405 Jul 26 15:59 000005.ldb
5-rw------- 1 hebasto hebasto       16 Jul 26 14:15 CURRENT
6-rw------- 1 hebasto hebasto        0 Jul 26 14:15 LOCK
7-rw------- 1 hebasto hebasto       50 Jul 26 14:15 MANIFEST-000002

master + this PR:

0$ ls -l
1total 122436
2-rw------- 1 hebasto hebasto 60722874 Jul 26 14:07 000003.log
3-rw------- 1 hebasto hebasto       27 Jul 26 14:07 000004.log
4-rw------- 1 hebasto hebasto 64632267 Jul 26 14:07 000005.ldb
5-rw------- 1 hebasto hebasto       16 Jul 26 13:09 CURRENT
6-rw------- 1 hebasto hebasto        0 Jul 26 13:09 LOCK
7-rw------- 1 hebasto hebasto       50 Jul 26 13:09 MANIFEST-000002

hebasto commented at 1:53 pm on July 26, 2020: member

I found the root of all differences in logs:

master (40a04814d130dfc9131af3f568eb44533e2bcbfc):

0[loadblk] Reindexing block file blk00877.dat...
1...
2[loadblk] LoadExternalBlockFile: Processing out of order child 000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 of 000000000000000001d22f4d6a05420ba43ee2eb457efea06be8280f8a5f14d2
3[loadblk] LoadExternalBlockFile: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error
4...
5[loadblk] Loaded 98 blocks from external file
6[loadblk] Reindexing block file blk00878.dat...
7...

master + this PR

0[loadblk] Reindexing block file blk00877.dat...
1...
2[loadblk] LoadExternalBlockFile: Processing out of order child 000000000000000001ee925f0e1570c6b61990ac9777fd4a73983b623494e894 of 000000000000000001d22f4d6a05420ba43ee2eb457efea06be8280f8a5f14d2
3...
4[loadblk] Loaded 68 blocks from external file
5[loadblk] Reindexing block file blk00878.dat...
6...

Is this change in error handling behavior expected?

jamesob commented at 4:23 pm on July 26, 2020: member

I haven’t looked closely at the changes here or the debug.log from these runs, but reindexing to 450,000 isn’t showing much difference on my end between the HEAD of this branch and the mergebase.

hebasto commented at 5:25 pm on July 26, 2020: member

@jamesob Will try to describe a problem better. The blocks/blk00877.dat file is somehow broken on my disk.

The master branch:

encounter an error in blkdat >> block
reports about an error with the “Deserialize or I/O error…”
tries to read from the next position

This PR:

no error during blkdat >> header
a broken block (or just random bytes) just treated as a block with unknown parent

An example with this PR:

0$ ./src/bitcoind -loadblock=/home/hebasto/blk00877.dat.error
1...
22020-07-26T17:19:28Z [loadblk] loadblk thread start
32020-07-26T17:19:28Z [loadblk] Importing blocks file /home/hebasto/blk00877.dat.error...
4...
52020-07-26T17:19:28Z [loadblk] LoadExternalBlockFile: Out of order block eda32a9dea60fff9b0c4df31a425cd6d0e19b74f4c88d8f08a5b8f9e04265872, parent 01ee925f0e1570c6b61990ac9777fd4a73983b623494e89420000000000f3aee not known
6...

Please note that blocks

eda32a9dea60fff9b0c4df31a425cd6d0e19b74f4c88d8f08a5b8f9e04265872, and
01ee925f0e1570c6b61990ac9777fd4a73983b623494e89420000000000f3aee just do not exist in the Universe yet :)

hebasto commented at 3:43 pm on July 27, 2020: member

May I suggest a patch:

0--- a/src/validation.cpp
1+++ b/src/validation.cpp
2@@ -4719,6 +4719,7 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
3                 blkdat.Skip(nBlockPos + nSize - blkdat.GetPos());
4 
5                 const uint256 hash = header.GetHash();
6+                if (!CheckProofOfWork(hash, header.nBits, chainparams.GetConsensus())) continue;
7                 {
8                     LOCK(cs_main);
9                     // Store positions of out of order blocks for later.

that makes loading blocks from files bullet-proof?

The excerpt from the log (master + this PR + patch):

0...
12020-07-27T14:10:13Z [loadblk] loadblk thread start
22020-07-27T14:10:13Z [loadblk] Reindexing block file blk00000.dat...
3...
42020-07-27T15:33:46Z [loadblk] Reindexing block file blk02167.dat...
52020-07-27T15:33:47Z [loadblk] Loaded 2 blocks from external file in 1282ms
62020-07-27T15:33:47Z [loadblk] Reindexing block file blk02168.dat...
72020-07-27T15:33:58Z [loadblk] Loaded 837 blocks from external file in 10813ms
82020-07-27T15:33:58Z [loadblk] Reindexing finished
9...

It takes 84 min to build a new block index vs 93 min on master (see #16981 (comment)), that is ~10% improvement.

LarryRuane commented at 8:06 pm on July 28, 2020: contributor

@jamesob:

… reindexing to 450,000 isn’t showing much difference on my end between the HEAD of this branch and the mergebase

That’s strange. What does reindexing to 450,000 mean? I didn’t know you can do that. Is it reindexing up to that height, or only beyond that height?

Are the debug.log files still available for those runs? I’d be curious to know when the message Reindexing finished appears relative to the start, because this PR should only affect that part of a -reindex run.

hebasto commented at 7:54 am on July 29, 2020: member

I think the benchmarks for this PR should additionally provide “Out of order block…” log message count. The lower this count is the lower -reindex speedup should be expected.

LarryRuane marked this as a draft on Jul 29, 2020

LarryRuane commented at 3:58 am on July 31, 2020: contributor

@hebasto, thank you for doing all those tests. I did more testing (that’s why I changed this PR to a Draft), and I’m now convinced there’s no functionally important difference between master and this PR when there’s data corruption in the blk.dat files. There can be a behavior and timing difference, but I don’t think it matters. The behavior difference (as you have explained) is that if there’s corruption in the middle of a block’s serialization on disk, it’s most likely not in the header, and because this PR only initially deserializes the header, the corruption isn’t detected immediately. master deserializes the entire block, so it detects it immediately.

Assume there’s corruption in the middle of a block (not in the header). This happens within LoadExternalBlockFile():

With master, the blkdat >> block throws an exception, this is caught and the catch handler prints “Deserialize or I/O error”, and we move on to the next block. So it was as if that block didn’t exist in the blk.dat files.

With this PR, the same thing happens if the block can be processed immediately (that is, if we’ve already seen its parent). Otherwise, an entry gets added to mapBlocksUnknownParent and we move on to the next block, and the corruption is undetected for now. A short time later, we reach this corrupt block’s parent and then call ReadBlockFromDisk() on the corrupt block, which detects the corruption, logs the error, and returns false. LoadExternalBlockFile() just continues, so, again, it’s as if this block didn’t exist in the blk.dat files.

In both cases (with and without this PR), subsequent blocks no longer have AcceptBlock() called on them, because the chain is broken (the corrupt block is a missing link in the blockchain unless the block happens to not be part of the best chain, which is unlikely). All these blocks just get added to mapBlocksUnknownParent (and never removed and processed).

 0--- a/src/validation.cpp
 1+++ b/src/validation.cpp
 2@@ -4720,8 +4720,20 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
 3                     if (hash != chainparams.GetConsensus().hashGenesisBlock && !LookupBlockIndex(block.hashPrevBlock)) {
 4                         LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
 5                                 block.hashPrevBlock.ToString());
 6-                        if (dbp)
 7-                            mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
 8+                        static bool docorrupt = true;
 9+                        // simulate corruption in the first block that doesn't have a parent yet
10+                        if (dbp) {
11+                            if (docorrupt) {
12+                                // Simulate corruption in the first block that can't be processed
13+                                // immediately (that's when there's a difference between master
14+                                // and the PR) -- don't add an entry to mapBlocksUnknownParent.
15+                                docorrupt = false;
16+                                LogPrintf("%s: simulate corruption block %s, parent %s\n", __func__, hash.ToString(),
17+                                        block.hashPrevBlock.ToString());
18+                            } else {
19+                                mapBlocksUnknownParent.insert(std::make_pair(block.hashPrevBlock, *dbp));
20+                            }
21+                        }
22                         continue;
23                     }
24 
25@@ -4778,13 +4790,13 @@ void LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, FlatFi
26                     }
27                 }
28             } catch (const std::exception& e) {
29-                LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what());
30+                LogPrintf("%s: Deserialize or I/O error - %s pos %s offset %d\n", __func__, e.what(), dbp->ToString(), blkdat.GetPos());
31             }
32         }
33     } catch (const std::runtime_error& e) {
34         AbortNode(std::string("System error: ") + e.what());
35     }
36-    LogPrintf("Loaded %i blocks from external file in %dms\n", nLoaded, GetTimeMillis() - nStart);
37+    LogPrintf("Loaded %i blocks from external file in %dms mapBlocksUnknownParent size %i\n", nLoaded, GetTimeMillis() - nStart, mapBlocksUnknownParent.size());
38 }
39 
40 void CChainState::CheckBlockIndex(const Consensus::Params& consensusParams)

(The final LogPrintf in the function also prints the number of entries in mapBlocksUnknownParent.)

Here’s the output when I run this on master on testnet:

02020-07-30T19:55:07Z Reindexing block file blk00000.dat...
12020-07-30T19:55:11Z Loaded 753 blocks from external file in 3837ms mapBlocksUnknownParent size 85998
22020-07-30T19:55:11Z Reindexing block file blk00001.dat...
32020-07-30T19:55:14Z Loaded 0 blocks from external file in 2985ms mapBlocksUnknownParent size 139893
42020-07-30T19:55:14Z Reindexing block file blk00002.dat...
52020-07-30T19:55:17Z Loaded 0 blocks from external file in 3162ms mapBlocksUnknownParent size 202285
6(...)
72020-07-30T20:05:43Z Reindexing block file blk00185.dat...
82020-07-30T20:05:47Z Loaded 0 blocks from external file in 3997ms mapBlocksUnknownParent size 1781217

It’s always zero blocks loaded from then on. Then (either with this PR or master), it enters the ActivateBestChain phase, and it gets blocks from peers. So it automatically goes into IBD mode. When it gets near the tip. It finishes syncing, it logs InitialBlockDownload (latching to false), and then runs normally. I verified that if I restart the node, it’s still synced, and runs normally. This is both with and without the PR.

LarryRuane marked this as ready for review on Jul 31, 2020

hebasto commented at 9:45 am on July 31, 2020: member

@LarryRuane

You described the scenario when a block is corrupted within a block file. In my situation (don’t know how it was happened though) a block file has random/broken bytes but all block are correct within it.

Therefore, with the suggested patch --reindex completes without getting blocks from peers.

LarryRuane commented at 3:35 pm on August 1, 2020: contributor

a block file has random/broken bytes but all block are correct within it.

Can you explain more? Where exactly is the corruption?

hebasto commented at 3:46 pm on August 1, 2020: member

a block file has random/broken bytes but all block are correct within it.

Can you explain more? Where exactly is the corruption?

blk00877.dat.error

LarryRuane commented at 3:06 pm on August 4, 2020: contributor

TL;DR

I think the PR is good as it is can and should be merged. @hebasto, I understand exactly what’s going on, and having that corrupted file really helped, thanks. Before explaining what I found, can you try a different fix, this will confirm my understanding. At this point in LoadExternalBlockFile():

0// Skip the rest of this block; position to the marker before the next block.
1nRewind = nBlockPos + nSize;

Just comment out that setting of nRewind. I expect that all the blocks will load (you won’t see Loaded 0 blocks... messages).

Explanation

There’s a bad block whose marker bytes, 0xf9, 0xbe, 0xb4, 0xd9, start at offset 0x3147657. The size field that follows is 0xf3afd = 998141, which is reasonable. This is the number of bytes of the serialized block. If you move ahead that many bytes in the stream, you should be at the marker bytes of the next block (or be at the end of the file). That should be 0x3147657 + 4 + 4 + 998141 = 0x323b15c. Looking at that part of the hex dump of the file, we see:

00323b150  ac ad 21 07 00 f9 be b4  d9 03 3b 0f 00 02 00 00  |..!.......;.....|

That’s 16 bytes, and offset 0xc should be the next marker bytes, but instead we see the marker bytes starting at offset 0x5. These are the marker bytes for a legitimate block, and if we don’t load it, all subsequent blocks will have no parent.

So, with the PR, what happens is, we deserialize just the header, and it’s probably impossible for that to fail because it’s just 80 bytes and everything is fixed-sized. Its parent isn’t found (of course, since its prevhash is garbage, doesn’t even start with zeros), so we add an entry to mapBlocksUnknownParent[], and then skip ahead to where we think the next block’s marker bytes start. But that’s just beyond where the marker bytes are, so miss that entire block. We get resynchronized (FindByte()) with the following block.

In contrast, with master, the attempt to deserialize the entire block fails, so we resynchronize starting at the byte after that bad block’s first marker byte, and thus we do find and load this block.

The additional call to CheckProofOfWork() that @hebasto suggested acts here as a sanity-check on the block header, which this nonsense header fails, so then again we attempt to resynchronize starting at the byte after that bad block’s first marker byte, and we find the next block.

So, a way to summarize the problem is that this PR makes the block size (that follows the marker bytes, nSize) more important than it is with master. On master, when we fail to deserialize the block, we go back to resynchronizing with the stream (looking for the marker bytes) and the block size isn’t used.

One thing we can do to fix this in this PR is that, after deserializing just the header, if its parent hasn’t been seen, then after adding it to mapBlocksUnknownParent, we could resynchronize with the stream, instead of just skipping ahead by the size of the block. We can do that by removing this line. But that would be much slower.

 0--- a/src/streams.h
 1+++ b/src/streams.h
 2@@ -851,12 +851,21 @@ public:
 3 
 4     //! search for a given byte in the stream, and remain positioned on it
 5     void FindByte(char ch) {
 6+        // start is the index into vchBuf[] at which to start searching
 7+        size_t start = nReadPos % vchBuf.size();
 8         while (true) {
 9             if (nReadPos == nSrcPos)
10                 Fill();
11-            if (vchBuf[nReadPos % vchBuf.size()] == ch)
12-                break;
13-            nReadPos++;
14+            size_t n = vchBuf.size() - start;
15+            if (n > nSrcPos - nReadPos)
16+                n = nSrcPos - nReadPos;
17+            auto it_start = vchBuf.begin()+start;
18+            auto it_find = std::find(it_start, it_start+n, ch);
19+            size_t inc = it_find - it_start;
20+            nReadPos += inc;
21+            if (inc < n) break;
22+            start += inc;
23+            if (start >= vchBuf.size()) start = 0;
24         }

And this works, and is much faster (std::find() seems to be very efficient in this case).

But on further reflection, I don’t think we should do this, because this fixes only this specific kind of file corruption – this problem can happen anyway. There can be all kinds of corruption that makes a block unserializable, and once that happens, all subsequent blocks cannot be loaded. But I have seen that when this occurs, the node is able to download blocks from peers and synchronize with the network anyway. It’s true that it doesn’t seem to correct damaged blk.dat files, but if that’s worth doing, it should be for a separate PR.

But I’ll do some benchmarking to see if this resynchronizing (with the faster FindByte()) is close to the PR as it is now, and if it’s only a few percent slower, maybe it’s worth doing. I’ll give an update later.

LarryRuane commented at 4:20 pm on August 5, 2020: contributor

I benchmarked this latest version, and it’s only 3% slower than the PR version. This latest version has the faster FindByte() and also resynchronizes with the byte stream in the (very common) case where, after deserializing only the header, the parent hasn’t yet been seen. This is instead of depending on the nSize value (moving ahead by this much). This makes reindexing more tolerant of various kinds of file corruption (as good as master). So I’ll push that as a separate commit; I think the added robustness is worth it (and this is still much faster than master).

hebasto commented at 3:17 am on August 6, 2020: member

@LarryRuane

FindByte() speed up is great, but combining two different speed optimizing solutions in a single PR is not a good idea. I’d suggest to separate the FindByte() patch into a dedicated PR that will allow to measure each optimization approach individually.
Resynchronizing on each out-of-order block (commenting out nRewind = nBlockPos + nSize;) indeed improves robustness. Let’s compare two solutions:

resynchronizing on each header that fails CheckProofOfWork() (https://github.com/bitcoin/bitcoin/pull/16981#issuecomment-664474119) has a small constant time (optimized hashing) overhead per each block
resynchronizing on each out-of-order block has a much greater time overhead (reading operations) per each out-of-order block, and the number of such blocks can vary from user to user. Moreover, this approach inserts into mapBlocksUnknownParent a piece of junk that will cause additional time overhead while searching through this map

Therefore, I’d strictly prefer the former approach.

While touching log message you could combine ideas from #19600.

LarryRuane force-pushed on Aug 7, 2020

LarryRuane commented at 3:12 pm on August 7, 2020: contributor

@hebasto, that makes sense, I took your suggestion. Force-pushed (diff). The only slight change I made from your suggestion is this:

0if (!CheckProofOfWork(hash, header.nBits, chainparams.GetConsensus())) continue;
1{
2    LOCK(cs_main);
3    (...)

looks somewhat like this large block of code is within the if statement (if you don’t notice the continue). So I wrote the continue on a separate line.

hebasto commented at 3:38 pm on August 7, 2020: member

@LarryRuane

Thanks!

Two last commits with the same messages are confusing. Did you mean that they should be a single commit (that seems reasonable)?

LarryRuane force-pushed on Aug 7, 2020

LarryRuane commented at 5:38 pm on August 7, 2020: contributor

Two last commits with the same messages are confusing

Yes, that was a mistake, thanks for catching that. Force-pushed (diff).

LarryRuane force-pushed on Aug 7, 2020

hebasto commented at 9:05 pm on August 7, 2020: member

Was writing a review on e5724b5fde855b67ff500e21f11274f12b5c893a and noticed a fresh push. What is its goal?

LarryRuane commented at 9:11 pm on August 7, 2020: contributor

@hebasto, you’re too quick! Sorry, I confused myself with my lack of git skills… I just make a force-push to do a small cleanup (diff) and ended up doing some extra commits by mistake. I had noticed that the "-debug=reindex" in the python test was unneeded, because the test nodes are started with "-debug" (which turns on all debug logging).

hebasto commented at 9:32 pm on August 7, 2020: member

ACK e5724b5fde855b67ff500e21f11274f12b5c893a, tested on Linux Mint 20 (x86_64).

Testing and benchmarking setup:

block height: 641143
block files: blk00000.dat..blk02172.dat
617526 blocks are out of order (data extracted from debug.log)
blk00877.dat is buggy (see #16981 (comment) and #16981 (comment))
to exit early the following patch was used:

 0--- a/src/init.cpp
 1+++ b/src/init.cpp
 2@@ -715,6 +715,8 @@ static void ThreadImport(ChainstateManager& chainman, std::vector<fs::path> vImp
 3         // To avoid ending up in a situation without genesis block, re-try initializing (no-op if reindexing worked):
 4         LoadGenesisBlock(chainparams);
 5     }
 6+    StartShutdown();
 7+    return;
 8 
 9     // -loadblock=
10     for (const fs::path& path : vImportFiles) {

master (4b705b1c98f60ab736df98d62a8d4988f61678d5)

0$ time src/bitcoind -nonetworkactive -nodnsseed -debug=reindex -reindex
1...
2real	103m19,177s
3user	75m51,900s
4sys	2m40,693s

master (4b705b1c98f60ab736df98d62a8d4988f61678d5) + PR (e5724b5fde855b67ff500e21f11274f12b5c893a)

0$ time src/bitcoind -nonetworkactive -nodnsseed -debug=reindex -reindex
1...
2real	88m30,689s
3user	43m33,576s
4sys	4m24,282s

hebasto approved

hebasto commented at 9:39 pm on August 7, 2020: member

re-ACK 84b995a9ef468e0badf7bc0378851ea8c698dfc7, the only change since my previous review:

 0$ git diff e5724b5fde855b67ff500e21f11274f12b5c893a 84b995a9ef468e0badf7bc0378851ea8c698dfc7
 1diff --git a/test/functional/feature_reindex.py b/test/functional/feature_reindex.py
 2index d1bdbd486..1800e382f 100755
 3--- a/test/functional/feature_reindex.py
 4+++ b/test/functional/feature_reindex.py
 5@@ -60,7 +60,7 @@ class ReindexTest(BitcoinTestFramework):
 6             'LoadExternalBlockFile: Out of order block',
 7             'LoadExternalBlockFile: Processing out of order child',
 8         ]):
 9-            extra_args = [["-reindex", "-debug=reindex"]]
10+            extra_args = [["-reindex"]]
11             self.start_nodes(extra_args)
12 
13         # All blocks should be accepted and processed.

adamjonas commented at 0:31 am on August 16, 2020: member

I ran another round of benchmarks of a reindex up to block 600_000 on 84b995a9ef468e0badf7bc0378851ea8c698dfc7 ~~(it says ibd on the graph, but it’s a reindex).~~

~~There is a definite speedup but with a much wider range of peak memory usage:~~ [edit: I got this wrong. Will repost with the proper results.]

LarryRuane commented at 5:03 am on August 24, 2020: contributor

@pinheadmz, @hebasto, I just added another commit, please review. I discovered an additional minor performance improvement in the case where the block is already present in the block index (the message Block Import: already had block ... appears). This happens when you start with -reindex but the client stops before completing, and you start again without -reindex. I didn’t attempt to measure the performance difference. I recommend ignoring whitespace because I took the chance to fix some broken indentation since I’m changing some of those exact lines.

pinheadmz commented at 5:13 pm on August 25, 2020: member

@LarryRuane top commit makes sense to me, no point in reading the block data if we already have it. I’m running another reindex now to test it out, and will also try to get an “already had…” message. Also digesting the suggestion by @hebasto to skip corrupted block headers. Am able to modify the test by adding “corrupt” bytes and make sure the process just skips the block instead of completely borking.

pinheadmz approved

pinheadmz commented at 3:25 pm on August 26, 2020: member

ACK 969366d07c8cfdf8ea56cdb2ae11f5c648152345

Built on OSX 10.14. Unit tests OK, Functional tests OK.

Modified the feature_reindex to corrupt the blk file and check that reindexing stopped. I wanted to try and catch the already had block... case but that’s proving hard to automate.

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA256
 2
 3ACK 969366d07c8cfdf8ea56cdb2ae11f5c648152345
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAl9GfcsACgkQ5+KYS2KJ
 7yToSWg/+Oup611s3ituQsn0AhwPfgupkrJNNXVLyBFJeejQcHDUwfyyzHT7IaxQk
 8TkEUb3j6VUS2yOmuhH6TX2gPQn2t8QXE6iRguDnIlr9yf661pE0qGmxRLEnPnWFh
 9UF4Mv7z6TYssqMN1a1Eu6Tute/RggoNKuZOimFXmY5Zr/AlBv6jkR/CKvld9i47p
10fQN3sI2nDKZz5y5blbt9JIOvSOTGsKbaLZOzUAXtuv34C8aLxywNLGsHdDt2qVuJ
11Z0uc/o/gQkdje/AIr9cAjbsI2Tw+G3P0ziVBwP5Iae5cOaakjw+DLrK6Vo9f/tOi
12rVOLJwpVU+MnYAfXke2jNauQzJRunya3LbvxVEHdepwrOd6uKk8LzXnQdj41dBup
13klPPp3bnHKlm0PYPw/qKKJpjKbKH/nw4r30fJ7kDbDtzg7OP2nlyLTNQ8CTDahCr
14WXP1Bp+Qbr1z1oi1j4EvB6SuH0WvS01sani4IxIJv5sLSGlVhQTH6gQqe0irlpT3
15WiivqOkdUyaG9joF2tCf/eX0MAireZ8+f2Rbm7+HQlHZpV/u0P9anfJZhoeJgpwV
16vdgWQIKhwBqsuvD2CzCMoHWVWFVdMbVT+vOW4URZIfWrDrEoDxBjNQTJEfDEnCF+
172EHZQaZ8D1fBMr2e/0cULjpTKKnBPEQpDHXbcHDJf6VETiG33ZI=
18=DBGs
19-----END PGP SIGNATURE-----

pinheadmz’s public key is on keybase

maflcko referenced this in commit 15886b08aa on Aug 27, 2020

sidhujag referenced this in commit 9c0a0b14dd on Aug 28, 2020

hebasto commented at 5:23 pm on September 12, 2020: member

The latest commit 969366d07c8cfdf8ea56cdb2ae11f5c648152345 looks good, but it adds another behavior change into this PR. I think it deserves its own PR to get the measurable effects from each improvement.

LarryRuane commented at 4:59 pm on September 13, 2020: contributor

The latest commit 969366d looks good, but it adds another behavior change into this PR. I think it deserves its own PR to get the measurable effects from each improvement.

I see it differently. In my view, there’s only one behavior change, which is “don’t deserialize the full block unless necessary.” If initially, I had made a single commit that’s the same as what are now two commits, no one would have been said this is two behavior changes. What happened instead is that I simply missed one of the code paths in which it’s not necessary to deserialize the entire block (only the header is needed). I made a second, separate commit to fix that only because some reviewers may have already seen the first commit. These two commits should probably be squashed. I see no reason to measure the performance of the two commits separately. It’s clear that if one of these commits is worth merging, the other one is too.

decryp2kanon commented at 6:34 am on December 10, 2020: contributor

Concept ACK

DrahtBot added the label Needs rebase on Feb 1, 2021

LarryRuane force-pushed on Feb 8, 2021

LarryRuane commented at 11:05 pm on February 8, 2021: contributor

Rebased, no changes except to improve a few of the commit messages.

DrahtBot removed the label Needs rebase on Feb 9, 2021

in src/test/streams_tests.cpp:14 in 632499c630 outdated

 8@@ -9,6 +9,18 @@
 9 
10 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
11 
12+// This is a macro rather than a function so that failure
13+// line numbers are reported correctly.
14+#define BOOST_CHECK_THROW_STR(statement, exception, str)   \

luke-jr commented at 6:19 pm on February 28, 2021:

Rather not call our own macros BOOST_*…

LarryRuane commented at 12:59 pm on March 15, 2021:

Rather not call our own macros BOOST_* …

Good point; I just removed this (new) macro because it’s out of the scope for this PR, force-pushed (but didn’t rebase).

LarryRuane force-pushed on Mar 15, 2021

DrahtBot added the label Needs rebase on Jun 12, 2021

LarryRuane force-pushed on Jun 14, 2021

LarryRuane commented at 3:58 am on June 14, 2021: contributor

Rebased to resolve merge conflicts.

DrahtBot removed the label Needs rebase on Jun 14, 2021

PastaPastaPasta referenced this in commit 843c166c62 on Jun 27, 2021

PastaPastaPasta referenced this in commit 53dd955007 on Jun 28, 2021

PastaPastaPasta referenced this in commit 81b6e725af on Jun 29, 2021

DrahtBot added the label Needs rebase on Jun 29, 2021

PastaPastaPasta referenced this in commit 8d04b42f5f on Jul 1, 2021

PastaPastaPasta referenced this in commit 901b101fb4 on Jul 1, 2021

PastaPastaPasta referenced this in commit 3122a6b5a6 on Jul 15, 2021

PastaPastaPasta referenced this in commit cd741d0d4d on Jul 16, 2021

LarryRuane force-pushed on Jul 28, 2021

LarryRuane commented at 0:50 am on July 28, 2021: contributor

Force-pushed rebase for merge conflicts, and also, @hebasto, I had added a call to CheckProofOfWork() as you suggested in this comment above, but I looked more carefully and found that this function is already called as part of importing each block (reindexing), so I removed that call as part of this rebase. It would help performance only if many blocks failed this check, but very few, if any, do.

DrahtBot removed the label Needs rebase on Jul 28, 2021

LarryRuane force-pushed on Jul 28, 2021

LarryRuane force-pushed on Nov 20, 2021

LarryRuane commented at 5:28 pm on November 20, 2021: contributor

Rebased (no changes to this PR’s code), there were no conflicts but just because some people would like to review, nice to have a fresh base.

in src/validation.cpp:4212 in 0908d01954 outdated

4217 
4218+                    // This block can be processed immediately; rewind to its start then read it.
4219+                    blkdat.SetPos(nBlockPos);
4220+                    std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();
4221+                    CBlock& block = *pblock;
4222+                    blkdat >> block;

stickies-v commented at 9:48 pm on November 22, 2021:

It looks like we’re deserializing the header twice now. CBlock already has a constructor that takes a CBlockHeader, so potentially we could use that for another minor performance improvement? However, I think that would prevent us from using the overloaded » operator - but I’m not familiar with this at all. So, probably you’ll want to keep it like this, but I’m just flagging in case anyone sees any opportunities here?

stickies-v commented at 10:08 pm on November 22, 2021:

Also, I think technically you could release/reacquire the lock before/after the block serialization, but I’m not sure if that would be a net positive improvement? I can imagine there will be a performance penalty for doing that.

LarryRuane commented at 6:22 am on December 8, 2021:

Thanks for the excellent and helpful review!

It looks like we’re deserializing the header twice now

Yes, but since the header is only 80 bytes (less than 0.01% of a 1 MB block), this likely isn’t much of a problem. Besides, I don’t think it’s possible to deserialize a header, and then deserialize only the rest of the block, and end up with a CBlock.

the amount of code executed under cs_main lock increases, since it is now used during the entire block deserialization … you could release/reacquire the lock before/after the block serialization

Yes, good point, I implemented a patch to not hold cs_main while deserializing the block (then it’s acquired and released 3 times per block read from disk, rather than once). The result was that the reindex time was slower by 15 minutes – only about 1% but still measurable (although I don’t know how much of that difference is due to random variation). I think not much else is going on during reindex; there are other threads in the system but they’re just sleeping, so I don’t think it’s okay to leave it as it is.

stickies-v commented at 11:04 pm on November 22, 2021: contributor

Reviewed until f8f9ba31f

Summary of changes to verify my understanding is correct:

behaviour changes are limited to CChainState::LoadExternalBlockFile, which is only called from AppInitMain if global fReindex is true or if blocks are provided through -loadblock. All other paths are unaffected.
Instead of deserializing the entire block, only the header is deserialized. After checking that the parent block is known and that the block is not already in the index, the block is deserialized entirely and processed. This does not introduce an extra disk read operation, since the entire block is buffered already. This is seems like a strict performance improvement.
as discussed by sipa and ajtowns, this increases dependency on nSize not being corrupted in the blk<>.dat files. I will follow but not participate in the discussion as I lack relevant experience and don’t have a strong view, but at first sight I think this is fine? If the blk<>.dat files are suspected to be corrupt, they should probably be discarded and redownloaded anyway?
the amount of code executed under cs_main lock increases, since it is now used during the entire block deserialization too (note: now this deserialization does not require any disk reads, as mentioned earlier).

Some general nits:

nBlockPos + nSize is now being used 3 times. For readability, would it be worth introducing a new variable, e.g. auto BlockEndPos = nBlockPos + nSize?
nRewind is a variable that predates this PR, so feel free to ignore if this is not the place to fix it. I think the name is unnecessarily confusing. It indicates being a relative offset versus the current position and to have a backwards direction, but neither is true. I think e.g. nextPos would be more descriptive? Would it make sense to sneak that in with this PR?

Thank you Larry for the very clear PR, and for helping me debug and fix some local issues I had. I think this PR is a very nice improvement. I will do some more performance testing before tACK but since you’ve already done that, I don’t think I’ll have much to add later on.

LarryRuane commented at 6:43 am on December 8, 2021: contributor

I tested the performance of this PR on a Raspberry Pi 4 (RPi4, a myNode), and this PR reduces the reindex real-time by 7 hours (from 28h15m down to 21h15m for the first phase of reindex (reading the blocks from disk into the block index). This is, admittedly, a best-case comparison, because the RPi has a slow CPU, so not deserializing most blocks twice is a significant win.

LarryRuane commented at 7:19 pm on December 10, 2021: contributor

One further update on performance measurement: most of the -reindex time is spent in the second phase during which the node is building the best chain (the UpdateTip: new best= log messages), and this PR doesn’t change this part of the reindex process. This myNode required 4 days and 15 hours to complete the entire reindex (both phases). The 7 hours improvement is, of course, a much smaller fraction of this time, about 6%. That’s not as dramatic but still significant.

The reindex time was probably affected by the following configuration settings (which are the default for myNode, but I don’t know what all these do):

0txindex=1
1dbcache=1000
2maxorphantx=10
3maxmempool=250
4maxconnections=40
5maxuploadtarget=1000

Enabling txindex definitely slows down reindex, so without that option, the percentage improvement from this PR would increase slightly (but I don’t know by how much). Tor was also enabled, but I don’t think that affects reindex time.

I recall that IBD on this myNode took 2 or 3 weeks (I wish I had measured it, I can do that if anyone’s interested), so, at least on this hardware, doing a reindex is a much better option than a full IBD, to minimize both elapsed time and network usage.

stickies-v commented at 5:28 pm on December 17, 2021: contributor

tACK f8f9ba31f

I finished my performance review, and the results seem to align with Larry’s findings:

The test setup is as follows: on each machine, I ran the first part of the reindex multiple times consecutively to also measure the stability of the timings. This set of reindexes was ran both on bitcoin:master branch as well as on LarryRuane:reindex-speedup f8f9ba31f.

On my laptop, reindexing took 1h43 mins on average (n=3) on master branch. On f8f9ba31f, the average (n=3) took only 1h13m, a decrease of 29%. On my Raspberry Pi 4, reindexing took 18h46 mins on average (n=2) on master branch. On f8f9ba31f, the average (n=2) took only 13h27m, a decrease of 28%.

Across all these runs, the maximum runtime difference on each setup was < 4%, so even though sample size is pretty small, the results look quite reliable.

LarryRuane force-pushed on Dec 27, 2021

LarryRuane commented at 7:18 am on December 27, 2021: contributor

Force-pushed to implement review suggestion to not hold cs_main while deserializing the block, also clean up the commits, ready for review.

LarryRuane force-pushed on Dec 27, 2021

LarryRuane commented at 8:19 pm on December 27, 2021: contributor

Rebased to latest master to fix CI failure.

LarryRuane commented at 7:03 pm on December 28, 2021: contributor

Checking the reindex performance of the latest commit 7da6497bd7f0eaf6d20a1d0f83a2c849f3ba344f, on my Dell XPS-15 laptop, mainnet, default configuration except txindex is enabled. I also specified -networkactive=0 to eliminate any possible measurement noise caused by background networking activity (reindexing does not require any networking).

The result is that the reduction in reindex time with this PR is 82 minutes (from 362 minutes on master, down to 280 minutes with the PR for the first part of reindexing, adding all blocks to the index).

I also measured CPU time, and it’s 81 minutes lower with this PR, which makes sense since this PR is a CPU-time improvement. @hebasto, @fjahr, @vasild, @kallewoof, @pinheadmz, @stickies-v , would you mind taking another look and re-ACK if appropriate? Thanks.

DrahtBot added the label Needs rebase on Jan 27, 2022

LarryRuane force-pushed on Feb 5, 2022

DrahtBot removed the label Needs rebase on Feb 6, 2022

LarryRuane force-pushed on Feb 7, 2022

LarryRuane commented at 0:20 am on February 8, 2022: contributor

Force-pushed dd83d9e to fix CI failure with 5a9387d, which had a hidden merge conflict with #22932. This actually reduced the size and complexity of this patch.

theStack commented at 4:58 pm on April 17, 2022: contributor

Concept ACK

gades referenced this in commit efffb173db on May 9, 2022

LarryRuane force-pushed on Jun 8, 2022

LarryRuane force-pushed on Jun 9, 2022

LarryRuane commented at 5:20 am on June 9, 2022: contributor

Force-pushed to resolve hidden conflict, and make the functional test less brittle (not rely on hardcoded constants).

DrahtBot added the label Needs rebase on Jul 29, 2022

LarryRuane force-pushed on Aug 1, 2022

LarryRuane commented at 6:08 pm on August 1, 2022: contributor

Force-pushed rebase due to conflict with #25571

DrahtBot removed the label Needs rebase on Aug 1, 2022

in src/streams.h:690 in fa2dd4e9d5 outdated

683@@ -684,6 +684,26 @@ class CBufferedFile
684         }
685     }
686 
687+    //! move the read position ahead in the stream to the given position
688+    void SkipTo(const uint64_t file_pos) {
689+        assert(file_pos >= m_read_pos);
690+        // Use SetPos() to back up in the stream, not SkipTo().

stickies-v commented at 3:43 pm on August 2, 2022:

nit: I think we usually put comments before the relevant section?

0        // Use SetPos() to back up in the stream, not SkipTo().
1        assert(file_pos >= m_read_pos);

in src/streams.h:696 in fa2dd4e9d5 outdated

691+        if (file_pos > nReadLimit) {
692+            throw std::ios_base::failure("Attempt to position past buffer limit");
693+        }
694+        while (m_read_pos < file_pos) {
695+            if (m_read_pos == nSrcPos)
696+                Fill();

stickies-v commented at 4:06 pm on August 2, 2022:

I don’t think it’s worth the optimization so just thinking out loud, but for large skips (many times the buffer size) it would seem Fill() can be optimized to temporarily ignore nRewind so larger chunks (of size vchBuf.size() instead of vchBuf.size() - nRewind) are read at once? I don’t think we need to guarantee a rewind during the SkipTo(), until we’ve reached the destination position?

LarryRuane commented at 6:33 pm on August 4, 2022:

Good observation, and actually the same applies to the read() method. I looked into making this optimization, but I think it would make this PR too complex; maybe it can be done in a follow-up, but I don’t think it’s important in practice.

in src/streams.h:698 in fa2dd4e9d5 outdated

693+        }
694+        while (m_read_pos < file_pos) {
695+            if (m_read_pos == nSrcPos)
696+                Fill();
697+            unsigned int pos = m_read_pos % vchBuf.size();
698+            size_t nNow = file_pos - m_read_pos;

stickies-v commented at 4:15 pm on August 2, 2022:

nit: could use list initialization?

0            unsigned int pos{m_read_pos % vchBuf.size()};
1            size_t nNow{file_pos - m_read_pos};

in test/functional/feature_reindex.py:51 in fa2dd4e9d5 outdated

46+            # Find the offsets of blocks 2, 3, and 4 (the first 3 blocks beyond genesis)
47+            # by searching for the regtest marker bytes (see pchMessageStart).
48+            def find_block(b, start):
49+                while [b[start+i] for i in range(4)] != [0xfa, 0xbf, 0xb5, 0xda]:
50+                    start += 1
51+                return start+4

stickies-v commented at 8:34 pm on August 2, 2022:

You could also use the built-in find() function I think:

0                return b.find(b'\xfa\xbf\xb5\xda', start)+4

in src/test/streams_tests.cpp:453 in fa2dd4e9d5 outdated

447@@ -399,6 +448,17 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file_rand)
448                 break;
449             }
450             case 3: {
451+                // SkipTo is similar to the "read" cases above, except
452+                // we don't receive the data.
453+                size_t nSize = InsecureRandRange(5);

stickies-v commented at 8:38 pm on August 2, 2022:

nit: could use list initialization?

0                size_t nSize{InsecureRandRange(5)};

in src/validation.cpp:4306 in fa2dd4e9d5 outdated

4303             }
4304             try {
4305-                // read block
4306-                uint64_t nBlockPos = blkdat.GetPos();
4307+                // read block header
4308+                const uint64_t nBlockPos = blkdat.GetPos();

stickies-v commented at 8:38 pm on August 2, 2022:

nit: could use list initialization?

0                const uint64_t nBlockPos{blkdat.GetPos()};

in src/validation.cpp:4312 in fa2dd4e9d5 outdated

4315-                nRewind = blkdat.GetPos();
4316-
4317-                uint256 hash = block.GetHash();
4318+                CBlockHeader header;
4319+                blkdat >> header;
4320+                const uint256 hash = header.GetHash();

stickies-v commented at 8:39 pm on August 2, 2022:

nit: could use list initialization?

0                const uint256 hash{header.GetHash()};

in src/validation.cpp:4333 in fa2dd4e9d5 outdated

4347-                      if (state.IsError()) {
4348-                          break;
4349-                      }
4350+                        // This block can be processed immediately; rewind to its start, read and deserialize it.
4351+                        blkdat.SetPos(nBlockPos);
4352+                        std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();

stickies-v commented at 8:39 pm on August 2, 2022:

nit: could use list initialization?

0                        std::shared_ptr<CBlock> pblock{std::make_shared<CBlock>()};

stickies-v approved

stickies-v commented at 8:41 pm on August 2, 2022: contributor

re-ACK fa2dd4e9d57f3f0240273b97d885079f6a57ebc8

Left a few more nits/suggestions, but please disregard if they’d delay merging this PR, nothing crucial.

LarryRuane force-pushed on Aug 4, 2022

LarryRuane commented at 6:28 pm on August 4, 2022: contributor

@stickies-v, thanks for the review, I decided to force-push all your suggestions. This force-push also includes a small refactor (adds the FillBufferUpToLength() to eliminate some duplicated code). Just to make sure the performance improvement still exists, I reindexed testnet3 (it’s much faster than mainnet), and with this PR in its current form, the time to read the block files (set up the block index) is 8 minutes, and without this PR, the time is 13 minutes.

LarryRuane force-pushed on Aug 4, 2022

in src/streams.h:652 in 50e922f9ec outdated

639+        if (length > vchBuf.size() - buffer_offset)
640+            length = vchBuf.size() - buffer_offset;
641+        if (length > nSrcPos - m_read_pos)
642+            length = nSrcPos - m_read_pos;
643+        return length;
644+    }

stickies-v commented at 7:15 pm on August 4, 2022:

Ahh, the code duplication was bothering me a bit. Nice cleanup! I think, while we’re on it, this would make it even better:

0    size_t FillBufferUpToLength(const size_t buffer_offset, const size_t length)
1    {
2        assert(m_read_pos <= nSrcPos);
3        if (m_read_pos == nSrcPos) Fill();
4        return std::min({length, vchBuf.size() - buffer_offset, static_cast<size_t>(nSrcPos - m_read_pos)});
5    }

stickies-v commented at 7:34 pm on August 4, 2022:

Or for improved readability (not sure if necessary, and variable names just a suggestion):

0    size_t FillBufferUpToLength(const size_t buffer_offset, const size_t length)
1    {
2        assert(m_read_pos <= nSrcPos);
3        if (m_read_pos == nSrcPos) Fill();
4        size_t buffer_available{vchBuf.size() - buffer_offset};
5        size_t bytes_until_source{nSrcPos - m_read_pos};
6        return std::min({length, buffer_available, bytes_until_source});
7    }

stickies-v commented at 12:09 pm on August 5, 2022:

What do you think about this diff (based on d60fa16)? I didn’t like the fact that FillBufferUpToLength() would call Fill() but would not progress m_read_pos. With this change, (what’s now called) ReadBufferOnceUpToLength() both progress m_read_pos and calls Fill() if necessary, without doing any looping.

Besides feeling more intuitive (to me), it also allows for some more code cleanup. My main concern is that in read() we now progress m_read_pos before copying it to dst, so if something fails there we’d need to rewind the buffer. I don’t actually see that be a problem, but wanted to mention it in case I’m missing something?

Oh and I reverted everything back to uint64_t, since those are the types used for CBufferedFile’s relevant members? Hopefully that’ll fix the CI issues, but I’m a bit out of my depth here.

 0
 1diff --git a/src/streams.h b/src/streams.h
 2index 23e617e56..dbc6e4c03 100644
 3--- a/src/streams.h
 4+++ b/src/streams.h
 5@@ -629,15 +629,18 @@ private:
 6         return true;
 7     }
 8 
 9-    //! Make up to the requested 'length' (but at least 1) bytes available
10-    //! in the buffer, starting at stream offset 'm_read_pos', filling the
11-    //! buffer if no bytes are available.
12-    //! Return the (possibly lower) number of bytes actually available.
13-    size_t FillBufferUpToLength(size_t buffer_offset, size_t length)
14+    //! Read up to the requested 'length' bytes in the buffer, starting at stream offset
15+    //! 'm_read_pos', filling the buffer if no bytes are available.
16+    //! Return how many bytes were read in this iteration, which may be less than 'length'.
17+    uint64_t ReadBufferOnceUpToLength(const uint64_t length)
18     {
19         assert(m_read_pos <= nSrcPos);
20-        if (m_read_pos == nSrcPos) Fill();
21-        return std::min({length, vchBuf.size() - buffer_offset, static_cast<size_t>(nSrcPos - m_read_pos)});
22+        if (m_read_pos == nSrcPos && length > 0) Fill();
23+        uint64_t advance{std::min({length,                                       // bytes requested
24+                                   vchBuf.size() - (m_read_pos % vchBuf.size()), // bytes until buffer end
25+                                   nSrcPos - m_read_pos})};                      // bytes not yet read
26+        m_read_pos += advance;
27+        return advance;
28     }
29 
30 public:
31@@ -680,12 +683,10 @@ public:
32         if (dst.size() + m_read_pos > nReadLimit) {
33             throw std::ios_base::failure("Read attempted past buffer limit");
34         }
35-        while (dst.size() > 0) {
36-            uint64_t pos{m_read_pos % vchBuf.size()};
37-            size_t length{FillBufferUpToLength(pos, dst.size())};
38-            memcpy(dst.data(), &vchBuf[pos], length);
39-            dst = dst.subspan(length);
40-            m_read_pos += length;
41+        while (uint64_t bytes_read{ReadBufferOnceUpToLength(dst.size())}) {
42+            uint64_t pos{(m_read_pos - bytes_read) % vchBuf.size()};
43+            memcpy(dst.data(), &vchBuf[pos], bytes_read);
44+            dst = dst.subspan(bytes_read);
45         }
46     }
47 
48@@ -697,11 +698,7 @@ public:
49         if (file_pos > nReadLimit) {
50             throw std::ios_base::failure("Attempt to position past buffer limit");
51         }
52-        while (m_read_pos < file_pos) {
53-            uint64_t pos{m_read_pos % vchBuf.size()};
54-            size_t length{FillBufferUpToLength(pos, file_pos - m_read_pos)};
55-            m_read_pos += length;
56-        }
57+        while (ReadBufferOnceUpToLength(file_pos - m_read_pos));
58     }
59 
60     //! return the current reading position

LarryRuane commented at 7:17 pm on August 4, 2022: contributor

Force-pushed again to fix (I hope) CI failure (ARM (32-bit), where size_t is 32 bits instead of 64).

in src/streams.h:634 in 50e922f9ec outdated

628@@ -629,6 +629,20 @@ class CBufferedFile
629         return true;
630     }
631 
632+    //! Try to get up to the requested length bytes available in the buffer,
633+    //! starting at stream offset m_read_pos, filling the buffer if necessary.
634+    //! Return the (possibly lower) number of bytes actually available,

stickies-v commented at 7:22 pm on August 4, 2022:

0    //! Return the (possibly lower) number of bytes actually available.

in src/streams.h:633 in 50e922f9ec outdated

628@@ -629,6 +629,20 @@ class CBufferedFile
629         return true;
630     }
631 
632+    //! Try to get up to the requested length bytes available in the buffer,
633+    //! starting at stream offset m_read_pos, filling the buffer if necessary.

stickies-v commented at 7:24 pm on August 4, 2022:

I think “filling the buffer if necessary.” is confusing - we only fill the buffer if m_read_pos == nSrcPos, whereas I’d interpret “if necessary” as more broad than that.

stickies-v commented at 7:27 pm on August 4, 2022: contributor

I like this force push! Left a few more comments (on the change) that I think would be good to implement before re-ACK.

LarryRuane force-pushed on Aug 4, 2022

LarryRuane commented at 9:33 pm on August 4, 2022: contributor

Force-pushed for review suggestions, thanks!

LarryRuane force-pushed on Aug 6, 2022

LarryRuane commented at 2:18 am on August 6, 2022: contributor

Force-pushed to fix CI failures

LarryRuane force-pushed on Aug 9, 2022

LarryRuane commented at 10:45 pm on August 9, 2022: contributor

Force-pushed to address review comments (thank you, @stickies-v), ready for review.

stickies-v approved

stickies-v commented at 11:56 am on August 10, 2022: contributor

re-ACK 451b1f3

I think this latest refactoring strikes a nice balance between optimization, maintenance and readability. Thank you for incorporating my comments, and my apologies to previous reviewers for making you drift away a bit further from the initial proposal. I think this is great! Hopefully we can get some more (re) ACKs on this soon.

DrahtBot added the label Needs rebase on Aug 30, 2022

achow101 added the label Needs Benchmark on Oct 12, 2022

LarryRuane commented at 7:01 pm on October 20, 2022: contributor

Rebased, I’ll look into adding a benchmark test.

LarryRuane force-pushed on Oct 20, 2022

DrahtBot removed the label Needs rebase on Oct 20, 2022

in src/streams.h:615 in 6c8bc9a19c outdated

611@@ -612,7 +612,7 @@ class CBufferedFile
612     uint64_t nRewind;     //!< how many bytes we guarantee to rewind
613     std::vector<std::byte> vchBuf; //!< the buffer
614 
615-protected:
616+private:

aureleoules commented at 12:59 pm on October 21, 2022:

nit e1df0eb806d919cecceebcb2c72c3f54b1a529f5

there is already a private identifier above

aureleoules approved

aureleoules commented at 3:37 pm on October 21, 2022: member

Code Review ACK 6c8bc9a19cc9f35a93d264b69f63d4c93e2c7da2 LGTM and saw a 25%-30% reduction in the first part of reindexing on testnet.

in test/functional/feature_reindex.py:49 in 6c8bc9a19c outdated

44+            b = bf.read(2000)
45+
46+            # Find the offsets of blocks 2, 3, and 4 (the first 3 blocks beyond genesis)
47+            # by searching for the regtest marker bytes (see pchMessageStart).
48+            def find_block(b, start):
49+                return b.find(b'\xfa\xbf\xb5\xda', start)+4

theStack commented at 2:09 pm on October 24, 2022:

nit: could replace these magic bytes

0                return b.find(MAGIC_BYTES["regtest"], start)+4

(the MAGIC_BYTES dictionary is defined in test_framework.p2p)

theStack approved

theStack commented at 2:50 pm on October 24, 2022: contributor

Code-review ACK 6c8bc9a19cc9f35a93d264b69f63d4c93e2c7da2

In addition to the performance improvement, this PR also increases the readibility of the CBufferedFile::read logic by using more expressive names (special kudos for using std::min for determining the advance value, nice idea!).

Did three -reindex runs for both the master and PR branch (with a patch to shut down after emitting the “Reindexing finished” debug message) each on testnet at block height 2377503 with quite significant speed-up results:

master:

0real    4m58.987s, user    4m0.337s, sys     0m38.207s
1real    4m54.370s, user    4m2.202s, sys     0m36.742s
2real    4m54.738s, user    4m2.151s, sys     0m37.362s

PR:

0real    3m33.332s, user    2m42.151s, sys     0m36.165s
1real    3m52.371s, user    2m42.028s, sys     0m37.558s
2real    3m36.388s, user    2m41.354s, sys     0m36.710s

LarryRuane force-pushed on Oct 24, 2022

LarryRuane commented at 6:48 pm on October 24, 2022: contributor

Force-pushed to add benchmark (@achow101 please remove Needs Benchmark label), the PR shows about 16 to 1 improvement on my system.

Also addressed review comments (thanks). @theStack, @aureleoules, @stickies-v, @vasild, @hebasto, please re-review, thank you!

LarryRuane force-pushed on Oct 24, 2022

Add LoadExternalBlockFile() benchmark 48a68908ba

util: add CBufferedFile::SkipTo() to move ahead in the stream

SkipTo() reads data from the file into the CBufferedFile object
(memory), but, unlike this object's read() method, SkipTo() doesn't
transfer data into a caller's memory buffer. This is useful because
after skipping forward in the stream in this way, the user can, if
needed, rewind the stream (SetPos()) and access the object's memory
buffer including ranges that were skipped over (without needing to
read from the disk file).

c72de9990a

Faster -reindex by initially deserializing only headers

When a block is initially read from a blk*.dat file during reindexing,
it can be added to the block index only if all of its ancestor blocks
have been added, which is rare. If the block's ancestors have not been
added, the block must be re-read from disk later when it can be added.

This commit: During the initial block read, deserialize only its header,
rather than the entire block, since this is sufficient to determine
if its parent (and thus all its ancestors) has been added. This is a
performance improvement.

db929893ef

LarryRuane force-pushed on Oct 24, 2022

LarryRuane commented at 7:03 pm on October 24, 2022: contributor

Another force-push (actually 2 of them) to fix linter errors.

aureleoules approved

aureleoules commented at 12:36 pm on October 25, 2022: member

ACK db929893ef0bc86ea2708cdbcf41152240cd7c73 - minor changes and new benchmark since last review

Force-pushed to add benchmark, the PR shows about 16 to 1 improvement on my system.

The results of this benchmark are even more impressive on my machine (SSD):

Master

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,088,323,673.00	0.92	0.9%	7,649,033,881.00	2,438,838,336.00	3.136	412,805,077.00	0.7%	12.18	`LoadExternalBlockFile`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,078,400,062.00	0.93	0.2%	7,649,033,832.00	2,417,239,776.00	3.164	412,805,028.00	0.6%	11.88	`LoadExternalBlockFile`

PR (~37x faster)

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
29,175,159.00	34.28	0.6%	1,369,544.00	1,728,864.00	0.792	88,238.00	1.2%	0.34	`LoadExternalBlockFile`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
27,193,734.00	36.77	1.2%	1,369,543.00	1,647,264.00	0.831	88,237.00	1.5%	0.32	`LoadExternalBlockFile`

theStack approved

theStack commented at 3:48 pm on October 25, 2022: contributor

re-ACK db929893ef0bc86ea2708cdbcf41152240cd7c73

The benchmark yields a ~13.9x speed-up on my machine.

master branch (commit 48a68908ba3d5e077cda7bd1e908b923fbead824):

ns/op	op/s	err%	total	benchmark
309,424,533.00	3.23	0.6%	3.41	`LoadExternalBlockFile`

PR branch (commit db929893ef0bc86ea2708cdbcf41152240cd7c73):

ns/op	op/s	err%	total	benchmark
22,226,060.00	44.99	3.6%	0.26	`LoadExternalBlockFile`

in src/bench/load_external.cpp:23 in db929893ef

18+ * from disk later when it encounters its parent.)
19+ *
20+ * This benchmark measures the performance of deserializing the block (or just
21+ * its header, beginning with PR 16981).
22+ */
23+static void LoadExternalBlockFile(benchmark::Bench& bench)

stickies-v commented at 8:00 pm on October 25, 2022:

nit: for grepping purposes, perhaps useful to name this fn slightly different?

0static void BenchLoadExternalBlockFile(benchmark::Bench& bench)

in src/bench/load_external.cpp:43 in db929893ef

38+    // Create the test file.
39+    {
40+        // "wb+" is "binary, O_RDWR | O_CREAT | O_TRUNC".
41+        FILE* file{fsbridge::fopen(blkfile, "wb+")};
42+        // Make the test block file about 128 MB in length.
43+        for (size_t i = 0; i < node::MAX_BLOCKFILE_SIZE / ss.size(); ++i) {

stickies-v commented at 8:03 pm on October 25, 2022:

nit: a bit more clear without i imo.

0        for (size_t written = 0; written < node::MAX_BLOCKFILE_SIZE; written += ss.size()) {

stickies-v approved

stickies-v commented at 8:12 pm on October 25, 2022: contributor

re-ACK db929893e

Did not go super deep on the benchmark test code but looks alright. Other than that, no meaningful changes since my previous ACK.

I saw a bench performance increase of ~24x, very impressive.

2 nits, but would prefer to try and get merged as is - only for when you really need to retouch.

WIthout improvement (48a68908ba3d5e077cda7bd1e908b923fbead824):

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|      246,969,041.00 |                4.05 |    2.1% |      2.71 | `LoadExternalBlockFile`

With improvement (db929893ef0bc86ea2708cdbcf41152240cd7c73):

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|       10,413,666.00 |               96.03 |    0.6% |      0.12 | `LoadExternalBlockFile`

andrewtoth commented at 7:29 pm on October 28, 2022: contributor

ACK db929893ef0bc86ea2708cdbcf41152240cd7c73

Benchmark runs roughly 42x speedup over master. Very nice!

~~Would it make sense to have a fuzz target for CBufferedFile?~~ I see that exists already.

This Branch (db929893ef0bc86ea2708cdbcf41152240cd7c73)

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
15,150,607.00	66.00	2.0%	1,366,806.00	948,612.00	1.441	87,657.00	0.6%	0.20	`LoadExternalBlockFile`

Master (1bad29fe02096b269858b192a4a91c785a44e505) with the benchmark commit (48a68908ba3d5e077cda7bd1e908b923fbead824) cherry-picked

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
629,905,824.00	1.59	1.8%	7,542,006,593.00	1,353,516,328.00	5.572	392,803,992.00	0.4%	7.07	`LoadExternalBlockFile`

achow101 commented at 0:07 am on November 16, 2022: member

ACK db929893ef0bc86ea2708cdbcf41152240cd7c73

achow101 merged this on Nov 16, 2022

achow101 closed this on Nov 16, 2022

LarryRuane deleted the branch on Nov 16, 2022

sidhujag referenced this in commit 111faa6b0b on Nov 16, 2022

bitcoin locked this on Nov 16, 2023

Improve runtime performance of –reindex #16981

Background

This PR

Conflicts

TL;DR

Explanation

Master

PR (~37x faster)