I haven’t checked the linked PR in detail (I have to have some stopping condition, and I think the PR should explain the important context) but I didn’t get the impression until now that we weren’t doing any compaction here before, I just thought you meant that “it’s struggling to keep up”.
Have a few questions here:
- could we rather add this to the other background compaction thread? Do we understand why there’s no automatic compaction?
- does it make sense to do the compaction so often? There’s a reason that’s usually not done for every change, garbage collection is usually a rare event. Wouldn’t it suffice to to it once at the very end only? Maybe 10k isn’t that often, I’ll see if there’s a way to measure it.
- We’re doing a compactionand logging for genesis (height == 0) as well, right? Is it deliberate?
- We seem to be writing each entry separately to LevelDB in a single-value batch and fsync-ing immediately? Could use some kind of batching like we do with the UTXO set? https://github.com/bitcoin/bitcoin/blob/e419b0e17f8acfe577c35c62a8a71a19aad249f3/src/txdb.cpp#L94 and
TxIndex::CustomAppend
and BaseIndex::Commit
already write in batches - wouldn’t that fix the coinstatsindex small files better than doing compactions manually (it should also be a lot faster)
I want to see a global, macro benchmark as well, since the continuous compaction most likely slows it down - but I haven’t worked with this area of the code yet so it may take a while for me to set it up
Here’s what I did to benchmark the performance of only the indexing - created 3 commits:
For simplicity the application terminates after the indexes are successfully created and we’re measuring the time it took to iterate and index them.
I’ve started the application with -connect=0 -coinstatsindex=1
to show the progress without any additional activity for stability and measure the output folder’s size and number of files.
0COMMITS="62d3ad137f70803b210623c51ea3b8a20996b39b 7779473a6e2f3b9fa5b8b97800e29d6fb3299aad abab616a2ba4701816c582dab24593f3f8200b0a 7d3fa1c8e99828751187b13fb1e2de11052c6215";\
1DATA_DIR="/mnt/my_storage/BitcoinData";\
2for COMMIT in $COMMITS; do\
3 killall bitcoind 2>&1; rm -rf "$DATA_DIR/indexes" "$DATA_DIR/debug.log" &&\
4 git fetch -q origin "$COMMIT" >/dev/null 2>&1 || true && git checkout -q "$COMMIT" && git log -1 --pretty='%h %s' "$COMMIT" &&\
5 cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C build >/dev/null 2>&1 &&\
6 ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0 2>&1;\
7 grep 'Indexing finished migrating' "$DATA_DIR/debug.log";\
8 du -h "$DATA_DIR/indexes/coinstatsindex/db";\
9 find "$DATA_DIR/indexes/coinstatsindex/db" -type f -name '*.ldb' | wc -l;\
10done
It seems the indexing takes several hours, I’ll post the result here after it finishes
Edit:
no compaction
: 33869s, 211MB, 112 files
compact at every 10k blocks
: 33998s, 211MB, 16 files
compact at every 100k blocks
: 34055s, 211M, 36 files