I haven’t checked the linked PR in detail (I have to have some stopping condition, and I think the PR should explain the important context) but I didn’t get the impression until now that we weren’t doing any compaction here before, I just thought you meant that “it’s struggling to keep up”.
Have a few questions here:
- could we rather add this to the other background compaction thread? Do we understand why there’s no automatic compaction?
- does it make sense to do the compaction so often? There’s a reason that’s usually not done for every change, garbage collection is usually a rare event. Wouldn’t it suffice to to it once at the very end only? Maybe 10k isn’t that often, I’ll see if there’s a way to measure it.
- We’re doing a compactionand logging for genesis (height == 0) as well, right? Is it deliberate?
I want to see a global, macro benchmark as well, since the continuous compaction most likely slows it down - but I haven’t worked with this area of the code yet so it may take a while for me to set it up
Here’s what I did to benchmark the performance of only the indexing - created 3 commits:
For simplicity the application terminates after the indexes are successfully created and we’re measuring the time it took to iterate and index them.
I’ve started the application with -connect=0 -coinstatsindex=1
to show the progress without any additional activity for stability and measure the output folder’s size and number of files.
0COMMITS="d1e2c95c4d5db1fcfa200276e9053f432a92ff29 6028fdf13a5059c9f44e9a8c4e5a04bc4346749b 461b92236f91b6a1090222185ffa20e481ea02de"; \
1DATA_DIR="/mnt/my_storage/BitcoinData"; \
2for COMMIT in $COMMITS; do \
3 git fetch -q origin "$COMMIT" >/dev/null 2>&1 || true && \
4 git checkout -q "$COMMIT" && \
5 git log -1 --pretty='%h %s' "$COMMIT" && \
6 rm -rf "$DATA_DIR/indexes/coinstats" "$DATA_DIR/debug.log" && \
7 cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && \
8 ninja -C build >/dev/null 2>&1 && \
9 ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0; \
10 grep -m1 'Indexing finished migrating' "$DATA_DIR/debug.log" || true; \
11 du -h "$DATA_DIR/indexes/coinstats/db"; \
12 find "$DATA_DIR/indexes/coinstats/db" -type f -name '*.ldb' | wc -l; \
13done
It seems the indexing takes several hours, I’ll post the result here after it finishes