bitcoind shouldn’t fail to progress with synchronization: endless [leveldb] Generated table … logs #31882

issue GregTonoski openend this issue on February 16, 2025
  1. GregTonoski commented at 3:00 pm on February 16, 2025: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    bitcoind doesn’t progress with synchronization despite approx. 4 months lag, doesn’t consume much CPU or RAM, uses disk IO, produces large amount of logs like e.g. [leveldb] Generated table [#3982690](/bitcoin-bitcoin/3982690/)@4: 31588 keys, 2162389 bytes for hours (there was a test for more than 8 hours).

    Reproduced 3 times out of 3 tests (one with a pre-built version 27).

    Expected behaviour

    bitcoind finishes synchronization in an hour or so.

    Steps to reproduce

    Preconditions:

    1. Bitcoin 28.1
    2. Pruned node (550MB)

    Steps:

    1. Launch bitcoind bitcoind -datadir=/mnt/sd/Bitcoin_data -daemon=0 -debug=all -blocksonly=1 -debuglogfile=/tmp/debug.log -prune=550 -dbcache=2048

    Relevant log output

    2025-02-16T13:04:33Z [leveldb] Generated table [#3987352](/bitcoin-bitcoin/3987352/)@3: 31812 keys, 2161649 bytes

    Also: error code: -28 error message: Starting network threads…

    • after the command: bitcoin-cli stop.

    logs.tar.gz

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    28.1

    Operating system and version

    Arch Linux

    Machine specifications

    Intel x64, 4GB RAM, 32GB SD (exfat), 1Gbit Ethernet, 100/10 Mbit/s FTTH.

  2. GregTonoski renamed this:
    bitcoind shouldn't fail to progress with synchronization: endless [leveldb] Generated table... logs
    bitcoind shouldn't fail to progress with synchronization: endless [leveldb] Generated table ... logs
    on Feb 16, 2025
  3. l0rinc commented at 4:04 pm on February 16, 2025: contributor

    bitcoind finishes synchronization in an hour or so.

    That alone would require a 1.5 Gbps download speed - and even then, we’re not there yet (especially with such a resource-constrained setup). But you could give AssumeUTXO a try to see if it helps - this will jump the process ahead to 95% complete and validate the blocks afterward.

    bitcoind doesn’t progress with synchronization

    The attached iostats and debug.log indicate that the bottleneck is the extremely slow write speed of the SD card (mmcblk0 is often below 1 MB/s), which is causing LevelDB to become the main limiting factor, dramatically slowing down the synchronization process. The logs show frequent LevelDB compaction activity in the chainstate database - meaning that during block verification, bitcoind is repeatedly compacting the database, severely hindering progress. Additionally, exFAT is quite problematic, moving the -datadir to an SSD should drastically improve performance.

    Produces a large amount of logs

    The -debug=all flag generates an excessive amount of logs and isn’t typically needed for normal operation.

  4. GregTonoski commented at 4:41 pm on February 16, 2025: none

    That alone would require a 1.5 Gbps download speed - and even then, we’re not there yet (especially with such a resource-constrained setup).

    Disagree with the assumption of the 1.5 Gbps download speed requirement. There isn’t a complete initial block download from 2009 to 2025 scenario here. There are 4 months of blocks data that remain to be downloaded here.

    I would guess there is a livelock or deadlock or similar issue.

  5. maflcko added the label Resource usage on Feb 17, 2025
  6. maflcko commented at 7:11 am on February 17, 2025: member
    Does it happen with a smaller (default) dbcache? Does it happen on a different filesystem than exfat? Does it happen on a different filesystem hardware, maybe something that is faster? (Can you benchmark the hardware?)
  7. maflcko added the label Questions and Help on Feb 17, 2025
  8. GregTonoski commented at 8:10 am on February 17, 2025: none

    Does it happen with a smaller (default) dbcache?

    Yes, it does happen with default settings, including default value 450 of dbcache.

  9. sipa commented at 2:54 pm on February 18, 2025: member

    would guess there is a livelock or deadlock or similar issue.

    Is there progress at all? As in, is the number of validation blocks going not going up at all? That would be a bug. EDIT: just saw you included logs. It’s just very slow, so clearly no deadlock or livelock.

    If it’s just (very) slow, that may be due to low I/O speed of your disk + pruning + low dbcache. #28280 may speed things up somewhat (which will be in 29.0, not in any release yet).

  10. GregTonoski commented at 6:53 pm on February 18, 2025: none

    just saw you included logs. It’s just very slow, so clearly no deadlock or livelock

    I would suggest investigating the number of “generated table …” logs. There shouldn’t be so many of them especially in comparison to a correct execution.

    There are:

    • 17176 logs out of which: — 15117 logs (or 88%) of the type: [leveldb] out of which: —– 6470 logs of the type: [leveldb] Generated table ....
  11. maflcko commented at 7:14 pm on February 18, 2025: member
    How fast is the hardware? What is the speed for contiguous reads/writes that do not go through the filesystem cache? What is the speed for random read/writes that also do not go through the filesystem cache?
  12. GregTonoski commented at 7:35 pm on February 18, 2025: none

    How fast is the hardware? What is the speed for contiguous reads/writes that do not go through the filesystem cache? What is the speed for random read/writes that also do not go through the filesystem cache?

    Tell me what they are expected to be normally and I will measure the actual values, please.

  13. GregTonoski commented at 9:00 pm on February 18, 2025: none

    If it’s just (very) slow, that may be due to low I/O speed of your disk

    Disagree. The alleged root cause is not supported by the data: %iowait = 30.51% kB_read = 21 624 950 kB_wrtn = 10 959 841 Only 31 blocks (~120 MB) downloaded during the time (c.a. 90 minutes), e.g. 2025-02-16T11:51:13Z UpdateTip: new best=000000000000000000020d77c16bf3836faf1ef3af28f5f31fbdd8955557d6f0 height=869931 version=0x3fffe000 log2_work=95.263443 tx=1114900731 date='2024-11-12T03:25:56Z' progress=0.952213 cache=2.0MiB(14666txo)

  14. GregTonoski commented at 9:05 pm on February 18, 2025: none

    20250218-debug.log

    Does it happen on a different filesystem than exfat? Does it happen on a different filesystem hardware, maybe something that is faster? (Can you benchmark the hardware?)

    Yes, the same occurs in the same environment except NTFS instead of exFAT and HDD instead of SD.

  15. GregTonoski commented at 3:07 pm on March 20, 2025: none

    so clearly no deadlock or livelock.

    If neither dead- nor live-lock then the algorithm may be slow and suboptimal (bottleneck). There are up to 4 MB of data that need to be downloaded and written to disk per block. On top of it, a fraction of the data may need to be read and written as a “chainstate” index (UTXO set). Apparently, the current algorithm results in orders of magnitude higher use of a disk. I suppose LSM-tree may drive the excessive use of a disk: entries duplication, multiple reshuffling of data across a disk during operations of merges/compaction, rebuilding indexes and so on.

    There are records of 7 values stored in a “chainstate” index: 32-byte transaction hash, output index length, output index, true/false coinbase, block height, scriptPubKey, amount (ref.). It should be possible to fit them in a simpler and more efficient data structure. Why not using a file system directly instead of LevelDB (or any additional overhead)? What would be a downside?

    #28280 may speed things up somewhat (which will be in 29.0, not in any release yet).

    Improvement by 10-20% is below my expectations. I’m looking for 1000%.

  16. tnndbtc commented at 0:04 am on June 4, 2025: none
    @GregTonoski Just curious, how do you get the pruned node of 550MB? I’d like to explore a little more about this issue but searched on the internet and not able to find a easy solution to get a pruned node to start with.
  17. GregTonoski commented at 9:29 am on June 4, 2025: none

    @GregTonoski Just curious, how do you get the pruned node of 550MB? I’d like to explore a little more about this issue but searched on the internet and not able to find a easy solution to get a pruned node to start with.

    A pruned node could be setup by copying data (a.k.a. snapshot of UTXO/chainstate and blocks) to directories, see e.g. https://github.com/Blockchains-Download/Bitcoin/releases or https://bitcoin-snapshots.jaonoctus.dev. For more information visit bitcointalk.org.

  18. tnndbtc commented at 9:45 pm on June 4, 2025: none
    @GregTonoski For your particular test, you used snapshot for Mainnet (not assumeutxo@840,000) I assume?
  19. tnndbtc commented at 4:00 am on June 15, 2025: none

    @GregTonoski For the past several days, I have run the test based on the input provided, and it typically finishes the InitialBlockDownload within 3 hours. Tested multiple times, but couldn’t reproduce the issue with my Macbook Pro (8 cores, 16GB memory, SSD hard drive, over ~50Mbps internet speed, upload and download)

    Snapshot downloaded: https://bitcoin-snapshots.jaonoctus.dev/ Block Height: 891,749 on Mainnet

    For one particular test result: blocks downloaded before InitialBlockDownload turn to false: 9563 total time: 162 mins (2h 42m)

    build configuration, Debug build: cmake -B build -DENABLE_WALLET=ON -DCMAKE_CXX_COMPILER=/opt/homebrew/opt/llvm@16/bin/clang++ -DCMAKE_C_COMPILER=/opt/homebrew/opt/llvm@16/bin/clang -DCMAKE_BUILD_TYPE=Debug

    Run as daemon: because I need to create a pid file to rotate debug.out file due to small disk bitcoin.conf used:

    server=1 # Enable RPC server daemon=1 # Run as daemon (Linux/Unix) pid=/Users/user/tmp/bitcoin_data/bitcoind.pid datadir=/Users/user/tmp/bitcoin_data rpcbind=127.0.0.1 # Only listen for RPC on localhost rpcallowip=127.0.0.1 # Only allow RPC from localhost rpcport=8332 # Default RPC port prune=550 # Keep ~550 MB of the most recent blockchain data dbcache=2048 # Memory cache size in MB (default is 300)

    commit hash: f94167512d on master branch

    CPU (bitcoind): 200% - 400% 8 cores, so average 25% - 50% CPU utilization for bitcoind process

    MacOS: Sequoia 15.1.1 Physical Memory: 16GB CPUs: 8, Apple M1 chipset Hard drive: SSD

    Internet speed, in a spot has slower internet access: download: 54 Mbps (Megabits, not bytes) upload: 53 Mbps

    Commands used: # to start bitcoind as daemon % build/bin/bitcoind -datadir=/Users/user/tmp/bitcoin_data -debug=all -blocksonly=1 -debuglogfile=/Users/user/tmp/bitcoin_data/debug.log -conf=/Users/user/tmp/bitcoin_data/bitcoin.conf

    # to check downloaded blocks % ./build/bin/bitcoin-cli -rpcport=8332 -rpcuser=_cookie_ -rpcpassword=my_cookie getblockchaininfo Final output when the IBD is done: { “chain”: “main”, “blocks”: 901312, “headers”: 901312, “bestblockhash”: “00000000000000000000e3eb2c88359adaf3ed509d488577583a37ed4371eaba”, “bits”: “17023a04”, “target”: “000000000000000000023a040000000000000000000000000000000000000000”, “difficulty”: 126411437451912.2, “time”: 1749957611, “mediantime”: 1749953913, “verificationprogress”: 1, “initialblockdownload”: false, “chainwork”: “0000000000000000000000000000000000000000cb0b7c1ebd4e3b33e700bb6c”, “size_on_disk”: 618018566, “pruned”: true, “pruneheight”: 900926, “automatic_pruning”: true, “prune_target_size”: 576716800, “warnings”: [ “This is a pre-release test build - use at your own risk - do not use for mining or merchant applications” ] }

  20. GregTonoski commented at 9:00 am on June 15, 2025: none

    (…) For the past several days, I have run the test based on the input provided (…)

    How many disk operations in terms of MB and total requests count have you found? Also, how many logs (rows) like [leveldb] Generated table [#3987352](/bitcoin-bitcoin/3987352/)@3: 31812 keys, 2161649 bytes by Bitcoin Core?

  21. tnndbtc commented at 4:58 pm on June 15, 2025: none

    @GregTonoski Sure, I re-run the test, and used following command to collect disk I/O, then used spreadsheet to calculate the average, std dev, max and min of the data collected over 15 minutes (not whole 3 hours) of the test:

    0% iostat -d -w 1
    1
    2            KB/t        tps     MB/s
    3Average:    21.71       4974    101.98
    4STDDEV:     5.21        1336    25.88
    5Max:        48.37       8851    194.67
    6Min:        12.8        1197    37.35
    

    KB/t: KB per transfer tps: transfer per second MB/s: MB per second

    As for number of entries of “Generated table”, since I only have a small disk space and I am rotating the debug.log file, so I sampled the number of entries during the test. For a typical 90 seconds range, I see about 100 of such entries. So, for the entire test of 160 minutes, I expect to see over 11K of such entries.

    Hope this helps.

  22. GregTonoski commented at 8:09 pm on June 15, 2025: none

    As for number of entries of “Generated table”, since I only have a small disk space and I am rotating the debug.log file, so I sampled the number of entries during the test. A typical 90 seconds range, I see about 100 of such entries. So, for the entire test of 160 minutes, I expect to see over 11K of such entries.

    Hope this helps.

    Yes, this helps. Thanks. I can see the same problem of too many disk operations in your case. 101.98 * 60 sec. * 60 min. * 3 hours = 1TB while only 20 GB of blocks were downloaded (approx.). The excessive amount must result from LevelDB (compaction and others) as evident from the high count of “Generated table…” logs in debug.log.

  23. tnndbtc commented at 6:35 pm on June 18, 2025: none

    @GregTonoski I did more test and found that reducing [leveldb] Generated table [#3728](/bitcoin-bitcoin/3728/)@0: 64330 keys, 8671437 bytes entries wouldn’t affect the overall download time. However, changing debug=0 would definitely reducing the time. Test 1: run bitcoind with debug=0 which reduces size of debug.log, Generated table entries is not reduced:

    0        KB/t            tps             MB/s
    1AVG:    44.41095541     1294.624204     37.95407643
    2MAX:    97.28           5227            167.52
    3MIN:    10.53           45              2.03
    4STDEV:  21.0085229      1396.674945     38.32545358
    

    Total time: 131 mins (reduced from 162 mins) No “Generated table” due to debug=0

    Test 2: change to debug=1, to avoid the massive log trace of: [lock] Enter: lock contention m_mutex, checkqueue.h:82 started by hacking src/sync.h and comment out the log trace LOG_TIME_MICROS_WITH_CATEGORY(strprintf("lock contention %s, %s:%d", pszName, pszFile, nLine), BCLog::LOCK); , this will significantly reduce the output in debug.log so that my small disk can see all entries of Generated table.

    Total time: 129 minutes Total Generated table entries: 11623. This is the base line for file compaction comparison.

    Test 3: To reduce total Generated table entries, I studied the code in src/dbwrapper.cpp function: leveldb::Options GetOptions(size_t nCacheSize)

    This function takes dbcache=2048 and converted it to KB, i.e., 2097152. However, the original code to set options.block_cache, options.write_buffer_size, and options.max_file_size are in bytes which is too small compare to default values set in src/leveldb/include/leveldb/options.h. Example, for write_buffer_size, it defaults to 4MB (4 * 1024 * 1024).

    Quote from original source code in src/dbwrapper.h:

    0options.block_cache = leveldb::NewLRUCache(nCacheSize / 2);
    1options.write_buffer_size = nCacheSize / 4;
    2options.max_file_size = std::max(options.max_file_size, DBWRAPPER_MAX_FILE_SIZE);
    

    For this test, I hacked the values to be:

    0options.block_cache = leveldb::NewLRUCache(nCacheSize / 2 * 1024); // 2048 / 2 * 1024 = 1GB
    1options.write_buffer_size = nCacheSize / 4 * 1024; // 2048 / 4 * 1024 = 500MB
    2options.max_file_size = std::max(options.max_file_size, 2 * DBWRAPPER_MAX_FILE_SIZE); // 64 MiB
    

    Re-run the test, with src/sync.h hacked to reduce size of debug.log, and debug=1, total time to finish initial download: 130 mins Total Generated table: 5850

    Conclusion: with Generated table reduced to 50.33%, the total time is almost identical when Generated table entry is doubled. The high disk operations likely is due to debug=1.

    I am not sure if there are other clues that this initial download time can still be further reduced. I’d appreciate if there are further comments/clues.

  24. GregTonoski commented at 8:12 pm on June 18, 2025: none
    Before jumping to any conclusion, how do you explain the difference between the average 37.95407643 MB/s and 101.98 MB/s in the previous test three days ago?
  25. tnndbtc commented at 9:10 pm on June 18, 2025: none

    @GregTonoski The key difference for the drop from 101.98MB/s to 37.95MB/s is due to debug=1 vs. debug=0.

    I manually monitored the output of iostat -d -w 1 for about 10-15 minutes during the test. With debug=0, the MB/s is dramatically dropped, the size of debug.log is dramatically reduced so I don’t need to rotate debug.log as the final size of it becomes only 35.5 MB for the entire download time.

    I am not able to get the actual file size of debug.log when debug=1 as I only have less than 8GB of free disk space and the test would ultimately stop (bitcoind is shutdown) due to no disk space left error in debug.log, if I don’t rotate it.

    Plus, I am using DEBUG build for the test so I can attach to process and debug. If using RELEASE build, I assume the test would run faster..

  26. tnndbtc commented at 0:08 am on June 19, 2025: none

    @GregTonoski I ran the test with release build and debug=0 and following is the test data:

    Test 1: Release build with original dbwrapper.cpp Took 68 mins to finish iostat shows mostly between 180-220 MB/s when bitcoind is active (30% - 50% of CPU)

    Test 2: Release build with modified dbwrapper.cpp, so Generated table entries are cut about 50% Took 78 mins to finish iostat shows mostly between 160-200 MB/s when bitcoind is active (30% - 50% of CPU)

    0options.block_cache = leveldb::NewLRUCache(nCacheSize / 2 * 1024); // 2048 / 2 * 1024 = 1GB
    1options.write_buffer_size = nCacheSize / 4 * 1024; // 2048 / 4 * 1024 = 500MB
    2options.max_file_size = std::max(options.max_file_size, 2 * DBWRAPPER_MAX_FILE_SIZE); // 64 MiB
    

    Having a high MB/s is a good sign when bitcoind is actively downloading blocks. However, to collect good test data on MB/s is not easy: I have to run top and see MB/s when bitcoind starts to take most of the CPU (30% - 50%). In my previous post about MB/s, I didn’t check CPU so there are many polluted data point where disk operation is low because bitcoind is not active.

    Given that the tests are run a few hours apart for I only have one computer, I can’t run the tests in parallel, this 68 mins (high Generated table entries) vs. 78 mins (low Generated table entries) may not be a big issue.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-19 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me