Is there an existing issue for this?
- I have searched the existing issues
Current behaviour
When running in prune=550 mode, I consistently get the following error about once every 10 days per machine:
LevelDB read failure: Corruption: block checksum mismatch
There is no recovery from this error (reindex doesn’t work in prune mode), so the only solution is to nuke the datadir and do a full resync or restore the datadir from a backup.
Searching the webs, the conventional wisdom is that this is caused by a hardware/disk problem. That is definitely not the case here, as I’ll explain. I suspect a bug in the code is causing some thread to write to an incorrect memory location, possibly a memory use-after-free/reallocation/reorganization bug.
Details:
I set up bitcoind in prune=550 mode to run on ten Amazon EC2 t4g.nano instances that each have 512mb ram and a 4 GB gpt3 (SSD) swap drive. The only job the machines have is to track the blockchain and provide RPC information on recent blocks and confirmed transactions. There is a lot of memory pressure and paging when a block comes in, but since speed is not an issue, that shouldn’t be a problem.
About once per day, bitcoind on one of the machines reports LevelDB read failure: Corruption: block checksum mismatch and stops working. The only fix is to delete and restore the bitcoin data directory, and then restart bitcoind to sync and catch back up.
I am virtually certain this is a software bug in bitcoind because (a) Amazon EC2 has some of the most tested and reliable hardware in the world; (b) this problem had shown up on all ten Amazon EC2 instances all running in different EC2 availability zones; (c) Bitcoin Cash Node is also installed on all ten machines, and it has had zero problems. Bitcoin Cash Node forked from bitcoin core a while back and also uses LevelDB, among other commonalities.
I tried this with the bitcoin binary distributions bitcoin-25.2-aarch64-linux-gnu.tar.gz and bitcoin-26.1-aarch64-linux-gnu.tar.gz, as well as well as bitcoin-25.2 compiled from source, and it made no difference.
I collected some of the files that have the checksum mismatch and can provide them if someone wants to look for clues on which software component corrupted the data.
Expected behaviour
I expect to never see a LevelDB data corruption failure.
Steps to reproduce
bitcoind config file:
datadir=/bdata/bitcoin-data
discover=0 listen=1 maxconnections=24
par=1
blocksonly=1 dbcache=200 maxsigcachesize=4 prune=550
maxmempool=5 blockreconstructionextratxn=1 maxorphantx=1 mempoolexpiry=1 persistmempool=0
disablewallet=1
server=1 rpcallowip=127.0.0.1 rpcuser=btc rpcpassword=btc rpcworkqueue=40 rpcthreads=1
printtoconsole=1 nodebuglogfile=1
[main] rpcport=8332 rpcbind=127.0.0.1:8332 bind=[::]:9333 bind=127.0.0.1:8334=onion
Relevant log output
This is one example, I have more, they all look the same:
Started bitcoind.service. 2024-05-22T17:53:08Z Bitcoin Core version v25.2.0 (release build) … 2024-05-22T19:05:51Z UpdateTip: new best=00000000000000000002c656268be2b9e044b5963af0507e16414552aa526d57 height=844603 version=0x237c6000 log2_work=94.939158 tx=1009037420 date=‘2024-05-22T14:46:04Z’ progress=0.999943 cache=72.4MiB(515946txo) 2024-05-22T19:05:57Z UpdateTip: new best=000000000000000000028401d5cd96ea647cc9adae836735615d7dbf64feed6f height=844604 version=0x2f50c000 log2_work=94.939172 tx=1009041112 date=‘2024-05-22T15:00:11Z’ progress=0.999946 cache=73.9MiB(526621txo) 2024-05-22T19:06:00Z Socks5() connect to 78.44.10.186:8333 failed: connection refused 2024-05-22T19:06:03Z UpdateTip: new best=000000000000000000018332b3b2594e340a0dfd150cbc2a852930c0cddaa91b height=844605 version=0x20000000 log2_work=94.939185 tx=1009044861 date=‘2024-05-22T15:14:46Z’ progress=0.999949 cache=75.4MiB(537704txo) 2024-05-22T19:06:05Z Socks5() connect to 2601:283:5080:8540::55d6:8333 failed: general failure 2024-05-22T19:06:12Z UpdateTip: new best=0000000000000000000163e90fef2b79654d0235d68a603f46ac5e41ce62d827 height=844606 version=0x2e000000 log2_work=94.939199 tx=1009048116 date=‘2024-05-22T15:31:30Z’ progress=0.999953 cache=76.8MiB(548883txo) 2024-05-22T19:06:20Z UpdateTip: new best=00000000000000000001a4ce0b96e5a761337a84974d27a694c7b8d2c74b8cf0 height=844607 version=0x27a94000 log2_work=94.939212 tx=1009050900 date=‘2024-05-22T15:43:50Z’ progress=0.999956 cache=78.1MiB(558437txo) 2024-05-22T19:06:26Z UpdateTip: new best=00000000000000000001d82049db35f2dfabccfba593ee3a433f0500c2734f4e height=844608 version=0x274a6000 log2_work=94.939226 tx=1009054026 date=‘2024-05-22T16:09:29Z’ progress=0.999961 cache=79.5MiB(569127txo) 2024-05-22T19:06:49Z Socks5() connect to 212.102.36.243:8333 failed: general failure 2024-05-22T19:07:09Z UpdateTip: new best=00000000000000000003571b667acb77721004099827b38802f77500cd370d8b height=844609 version=0x20000000 log2_work=94.939240 tx=1009057717 date=‘2024-05-22T16:21:29Z’ progress=0.999964 cache=5.4MiB(0txo) 2024-05-22T19:07:17Z LevelDB read failure: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb 2024-05-22T19:07:17Z Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb 2024-05-22T19:07:17Z You can use -debug=leveldb to get more complete diagnostic messages 2024-05-22T19:07:17Z Error: Error reading from database, shutting down. Error: Error reading from database, shutting down. 2024-05-22T19:07:17Z Error reading from database: Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb bitcoind.service: Main process exited, code=dumped, status=6/ABRT bitcoind.service: Failed with result ‘core-dump’. bitcoind.service: Consumed 16min 43.740s CPU time.
How did you obtain Bitcoin Core
Compiled from source
What version of Bitcoin Core are you using?
bitcoin-25.2-aarch64-linux-gnu.tar.gz and bitcoin-26.1-aarch64-linux-gnu.tar.gz
Operating system and version
Linux 6.1.87-99.174.amzn2023.aarch64 #1 SMP
Machine specifications
Amazon EC2 t4g.nano instance (512mb ram) with unlimited CPU zram driver disabled (sudo yum remove zram-generator, reboot) 14 GB gp3 root drive 4 GB gp3 swap drive 40 GB gp3 data drive for bitcoin blockchain
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS nvme1n1 259:0 0 40G 0 disk /bdata nvme2n1 259:1 0 4G 0 disk [SWAP] nvme0n1 259:2 0 14G 0 disk ├─nvme0n1p1 259:3 0 14G 0 part / └─nvme0n1p128 259:4 0 10M 0 part /boot/efi
IPv4 enabled but not assigned an IP address IPv6 enabled, assigned an IP address, and routed to internet
Some machines have Tor installed and enabled for testing (including the one for the log file attached above), but this made no difference in results.