LevelDB read failure: Corruption: block checksum mismatch #30159

issue apulsifer openend this issue on May 23, 2024
  1. apulsifer commented at 3:22 pm on May 23, 2024: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    When running in prune=550 mode, I consistently get the following error about once every 10 days per machine:

    LevelDB read failure: Corruption: block checksum mismatch

    There is no recovery from this error (reindex doesn’t work in prune mode), so the only solution is to nuke the datadir and do a full resync or restore the datadir from a backup.

    Searching the webs, the conventional wisdom is that this is caused by a hardware/disk problem. That is definitely not the case here, as I’ll explain. I suspect a bug in the code is causing some thread to write to an incorrect memory location, possibly a memory use-after-free/reallocation/reorganization bug.

    Details:

    I set up bitcoind in prune=550 mode to run on ten Amazon EC2 t4g.nano instances that each have 512mb ram and a 4 GB gpt3 (SSD) swap drive. The only job the machines have is to track the blockchain and provide RPC information on recent blocks and confirmed transactions. There is a lot of memory pressure and paging when a block comes in, but since speed is not an issue, that shouldn’t be a problem.

    About once per day, bitcoind on one of the machines reports LevelDB read failure: Corruption: block checksum mismatch and stops working. The only fix is to delete and restore the bitcoin data directory, and then restart bitcoind to sync and catch back up.

    I am virtually certain this is a software bug in bitcoind because (a) Amazon EC2 has some of the most tested and reliable hardware in the world; (b) this problem had shown up on all ten Amazon EC2 instances all running in different EC2 availability zones; (c) Bitcoin Cash Node is also installed on all ten machines, and it has had zero problems. Bitcoin Cash Node forked from bitcoin core a while back and also uses LevelDB, among other commonalities.

    I tried this with the bitcoin binary distributions bitcoin-25.2-aarch64-linux-gnu.tar.gz and bitcoin-26.1-aarch64-linux-gnu.tar.gz, as well as well as bitcoin-25.2 compiled from source, and it made no difference.

    I collected some of the files that have the checksum mismatch and can provide them if someone wants to look for clues on which software component corrupted the data.

    Expected behaviour

    I expect to never see a LevelDB data corruption failure.

    Steps to reproduce

    bitcoind config file:

    datadir=/bdata/bitcoin-data

    discover=0 listen=1 maxconnections=24

    par=1

    blocksonly=1 dbcache=200 maxsigcachesize=4 prune=550

    maxmempool=5 blockreconstructionextratxn=1 maxorphantx=1 mempoolexpiry=1 persistmempool=0

    disablewallet=1

    server=1 rpcallowip=127.0.0.1 rpcuser=btc rpcpassword=btc rpcworkqueue=40 rpcthreads=1

    printtoconsole=1 nodebuglogfile=1

    [main] rpcport=8332 rpcbind=127.0.0.1:8332 bind=[::]:9333 bind=127.0.0.1:8334=onion

    Relevant log output

    This is one example, I have more, they all look the same:

    Started bitcoind.service. 2024-05-22T17:53:08Z Bitcoin Core version v25.2.0 (release build) … 2024-05-22T19:05:51Z UpdateTip: new best=00000000000000000002c656268be2b9e044b5963af0507e16414552aa526d57 height=844603 version=0x237c6000 log2_work=94.939158 tx=1009037420 date=‘2024-05-22T14:46:04Z’ progress=0.999943 cache=72.4MiB(515946txo) 2024-05-22T19:05:57Z UpdateTip: new best=000000000000000000028401d5cd96ea647cc9adae836735615d7dbf64feed6f height=844604 version=0x2f50c000 log2_work=94.939172 tx=1009041112 date=‘2024-05-22T15:00:11Z’ progress=0.999946 cache=73.9MiB(526621txo) 2024-05-22T19:06:00Z Socks5() connect to 78.44.10.186:8333 failed: connection refused 2024-05-22T19:06:03Z UpdateTip: new best=000000000000000000018332b3b2594e340a0dfd150cbc2a852930c0cddaa91b height=844605 version=0x20000000 log2_work=94.939185 tx=1009044861 date=‘2024-05-22T15:14:46Z’ progress=0.999949 cache=75.4MiB(537704txo) 2024-05-22T19:06:05Z Socks5() connect to 2601:283:5080:8540::55d6:8333 failed: general failure 2024-05-22T19:06:12Z UpdateTip: new best=0000000000000000000163e90fef2b79654d0235d68a603f46ac5e41ce62d827 height=844606 version=0x2e000000 log2_work=94.939199 tx=1009048116 date=‘2024-05-22T15:31:30Z’ progress=0.999953 cache=76.8MiB(548883txo) 2024-05-22T19:06:20Z UpdateTip: new best=00000000000000000001a4ce0b96e5a761337a84974d27a694c7b8d2c74b8cf0 height=844607 version=0x27a94000 log2_work=94.939212 tx=1009050900 date=‘2024-05-22T15:43:50Z’ progress=0.999956 cache=78.1MiB(558437txo) 2024-05-22T19:06:26Z UpdateTip: new best=00000000000000000001d82049db35f2dfabccfba593ee3a433f0500c2734f4e height=844608 version=0x274a6000 log2_work=94.939226 tx=1009054026 date=‘2024-05-22T16:09:29Z’ progress=0.999961 cache=79.5MiB(569127txo) 2024-05-22T19:06:49Z Socks5() connect to 212.102.36.243:8333 failed: general failure 2024-05-22T19:07:09Z UpdateTip: new best=00000000000000000003571b667acb77721004099827b38802f77500cd370d8b height=844609 version=0x20000000 log2_work=94.939240 tx=1009057717 date=‘2024-05-22T16:21:29Z’ progress=0.999964 cache=5.4MiB(0txo) 2024-05-22T19:07:17Z LevelDB read failure: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb 2024-05-22T19:07:17Z Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb 2024-05-22T19:07:17Z You can use -debug=leveldb to get more complete diagnostic messages 2024-05-22T19:07:17Z Error: Error reading from database, shutting down. Error: Error reading from database, shutting down. 2024-05-22T19:07:17Z Error reading from database: Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5847811.ldb bitcoind.service: Main process exited, code=dumped, status=6/ABRT bitcoind.service: Failed with result ‘core-dump’. bitcoind.service: Consumed 16min 43.740s CPU time.

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    bitcoin-25.2-aarch64-linux-gnu.tar.gz and bitcoin-26.1-aarch64-linux-gnu.tar.gz

    Operating system and version

    Linux 6.1.87-99.174.amzn2023.aarch64 #1 SMP

    Machine specifications

    Amazon EC2 t4g.nano instance (512mb ram) with unlimited CPU zram driver disabled (sudo yum remove zram-generator, reboot) 14 GB gp3 root drive 4 GB gp3 swap drive 40 GB gp3 data drive for bitcoin blockchain

    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS nvme1n1 259:0 0 40G 0 disk /bdata nvme2n1 259:1 0 4G 0 disk [SWAP] nvme0n1 259:2 0 14G 0 disk ├─nvme0n1p1 259:3 0 14G 0 part / └─nvme0n1p128 259:4 0 10M 0 part /boot/efi

    IPv4 enabled but not assigned an IP address IPv6 enabled, assigned an IP address, and routed to internet

    Some machines have Tor installed and enabled for testing (including the one for the log file attached above), but this made no difference in results.

  2. maflcko added the label Data corruption on May 23, 2024
  3. maflcko added the label Bug on May 23, 2024
  4. maflcko commented at 3:35 pm on May 23, 2024: member

    I suspect a bug in the code is causing some thread to write to an incorrect memory location, possibly a memory use-after-free/reallocation/reorganization bug.

    Would it be possible for you to compile and run with asan, or a similar sanitizer?

    Also, what filesystem are you using on the drives? Something like df --print-type --human-readable /bdata should print it.

  5. apulsifer commented at 3:40 pm on May 23, 2024: none

    xfs on the root and data drive

    sudo mkswap /dev/nvme[4 GB disk] sudo swapon /dev/nvme[4 GB disk]

    sudo mkdir /bdata sudo mkfs -t xfs /dev/nvme[40 GB disk] lsblk -o name,size,type,uuid sudo nano /etc/fstab

    add to fstab: UUID=[4 GB disk uuid] swap swap defaults 0 0 UUID=[40 GB disk uuid] /bdata xfs defaults,nofail 0 2

    sudo mount -a

    Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 4.0M 0 4.0M 0% /dev tmpfs tmpfs 210M 0 210M 0% /dev/shm tmpfs tmpfs 84M 552K 84M 1% /run /dev/nvme0n1p1 xfs 14G 3.4G 11G 25% / tmpfs tmpfs 210M 0 210M 0% /tmp /dev/nvme1n1 xfs 40G 19G 22G 46% /bdata /dev/nvme0n1p128 vfat 10M 1.4M 8.7M 14% /boot/efi tmpfs tmpfs 42M 0 42M 0% /run/user/1000

  6. apulsifer commented at 3:45 pm on May 23, 2024: none

    Would it be possible for you to compile and run with asan, or a similar sanitizer?

    I don’t see where I would get a chunk of time to do that right now…. But as I mentioned, I copied the corrupted files, and that might give some clues to someone familiar with their format (especially if the corruption is ascii in the middle of binary, or vice versa)

  7. maflcko commented at 4:22 pm on May 23, 2024: member

    I don’t see where I would get a chunk of time to do that right now….

    Sure, no rush. I’ll probably take some time to pin this down. (I don’t have an AWS account, so I can’t test it, but maybe someone else has).

    Some other ideas to test in the meantime:

    • Try another filesystem instead of xfs
    • Try the master branch (not for production, just for testing whether the issue still happens there)
  8. maflcko added the label UTXO Db and Indexes on May 23, 2024
  9. apulsifer commented at 5:51 pm on May 23, 2024: none
    xfs is used on the root drive of every Amazon EC2 instance running Amazon Linux. If xfs on Amazon EC2 were the problem, a lot of of critical infrastructure would be failing right now. And as I mentioned, these machines are also running bitcoin cash in an almost an identical configuration (data file path and ports changed) and it has had zero problems. Since the data corruption only happens about once per week when running on mainnet, I think figuring this problem out will probably take a customized and instrumented version of bitcoind being feed blocks at high speed with random jitter and waits and synthetic memory pressure. This could probably be done on a virtual machine anywhere, like Xen, KVM, Virtual Box, etc.
  10. maflcko commented at 6:27 pm on May 23, 2024: member

    Since the data corruption only happens about once per week

    Once per week is a lot and if this was a broader problem, I’d assume that more people were complaining. Given that you can consistently reproduce on different machines, this seems like a real bug is somewhere. However, Bitcoin Core is running fine in a lot of other places, so there has to be some hardware or configuration setting (or combination thereof) that triggers this bug on your side. It would be good to know which one it is.

  11. mzumsande commented at 6:32 pm on May 23, 2024: contributor

    (edited first question out, I misunderstood)

    From your log it appears that this happened while the node was catching up with the tip (almost but not completely synced yet), receiving blocks quickly. Is that typical, or does it usually happen when the node is synced and receives blocks as they are mined?

  12. apulsifer commented at 7:10 pm on May 23, 2024: none

    Once per week is a lot and if this was a broader problem, I’d assume that more people were complaining.

    It could just be that the memory pressure is uncovering the problem. Of course, any machine can experience memory pressure at times, but one thing that’s unique is that those machines starting hard paging to SSD for about 20 seconds after each new block arrives.

  13. apulsifer commented at 7:58 pm on May 23, 2024: none

    Could you explain the paging in a bit more detail?

    The machines page pretty hard to SSD for about 20 seconds after each new block arrives. That info comes from “sar -d 10”, which I logged for a while when I was setting up the first machine.

    Is the node constantly being bombarded with lots of simultaneous RPC calls (which ones?) or is some other interface used?

    None of these machines has at this point serviced a single RPC call (I’m still trying to get things set up). No other interface is being used, just the bitcoind peer network, half of the machines with direct IPv6, half via torproxy.

    Also, from your log it appears that this happened while the node was catching up with the tip (almost but not completely synced yet), receiving blocks quickly. Is that typical, or does it usually happen when the node is synced and receives blocks as they are mined?

    Sync’ing has been pretty typical at the moment, since I’m still setting things up and bitcoind has been started and stopped from time-to-time to try out different settings.

    The initial sync from the genesis block was done on a machine with 16 GB RAM. Then on May 7, bitcoind on that machine was stopped and the /bdata directory was copied to these 10 machines with only 512 MB RAM that are only expected to keep up with new blocks as they arrive. Since that time, two of the ten machines have had no data corruption. The other eight machines have had data corruption one or more times.

    The problem doesn’t just show up during block sync however. I do know that on Monday at end of day, all the machines were running and fully synced, and by Tuesday morning, two machines had data corruption. I was busy with other things and left all the machines alone, and by the time I went to fix them Wednesday afternoon, four machines had data corruption.

    I don’t think I can definitely rule out that starting and stopping bitcoind, or rebooting the machines has not contributed to this problem. The files could sit in the bdata directory for a while, and it’s possible one is getting corrupted when bitcoind shuts down or the machine reboots but bitcoind doesn’t notice it until sometime later when it reads the file. Note that bitcoind is being started and stopped by systemd service files (attached below) which has a 10 minute timeout. So systemd will politely ask bitcoind to stop, and if it hasn’t exited after 10 minutes, systemd will kill it. Another thing worth noting from the service file is that I set up bitcoind to run at Nice=16, which might contribute to triggering the problem.

    As of late last night, all machines are running and fully synced again. By next week, I’m going to start leaving them alone to run autonomously (with the exception of rebuilding a datadir if needed), so that will be a much better test of what happens when the machines are fully synced and bitcoind is running continuously.

    [Service] WorkingDirectory=/home/ec2-user ExecStart=/home/ec2-user/bitcoin/bin/bitcoind -conf=/home/ec2-user/bitcoin.conf Restart=always RestartSec=60 TimeoutStopSec=600 Nice=16 User=ec2-user Group=ec2-user StandardOutput=journal StandardError=journal

    [Unit] After=network-online.target

    [Install] WantedBy=multi-user.target

  14. maflcko commented at 8:20 pm on May 23, 2024: member

    I think calling the RPC gettxoutsetinfo muhash on all nodes (when they are synced to the same block) and it matches for all, then the chainstate leveldb at that point in time is probably fine. I presume all failures happened in the /bdata/bitcoin-data/chainstate/ leveldb?

    Edit: Calling that RPC will take a long time on your machines, I suspect.

  15. apulsifer commented at 11:51 pm on May 23, 2024: none

    Yes, all the checksum errors are in numerically-named NNNNNN.ldb files in /bdata/bitcoin-data/chainstate/

    I had no luck with gettxoutsetinfo muhash:

    bitcoin/bin/bitcoin-cli -rpcwaittimeout=0 -conf=/home/ec2-user/bitcoin.conf getblockcount 844826

    bitcoin/bin/bitcoin-cli -rpcwaittimeout=0 -conf=/home/ec2-user/bitcoin.conf gettxoutsetinfo muhash error: timeout on transient error: Could not connect to the server 127.0.0.1:8332 (error code 0 - “timeout reached”)

  16. maflcko commented at 6:27 am on May 24, 2024: member

    The RPC will take a long time (probably hours), so you’ll have to disable the client timeout -rpcclienttimeout=0.

    0bitcoin/bin/bitcoin-cli -rpcclienttimeout=0 -rpcwaittimeout=0 -conf=/home/ec2-user/bitcoin.conf gettxoutsetinfo muhash
    
  17. apulsifer commented at 12:43 pm on May 24, 2024: none

    Took about 20 minutes. They all match.

    Fri May 24 12:12:33 UTC 2024 { “height”: 844917, “bestblock”: “000000000000000000017dd5f59b73629f6f88797e90017a9df39c5e435296bf”, “txouts”: 181984254, “bogosize”: 13994337434, “muhash”: “b97a3fb13a61e8c889668d064f7ae5408a78ee7e4c6a4fdebedb30ecb2f23378”, “total_amount”: 19702649.24256659, “transactions”: 125720653, “disk_size”: 12220356091 } Fri May 24 12:39:58 UTC 2024

  18. apulsifer commented at 4:55 pm on May 31, 2024: none
    Update: Since leaving these servers alone for a week and not rebooting them or restarting bitcoind, they have stayed perfectly in sync without issues. So it looks like the problem is triggered by starting and stopping bitcoind (which I can live with, if I do have to restart a server and I get data corruption, I’ll image the data from another server).
  19. apulsifer commented at 12:53 pm on June 4, 2024: none

    Update: After running continuously since 05-23 (no reboots or restarting bitcoind), one of the servers failed this morning. So it seems the data corruption bug occurs even when bitcoind is running continuously, although at a much lower rate.

    Started bitcoind.service. 2024-05-23T11:23:40Z Bitcoin Core version v25.2.0 (release build) 2024-05-23T11:23:40Z InitParameterInteraction: parameter interaction: -blocksonly=1 -> setting -whitelistrelay=0 2024-05-23T11:23:40Z Using the ‘arm_shani(1way,2way)’ SHA256 implementation 2024-05-23T11:23:40Z Default data directory /home/ec2-user/.bitcoin 2024-05-23T11:23:40Z Using data directory /bdata/bitcoin-data 2024-05-23T11:23:40Z Config file: /home/ec2-user/bitcoin.conf 2024-05-23T11:23:40Z Config file arg: blockreconstructionextratxn=“1” 2024-05-23T11:23:40Z Config file arg: blocksonly=“1” 2024-05-23T11:23:40Z Config file arg: datadir="/bdata/bitcoin-data" 2024-05-23T11:23:40Z Config file arg: dbcache=“200” 2024-05-23T11:23:40Z Config file arg: debuglogfile=false 2024-05-23T11:23:40Z Config file arg: disablewallet=“1” 2024-05-23T11:23:40Z Config file arg: discover=“0” 2024-05-23T11:23:40Z Config file arg: dns=“0” 2024-05-23T11:23:40Z Config file arg: dnsseed=“0” 2024-05-23T11:23:40Z Config file arg: listen=“1” 2024-05-23T11:23:40Z Config file arg: maxconnections=“24” 2024-05-23T11:23:40Z Config file arg: maxmempool=“5” 2024-05-23T11:23:40Z Config file arg: maxorphantx=“1” 2024-05-23T11:23:40Z Config file arg: maxsigcachesize=“4” 2024-05-23T11:23:40Z Config file arg: mempoolexpiry=“1” 2024-05-23T11:23:40Z Config file arg: par=“1” 2024-05-23T11:23:40Z Config file arg: persistmempool=“0” 2024-05-23T11:23:40Z Config file arg: printtoconsole=“1” 2024-05-23T11:23:40Z Config file arg: prune=“550” 2024-05-23T11:23:40Z Config file arg: rest=“1” 2024-05-23T11:23:40Z Config file arg: rpcallowip=“127.0.0.1” 2024-05-23T11:23:40Z Config file arg: rpcthreads=“1” 2024-05-23T11:23:40Z Config file arg: rpcworkqueue=“40” 2024-05-23T11:23:40Z Config file arg: server=“1” 2024-05-23T11:23:40Z Config file arg: [main] bind=“127.0.0.1:8334=onion” 2024-05-23T11:23:40Z Config file arg: [main] rpcbind=“127.0.0.1:8332” 2024-05-23T11:23:40Z Command-line arg: conf="/home/ec2-user/bitcoin.conf" 2024-05-23T11:23:40Z Using at most 24 automatic connections (65535 file descriptors available) 2024-05-23T11:23:40Z Using 2 MiB out of 2 MiB requested for signature cache, able to store 65536 elements 2024-05-23T11:23:40Z Using 2 MiB out of 2 MiB requested for script execution cache, able to store 65536 elements 2024-05-23T11:23:40Z Script verification uses 0 additional threads 2024-05-23T11:23:40Z Wallet disabled! 2024-05-23T11:23:40Z scheduler thread start 2024-05-23T11:23:40Z Binding RPC on address 127.0.0.1 port 8332 2024-05-23T11:23:40Z [http] creating work queue of depth 40 2024-05-23T11:23:40Z [http] starting 1 worker threads 2024-05-23T11:23:40Z Using /16 prefix for IP bucketing 2024-05-23T11:23:40Z init message: Loading P2P addresses… 2024-05-23T11:23:41Z Loaded 67288 addresses from peers.dat 1001ms 2024-05-23T11:23:41Z init message: Loading banlist… 2024-05-23T11:23:41Z SetNetworkActive: true 2024-05-23T11:23:41Z Cache configuration: 2024-05-23T11:23:41Z * Using 2.0 MiB for block index database 2024-05-23T11:23:41Z * Using 8.0 MiB for chain state database 2024-05-23T11:23:41Z * Using 190.0 MiB for in-memory UTXO set (plus up to 4.8 MiB of unused mempool space) 2024-05-23T11:23:41Z init message: Loading block index… 2024-05-23T11:23:41Z Assuming ancestors of block 000000000000000000035c3f0d31e71a5ee24c5aaf3354689f65bd7b07dee632 have valid signatures. 2024-05-23T11:23:41Z Setting nMinimumChainWork=000000000000000000000000000000000000000044a50fe819c39ad624021859 2024-05-23T11:23:41Z Prune configured to target 550 MiB on disk for block and undo files. 2024-05-23T11:23:41Z Opening LevelDB in /bdata/bitcoin-data/blocks/index 2024-05-23T11:23:41Z Opened LevelDB successfully 2024-05-23T11:23:41Z Using obfuscation key for /bdata/bitcoin-data/blocks/index: 0000000000000000 2024-05-23T11:23:50Z LoadBlockIndexDB: last block file = 4298 2024-05-23T11:23:50Z LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=12, size=18765314, heights=844726…844737, time=2024-05-23…2024-05-23) 2024-05-23T11:23:50Z Checking all blk files are present… 2024-05-23T11:23:51Z LoadBlockIndexDB(): Block files have previously been pruned 2024-05-23T11:23:53Z Initializing chainstate Chainstate [ibd] @ height -1 (null) 2024-05-23T11:23:53Z Opening LevelDB in /bdata/bitcoin-data/chainstate 2024-05-23T11:23:53Z Opened LevelDB successfully 2024-05-23T11:23:53Z Using obfuscation key for /bdata/bitcoin-data/chainstate: 27687fc922c5e117 2024-05-23T11:23:59Z Loaded best chain: hashBestChain=000000000000000000027d7ef87e117148fb2f0fd86daa593be6a9ab60d90b55 height=844737 date=2024-05-23T11:14:21Z progress=0.999998 2024-05-23T11:23:59Z [snapshot] allocating all cache to the IBD chainstate 2024-05-23T11:23:59Z Opening LevelDB in /bdata/bitcoin-data/chainstate 2024-05-23T11:23:59Z Opened LevelDB successfully 2024-05-23T11:23:59Z Using obfuscation key for /bdata/bitcoin-data/chainstate: 27687fc922c5e117 2024-05-23T11:23:59Z [Chainstate [ibd] @ height 844737 (000000000000000000027d7ef87e117148fb2f0fd86daa593be6a9ab60d90b55)] resized coinsdb cache to 8.0 MiB 2024-05-23T11:23:59Z [Chainstate [ibd] @ height 844737 (000000000000000000027d7ef87e117148fb2f0fd86daa593be6a9ab60d90b55)] resized coinstip cache to 190.0 MiB 2024-05-23T11:23:59Z init message: Verifying blocks… 2024-05-23T11:23:59Z Verifying last 6 blocks at level 3 2024-05-23T11:23:59Z Verification progress: 0% 2024-05-23T11:24:08Z Verification progress: 16% 2024-05-23T11:24:13Z Verification progress: 33% 2024-05-23T11:24:16Z Verification progress: 50% 2024-05-23T11:24:21Z Verification progress: 66% 2024-05-23T11:24:26Z Verification progress: 83% 2024-05-23T11:24:30Z Verification progress: 99% 2024-05-23T11:24:30Z Verification: No coin database inconsistencies in last 6 blocks (17307 transactions) 2024-05-23T11:24:30Z block index 49358ms 2024-05-23T11:24:30Z init message: Pruning blockstore… 2024-05-23T11:24:30Z Leaving InitialBlockDownload (latching to false) 2024-05-23T11:24:30Z block tree size = 844738 2024-05-23T11:24:30Z nBestHeight = 844737 2024-05-23T11:24:30Z loadblk thread start 2024-05-23T11:24:30Z loadblk thread exit 2024-05-23T11:24:30Z torcontrol thread start 2024-05-23T11:24:30Z Bound to 127.0.0.1:8334 2024-05-23T11:24:30Z init message: Starting network threads… 2024-05-23T11:24:30Z DNS seeding disabled 2024-05-23T11:24:30Z init message: Done loading 2024-05-23T11:24:30Z opencon thread start 2024-05-23T11:24:30Z net thread start 2024-05-23T11:24:30Z addcon thread start 2024-05-23T11:24:30Z msghand thread start 2024-05-23T11:24:30Z New outbound peer connected: version: 70016, blocks=844737, peer=1 (manual) 2024-05-23T11:24:31Z New outbound peer connected: version: 70016, blocks=844737, peer=4 (manual) 2024-05-23T11:24:31Z New outbound peer connected: version: 70016, blocks=844737, peer=5 (manual) 2024-05-23T11:24:31Z New outbound peer connected: version: 70016, blocks=844737, peer=6 (manual) 2024-05-23T11:25:09Z New outbound peer connected: version: 70016, blocks=844737, peer=10 (manual) 2024-05-23T11:26:10Z New outbound peer connected: version: 70016, blocks=844737, peer=13 (manual) 2024-05-23T11:26:13Z Saw new header hash=00000000000000000003508531e1ec11798f1972e307235a54ef91bf945e246c height=844738 2024-05-23T11:26:58Z UpdateTip: new best=00000000000000000003508531e1ec11798f1972e307235a54ef91bf945e246c height=844738 version=0x322d6000 log2_work=94.940996 tx=1009663503 date=‘2024-05-23T11:22:20Z’ progress=0.999999 cache=1.8MiB(11945txo) 2024-05-23T11:26:58Z Saw new header hash=00000000000000000000c18685513156cfd695edd2378ca2ba819d785866a571 height=844739 2024-05-23T11:27:08Z UpdateTip: new best=00000000000000000000c18685513156cfd695edd2378ca2ba819d785866a571 height=844739 version=0x29a3e000 log2_work=94.941009 tx=1009668305 date=‘2024-05-23T11:25:43Z’ progress=1.000000 cache=2.6MiB(18123txo) 2024-05-23T11:27:39Z New outbound peer connected: version: 70016, blocks=844737, peer=15 (manual) 2024-05-23T11:27:55Z New outbound peer connected: version: 70016, blocks=844737, peer=14 (manual) 2024-05-23T11:36:44Z Saw new header hash=00000000000000000000e9be8cfceef5f4d12313ab2657d7fcf4e617dc9bb839 height=844740 2024-05-23T11:36:51Z UpdateTip: new best=00000000000000000000e9be8cfceef5f4d12313ab2657d7fcf4e617dc9bb839 height=844740 version=0x224c8000 log2_work=94.941023 tx=1009673632 date=‘2024-05-23T11:36:11Z’ progress=1.000000 cache=3.9MiB(26582txo) 2024-05-23T11:47:09Z Saw new header hash=0000000000000000000152fee6b2cb2779c2fe0ce34aaad57f9034c1613463a0 height=844741 2024-05-23T11:47:15Z UpdateTip: new best=0000000000000000000152fee6b2cb2779c2fe0ce34aaad57f9034c1613463a0 height=844741 version=0x24000000 log2_work=94.941037 tx=1009679243 date=‘2024-05-23T11:46:34Z’ progress=1.000000 cache=4.9MiB(34377txo) 2024-05-23T11:49:55Z Saw new header hash=000000000000000000025c416dc7962405d500d87238bf392c95aa9610c3a71e height=844742 2024-05-23T11:49:58Z UpdateTip: new best=000000000000000000025c416dc7962405d500d87238bf392c95aa9610c3a71e height=844742 version=0x2e000000 log2_work=94.941051 tx=1009686227 date=‘2024-05-23T11:49:28Z’ progress=1.000000 cache=5.3MiB(37584txo) 2024-05-23T11:59:40Z Saw new header hash=000000000000000000014072f1d5d67100bf6c097e971cd2af2d579b32a30f93 height=844743 2024-05-23T11:59:46Z UpdateTip: new best=000000000000000000014072f1d5d67100bf6c097e971cd2af2d579b32a30f93 height=844743 version=0x2652e000 log2_work=94.941064 tx=1009691358 date=‘2024-05-23T11:59:05Z’ progress=1.000000 cache=6.8MiB(45760txo) 2024-05-23T12:04:32Z Saw new header hash=00000000000000000002b692a4141102da57b78d667d8a3f9a461fe85106a4c5 height=844744

    2024-06-04T06:23:12Z Saw new header hash=00000000000000000001de5e312d55f873e73d14f3cd8a8ee656a392dbc28236 height=846459 2024-06-04T06:23:52Z UpdateTip: new best=00000000000000000001de5e312d55f873e73d14f3cd8a8ee656a392dbc28236 height=846459 version=0x2403a000 log2_work=94.964467 tx=1017950010 date=‘2024-06-04T06:22:45Z’ progress=1.000000 cache=87.4MiB(578849txo) 2024-06-04T06:24:43Z Saw new header hash=00000000000000000000225789427db9f0e8f310d8bc0a205f884a8ce68a2aaf height=846460 2024-06-04T06:25:09Z UpdateTip: new best=00000000000000000000225789427db9f0e8f310d8bc0a205f884a8ce68a2aaf height=846460 version=0x25ed2000 log2_work=94.964481 tx=1017955017 date=‘2024-06-04T06:24:38Z’ progress=1.000000 cache=88.0MiB(583022txo) 2024-06-04T06:37:08Z Saw new header hash=0000000000000000000012f2d726f8a033a2bfb5eada30cd92e15e6e1d196ce7 height=846461 2024-06-04T06:37:47Z UpdateTip: new best=0000000000000000000012f2d726f8a033a2bfb5eada30cd92e15e6e1d196ce7 height=846461 version=0x23c16000 log2_work=94.964494 tx=1017959189 date=‘2024-06-04T06:36:37Z’ progress=1.000000 cache=89.1MiB(590143txo) 2024-06-04T07:36:10Z Saw new header hash=000000000000000000031f97130e48c0a7797547416d16ccd3d7dd8a6cc6d0b0 height=846462 2024-06-04T07:36:52Z UpdateTip: new best=000000000000000000031f97130e48c0a7797547416d16ccd3d7dd8a6cc6d0b0 height=846462 version=0x2001e000 log2_work=94.964508 tx=1017962502 date=‘2024-06-04T07:35:52Z’ progress=1.000000 cache=90.6MiB(602543txo) 2024-06-04T07:42:53Z Saw new header hash=000000000000000000002bde133693a19d84616a4cf1db767f8864b5288cce6b height=846463 2024-06-04T07:44:01Z UpdateTip: new best=000000000000000000002bde133693a19d84616a4cf1db767f8864b5288cce6b height=846463 version=0x21aea000 log2_work=94.964521 tx=1017965939 date=‘2024-06-04T07:42:39Z’ progress=1.000000 cache=11.0MiB(0txo) 2024-06-04T08:01:04Z Saw new header hash=00000000000000000001d5e0369520ead2dc646b7b592b8bafff8dc02e368600 height=846464 2024-06-04T08:01:40Z LevelDB read failure: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5978736.ldb 2024-06-04T08:01:40Z Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5978736.ldb 2024-06-04T08:01:40Z You can use -debug=leveldb to get more complete diagnostic messages 2024-06-04T08:01:40Z Error: Error reading from database, shutting down. Error: Error reading from database, shutting down. 2024-06-04T08:01:40Z Error reading from database: Fatal LevelDB error: Corruption: block checksum mismatch: /bdata/bitcoin-data/chainstate/5978736.ldb bitcoind.service: Main process exited, code=dumped, status=6/ABRT bitcoind.service: Failed with result ‘core-dump’. bitcoind.service: Consumed 3h 46min 58.907s CPU time. bitcoind.service: Scheduled restart job, restart counter is at 1. Stopped bitcoind.service. bitcoind.service: Consumed 3h 46min 58.907s CPU time.

  20. maflcko commented at 5:17 pm on June 4, 2024: member

    Another thing you could try to debug this further is to put a swapfile, and the datadir on the same AWS gp3 SSD filesystem.

    I am happy to create an AWS account to test this, but it would be good if there was a single (bash) script, which can be deployed to AWS, so that it is easy for anyone to reproduce your exact setup.

  21. apulsifer commented at 6:05 pm on June 4, 2024: none

    IMO, the first thing to do would be for someone who’s familiar with the format of these block files to look at the corrupted files and see if they can figure out what code may have stomped on the blocks (it might be obvious, like a fragment of p2p networking data in the middle of a block – you never know until you look).

    The most likely scenario is that this is a latent software bug that will show up on any machine if its under memory pressure and heavy paging. In my experience, finding problems low incidence seemingly random problems like this requires instrumenting the code (or using automated tools) with frequent memory buffer guard checks, injected faults such as networking jitter, stalls, disconnects, and invalid data, and random waits before and after memory is allocated, freed, and used (including networking and I/O buffers) and locks are acquired and released. I myself am more familiar with troubleshooting these problems under Windoze than Linux tho.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me