coinstatsindex way more fragile than any other data w.r.t. power loss #24076

issue akovalenko openend this issue on January 15, 2022
  1. akovalenko commented at 12:06 pm on January 15, 2022: none

    Two times since a fresh 22.0 installation I’ve had a power loss event (artifically produced to test system behavior), and each time I’ve started getting incorrect muhash values for block heights since approximately that moment

    “Approximately”, as in I’ve not taken care to record the exact time of events because I’ve expecting nothing of interest from bitcoind in particular; “incorrect”, as in compared to my another long-running nodes running with coinstatsindex without power outages, and to its own muhash for the current tip calculated without using the index, gettxoutsetinfo muhash null false.

    Expected behavior

    As usual when testing power outage behavior, I’ve expected pretty much anything except silently returning incorrect data. Most desirable would be checking several recent block muhashes on restart and auto-correcting the problem.

    To reproduce

    Unsure if it can be reproduced reliably, but 2/2 times are rather good indicator that it may not be uncommon

    System information

    Ubuntu Linux 20.04, ~/.bitcoin on /dev/md raid1 over two HDDs, ext4 fs with default mount options

    Running bitcoind 22.0 prebuilt binaries from bitcoincore (amd64).

    Here’s debug.log from a typical restart (I see nothing interesting there, just in case if some configured or autodetected parameters are relevant):

     0 2022-01-14T17:10:34Z Bitcoin Core version v22.0.0 (release build)
     12022-01-14T17:10:34Z Assuming ancestors of block 00000000000000000008a89e854d57e5667df88f1cdef6fde2fbca1de5b639ad have valid signatures.
     22022-01-14T17:10:34Z Setting nMinimumChainWork=00000000000000000000000000000000000000001fa4663bbbe19f82de910280
     32022-01-14T17:10:34Z Using the 'shani(1way,2way)' SHA256 implementation
     42022-01-14T17:10:34Z Using RdSeed as additional entropy source
     52022-01-14T17:10:34Z Using RdRand as an additional entropy source
     62022-01-14T17:10:34Z Default data directory /home/bitnode/.bitcoin
     72022-01-14T17:10:34Z Using data directory /srv/slow/dot-bitcoin
     82022-01-14T17:10:34Z Config file: /etc/bitcoin/bitcoin.conf
     92022-01-14T17:10:34Z Config file arg: blockfilterindex="1"
    102022-01-14T17:10:34Z Config file arg: coinstatsindex="1"
    112022-01-14T17:10:34Z Config file arg: dbcache="2000"
    122022-01-14T17:10:34Z Config file arg: listen="1"
    132022-01-14T17:10:34Z Config file arg: maxuploadtarget="450"
    142022-01-14T17:10:34Z Config file arg: peerblockfilters="1"
    152022-01-14T17:10:34Z Config file arg: server="1"
    162022-01-14T17:10:34Z Config file arg: txindex="1"
    172022-01-14T17:10:34Z Config file arg: upnp="0"
    182022-01-14T17:10:34Z Config file arg: whitelist="download@127.0.0.1"
    192022-01-14T17:10:34Z Config file arg: whitelist="download@192.168.1.0/24"
    202022-01-14T17:10:34Z Config file arg: zmqpubrawblock="tcp://127.0.0.1:28332"
    212022-01-14T17:10:34Z Config file arg: zmqpubrawtx="tcp://127.0.0.1:28333"
    222022-01-14T17:10:34Z Command-line arg: conf="/etc/bitcoin/bitcoin.conf"
    232022-01-14T17:10:34Z Command-line arg: daemonwait=""
    242022-01-14T17:10:34Z Command-line arg: datadir="/srv/slow/dot-bitcoin/"
    252022-01-14T17:10:34Z Command-line arg: pid="/run/bitcoind/bitcoind.pid"
    262022-01-14T17:10:34Z Using at most 125 automatic connections (1024 file descriptors available)
    272022-01-14T17:10:34Z Using 16 MiB out of 32/2 requested for signature cache, able to store 524288 elements
    282022-01-14T17:10:34Z Using 16 MiB out of 32/2 requested for script execution cache, able to store 524288 elements
    292022-01-14T17:10:34Z Script verification uses 15 additional threads
    302022-01-14T17:10:34Z scheduler thread start
    312022-01-14T17:10:34Z HTTP: creating work queue of depth 16
    322022-01-14T17:10:34Z Using random cookie authentication.
    332022-01-14T17:10:34Z Generated RPC authentication cookie /srv/slow/dot-bitcoin/.cookie
    342022-01-14T17:10:34Z HTTP: starting 4 worker threads
    352022-01-14T17:10:34Z Using wallet directory /srv/slow/dot-bitcoin/wallets
    362022-01-14T17:10:34Z init message: Verifying wallet(s)
    372022-01-14T17:10:34Z init message: Loading banlist
    382022-01-14T17:10:34Z SetNetworkActive: true
    392022-01-14T17:10:34Z Using /16 prefix for IP bucketing
    402022-01-14T17:10:34Z Cache configuration:
    412022-01-14T17:10:34Z * Using 2.0 MiB for block index database
    422022-01-14T17:10:34Z * Using 249.8 MiB for transaction index database
    432022-01-14T17:10:34Z * Using 218.5 MiB for basic block filter index database
    442022-01-14T17:10:34Z * Using 8.0 MiB for chain state database
    452022-01-14T17:10:34Z * Using 1521.7 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)
    462022-01-14T17:10:34Z init message: Loading block index
    472022-01-14T17:10:34Z Switching active chainstate to Chainstate [ibd] @ height -1 (null)
    482022-01-14T17:10:34Z Opening LevelDB in /srv/slow/dot-bitcoin/blocks/index
    492022-01-14T17:10:34Z Opened LevelDB successfully
    502022-01-14T17:10:34Z Using obfuscation key for /srv/slow/dot-bitcoin/blocks/index: 0000000000000000
    512022-01-14T17:10:38Z LoadBlockIndexDB: last block file = 2883
    522022-01-14T17:10:38Z LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=17, size=20113255, heights=718645...718661, time=2022-01-14...2022-01-14)
    532022-01-14T17:10:38Z Checking all blk files are present...
    542022-01-14T17:10:38Z Opening LevelDB in /srv/slow/dot-bitcoin/chainstate
    552022-01-14T17:10:39Z Opened LevelDB successfully
    562022-01-14T17:10:39Z Using obfuscation key for /srv/slow/dot-bitcoin/chainstate: d57f45de91bd35fa
    572022-01-14T17:10:39Z Loaded best chain: hashBestChain=000000000000000000001cab1821f8b5ce71faccfde4d9ae11ab444c0dff979f height=718643 date=2022-01-14T14:19:04Z progress=0.999964
    582022-01-14T17:10:39Z init message: Verifying blocks
    592022-01-14T17:10:39Z Verifying last 6 blocks at level 3
    602022-01-14T17:10:39Z [0%]...[16%]...[33%]...[50%]...[66%]...[83%]...[99%]...[DONE].
    612022-01-14T17:12:34Z No coin database inconsistencies in last 6 blocks (9259 transactions)
    622022-01-14T17:12:34Z  block index          120309ms
    632022-01-14T17:12:34Z Opening LevelDB in /srv/slow/dot-bitcoin/indexes/txindex
    642022-01-14T17:12:35Z Opened LevelDB successfully
    652022-01-14T17:12:35Z Using obfuscation key for /srv/slow/dot-bitcoin/indexes/txindex: 0000000000000000
    662022-01-14T17:12:35Z txindex thread start
    672022-01-14T17:12:35Z txindex is enabled at height 718643
    682022-01-14T17:12:35Z txindex thread exit
    692022-01-14T17:12:35Z Opening LevelDB in /srv/slow/dot-bitcoin/indexes/blockfilter/basic/db
    702022-01-14T17:12:35Z Opened LevelDB successfully
    712022-01-14T17:12:35Z Using obfuscation key for /srv/slow/dot-bitcoin/indexes/blockfilter/basic/db: 0000000000000000
    722022-01-14T17:12:35Z basic block filter index thread start
    732022-01-14T17:12:35Z basic block filter index is enabled at height 718643
    742022-01-14T17:12:35Z basic block filter index thread exit
    752022-01-14T17:12:35Z Opening LevelDB in /srv/slow/dot-bitcoin/indexes/coinstats/db
    762022-01-14T17:12:36Z Opened LevelDB successfully
    772022-01-14T17:12:36Z Using obfuscation key for /srv/slow/dot-bitcoin/indexes/coinstats/db: 0000000000000000
    782022-01-14T17:12:36Z coinstatsindex thread start
    792022-01-14T17:12:36Z coinstatsindex is enabled at height 718643
    802022-01-14T17:12:36Z coinstatsindex thread exit
    812022-01-14T17:12:36Z block tree size = 718667
    822022-01-14T17:12:36Z nBestHeight = 718643
    832022-01-14T17:12:36Z loadblk thread start
    84....
    
  2. akovalenko added the label Bug on Jan 15, 2022
  3. Sjors commented at 9:41 am on January 19, 2022: member
    cc @fjahr: can we add a test for an unclean shutdown to the indexer?
  4. mzumsande commented at 11:51 am on January 19, 2022: member
    Not sure if that is the same cause because the unclean shutdown is initiated already during init there, but if I use -coinstatsindex instead of -txindex in feature_init.py, that test fails for me.
  5. fjahr commented at 8:06 pm on January 19, 2022: member
    Oh, interesting, I am investigating, thanks for tagging me @Sjors
  6. mzumsande commented at 0:17 am on January 20, 2022: member
    just fyi, the feature_init fail I mentioned is unrelated to this and not so serious. (The problem there is that BaseIndex::CommitInternal doesn’t check if m_best_block_index == nullptr, which can happen if init is interrupted very early before indexing anything. In that case, DB_BEST_BLOCK is written with a block locator corresponding to the tip, and on the next start we think the index is fully synced when we actually aren’t synced at all. Might still open a small fix for this in the next days.)
  7. fjahr commented at 4:49 pm on January 23, 2022: member

    I have tried to reproduce this by kill -9ing the process but was not successful so far. I think something like this is also the best we could do in an automated test. However I opened #24133 that should address the issue by failing loudly if the restart is indeed the source of the issue. @akovalenko Thanks a lot for reporting, you have already given a lot of information but since I couldn’t reproduce the issue yet it would be great if you could give me more details. Any of the following would be very helpful:

    • Was only the muhash value out of sync or also the other outputs of the index that are also returned with the non index gettxoutsetinfo muhash null false?
    • If you have any real outputs of the gettxoutsetinfo calls after it got out of sync that would be great!
    • If you could dissect or observe the next time if the index got indeed out of sync at the restart or at another point that would be even better. If you still have the corrupted index you could use #19792 to dump the whole index and upload the result, then I could go search for the offending height as well.
    • If you do these planned failures regularly you could also maybe run #24133 with one of them to see if that catches the issue.
  8. mzumsande commented at 2:13 pm on January 24, 2022: member
    I could reproduce this manually on regtest with the standard 200 chain used in the functional tests - adding a sleep to ThreadSync() after block 100 so I could conveniently kill -9 the process. See #24138 for an explanation and a fix.
  9. fanquake referenced this in commit 5f44c5c428 on Feb 20, 2022
  10. sidhujag referenced this in commit d746aa5b1c on Feb 20, 2022
  11. MarcoFalke referenced this in commit 7003b6ab24 on Mar 9, 2022
  12. MarcoFalke closed this on Mar 9, 2022

  13. DrahtBot locked this on Mar 9, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-08 22:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me