IBD stalls permanently with Bitcoin core v23 after “ignoring getheaders from peer=0 because node is in initial block download” #25800

issue verdy-p openend this issue on August 7, 2022
  1. verdy-p commented at 1:41 pm on August 7, 2022: none

    With Bitcoin core v23 (Qt versin on Windows, or with the daemon from commandline), the IBD process can stall indefinitely after receiving a large block:

     02022-08-07T13:24:50Z [net.cpp:3043] [CNode] Added connection to 104.248.143.83:8333 peer=0
     12022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending version (102 bytes) peer=0
     22022-08-07T13:24:50Z [net_processing.cpp:1146] [PushNodeVersion] send version message: version 70016, blocks=239242, them=104.248.143.83:8333, txrelay=1, peer=0
     32022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: version (102 bytes) peer=0
     42022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending verack (0 bytes) peer=0
     52022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getaddr (0 bytes) peer=0
     62022-08-07T13:24:50Z [net_processing.cpp:2754] [ProcessMessage] receive version message: /Satoshi:0.20.1/: version 70015, blocks=748403, us=104.248.143.83:8333, txrelay=1, peer=0, peeraddr=104.248.143.83:8333
     72022-08-07T13:24:50Z [timedata.cpp:57] [AddTimeData] added time data, samples 2, offset -1 (+0 minutes)
     82022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: verack (0 bytes) peer=0
     92022-08-07T13:24:50Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=748403, peer=0, peeraddr=104.248.143.83:8333 (manual)
    102022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendheaders (0 bytes) peer=0
    112022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
    122022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
    132022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending ping (8 bytes) peer=0
    142022-08-07T13:24:50Z [net_processing.cpp:4650] [SendMessages] initial getheaders (319216) to peer=0 (startheight:748403)
    152022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
    162022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending feefilter (8 bytes) peer=0
    172022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendheaders (0 bytes) peer=0
    182022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
    192022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
    202022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: ping (8 bytes) peer=0
    212022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending pong (8 bytes) peer=0
    222022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: getheaders (1029 bytes) peer=0
    232022-08-07T13:24:50Z [net_processing.cpp:3223] [ProcessMessage] Ignoring getheaders from peer=0 because node is in initial block download
    242022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: feefilter (8 bytes) peer=0
    252022-08-07T13:24:50Z [net_processing.cpp:4051] [ProcessMessage] received: feefilter of 0.00001000 BTC/kvB from peer=0
    262022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: pong (8 bytes) peer=0
    272022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: headers (162003 bytes) peer=0
    

    Then any other attemps to connect to other peers (or accept incoming connections) stalls permanently (we can waitfior hours: no network activity, no disk activity, but some thread uses 100% of a CPU core, probably in a tight loop generated by a deadlock and infinite attemps to recover from it).

    If we try to shutdown, al most all databases thread are closed, the RPC services are closed, But the conenction manager remains locked for about ~15 minutes, not doing anything.

    In fact the received headers are not even starting to be parse until AFTER these ~15 minutes. At which time we see that in logs: (note: I’ve enabled logtimestamps=1, logsourcelocations=1 and logthreadnames=1 in bitcoin.conf to have more traceable traces; and run it with debug=all).

    It seems that a lock is kept somewhere in “net_processing.cpp” after the ProcessMessage() exists with “Ignoring getheaders from peer=0 because node is in initial block download”

    Only when we shut down the node we see instantly:

     02022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: requestShutdown : Requesting shutdown
     12022-08-07T13:26:17Z [rpc/server.cpp:302] [operator()] Interrupting RPC
     22022-08-07T13:26:17Z [rpc/server.cpp:314] [operator()] Stopping RPC
     32022-08-07T13:26:17Z [init.cpp:371] [OnRPCStopped] RPC stopped.
     42022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Running Shutdown in thread
     52022-08-07T13:26:17Z [httpserver.cpp:436] [InterruptHTTPServer] Interrupting HTTP server
     62022-08-07T13:26:17Z [httprpc.cpp:312] [InterruptHTTPRPC] Interrupting HTTP RPC server
     72022-08-07T13:26:17Z [init.cpp:205] [Shutdown] Shutdown: In progress...
     82022-08-07T13:26:17Z [httprpc.cpp:317] [StopHTTPRPC] Stopping HTTP RPC server
     92022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for / (exactmatch 1)
    102022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for /wallet/ (exactmatch 0)
    112022-08-07T13:26:17Z [httpserver.cpp:448] [StopHTTPServer] Stopping HTTP server
    122022-08-07T13:26:17Z [httpserver.cpp:450] [StopHTTPServer] Waiting for HTTP worker threads to exit
    132022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] opencon thread exit
    142022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] dnsseed thread exit
    152022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] addcon thread exit
    162022-08-07T13:26:17Z [httpserver.cpp:291] [ThreadHTTP] Exited http event loop
    172022-08-07T13:26:17Z [httpserver.cpp:463] [StopHTTPServer] Waiting for HTTP event thread to exit
    182022-08-07T13:26:17Z [httpserver.cpp:475] [StopHTTPServer] Stopped HTTP server
    192022-08-07T13:26:17Z [mapport.cpp:210] [ProcessUpnp] UPNP_DeletePortMapping() returned: 0
    202022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] mapport thread exit
    

    And then we wait for about 15-20 minutes to finally see this: the received block headers are processed, an attempt for sending another requests for headers is emitted, but all peers are finally disconnected (because we were shutting down), and all remaining data in moemory is flushed to disk and the node finally exits correctly.

     02022-08-07T13:43:47Z [validation.cpp:3606] [ProcessNewBlockHeaders] Synchronizing blockheaders, height: 321216 (~43.64%)
     12022-08-07T13:43:47Z [net_processing.cpp:2198] [ProcessHeadersMessage] more getheaders (321216) to end to peer=0 (startheight:748403)
     22022-08-07T13:43:47Z [net.cpp:1262] [CreateNodeFromAcceptedSocket] connection from [2604:d500:4:1::2]:63949 accepted
     32022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] net thread exit
     42022-08-07T13:43:47Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
     52022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
     62022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] msghand thread exit
     72022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
     82022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] txindex thread exit
     92022-08-07T13:43:47Z [random.cpp:520] [SeedPeriodic] Feeding 1009016 bytes of dynamic environment data into RNG
    102022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat  701ms
    112022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=0
    122022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=0
    132022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=1
    142022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=1
    152022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat  15ms
    162022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] scheduler thread exit
    172022-08-07T13:43:47Z [validation.cpp:4644] [DumpMempool] Writing 0 unbroadcast transactions to disk.
    182022-08-07T13:43:47Z [validation.cpp:4654] [DumpMempool] Dumped mempool: 0s to copy, 0.036892s to dump
    192022-08-07T13:43:47Z [policy/fees.cpp:998] [FlushUnconfirmed] Recorded 0 unconfirmed txs from mempool in 0s
    202022-08-07T13:43:47Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
    212022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (38.02ms)
    222022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
    232022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.0MiB, after=0.3MiB
    242022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (36.98ms)
    252022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
    262022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
    272022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
    282022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
    292022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
    302022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
    312022-08-07T13:43:48Z [validationinterface.cpp:243] [operator()] ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
    322022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
    332022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
    342022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
    352022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (31.27ms)
    362022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
    372022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.3MiB, after=0.3MiB
    382022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (15.63ms)
    392022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
    402022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
    412022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
    422022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
    432022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
    442022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
    452022-08-07T13:43:48Z [init.cpp:322] [Shutdown] Shutdown: done
    462022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Shutdown finished
    472022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopping thread
    482022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopped thread
    

    If we relaunch the node, it loads all data successfully, connects to a new peer=0 and requests another set of headers, then stalls again exactly as above, and we have to shutdown and wait 15-20 minutes exactly the same way as above. A single connection is made instantly to download a single nex set of headers, and then stalls forever. Other connections are added to the list (the UPnP discovery occurs, other peers are also discovered), but will never start. These extra connections are locked again until….

    We shutdown the UI and then wait for 15-20 minutes when the RPC services are closed. These pending connections are then open and closed immediately and the UI closes.

    Result: we have been able to download a single set of headers at each run (less than 1 minute to start and then seeing the lock after the 1st download of headers, just about 180KB, then no network or disk activity at all dor an infinite time, then shutdown and wait 15-20 minutes).

    Expected behavior

    IBD should not stall. It should also connect to other peers, and not just to the initial one (the QT interface dispays constantly: “connecting to peers…”. System information

    Bitcoin v23 for Windows (from https://bitcoincore.org/en/download/, extracted from the ZIP file, because the current EXE installer does not run on a Windows system enforcing ASLT)

    Windows 11 x64 (Intel Core i7), 32 GB RAM

  2. verdy-p added the label Bug on Aug 7, 2022
  3. verdy-p commented at 1:56 pm on August 7, 2022: none

    Note: I tried to locate the message “ignoring getheaders from peer=0 because node is in initial block download” emitted in [net_processing.cpp:3223] [ProcessMessage], but this is not what we have in the sources here. I see this message only in very old versions of Bitcoin sources.

    So Bitcoin v23 was compiled from some sources that were out of sync (probably on another test branch, not meant to be part of the release).

    It seems also that the test for seeing if some lock must be released is inverted (so this can explain the 100% CPU use seen on one CPU core by a thread runing the tight loop almost forever, of waiting for some exception or event which will come only when we shut down)

  4. verdy-p commented at 2:13 pm on August 7, 2022: none
    It seems that the way that LOCK(cs_main) are called is faulty, this is a real mess in a giant function containing many successive but exclusive if() that should be a switch to smaller functions. How are locks released when you return early without processing a message?
  5. verdy-p commented at 3:40 pm on August 7, 2022: none

    Anyway I do not understand why a node A in IBD (having for example here 300K blocks synchronized) that is connecting to a full node B (which is fully in sync with 760K blocks for now) and requests it a set of headers, gets these headers, but then node B needs to request headers to node A: how can a full node in sync request any headers (in the 760K blocks) to a node in IBD which does not have them, when node B knows that node A was just syncing (in the 300K blocks, several years behind)?

    Note that my node and the remote node were both using the v23 version of Bitcoin core… v23 seems to work OK only for a node that is already in sync (or not too far behind, so that a single headers request at most would allow them to know they are in sync). I’ve also seen in Bitcoin.io that v23 nodes seem now to be frequently stalling near the best tip. They apparently need to be regularly shut down and restarted to resync them.

    In other words the logic for sync’ing nodes (even those that are not in IBD) is broken now in v23.

  6. verdy-p commented at 5:12 pm on August 7, 2022: none

    Note also that even if downloading headers stalls (only advanced by about 1000 blocks at each run), the number of fully downloaded blocks NEVER progresses; all the available ones are already indexed, have their “rev*dat” file ready, the transaction indexi and coinbase indexes seem to progress (but only according to logs: in fact, some of them are merged to regenerate a new top index while the previous top index is deleted; there remains only indexes for the very old blocks); but according to statistics we are froen always to the same full block.

    I’ve enabled all documented “check*” options and validations, no probelm or corruption is ever detected. Have you tried to resync any FRESH full node recently (with no other data)? Is it likelty that some parts of full data for some blocks are ONLY available via Tor and not from any other node in the P2P network (via IPv4, IPVv6, or I2P), and not available even in so called “full nodes”? Or that I’m stuck in a dead branch of the network each time I try (note that my node is reachable via both IPv4 and IPv6, it uses UPnP for NAT over IPv4, but IPv6 is direct), and that my IPs are not reachable from some parts of the Internet, so that I need to use proxies?

  7. verdy-p commented at 8:08 pm on August 7, 2022: none

    After (too many) tests, I just figured out what to do: STOPPING using the QT UI for the IBD.

    You may use te QT Interface just to start and try the configuration, but for the IBD it is unreliable.

    This bug seems to be caused by Qt’s own internal threads for its UI, adding it own unsynchronized locks or managing other condurrent threads used by Bitcoin incorrectly, or it could be a bug in the Qt integration inside the BitCoin Core UI.

    Also I note that bitcointd is much faster than bitcoin-qt to progress in the IBD phase.

    Once the IBD phase is complete. you can interrupt bitcoind (CTRL+C is safe, and it terminates much faster), if you want to continue with the QT UI (which will not consume a lot of CPU/disk/net resources). However I’ve noted frequently that the QT version can corrupt some indexes. The Qt integrationis not completely thread safe and has race isues.

    IMHO, the QT version should run the daemon only in a separate process (which could be registered in Windows as a standard “service”, without any attached console), and control it with its RPC/IPC interface. Stopping the QT interface should not stop the daemon, unless we click a button to stop the daemon by its RPC/IPC interface (or via Windows services API).

    As well there should be an option to send logs not in a flat file, but to Windows Events, where they could be compressed and managed automatically with a circular buffer (internally Windows Events will be packed in *.evtx or *.etl files and have a type handler managing common strings and formaters): this also saves a lot of I/O; Windows provides tools to dump these events into a flat file if needed, and to also filter them by sources, event type IDs, or sort them in different order. This would be mostly useful for log events generated by the daemon (debug logs for the QT interface can remain in flat files, or could be sent to a different source category in Windows Events).


    Note also that I use a dedicated storage partition only for the /Bitcoin data files and NOT the main system partition.

    I have also formatted it not with the default cluster sizes (4KB) but increased it to clusters of 256KB (because most files are very large): this significantly improves the I/O, reduces the filesystem overhead in terms of I/O and memory for the filesystem structures (allocation bitmaps are smaller and modified much less often, because database files, undo files, index files, redo-logs, and debug log files are all allocated incrementally and in parallel), it improves the filesystem cache, reduces the fragmentation a lot (when the store is on harddisks, even on a fast array), so we also get reduced latency.

    This also greatly facilitates maintenance.

  8. ghost commented at 9:09 pm on August 7, 2022: none
    Permanent solution is to downgrade to 22.0
  9. verdy-p commented at 11:58 pm on August 7, 2022: none

    No, version 22.0 has the same issue (though it does not have the bug present in the EXE installer for Windows caused by ASLR, due to a bug in MAKENSIS which creates an invalid EXE by when it appends the “pcode” compiled from the nsis.ini script, and then appends the compressed files, because it does not correctly manages the relations that are placed after them, but forgets to update the PE EXE header to the new placement of the relocation section in the EXE file)

    v22 tools has the same issue with the IBD which stalls as well if you use the QT interface (in addition, the bitcoin-qt.exe also corrupts the indexes; Qt is unsafe as it silently changes some global settings in the C runtime environment, and the LevelDB is also affected; it seems that Qt temporarily changes the current working directory, and files accessed by LeevlDB do not seem to specify the absolute path, so there are race conditions where some Qt threads may affect concurrent LevelDB threads.

    IF you use only the CLI daemon (v22 or v23), there’s no such conflict and race conditions; it is faster in both cases than their QT version; but the v23 includes many fixes (multiple ones being critical) still not present in the v22 binaries (and not backported in the v22 sources).

    I just hope that a new v23 will be recreated soon (and there are other bugs pending to be solved, already present in the sources, but not in the precompiled binaries).

    In my opinon the QT version of the daemon has not been seriously tested. And trying to link the daemon with the QT interface is an error; the QT interface should spawn or control the regular CLI daemon

  10. ghost commented at 6:15 pm on August 8, 2022: none
    Learning from this, just use the CLI version of bitcoin core by running bitcoind.exe on cmd and have another cmd session to interface with the wallet using another executable (I think it is bitcoin-cli.exe)
  11. bitcoin deleted a comment on Aug 8, 2022
  12. verdy-p commented at 7:38 am on August 11, 2022: none

    Also I’ve discovered that in some cases, during the IBD phase, if we have too many open connections with other peers, we exhaust the maximum number of allowed open files in the whole process. As a result we get a status saying “Error: Error reading from database, shutting down.”

    But what is worse is that during that shutdown, Bitcoind or Bitcoin-qt is unable to save its current indexing: it should first close all peer connections, make sure that they don’t have threads open in existing LevelDB files or on block files, BEFORE finally saving the indexing state. This occurs much more often with the QT version probably beaue Qt itself uses many additional file handles.

    So Bitcoin core is not properly handling cases where it cannot open files. If this occurs, an exception handler should first seek to terminate connections to peers with lower priority (notably incoming peers before outgoing peers).

    As well there are visibly leaks of open file handles if we leave Bitcoin Core open for a long time, especially during the IBD phase. The work around for that is to run the IBD phase for a limited time (e.g. stop after half an hour). Bitcoin core should in fact make such cleanup, possibly closing all connections to peers regularly, flushing its indexing state.

    As well it should NOT cleanup all files in the chainstate immediately, but should preserve at least the previous valid manifest and its attached lbd files. If ever there’s been a corruption because it could not save the last chainstate, at least it has a way to restart from the previous manifest, and may be me more conservative by limiting the number of peers with which it interacts (for now there’s no limit at all, and during the IBD, remote peers can request any data from anywhere, in any order, and this locks too many files, more that what is allowed in the configuration.

    By default it accepts up to 125 peers, and about 2000 open files, but during the IBD and if you have multiple indexes in construction, that limit is easily exhausted (notably when you run it over a Gigabit Internet connection): you cannot safely run the IBD phase and at the same time allow the construction of several additional indexes in parallel (e.g. with blockfilterindex=1, coinstatsindex=1, peerblockfilters=1, peerbloomfilters=1: you can only enable them one by one, after you’ev succesfully finished the IBD and each of the following phases for other indexes).

    Such forced shutdown (which incorrectly states that the “blocks are corrupted”, when in fact it could not even open them because of internal resource limits) leaving the chainstate after the shutdown in an inconsistant state (I’ve checked the blocks they were perfectly correct, it’s just that Bitcoin Core could not open them!) and it is very harmful (because we’ve lost the chainstate which becomes unrecoverable and has to be reindexed once again from the begining) and as opposed to what is said, this has nothing to do with a hardware problem on the sorage or with the memory: it is architectural in the app, which does not use safe management of its own resources usage, even if the host can perfectly support more! The so called “corruption” is generated by Bitcoin Core itself by not properly tracking its own internal errors for its own resource usages, whose given diagnostics are wrong about their effective cause).

    Also I wonder why, with the option “reindex-chainstate”, Bitcoin core will not just run without accepting ANY incoming connections; it may just open 1 outgoing connection to a full peer, and it should be sufficient for the time needed to recover the chainstate to the last block.

  13. verdy-p commented at 12:19 pm on August 11, 2022: none

    another proof that the logic for IBD is wrong and stalls after one of many retries without any success:

     02022-08-11T12:10:31Z init message: Loading block index…
     12022-08-11T12:10:31Z Switching active chainstate to Chainstate [ibd] @ height -1 (null)
     22022-08-11T12:10:31Z Opening LevelDB in G:\Bitcoin\blocks\index
     32022-08-11T12:10:31Z Opened LevelDB successfully
     42022-08-11T12:10:31Z Using obfuscation key for G:\Bitcoin\blocks\index: 0000000000000000
     52022-08-11T12:10:35Z LoadBlockIndexDB: last block file = 2611
     62022-08-11T12:10:35Z LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=12, size=15108850, heights=686224...686609, time=2021-06-04...2021-06-07)
     72022-08-11T12:10:35Z Checking all blk files are present...
     82022-08-11T12:10:36Z Opening LevelDB in G:\Bitcoin\chainstate
     92022-08-11T12:10:36Z Opened LevelDB successfully
    102022-08-11T12:10:36Z Using obfuscation key for G:\Bitcoin\chainstate: ea53eab331512c62
    112022-08-11T12:10:36Z Loaded best chain: hashBestChain=00000000000000003a8d7be745e0515c78b51014311f20235327855bdd1e87e4 height=277116 date=2013-12-26T21:18:12Z progress=0.039674
    122022-08-11T12:10:36Z init message: Verifying blocks…
    132022-08-11T12:10:36Z Verifying last 6 blocks at level 3
    142022-08-11T12:10:36Z [0%]...[16%]...[33%]...[50%]...[66%]...[83%]...[99%]...[DONE].
    152022-08-11T12:10:36Z No coin database inconsistencies in last 6 blocks (2745 transactions)
    162022-08-11T12:10:36Z  block index            5193ms
    172022-08-11T12:10:36Z block tree size = 748958
    182022-08-11T12:10:36Z nBestHeight = 277116
    192022-08-11T12:10:36Z loadblk thread start
    202022-08-11T12:10:36Z Imported mempool transactions from disk: 0 succeeded, 0 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
    212022-08-11T12:10:36Z loadblk thread exit
    222022-08-11T12:10:36Z torcontrol thread start
    232022-08-11T12:10:36Z AddLocal([2a01:e0a:30:2b70::4925:5939]:8333,1)
    242022-08-11T12:10:36Z Discover: JUPITER - 2a01:e0a:30:2b70::4925:5939
    252022-08-11T12:10:36Z Bound to 127.0.0.1:8334
    262022-08-11T12:10:36Z Bound to [::]:8333
    272022-08-11T12:10:36Z Bound to 0.0.0.0:8333
    282022-08-11T12:10:36Z Loaded 2 addresses from "anchors.dat"
    292022-08-11T12:10:36Z 2 block-relay-only anchors will be tried for connections.
    302022-08-11T12:10:36Z init message: Starting network threads…
    312022-08-11T12:10:36Z net thread start
    322022-08-11T12:10:36Z dnsseed thread start
    332022-08-11T12:10:36Z Waiting 300 seconds before querying DNS seeds.
    342022-08-11T12:10:36Z addcon thread start
    352022-08-11T12:10:36Z opencon thread start
    362022-08-11T12:10:36Z init message: Done loading
    372022-08-11T12:10:36Z msghand thread start
    382022-08-11T12:10:37Z New outbound peer connected: version: 70016, blocks=748973, peer=0 (block-relay-only)
    392022-08-11T12:10:37Z New outbound peer connected: version: 70016, blocks=748973, peer=1 (block-relay-only)
    402022-08-11T12:10:49Z New outbound peer connected: version: 70016, blocks=748973, peer=2 (outbound-full-relay)
    412022-08-11T12:10:55Z New outbound peer connected: version: 70016, blocks=748973, peer=3 (outbound-full-relay)
    422022-08-11T12:10:56Z New outbound peer connected: version: 70016, blocks=748973, peer=4 (outbound-full-relay)
    432022-08-11T12:10:56Z New outbound peer connected: version: 70016, blocks=748973, peer=5 (outbound-full-relay)
    442022-08-11T12:11:04Z New outbound peer connected: version: 70016, blocks=748973, peer=6 (outbound-full-relay)
    452022-08-11T12:11:10Z New outbound peer connected: version: 70016, blocks=748973, peer=7 (outbound-full-relay)
    462022-08-11T12:11:11Z New outbound peer connected: version: 70015, blocks=748973, peer=8 (outbound-full-relay)
    472022-08-11T12:11:11Z New outbound peer connected: version: 70015, blocks=748973, peer=9 (outbound-full-relay)
    482022-08-11T12:11:17Z New outbound peer connected: version: 70016, blocks=748973, peer=10 (outbound-full-relay)
    492022-08-11T12:11:29Z New outbound peer connected: version: 70015, blocks=748972, peer=13 (outbound-full-relay)
    

    And here it stalls at nBestHeight = 277116 (still in Chainstate [ibd] @ height -1 (null)), even though it knows at startup that block tree size = 748958, and then makes outgoing connection to peers all having blocks=748973, without asking anything else to them (those peers disconnect, and the local node just continues to make outbound connections). The local node already has all blocks up to height=748958, but stops going further and noes not index them. What is it waiting for?

    How many times I’ve tried to relaunch the daemon with -reindex-chainstate? The chainstate management is definitely broken with bad assumptions, also causing frequent crashes when it does not just stall!

  14. sipa commented at 12:22 pm on August 11, 2022: member
    What does the getchaintips RPC output?
  15. verdy-p commented at 12:23 pm on August 11, 2022: none

    Bitcoin> bitcoin-cli.exe -getinfo Chain: main Blocks: 277116 Headers: 277116 Verification progress: â–’â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘â–‘ 3.9674% Difficulty: 1180923195.258026

    Network: in 0, out 0, total 0 Version: 230000 Time offset (s): 0 Proxies: n/a Min tx relay fee rate (BTC/kvB): 0.00001000

    (this is when it stalls, only the “Network: in 0, out 0, total 0” line changes with the number of peers connected and disconnecting continuously)

    I can press CTRL+C, the daamon will exit normally without error, and I can relauch it, nothing changes. the chaintips never changes, and it is never updated even when I activate the option “-debug=all” to get more logs.

    I can use v22 or v23 (only from the published releases, I did not compile them) I get the same.

    More details with “debug=all”:

     02022-08-11T12:35:30Z received: getheaders (1029 bytes) peer=45
     12022-08-11T12:35:30Z Ignoring getheaders from peer=45 because node is in initial block download
     22022-08-11T12:35:31Z received: pong (8 bytes) peer=45
     32022-08-11T12:35:32Z socket closed for peer=31
     42022-08-11T12:35:32Z disconnecting peer=31
     52022-08-11T12:35:32Z Cleared nodestate for peer=31
     62022-08-11T12:35:32Z initial getheaders (277115) to peer=41 (startheight:748977)
     72022-08-11T12:35:32Z sending getheaders (997 bytes) peer=41
     82022-08-11T12:35:32Z Selected 2.61.65.59:8333 from new
     92022-08-11T12:35:32Z trying connection 2.61.65.59:8333 lastseen=49.3hrs
    102022-08-11T12:35:32Z sending addr (31 bytes) peer=38
    112022-08-11T12:35:32Z received: headers (162003 bytes) peer=41
    122022-08-11T12:35:32Z AcceptBlockHeader: block 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b is marked invalid
    132022-08-11T12:35:32Z Misbehaving: peer=41 (0 -> 100) DISCOURAGE THRESHOLD EXCEEDED: invalid header received
    142022-08-11T12:35:32Z Disconnecting and discouraging peer 41!
    152022-08-11T12:35:32Z disconnect by subnet matched peer=41; disconnecting
    162022-08-11T12:35:32Z disconnecting peer=41
    172022-08-11T12:35:32Z Cleared nodestate for peer=41
    182022-08-11T12:35:32Z initial getheaders (277115) to peer=36 (startheight:748977)
    192022-08-11T12:35:32Z sending getheaders (997 bytes) peer=36
    202022-08-11T12:35:33Z received: headers (162003 bytes) peer=36
    212022-08-11T12:35:33Z AcceptBlockHeader: block 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b is marked invalid
    222022-08-11T12:35:33Z Misbehaving: peer=36 (0 -> 100) DISCOURAGE THRESHOLD EXCEEDED: invalid header received
    232022-08-11T12:35:33Z Disconnecting and discouraging peer 36!
    242022-08-11T12:35:33Z disconnect by subnet matched peer=36; disconnecting
    252022-08-11T12:35:33Z disconnecting peer=36
    262022-08-11T12:35:33Z Cleared nodestate for peer=36
    272022-08-11T12:35:33Z initial getheaders (277115) to peer=34 (startheight:748977)
    282022-08-11T12:35:33Z sending getheaders (997 bytes) peer=34
    292022-08-11T12:35:33Z received: headers (162003 bytes) peer=34
    302022-08-11T12:35:33Z AcceptBlockHeader: block 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b is marked invalid
    312022-08-11T12:35:33Z Misbehaving: peer=34 (0 -> 100) DISCOURAGE THRESHOLD EXCEEDED: invalid header received
    322022-08-11T12:35:33Z Disconnecting and discouraging peer 34!
    332022-08-11T12:35:33Z disconnect by subnet matched peer=34; disconnecting
    342022-08-11T12:35:33Z disconnecting peer=34
    352022-08-11T12:35:33Z Cleared nodestate for peer=34
    362022-08-11T12:35:33Z initial getheaders (277115) to peer=33 (startheight:748977)
    372022-08-11T12:35:33Z sending getheaders (997 bytes) peer=33
    382022-08-11T12:35:33Z received: headers (162003 bytes) peer=33
    392022-08-11T12:35:33Z AcceptBlockHeader: block 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b is marked invalid
    402022-08-11T12:35:33Z Misbehaving: peer=33 (0 -> 100) DISCOURAGE THRESHOLD EXCEEDED: invalid header received
    412022-08-11T12:35:33Z Disconnecting and discouraging peer 33!
    422022-08-11T12:35:33Z disconnect by subnet matched peer=33; disconnecting
    432022-08-11T12:35:33Z disconnecting peer=33
    442022-08-11T12:35:33Z Cleared nodestate for peer=33
    452022-08-11T12:35:33Z initial getheaders (277115) to peer=38 (startheight:748971)
    462022-08-11T12:35:33Z sending getheaders (997 bytes) peer=38
    472022-08-11T12:35:34Z received: inv (397 bytes) peer=35
    482022-08-11T12:35:34Z got inv: wtx 44fd2ca0a4e5554690cd35871cf9da6a122c247b070504525351582fa1c7de37  new peer=35
    492022-08-11T12:35:34Z got inv: wtx 29d1039191288ff05646be90af2236f7e3161365dc5857980f3a8189c9c5754a  new peer=35
    502022-08-11T12:35:34Z got inv: wtx 254600576b8891f6389bae1151ef3b72ee8c8d1a2f633165a069705e60ccc694  new peer=35
    512022-08-11T12:35:34Z got inv: wtx 19c968d92c93c5a9f6c25220762f9d41c4b6dec83ab70b724668c3a6536c0773  new peer=35
    522022-08-11T12:35:34Z got inv: wtx 3f9fe1d48bac7d20fc33e688f18667473b959739716c436b0f5479ee7bf7a346  new peer=35
    532022-08-11T12:35:34Z got inv: wtx 288ff0886891bae9fa1471db7c7a909f3b3ec015e3ca0f407b32889eb44ea259  new peer=35
    542022-08-11T12:35:34Z got inv: wtx 3aba4502674f4c46de3b2916647793f081aebc31b134d700d4ab8d798d2b1b6d  new peer=35
    552022-08-11T12:35:34Z got inv: wtx 1239b7f6ff54ab12c7b1d5730d7a2ea4a827e559d3e8461ccee06ab58ac7e470  new peer=35
    562022-08-11T12:35:34Z got inv: wtx 2b13fe8c8e41401a5c17b6d241cde283cee18b974ed7379dfa3b46f1e435979b  new peer=35
    572022-08-11T12:35:34Z got inv: wtx 16bfe62a3e9d8fd2c8f8e93086e73218a4de9a38eb77c4893e66b9ef5ea4c106  new peer=35
    582022-08-11T12:35:34Z got inv: wtx abc6b8fd20904a859715114cda786af8ce6f58ab1286f9be14eb457ba38b8558  new peer=35
    592022-08-11T12:35:37Z connection attempt to 2.61.65.59:8333 timed out
    

    Do I always find only “misbehaving” nodes? Or isn’t Bitcoin Core v22+ itself misbehaving?

    Note that I have stopped since long trying to use the QT version (which bugs too much often). I just use the command line daemon (bitcoind) and the command line CLI query client (bitcoin-cli). In all cases these stalls or crashes are inside the management of the chainstate (there’s no corruption at all in the downloaded blocks) occuring at random position (with the same valid blocks downloaded I can restart “reindex-chainstate”, it always fails.

    If I restart with “reindex” it forces downloading again the whole blocks, but when performing binary compares with a backup of blocks from previous runs, they are strictly identical (except “block00000.dat” which is special and contains the current root of the tree, updated at each run).

    I thought my PC was too fast, and my Gigabit figer internet too fast. I tried running the daemon in a Linux VM, by restricting its amount of memory and network bandwidth allocated in the hypervisor. Still I get the same results: It’s never possible to terminate the IBD, I always get crashes by corruptions in the chainstate (some *.ldb files were deleted too early after a merge, even though they were still needed for other upper *.ldb files files; or I got the chainstate stalled as well in the IBD, and no corruption detected by the daemon).

    I don’t know what kind of machine or specific configuration can really make Bitcoin Core working and succeed in their IBD phase.

    Or the Bitcoin chain already and really contains invalid blocks (that were rejected much longer after they were first accepted), that can’t be eliminated and no progress is possible (in that case it is the whole shared Bitcoin chain which is damaged since long, or Bitcoin Core has forgotten to implement a rule to perform a safe cleanup of rejected blocks).

    Are there reliable Bitnode softwares other than Bitnode Core v22+ that can be used today to run a full node with the current state of the chain?

  16. verdy-p commented at 1:12 pm on August 11, 2022: none

    And if you still think that my machine has a hardware problem, I’ve also tried on different PCs, and with a Linux VM hosted in Azure and Amazon. They are also failing to sync the IBD with the same symptoms in the “chainstate” (corruption of the chain with index *.lbd files deleted too soon, and missing, or stalling at unpredictable height).

    I think that those that have Bitcoin Core working on their machine had to compile it themselves. But they DO NOT use your signed binary releases (which aer most prbably not compiled correctly in a safe environment; I know you use a new build chain in “GUIX”, but it is then not working at all: do you really test the generated binaries, which seem to be broken locally in GUIX by some unknown and unsafe/untested local customizations and patches, or running on a broken/damaged Linux installation?).

    My opinion is that it is not my machines that are borken, but your “GUIX” environment used to produce the signed binaries. And unfortunately you’ve not opensourced that environment; may be you should retry by using an hypervisor like Proxmox and building a “template” for a safe Linux image, a safe installtion of GCC and related tools; that tempalte would then be easy to replicate when you want to make all your tests before actual releases. The sources of Bitcoin Core alone is definitely not sufficient without a safe build chain we can replicate easily. Apparently you are only testing Bitcoin core with binaries you compile on your local machines without using “GUIX”, you never use or test the released signed binaries published on your official website!

  17. sipa commented at 1:13 pm on August 11, 2022: member
    Can you report the output of getchaintips?
  18. verdy-p commented at 1:56 pm on August 11, 2022: none
     0bitcoin> bitcoin-cli getchaintips
     1[
     2  {
     3    "height": 748957,
     4    "hash": "00000000000000000002997b7cf13fed750e2183d7693a24785c9f594389ee88",
     5    "branchlen": 471841,
     6    "status": "invalid"
     7  },
     8  {
     9    "height": 277116,
    10    "hash": "00000000000000003a8d7be745e0515c78b51014311f20235327855bdd1e87e4",
    11    "branchlen": 0,
    12    "status": "active"
    13  }
    14]
    

    Do I need to manually delete some recent blk0*.dat and rev0*.dat and restart indexing again and again? Are they caused by misbehaving nodes (or malwares) in the P2P network distributing wrong blocks and causing normal nodes in IBD to fail or stall? Is it the signal of an DoS attack again the Bitcoin chain in order to take control of it by “ejecting” regular nodes with such bad blocks? In that case, they are succesful in their attacks since many months! I’ve never been able to compleet the IBD from any machine when using Bitcoin Core. May be this is the sign that Bitcoin Core does not fully apply the “consensus” rules for validating new blocks, and that it accepts them too early, and is unable to rollback safely when they are detected as “disapproved” too much longer later. And above some limits (a big number of bad block in the chain to revert to replace one that came much earlier in the chain), it seems to crash and corrupt the “chainstate” index files itself.

    But may be there are other competing softwares that could work such as https://www.bitcoinabc.org/ (if they’ve detected such attack and are able to isolate and fix it)?

  19. sipa commented at 2:12 pm on August 11, 2022: member

    So, for whatever reason, your node has marked block 277117 (0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b) incorrectly as invalid. It’s not synchronizing further, because it is treating every peer that hands it block 277177 as malicious. We’ll need to figure out why that is happening. Your comment above about running out of file descriptors seems like a useful lead.

    First, we need to know if it is the block on disk itself that is corrupted, or the validation database. It’s likely the latter, but to find out, run reconsiderblock 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b, and report what you see in debug.log at that time.

    Do I need to manually delete some recent blk0*.dat and rev0*.dat and restart indexing again and again?

    If the blocks themselves are corrupted, you’ll need to use -reindex. If it’s just the database that is corrupted, -reindex-chainstate should suffice. Given that it sounds like you’ve tried the latter several times and that doesn’t help, it’s either a problem with the blocks themselves, or somehow your Bitcoin Core instance is corrupting itself time after time. Looking at the reconsiderblock output will help figure this out.

    Are they caused by misbehaving nodes (or malwares) in the P2P network distributing wrong blocks and causing normal nodes in IBD to fail or stall?

    Misbehaving nodes should never be able to cause Bitcoin Core to incorrectly mark a block as invalid, so no.

    May be this is the sign that Bitcoin Core does not fully apply the “consensus” rules for validating new blocks, and that it accepts them too early

    No.

    But may be there are other competing softwares that could work such as https://www.bitcoinabc.org/ (if they’ve detected such attack and are able to isolate and fix it)?

    That software is for a different currency, with different consensus rules. It will not synchronize with the Bitcoin network/chain/currency.

  20. verdy-p commented at 2:15 pm on August 11, 2022: none

    I jsut ran “bitcoin-cli reconsiderblock 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b”.

    And instantly the daemon restarts the IBD synchronization.

    I see many “UpdateTip: new best=*” logged.

    I did not have to use “-reindex” or “-reindex-chainstate”.

  21. maflcko commented at 2:18 pm on August 11, 2022: member

    I see many “UpdateTip: new best=*” logged.

    You’ll need to provide the exact debug log at the exact location when the RPC reconsiderblock was called

  22. sipa commented at 2:19 pm on August 11, 2022: member
    @verdy-p Do the heights keep going up, past 277117? If so, it sounds like there may not actually have been any corruption at all, but just a one-off validation issue.
  23. verdy-p commented at 2:19 pm on August 11, 2022: none

    Too late… that log has been “autocleared” with the huge amiunt of “UpdateTip: new best=*” that instantly started (and I cannot recover it as well fro mthe console which has flushed it as well from its memory.

    Yes “bitcoin-cli getinfo” shows that it is progressing again, as well as “bitcoin-cli getchaintips”.

    Note that in the separate console where I ran “bitcoin-cli reconsiderblock 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b”, the command never exited; I had to press CTRL+C to terminate the RPC session. But this does not affect the daemon which continues at fast speed.

    For example I get now:

     0bitcoin> bitcoin-cli getchaintips
     1[
     2  {
     3    "height": 748989,
     4    "hash": "000000000000000000081ba73135ef39612f940bc77e76240e52ef83b5f0b1e9",
     5    "branchlen": 443985,
     6    "status": "headers-only"
     7  },
     8  {
     9    "height": 305004,
    10    "hash": "0000000000000000162865e084d637f7e7fbd3580a2dae91abe6d47fa4ef8abe",
    11    "branchlen": 0,
    12    "status": "active"
    13  }
    14]
    15```
    
  24. sipa commented at 2:24 pm on August 11, 2022: member
    @verdy-p Yeah, it’s expected that it doesn’t exit; it’ll exit whenever it is done reconsidering that block, and all other blocks whose validation is triggered by it. If all goes well, that won’t be until it reaches the tip of the downloaded blocks you had before (height 748957) - or until somehow another incorrect validation error occurs.
  25. verdy-p commented at 2:31 pm on August 11, 2022: none

    Could it be caused by some OS limitation (temporarily disabling write accesses for some critical operations, and sending a “busy” status error that Bitcoin Core does not corerctly interpret, and cannot recover just by a retry after some reasonnable delay; e.g. it could be generated by security tools, instructing a running app to “pause” temporarily and not allowing for example to create or delete new files in the directory containing files of the chainstate, which changes really too fast with thousands files added/deleted each minute)? It may be caused by the fact that even if we close a file handle, the OS may still maintain it internally for its own operations, and cannot allocate new file handles for some time, until the OS performs some background cleanup in the background with the filesystem.

    So the issue would be that Bitcoin core fails while trying to open a new file: the OS does not accept immediately to give it a new file handle, and the error returned may indicate that the application needs to wait a bit before retrying.

  26. sipa commented at 2:33 pm on August 11, 2022: member

    @verdy-p That sounds like a possibility.

    If so, I would guess that whatever is going on is likely happening when Bitcoin Core is flushing the chainstate database to disk. You can trigger such a flush manually by calling gettxoutsetinfo, if you’d like.

  27. verdy-p commented at 2:41 pm on August 11, 2022: none

    Thanks for the tip about “bitcoin-cli gettxoutsetinfo”. It takes some minutes to complete until it returns:

    0{
    1  "height": 347551,
    2  "bestblock": "000000000000000014f5db2548b27a532ba2d81e22ada3800e787383496da281",
    3  "txouts": 17634623,
    4  "bogosize": 1340919614,
    5  "hash_serialized_2": "7a9209a30d15eade5e74248095962d82d67c1b060e91f9eb7127eac805aafbb0",
    6  "total_amount": 13938639.77152479,
    7  "transactions": 4995854,
    8  "disk_size": 950709316
    9}
    

    But this does not seem to stop the daemon which continues adding new tips at increadible rate (I just see instantly the truncation of some “level-0” table for less than one second, then it continues indexing at fast rate. I get the reply from “gettxoutsetinfo” only a few minutes later (probably the RPC service thread has low priority to reply, most of the CPU threads are for the indexing).

  28. verdy-p commented at 2:46 pm on August 11, 2022: none
    So for some reasons if the OS returns some error when opening a file, the log should detail that error (with the exact error number: it is very likely non-fatal, we should just retry after waiting a couple of seconds: there does not seem to exist any exception handler to handle this case), instead of just saying that it can’t open a file and give a false guess about a corruption and exiting abruptly (and at that time, trying to flush the state of the chain before exising will also faile for the same reason, as it will need to open many files for that: at that time this crashes, and we are unable to recover with missing files).
  29. sipa commented at 2:47 pm on August 11, 2022: member
    @verdy-p If that’s the issue, it’s a bug inside LevelDB, which we’ll be unlikely to fix on our own. At this point I would avoid jumping to any conclusions and just try to reproduce the problem so we can try things.
  30. verdy-p commented at 2:53 pm on August 11, 2022: none

    It’s true that I’ve never seen any LevelDB application creating and deleting so many indexes at such incredible rate. Generally these indexes are quite stable, they just grow slowly in size. May be the maximum index sizes for each table is too small (2MB) for a fast machine, and we easily reach the limitation of the OS for the filesystem background operations that are likely starting as soon as we close the file (and LevelDB doesnot seem to “sync” the filesystem when it closes a file, it allwos the OS to perform background cleanup, but the file handle remains allocated for the process as long as the OS has not terminated those operations.

    Does LevelDB uses some optimized operations that allow file close to return instantly without waiting the effective completion? such optimization does not seem to be very “safe”. The “close()” operation (or delete/move/rename operation) should be blocking.

    But visibly LevelDB seems to use worker threads to complete its file I/O like ni this project (using the Go language)

    https://github.com/syndtr/goleveldb/blob/master/leveldb/storage/file_storage.go

    LevelDB creates an increadible amount of threads if I monitor Bitcoin Core and just look at them, and the total number of file handles that are open (they are frequently “spiking” very high, notably when performing “gettxoutsetinfo”)

    Also I don’t understand why Bitcoind core is flushing EVERYTHING including its whole cache of block data (about 760GB) according to logs to return to just a single block (about 45MB). shouldn’t we split the cache in two parts, so that we flush it just in half, and then maintain two metafiles (one for the older view of the chainstate, another for the most recent, making sure that the older view will remain intact and will not be deleted as long as the new state is not finished: if the new state fails, at least we can recover from the previous state. old indexes don’t need to be deleted immediately if they can be still referenced from the old state: this allow safe recovery in all cases (even if the daemon crashes for any reason and is unable to flush its state completely).

  31. sipa commented at 3:19 pm on August 11, 2022: member

    Let’s focus on reproducing. I am not convinced LevelDB (or even anything related to the chainstate at all) is at fault here, because the fact that reconsiderblock caused things to get unstruck implies there wasn’t any actual on-disk corruption.

    Also I don’t understand why Bitcoind core is flushing EVERYTHING including its whole cache of block data (about 760GB)

    Only the chainstate database is cached by Bitcoin Core, which is a few GB typically (and only up to the dbcache setting of that is kept in memory).

    according to logs to return to just a single block (about 45MB).

    Blocks are never more than 4 MB, and aren’t stored in LevelDB (or any kind of database). They’re stored in raw blk*.dat files.

  32. maflcko removed the label Bug on Aug 11, 2022
  33. maflcko added the label Questions and Help on Aug 11, 2022
  34. maflcko added the label Data corruption on Aug 11, 2022
  35. maflcko commented at 3:31 pm on August 11, 2022: member
    I’d guess the most likely cause would be a cosmic ray or old/overheated CPU/RAM, as the issue wasn’t persistent.
  36. verdy-p commented at 4:29 pm on August 11, 2022: none
    No Marco, cosmic rays are very unlikely, and overheating is definitely not the case. The bug is persistent even if it occurs “unpredictably” (for now because we don’t know the exact cause) and in lot of different hardware and OS configurations, and even when running on very reliable systems (like Azure and Amazon VMs). And I have not noted any hardware issue reported by the OS or any problem in the filesystem which is always “clean”, and based on very reliable RAID stores with multiple layers of hardware and software checks. It looks very much like a race issue in Bitcoin Core implementation itself (or its dependancies like LevelDB), which is more likely to occur on very fast machines and with very fast Internet connections. All I did to limit the rate of these errors was to voluntarily slowdown Bitcoin Core. And I’ve never detected any block corruption when I compare the regular snapshots of data I make with backups or across the different hosts that I’ve tried. They just fail at different time, but they all fail sometime.
  37. maflcko added the label Windows on Aug 11, 2022
  38. maflcko commented at 4:42 pm on August 11, 2022: member
    Do you have exact steps to reproduce on Azure or Amazon? Please make sure to include exact and complete steps, including all settings that are changed on the OS layer or elsewhere.
  39. verdy-p commented at 5:06 pm on August 11, 2022: none
    The steps are just configure a standard VM with default settings. I chose the most common OSes with little configuration. I chose the standard binary releases. jkuste like anyone would do at home on their PC. I tried many different configs (since many months), none have successed. We see failures happening after about 2 hours, or sometimes just after less than 10 minutes. I can erase all the daa and restart, this just fails at different moments, but it never completes. In fact I have no idea about why it works for many other people. But they don’t tell us why. Most probably many other have tried and already abandoned rapidly since long.
  40. maflcko commented at 5:24 pm on August 11, 2022: member
    Are you absolutely sure you are using “default” settings? You said the same thing in the last issue, and I wasn’t able to reproduce with default settings: #25726 (comment)
  41. verdy-p commented at 8:19 pm on August 11, 2022: none

    I just run “bitcoind” with no options its logs in debug.log and on the console, unless I give the option to not no log anything on th console or in “debug.log” (this makes it run faster), or I use

    $ bitcoin-cli debug ‘[]’ ‘[“all”]’

    to disable new log outputs (it just logs the startup or shutdown and " UpdateTip: new best=*" messages, still at a very high rate.)

  42. verdy-p commented at 5:13 am on August 12, 2022: none

    But may be there are other competing softwares that could work such as https://www.bitcoinabc.org/ (if they’ve detected such attack and are able to isolate and fix it)?

    That software is for a different currency, with different consensus rules. It will not synchronize with the Bitcoin network/chain/currency.

    Why do I see them connected to the Bitcoin chain (according to their user agent string)? What are they doing there? They are just “listening” and not emitting any transaction? But frequently blocks are downloaded from them.

  43. verdy-p commented at 11:35 am on August 12, 2022: none

    I’ve made other tests by reducing a lot the dbcache size to just 4MB (and the mempool size to its minimum allowed of 5MB, even if it’s empty during the IBD phase, because that unused mempool is used to augment the dbcache size.

    What I observe as that when the dbcache is full, and compacts the index files, it may happen that it DELETE an index file (in the chainstate) too early (this index was generated in an eraly pass, and survived several passes where the dbcache was full. But at one point, it remains jsut as one of the “oldest” one and it is unconditionally deleted even if it is referenced in newer indexes that were added in later passes.

    So this is not a data corruption, but a logical error when deleting indexes.

    To make this clear, I purged the logs from “New tips” and here is when a specific index file #9575 for the “chainstate” was created, then survived serval “passes” of compression of the “cache”, and finally deleted just before it was accessed again (and that access failed, it was deleted by error).

      02022-08-12T08:39:06Z UpdateTip: new best=00000000000000656391cde7c780eb05cfff19176bae8cd7810a9db66d6df2c1 height=215930 version=0x00000001 log2_work=69.270036 tx=10937927 date='2013-01-10T03:00:11Z' progress=0.014484 cache=5.6MiB(40342txo)
      1
      2
      3
      42022-08-12T08:39:06Z leveldb: Generated table [#9775](/bitcoin-bitcoin/9775/)@2: 40143 keys, 2173277 bytes
      52022-08-12T08:39:07Z Cache size (6923712) exceeds total space (6835008)
      62022-08-12T08:39:07Z leveldb: compacted to: files[ 1 6 58 45 0 0 0 ]
      72022-08-12T08:39:07Z leveldb: Delete type=2 [#8916](/bitcoin-bitcoin/8916/)
      82022-08-12T08:39:07Z leveldb: Delete type=2 [#9485](/bitcoin-bitcoin/9485/)
      92022-08-12T08:39:07Z leveldb: Delete type=2 [#9486](/bitcoin-bitcoin/9486/)
     102022-08-12T08:39:07Z leveldb: Delete type=2 [#9487](/bitcoin-bitcoin/9487/)
     112022-08-12T08:39:07Z leveldb: Delete type=2 [#9488](/bitcoin-bitcoin/9488/)
     122022-08-12T08:39:07Z leveldb: Delete type=2 [#9489](/bitcoin-bitcoin/9489/)
     132022-08-12T08:39:07Z leveldb: Delete type=2 [#9490](/bitcoin-bitcoin/9490/)
     142022-08-12T08:39:07Z leveldb: Delete type=2 [#9493](/bitcoin-bitcoin/9493/)
     152022-08-12T08:39:07Z leveldb: Delete type=2 [#9656](/bitcoin-bitcoin/9656/)
     162022-08-12T08:39:07Z leveldb: Delete type=2 [#9728](/bitcoin-bitcoin/9728/)
     172022-08-12T08:39:07Z leveldb: Delete type=2 [#9729](/bitcoin-bitcoin/9729/)
     182022-08-12T08:39:07Z leveldb: Delete type=2 [#9730](/bitcoin-bitcoin/9730/)
     192022-08-12T08:39:07Z leveldb: Delete type=2 [#9731](/bitcoin-bitcoin/9731/)
     202022-08-12T08:39:07Z leveldb: Delete type=2 [#9733](/bitcoin-bitcoin/9733/)
     212022-08-12T08:39:07Z leveldb: Delete type=2 [#9734](/bitcoin-bitcoin/9734/)
     222022-08-12T08:39:07Z leveldb: Delete type=2 [#9735](/bitcoin-bitcoin/9735/)
     232022-08-12T08:39:07Z leveldb: Delete type=2 [#9736](/bitcoin-bitcoin/9736/)
     242022-08-12T08:39:07Z leveldb: Delete type=2 [#9737](/bitcoin-bitcoin/9737/)
     252022-08-12T08:39:07Z leveldb: Delete type=2 [#9738](/bitcoin-bitcoin/9738/)
     262022-08-12T08:39:07Z leveldb: Delete type=2 [#9739](/bitcoin-bitcoin/9739/)
     272022-08-12T08:39:07Z leveldb: Delete type=2 [#9749](/bitcoin-bitcoin/9749/)
     282022-08-12T08:39:07Z leveldb: Delete type=2 [#9750](/bitcoin-bitcoin/9750/)
     292022-08-12T08:39:07Z leveldb: Delete type=2 [#9764](/bitcoin-bitcoin/9764/)
     302022-08-12T08:39:07Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
     312022-08-12T08:39:07Z leveldb: Level-0 table [#9783](/bitcoin-bitcoin/9783/): started
     322022-08-12T08:39:07Z WriteBatch memory usage: db=chainstate, before=3.8MiB, after=7.6MiB
     332022-08-12T08:39:07Z leveldb: Level-0 table [#9783](/bitcoin-bitcoin/9783/): 2532965 bytes OK
     342022-08-12T08:39:07Z leveldb: Delete type=0 [#9759](/bitcoin-bitcoin/9759/)
     352022-08-12T08:39:07Z leveldb: Compacting 2@0 + 6@1 files
     362022-08-12T08:39:07Z leveldb: Generated table [#9784](/bitcoin-bitcoin/9784/)@0: 43672 keys, 2180595 bytes
     372022-08-12T08:39:07Z leveldb: Generated table [#9785](/bitcoin-bitcoin/9785/)@0: 8527 keys, 410180 bytes
     382022-08-12T08:39:07Z leveldb: Generated table [#9786](/bitcoin-bitcoin/9786/)@0: 47576 keys, 2185414 bytes
     392022-08-12T08:39:07Z leveldb: Generated table [#9787](/bitcoin-bitcoin/9787/)@0: 4499 keys, 204380 bytes
     402022-08-12T08:39:07Z leveldb: Generated table [#9788](/bitcoin-bitcoin/9788/)@0: 47828 keys, 2185234 bytes
     412022-08-12T08:39:07Z leveldb: Generated table [#9789](/bitcoin-bitcoin/9789/)@0: 10969 keys, 496878 bytes
     422022-08-12T08:39:08Z leveldb: Generated table [#9790](/bitcoin-bitcoin/9790/)@0: 47342 keys, 2184113 bytes
     432022-08-12T08:39:08Z leveldb: Generated table [#9791](/bitcoin-bitcoin/9791/)@0: 4074 keys, 182887 bytes
     442022-08-12T08:39:08Z leveldb: Generated table [#9792](/bitcoin-bitcoin/9792/)@0: 48512 keys, 2184988 bytes
     452022-08-12T08:39:08Z leveldb: Generated table [#9793](/bitcoin-bitcoin/9793/)@0: 4761 keys, 216771 bytes
     462022-08-12T08:39:08Z leveldb: Compacted 2@0 + 6@1 files => 12431440 bytes
     472022-08-12T08:39:08Z leveldb: compacted to: files[ 0 10 58 45 0 0 0 ]
     482022-08-12T08:39:08Z leveldb: Compacting 1@1 + 6@2 files
     492022-08-12T08:39:08Z leveldb: Generated table [#9794](/bitcoin-bitcoin/9794/)@1: 45350 keys, 2181751 bytes
     502022-08-12T08:39:08Z leveldb: Generated table [#9795](/bitcoin-bitcoin/9795/)@1: 38913 keys, 2171326 bytes
     512022-08-12T08:39:08Z leveldb: Generated table [#9796](/bitcoin-bitcoin/9796/)@1: 35041 keys, 1913554 bytes
     522022-08-12T08:39:08Z leveldb: Compacted 1@1 + 6@2 files => 6266631 bytes
     532022-08-12T08:39:08Z leveldb: compacted to: files[ 0 9 55 45 0 0 0 ]
     542022-08-12T08:39:08Z leveldb: Delete type=2 [#9624](/bitcoin-bitcoin/9624/)
     552022-08-12T08:39:08Z leveldb: Delete type=2 [#9625](/bitcoin-bitcoin/9625/)
     562022-08-12T08:39:08Z leveldb: Delete type=2 [#9657](/bitcoin-bitcoin/9657/)
     572022-08-12T08:39:08Z leveldb: Delete type=2 [#9751](/bitcoin-bitcoin/9751/)
     582022-08-12T08:39:08Z leveldb: Delete type=2 [#9752](/bitcoin-bitcoin/9752/)
     592022-08-12T08:39:08Z leveldb: Delete type=2 [#9753](/bitcoin-bitcoin/9753/)
     602022-08-12T08:39:08Z leveldb: Delete type=2 [#9754](/bitcoin-bitcoin/9754/)
     612022-08-12T08:39:08Z leveldb: Delete type=2 [#9755](/bitcoin-bitcoin/9755/)
     622022-08-12T08:39:08Z leveldb: Delete type=2 [#9756](/bitcoin-bitcoin/9756/)
     632022-08-12T08:39:08Z leveldb: Delete type=2 [#9760](/bitcoin-bitcoin/9760/)
     642022-08-12T08:39:08Z leveldb: Delete type=2 [#9770](/bitcoin-bitcoin/9770/)
     652022-08-12T08:39:08Z leveldb: Delete type=2 [#9771](/bitcoin-bitcoin/9771/)
     662022-08-12T08:39:08Z leveldb: Delete type=2 [#9772](/bitcoin-bitcoin/9772/)
     672022-08-12T08:39:08Z leveldb: Delete type=2 [#9783](/bitcoin-bitcoin/9783/)
     682022-08-12T08:39:08Z leveldb: Delete type=2 [#9785](/bitcoin-bitcoin/9785/)
     692022-08-12T08:39:08Z leveldb: Expanding@1 1+11 (2185414+21943875 bytes) to 2+11 (2389794+21943875 bytes)
     702022-08-12T08:39:08Z leveldb: Compacting 2@1 + 11@2 files
     712022-08-12T08:39:08Z leveldb: Generated table [#9797](/bitcoin-bitcoin/9797/)@1: 40063 keys, 2172908 bytes
     722022-08-12T08:39:08Z leveldb: Generated table [#9798](/bitcoin-bitcoin/9798/)@1: 40028 keys, 2176547 bytes
     732022-08-12T08:39:08Z leveldb: Generated table [#9799](/bitcoin-bitcoin/9799/)@1: 38815 keys, 2170398 bytes
     742022-08-12T08:39:08Z leveldb: Generated table [#9800](/bitcoin-bitcoin/9800/)@1: 39765 keys, 2172175 bytes
     752022-08-12T08:39:09Z leveldb: Generated table [#9801](/bitcoin-bitcoin/9801/)@1: 39211 keys, 2171465 bytes
     762022-08-12T08:39:09Z leveldb: Generated table [#9802](/bitcoin-bitcoin/9802/)@1: 39092 keys, 2174567 bytes
     772022-08-12T08:39:09Z leveldb: Generated table [#9803](/bitcoin-bitcoin/9803/)@1: 40303 keys, 2173785 bytes
     782022-08-12T08:39:09Z leveldb: Generated table [#9804](/bitcoin-bitcoin/9804/)@1: 39612 keys, 2172109 bytes
     792022-08-12T08:39:09Z leveldb: Generated table [#9805](/bitcoin-bitcoin/9805/)@1: 39188 keys, 2171479 bytes
     802022-08-12T08:39:09Z leveldb: Generated table [#9806](/bitcoin-bitcoin/9806/)@1: 39821 keys, 2176196 bytes
     812022-08-12T08:39:09Z leveldb: Generated table [#9807](/bitcoin-bitcoin/9807/)@1: 16248 keys, 858058 bytes
     822022-08-12T08:39:09Z leveldb: Compacted 2@1 + 11@2 files => 22589687 bytes
     832022-08-12T08:39:09Z leveldb: compacted to: files[ 0 7 55 45 0 0 0 ]
     842022-08-12T08:39:09Z leveldb: Delete type=2 [#9319](/bitcoin-bitcoin/9319/)
     852022-08-12T08:39:09Z leveldb: Delete type=2 [#9626](/bitcoin-bitcoin/9626/)
     862022-08-12T08:39:09Z leveldb: Delete type=2 [#9627](/bitcoin-bitcoin/9627/)
     872022-08-12T08:39:09Z leveldb: Delete type=2 [#9628](/bitcoin-bitcoin/9628/)
     882022-08-12T08:39:09Z leveldb: Delete type=2 [#9629](/bitcoin-bitcoin/9629/)
     892022-08-12T08:39:09Z leveldb: Delete type=2 [#9632](/bitcoin-bitcoin/9632/)
     902022-08-12T08:39:09Z leveldb: Delete type=2 [#9633](/bitcoin-bitcoin/9633/)
     912022-08-12T08:39:09Z leveldb: Delete type=2 [#9634](/bitcoin-bitcoin/9634/)
     922022-08-12T08:39:09Z leveldb: Delete type=2 [#9635](/bitcoin-bitcoin/9635/)
     932022-08-12T08:39:09Z leveldb: Delete type=2 [#9636](/bitcoin-bitcoin/9636/)
     942022-08-12T08:39:09Z leveldb: Delete type=2 [#9637](/bitcoin-bitcoin/9637/)
     952022-08-12T08:39:09Z leveldb: Delete type=2 [#9786](/bitcoin-bitcoin/9786/)
     962022-08-12T08:39:09Z leveldb: Delete type=2 [#9787](/bitcoin-bitcoin/9787/)
     972022-08-12T08:39:09Z leveldb: Expanding@1 1+10 (2184113+21728027 bytes) to 2+10 (2367000+21728027 bytes)
     982022-08-12T08:39:09Z leveldb: Compacting 2@1 + 10@2 files
     992022-08-12T08:39:09Z leveldb: Generated table [#9808](/bitcoin-bitcoin/9808/)@1: 39168 keys, 2170981 bytes
    1002022-08-12T08:39:09Z leveldb: Generated table [#9809](/bitcoin-bitcoin/9809/)@1: 39842 keys, 2172945 bytes
    101
    102
    103
    1042022-08-12T08:39:09Z Cache size (6992256) exceeds total space (6835008)
    1052022-08-12T08:39:09Z leveldb: Generated table [#9810](/bitcoin-bitcoin/9810/)@1: 39592 keys, 2172424 bytes
    1062022-08-12T08:39:10Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
    1072022-08-12T08:39:10Z leveldb: Generated table [#9811](/bitcoin-bitcoin/9811/)@1: 39167 keys, 2171711 bytes
    1082022-08-12T08:39:10Z leveldb: Level-0 table [#9813](/bitcoin-bitcoin/9813/): started
    1092022-08-12T08:39:10Z WriteBatch memory usage: db=chainstate, before=3.8MiB, after=7.6MiB
    1102022-08-12T08:39:10Z leveldb: Level-0 table [#9813](/bitcoin-bitcoin/9813/): 2537527 bytes OK
    1112022-08-12T08:39:10Z leveldb: Delete type=0 [#9782](/bitcoin-bitcoin/9782/)
    1122022-08-12T08:39:10Z leveldb: Generated table [#9814](/bitcoin-bitcoin/9814/)@1: 39556 keys, 2175754 bytes
    1132022-08-12T08:39:10Z leveldb: Generated table [#9815](/bitcoin-bitcoin/9815/)@1: 39455 keys, 2175033 bytes
    1142022-08-12T08:39:10Z leveldb: Generated table [#9816](/bitcoin-bitcoin/9816/)@1: 40154 keys, 2176692 bytes
    1152022-08-12T08:39:10Z leveldb: Generated table [#9817](/bitcoin-bitcoin/9817/)@1: 39655 keys, 2172245 bytes
    1162022-08-12T08:39:10Z leveldb: Generated table [#9818](/bitcoin-bitcoin/9818/)@1: 38929 keys, 2174589 bytes
    1172022-08-12T08:39:10Z leveldb: Generated table [#9819](/bitcoin-bitcoin/9819/)@1: 41090 keys, 2174505 bytes
    1182022-08-12T08:39:10Z leveldb: Generated table [#9820](/bitcoin-bitcoin/9820/)@1: 11097 keys, 630285 bytes
    1192022-08-12T08:39:10Z leveldb: Compacted 2@1 + 10@2 files => 22367164 bytes
    1202022-08-12T08:39:10Z leveldb: compacted to: files[ 1 5 56 45 0 0 0 ]
    1212022-08-12T08:39:10Z leveldb: Delete type=2 [#9594](/bitcoin-bitcoin/9594/)
    1222022-08-12T08:39:10Z leveldb: Delete type=2 [#9595](/bitcoin-bitcoin/9595/)
    1232022-08-12T08:39:10Z leveldb: Delete type=2 [#9596](/bitcoin-bitcoin/9596/)
    1242022-08-12T08:39:10Z leveldb: Delete type=2 [#9597](/bitcoin-bitcoin/9597/)
    1252022-08-12T08:39:10Z leveldb: Delete type=2 [#9598](/bitcoin-bitcoin/9598/)
    1262022-08-12T08:39:10Z leveldb: Delete type=2 [#9599](/bitcoin-bitcoin/9599/)
    1272022-08-12T08:39:10Z leveldb: Delete type=2 [#9602](/bitcoin-bitcoin/9602/)
    1282022-08-12T08:39:10Z leveldb: Delete type=2 [#9603](/bitcoin-bitcoin/9603/)
    1292022-08-12T08:39:10Z leveldb: Delete type=2 [#9604](/bitcoin-bitcoin/9604/)
    1302022-08-12T08:39:10Z leveldb: Delete type=2 [#9605](/bitcoin-bitcoin/9605/)
    1312022-08-12T08:39:10Z leveldb: Delete type=2 [#9790](/bitcoin-bitcoin/9790/)
    1322022-08-12T08:39:10Z leveldb: Delete type=2 [#9791](/bitcoin-bitcoin/9791/)
    1332022-08-12T08:39:10Z leveldb: Compacting 1@0 + 5@1 files
    1342022-08-12T08:39:11Z leveldb: Generated table [#9821](/bitcoin-bitcoin/9821/)@0: 44382 keys, 2180697 bytes
    1352022-08-12T08:39:11Z leveldb: Generated table [#9822](/bitcoin-bitcoin/9822/)@0: 21785 keys, 1072502 bytes
    1362022-08-12T08:39:11Z leveldb: Generated table [#9823](/bitcoin-bitcoin/9823/)@0: 5473 keys, 286217 bytes
    1372022-08-12T08:39:11Z leveldb: Generated table [#9824](/bitcoin-bitcoin/9824/)@0: 47921 keys, 2185021 bytes
    1382022-08-12T08:39:11Z leveldb: Generated table [#9825](/bitcoin-bitcoin/9825/)@0: 9491 keys, 433795 bytes
    1392022-08-12T08:39:11Z leveldb: Generated table [#9826](/bitcoin-bitcoin/9826/)@0: 11317 keys, 550703 bytes
    1402022-08-12T08:39:11Z leveldb: Generated table [#9827](/bitcoin-bitcoin/9827/)@0: 48629 keys, 2186670 bytes
    1412022-08-12T08:39:11Z leveldb: Generated table [#9828](/bitcoin-bitcoin/9828/)@0: 8308 keys, 377653 bytes
    1422022-08-12T08:39:11Z leveldb: Compacted 1@0 + 5@1 files => 9273258 bytes
    1432022-08-12T08:39:11Z leveldb: compacted to: files[ 0 8 56 45 0 0 0 ]
    1442022-08-12T08:39:11Z leveldb: Delete type=2 [#9784](/bitcoin-bitcoin/9784/)
    1452022-08-12T08:39:11Z leveldb: Delete type=2 [#9788](/bitcoin-bitcoin/9788/)
    1462022-08-12T08:39:11Z leveldb: Delete type=2 [#9789](/bitcoin-bitcoin/9789/)
    1472022-08-12T08:39:11Z leveldb: Delete type=2 [#9792](/bitcoin-bitcoin/9792/)
    1482022-08-12T08:39:11Z leveldb: Delete type=2 [#9793](/bitcoin-bitcoin/9793/)
    1492022-08-12T08:39:11Z leveldb: Delete type=2 [#9813](/bitcoin-bitcoin/9813/)
    1502022-08-12T08:39:11Z leveldb: Compacting 1@1 + 5@2 files
    1512022-08-12T08:39:11Z leveldb: Generated table [#9829](/bitcoin-bitcoin/9829/)@1: 48934 keys, 2185074 bytes
    1522022-08-12T08:39:11Z leveldb: Generated table [#9830](/bitcoin-bitcoin/9830/)@1: 38447 keys, 1718958 bytes
    1532022-08-12T08:39:11Z leveldb: Generated table [#9831](/bitcoin-bitcoin/9831/)@1: 49035 keys, 2185196 bytes
    1542022-08-12T08:39:11Z leveldb: Generated table [#9832](/bitcoin-bitcoin/9832/)@1: 41810 keys, 2174146 bytes
    1552022-08-12T08:39:11Z leveldb: Generated table [#9833](/bitcoin-bitcoin/9833/)@1: 24075 keys, 1340799 bytes
    1562022-08-12T08:39:11Z leveldb: Compacted 1@1 + 5@2 files => 9604173 bytes
    1572022-08-12T08:39:11Z leveldb: compacted to: files[ 0 7 56 45 0 0 0 ]
    1582022-08-12T08:39:11Z leveldb: Delete type=2 [#9767](/bitcoin-bitcoin/9767/)
    1592022-08-12T08:39:11Z leveldb: Delete type=2 [#9768](/bitcoin-bitcoin/9768/)
    1602022-08-12T08:39:11Z leveldb: Delete type=2 [#9769](/bitcoin-bitcoin/9769/)
    1612022-08-12T08:39:11Z leveldb: Delete type=2 [#9794](/bitcoin-bitcoin/9794/)
    1622022-08-12T08:39:11Z leveldb: Delete type=2 [#9795](/bitcoin-bitcoin/9795/)
    1632022-08-12T08:39:11Z leveldb: Delete type=2 [#9822](/bitcoin-bitcoin/9822/)
    1642022-08-12T08:39:11Z leveldb: Compacting 1@2 + 4@3 files
    1652022-08-12T08:39:12Z leveldb: Generated table [#9834](/bitcoin-bitcoin/9834/)@2: 40301 keys, 2174121 bytes
    1662022-08-12T08:39:12Z leveldb: Generated table [#9835](/bitcoin-bitcoin/9835/)@2: 39719 keys, 2172310 bytes
    1672022-08-12T08:39:12Z leveldb: Generated table [#9836](/bitcoin-bitcoin/9836/)@2: 38897 keys, 2170931 bytes
    168
    169
    170
    1712022-08-12T08:39:12Z Cache size (6878304) exceeds total space (6835008)
    1722022-08-12T08:39:12Z leveldb: Generated table [#9837](/bitcoin-bitcoin/9837/)@2: 23903 keys, 1294937 bytes
    1732022-08-12T08:39:12Z leveldb: Compacted 1@2 + 4@3 files => 7812299 bytes
    1742022-08-12T08:39:12Z leveldb: compacted to: files[ 0 7 55 45 0 0 0 ]
    1752022-08-12T08:39:12Z leveldb: Delete type=2 [#9494](/bitcoin-bitcoin/9494/)
    1762022-08-12T08:39:12Z leveldb: Delete type=2 [#9495](/bitcoin-bitcoin/9495/)
    1772022-08-12T08:39:12Z leveldb: Delete type=2 [#9765](/bitcoin-bitcoin/9765/)
    1782022-08-12T08:39:12Z leveldb: Delete type=2 [#9780](/bitcoin-bitcoin/9780/)
    1792022-08-12T08:39:12Z leveldb: Delete type=2 [#9781](/bitcoin-bitcoin/9781/)
    1802022-08-12T08:39:12Z leveldb: Level-0 table [#9839](/bitcoin-bitcoin/9839/): started
    1812022-08-12T08:39:12Z WriteBatch memory usage: db=chainstate, before=3.8MiB, after=7.6MiB
    1822022-08-12T08:39:12Z leveldb: Level-0 table [#9839](/bitcoin-bitcoin/9839/): 2498978 bytes OK
    1832022-08-12T08:39:12Z leveldb: Delete type=0 [#9812](/bitcoin-bitcoin/9812/)
    1842022-08-12T08:39:12Z leveldb: Compacting 1@0 + 7@1 files
    1852022-08-12T08:39:12Z leveldb: Generated table [#9840](/bitcoin-bitcoin/9840/)@0: 45613 keys, 2183633 bytes
    1862022-08-12T08:39:12Z leveldb: Generated table [#9841](/bitcoin-bitcoin/9841/)@0: 23249 keys, 1135569 bytes
    1872022-08-12T08:39:12Z leveldb: Generated table [#9842](/bitcoin-bitcoin/9842/)@0: 10138 keys, 503820 bytes
    1882022-08-12T08:39:12Z leveldb: Generated table [#9843](/bitcoin-bitcoin/9843/)@0: 48488 keys, 2186035 bytes
    1892022-08-12T08:39:12Z leveldb: Generated table [#9844](/bitcoin-bitcoin/9844/)@0: 18620 keys, 843849 bytes
    1902022-08-12T08:39:12Z leveldb: Generated table [#9845](/bitcoin-bitcoin/9845/)@0: 10398 keys, 520406 bytes
    1912022-08-12T08:39:12Z leveldb: Generated table [#9846](/bitcoin-bitcoin/9846/)@0: 48985 keys, 2186338 bytes
    1922022-08-12T08:39:12Z leveldb: Generated table [#9847](/bitcoin-bitcoin/9847/)@0: 10405 keys, 468151 bytes
    1932022-08-12T08:39:12Z leveldb: Compacted 1@0 + 7@1 files => 10027801 bytes
    1942022-08-12T08:39:13Z leveldb: compacted to: files[ 0 8 55 45 0 0 0 ]
    1952022-08-12T08:39:13Z leveldb: Compacting 1@1 + 7@2 files
    1962022-08-12T08:39:13Z leveldb: Generated table [#9848](/bitcoin-bitcoin/9848/)@1: 50790 keys, 2187264 bytes
    1972022-08-12T08:39:13Z leveldb: Generated table [#9849](/bitcoin-bitcoin/9849/)@1: 50090 keys, 2185479 bytes
    1982022-08-12T08:39:13Z leveldb: Generated table [#9850](/bitcoin-bitcoin/9850/)@1: 17474 keys, 790414 bytes
    1992022-08-12T08:39:13Z leveldb: Generated table [#9851](/bitcoin-bitcoin/9851/)@1: 49508 keys, 2187084 bytes
    2002022-08-12T08:39:13Z leveldb: Generated table [#9852](/bitcoin-bitcoin/9852/)@1: 46851 keys, 2076869 bytes
    2012022-08-12T08:39:13Z leveldb: Generated table [#9853](/bitcoin-bitcoin/9853/)@1: 38937 keys, 2171072 bytes
    2022022-08-12T08:39:13Z leveldb: Generated table [#9854](/bitcoin-bitcoin/9854/)@1: 39432 keys, 2172351 bytes
    2032022-08-12T08:39:13Z leveldb: Generated table [#9855](/bitcoin-bitcoin/9855/)@1: 8414 keys, 453601 bytes
    2042022-08-12T08:39:13Z leveldb: Compacted 1@1 + 7@2 files => 14224134 bytes
    2052022-08-12T08:39:13Z leveldb: compacted to: files[ 0 7 56 45 0 0 0 ]
    2062022-08-12T08:39:13Z leveldb: Delete type=2 [#9821](/bitcoin-bitcoin/9821/)
    2072022-08-12T08:39:13Z leveldb: Delete type=2 [#9823](/bitcoin-bitcoin/9823/)
    2082022-08-12T08:39:13Z leveldb: Delete type=2 [#9824](/bitcoin-bitcoin/9824/)
    2092022-08-12T08:39:13Z leveldb: Delete type=2 [#9825](/bitcoin-bitcoin/9825/)
    2102022-08-12T08:39:13Z leveldb: Delete type=2 [#9826](/bitcoin-bitcoin/9826/)
    2112022-08-12T08:39:13Z leveldb: Delete type=2 [#9827](/bitcoin-bitcoin/9827/)
    2122022-08-12T08:39:13Z leveldb: Delete type=2 [#9828](/bitcoin-bitcoin/9828/)
    2132022-08-12T08:39:13Z leveldb: Delete type=2 [#9839](/bitcoin-bitcoin/9839/)
    2142022-08-12T08:39:13Z leveldb: Compacting 1@1 + 6@2 files
    2152022-08-12T08:39:13Z leveldb: Generated table [#9856](/bitcoin-bitcoin/9856/)@1: 54034 keys, 2192179 bytes
    2162022-08-12T08:39:13Z leveldb: Generated table [#9857](/bitcoin-bitcoin/9857/)@1: 54644 keys, 2192950 bytes
    2172022-08-12T08:39:14Z leveldb: Generated table [#9858](/bitcoin-bitcoin/9858/)@1: 53467 keys, 2191225 bytes
    2182022-08-12T08:39:14Z leveldb: Generated table [#9859](/bitcoin-bitcoin/9859/)@1: 53435 keys, 2190814 bytes
    2192022-08-12T08:39:14Z leveldb: Generated table [#9860](/bitcoin-bitcoin/9860/)@1: 22479 keys, 1014350 bytes
    2202022-08-12T08:39:14Z leveldb: Generated table [#9861](/bitcoin-bitcoin/9861/)@1: 19814 keys, 949923 bytes
    221
    222
    223
    2242022-08-12T08:39:14Z Cache size (6853472) exceeds total space (6835008)
    2252022-08-12T08:39:14Z leveldb: Generated table [#9862](/bitcoin-bitcoin/9862/)@1: 50721 keys, 2188402 bytes
    2262022-08-12T08:39:14Z leveldb: Generated table [#9863](/bitcoin-bitcoin/9863/)@1: 2000 keys, 85505 bytes
    2272022-08-12T08:39:14Z leveldb: Compacted 1@1 + 6@2 files => 13005348 bytes
    2282022-08-12T08:39:14Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
    2292022-08-12T08:39:14Z leveldb: compacted to: files[ 0 6 58 45 0 0 0 ]
    2302022-08-12T08:39:14Z leveldb: Delete type=2 [#9757](/bitcoin-bitcoin/9757/)
    2312022-08-12T08:39:14Z leveldb: Delete type=2 [#9758](/bitcoin-bitcoin/9758/)
    2322022-08-12T08:39:14Z leveldb: Delete type=2 [#9761](/bitcoin-bitcoin/9761/)
    2332022-08-12T08:39:14Z leveldb: Delete type=2 [#9762](/bitcoin-bitcoin/9762/)
    2342022-08-12T08:39:14Z leveldb: Delete type=2 [#9763](/bitcoin-bitcoin/9763/)
    2352022-08-12T08:39:14Z leveldb: Delete type=2 [#9766](/bitcoin-bitcoin/9766/)
    2362022-08-12T08:39:14Z leveldb: Delete type=2 [#9796](/bitcoin-bitcoin/9796/)
    2372022-08-12T08:39:14Z leveldb: Delete type=2 [#9829](/bitcoin-bitcoin/9829/)
    2382022-08-12T08:39:14Z leveldb: Delete type=2 [#9830](/bitcoin-bitcoin/9830/)
    2392022-08-12T08:39:14Z leveldb: Delete type=2 [#9831](/bitcoin-bitcoin/9831/)
    2402022-08-12T08:39:14Z leveldb: Delete type=2 [#9832](/bitcoin-bitcoin/9832/)
    2412022-08-12T08:39:14Z leveldb: Delete type=2 [#9833](/bitcoin-bitcoin/9833/)
    2422022-08-12T08:39:14Z leveldb: Delete type=2 [#9840](/bitcoin-bitcoin/9840/)
    2432022-08-12T08:39:14Z leveldb: Delete type=2 [#9841](/bitcoin-bitcoin/9841/)
    2442022-08-12T08:39:14Z leveldb: Delete type=2 [#9848](/bitcoin-bitcoin/9848/)
    2452022-08-12T08:39:14Z leveldb: Compacting 1@2 + 7@3 files
    2462022-08-12T08:39:14Z leveldb: Level-0 table [#9866](/bitcoin-bitcoin/9866/): started
    2472022-08-12T08:39:14Z WriteBatch memory usage: db=chainstate, before=3.8MiB, after=7.6MiB
    2482022-08-12T08:39:14Z leveldb: Level-0 table [#9866](/bitcoin-bitcoin/9866/): 2590435 bytes OK
    2492022-08-12T08:39:14Z leveldb: Delete type=0 [#9838](/bitcoin-bitcoin/9838/)
    2502022-08-12T08:39:14Z leveldb: Generated table [#9864](/bitcoin-bitcoin/9864/)@2: 39569 keys, 2173066 bytes
    2512022-08-12T08:39:14Z leveldb: Generated table [#9867](/bitcoin-bitcoin/9867/)@2: 39540 keys, 2172144 bytes
    2522022-08-12T08:39:14Z leveldb: Generated table [#9868](/bitcoin-bitcoin/9868/)@2: 38826 keys, 2170693 bytes
    2532022-08-12T08:39:14Z leveldb: Generated table [#9869](/bitcoin-bitcoin/9869/)@2: 39086 keys, 2171889 bytes
    2542022-08-12T08:39:15Z leveldb: Generated table [#9870](/bitcoin-bitcoin/9870/)@2: 38165 keys, 2113933 bytes
    2552022-08-12T08:39:15Z leveldb: Compacted 1@2 + 7@3 files => 10801725 bytes
    2562022-08-12T08:39:15Z leveldb: compacted to: files[ 1 6 57 43 0 0 0 ]
    2572022-08-12T08:39:15Z leveldb: Delete type=2 [#8316](/bitcoin-bitcoin/8316/)
    2582022-08-12T08:39:15Z leveldb: Delete type=2 [#8918](/bitcoin-bitcoin/8918/)
    2592022-08-12T08:39:15Z leveldb: Delete type=2 [#8919](/bitcoin-bitcoin/8919/)
    2602022-08-12T08:39:15Z leveldb: Delete type=2 [#9340](/bitcoin-bitcoin/9340/)
    2612022-08-12T08:39:15Z leveldb: Delete type=2 [#9496](/bitcoin-bitcoin/9496/)
    2622022-08-12T08:39:15Z leveldb: Delete type=2 [#9497](/bitcoin-bitcoin/9497/)
    2632022-08-12T08:39:15Z leveldb: Delete type=2 [#9837](/bitcoin-bitcoin/9837/)
    2642022-08-12T08:39:15Z leveldb: Delete type=2 [#9862](/bitcoin-bitcoin/9862/)
    2652022-08-12T08:39:15Z leveldb: Compacting 1@0 + 6@1 files
    2662022-08-12T08:39:15Z leveldb: Generated table [#9871](/bitcoin-bitcoin/9871/)@0: 25762 keys, 1384300 bytes
    2672022-08-12T08:39:15Z leveldb: Generated table [#9872](/bitcoin-bitcoin/9872/)@0: 14959 keys, 738733 bytes
    2682022-08-12T08:39:15Z leveldb: Generated table [#9873](/bitcoin-bitcoin/9873/)@0: 48494 keys, 2185670 bytes
    2692022-08-12T08:39:15Z leveldb: Generated table [#9874](/bitcoin-bitcoin/9874/)@0: 23028 keys, 1042516 bytes
    2702022-08-12T08:39:15Z leveldb: Generated table [#9875](/bitcoin-bitcoin/9875/)@0: 15480 keys, 765505 bytes
    2712022-08-12T08:39:15Z leveldb: Generated table [#9876](/bitcoin-bitcoin/9876/)@0: 48988 keys, 2185593 bytes
    2722022-08-12T08:39:15Z leveldb: Generated table [#9877](/bitcoin-bitcoin/9877/)@0: 13248 keys, 593809 bytes
    2732022-08-12T08:39:15Z leveldb: Compacted 1@0 + 6@1 files => 8896126 bytes
    2742022-08-12T08:39:15Z leveldb: compacted to: files[ 0 7 57 43 0 0 0 ]
    2752022-08-12T08:39:15Z leveldb: Delete type=2 [#9842](/bitcoin-bitcoin/9842/)
    2762022-08-12T08:39:15Z leveldb: Delete type=2 [#9843](/bitcoin-bitcoin/9843/)
    2772022-08-12T08:39:15Z leveldb: Delete type=2 [#9844](/bitcoin-bitcoin/9844/)
    2782022-08-12T08:39:15Z leveldb: Delete type=2 [#9845](/bitcoin-bitcoin/9845/)
    2792022-08-12T08:39:15Z leveldb: Delete type=2 [#9846](/bitcoin-bitcoin/9846/)
    2802022-08-12T08:39:15Z leveldb: Delete type=2 [#9847](/bitcoin-bitcoin/9847/)
    2812022-08-12T08:39:15Z leveldb: Delete type=2 [#9866](/bitcoin-bitcoin/9866/)
    2822022-08-12T08:39:15Z leveldb: Compacting 1@1 + 13@2 files
    2832022-08-12T08:39:15Z leveldb: Generated table [#9878](/bitcoin-bitcoin/9878/)@1: 54038 keys, 2191174 bytes
    2842022-08-12T08:39:15Z leveldb: Generated table [#9879](/bitcoin-bitcoin/9879/)@1: 54588 keys, 2192716 bytes
    2852022-08-12T08:39:15Z leveldb: Generated table [#9880](/bitcoin-bitcoin/9880/)@1: 53487 keys, 2190523 bytes
    2862022-08-12T08:39:15Z leveldb: Generated table [#9881](/bitcoin-bitcoin/9881/)@1: 53431 keys, 2192014 bytes
    2872022-08-12T08:39:16Z leveldb: Generated table [#9882](/bitcoin-bitcoin/9882/)@1: 27094 keys, 1220859 bytes
    2882022-08-12T08:39:16Z leveldb: Generated table [#9883](/bitcoin-bitcoin/9883/)@1: 23871 keys, 1141856 bytes
    2892022-08-12T08:39:16Z leveldb: Generated table [#9884](/bitcoin-bitcoin/9884/)@1: 49802 keys, 2186049 bytes
    2902022-08-12T08:39:16Z leveldb: Generated table [#9885](/bitcoin-bitcoin/9885/)@1: 17422 keys, 786360 bytes
    2912022-08-12T08:39:16Z leveldb: Generated table [#9886](/bitcoin-bitcoin/9886/)@1: 49305 keys, 2186101 bytes
    2922022-08-12T08:39:16Z leveldb: Generated table [#9887](/bitcoin-bitcoin/9887/)@1: 49063 keys, 2174883 bytes
    2932022-08-12T08:39:16Z leveldb: Generated table [#9888](/bitcoin-bitcoin/9888/)@1: 41031 keys, 2173940 bytes
    294
    295
    296
    2972022-08-12T08:39:16Z Cache size (6894528) exceeds total space (6835008)
    2982022-08-12T08:39:16Z leveldb: Generated table [#9889](/bitcoin-bitcoin/9889/)@1: 38978 keys, 2171396 bytes
    2992022-08-12T08:39:16Z leveldb: Generated table [#9890](/bitcoin-bitcoin/9890/)@1: 7933 keys, 427011 bytes
    3002022-08-12T08:39:16Z leveldb: Compacted 1@1 + 13@2 files => 23234882 bytes
    3012022-08-12T08:39:16Z leveldb: compacted to: files[ 0 6 57 43 0 0 0 ]
    3022022-08-12T08:39:16Z leveldb: Delete type=2 [#9849](/bitcoin-bitcoin/9849/)
    3032022-08-12T08:39:16Z leveldb: Delete type=2 [#9850](/bitcoin-bitcoin/9850/)
    3042022-08-12T08:39:16Z leveldb: Delete type=2 [#9851](/bitcoin-bitcoin/9851/)
    3052022-08-12T08:39:16Z leveldb: Delete type=2 [#9852](/bitcoin-bitcoin/9852/)
    3062022-08-12T08:39:16Z leveldb: Delete type=2 [#9853](/bitcoin-bitcoin/9853/)
    3072022-08-12T08:39:16Z leveldb: Delete type=2 [#9854](/bitcoin-bitcoin/9854/)
    3082022-08-12T08:39:16Z leveldb: Delete type=2 [#9856](/bitcoin-bitcoin/9856/)
    3092022-08-12T08:39:16Z leveldb: Delete type=2 [#9857](/bitcoin-bitcoin/9857/)
    3102022-08-12T08:39:16Z leveldb: Delete type=2 [#9858](/bitcoin-bitcoin/9858/)
    3112022-08-12T08:39:16Z leveldb: Delete type=2 [#9859](/bitcoin-bitcoin/9859/)
    3122022-08-12T08:39:16Z leveldb: Delete type=2 [#9860](/bitcoin-bitcoin/9860/)
    3132022-08-12T08:39:16Z leveldb: Delete type=2 [#9861](/bitcoin-bitcoin/9861/)
    3142022-08-12T08:39:16Z leveldb: Delete type=2 [#9863](/bitcoin-bitcoin/9863/)
    3152022-08-12T08:39:16Z leveldb: Delete type=2 [#9871](/bitcoin-bitcoin/9871/)
    3162022-08-12T08:39:16Z leveldb: Compacting 1@2 + 8@3 files
    3172022-08-12T08:39:16Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
    3182022-08-12T08:39:17Z WriteBatch memory usage: db=chainstate, before=3.7MiB, after=7.6MiB
    3192022-08-12T08:39:17Z leveldb: Generated table [#9891](/bitcoin-bitcoin/9891/)@2: 39575 keys, 2171913 bytes
    3202022-08-12T08:39:17Z leveldb: Level-0 table [#9893](/bitcoin-bitcoin/9893/): started
    3212022-08-12T08:39:17Z leveldb: Level-0 table [#9893](/bitcoin-bitcoin/9893/): 2511900 bytes OK
    3222022-08-12T08:39:17Z leveldb: Delete type=0 [#9865](/bitcoin-bitcoin/9865/)
    3232022-08-12T08:39:17Z leveldb: Generated table [#9894](/bitcoin-bitcoin/9894/)@2: 39629 keys, 2175263 bytes
    3242022-08-12T08:39:17Z leveldb: Generated table [#9895](/bitcoin-bitcoin/9895/)@2: 38788 keys, 2171237 bytes
    3252022-08-12T08:39:17Z leveldb: Generated table [#9896](/bitcoin-bitcoin/9896/)@2: 38909 keys, 2171009 bytes
    3262022-08-12T08:39:17Z leveldb: Generated table [#9897](/bitcoin-bitcoin/9897/)@2: 39438 keys, 2172667 bytes
    3272022-08-12T08:39:17Z leveldb: Generated table [#9898](/bitcoin-bitcoin/9898/)@2: 39300 keys, 2171502 bytes
    3282022-08-12T08:39:17Z leveldb: Generated table [#9899](/bitcoin-bitcoin/9899/)@2: 38981 keys, 2174643 bytes
    3292022-08-12T08:39:17Z leveldb: Generated table [#9900](/bitcoin-bitcoin/9900/)@2: 39547 keys, 2171856 bytes
    3302022-08-12T08:39:17Z leveldb: Generated table [#9901](/bitcoin-bitcoin/9901/)@2: 10019 keys, 529942 bytes
    3312022-08-12T08:39:17Z leveldb: Compacted 1@2 + 8@3 files => 17910032 bytes
    3322022-08-12T08:39:17Z leveldb: compacted to: files[ 1 6 56 44 0 0 0 ]
    3332022-08-12T08:39:17Z leveldb: Delete type=2 [#8920](/bitcoin-bitcoin/8920/)
    3342022-08-12T08:39:17Z leveldb: Delete type=2 [#9447](/bitcoin-bitcoin/9447/)
    3352022-08-12T08:39:17Z leveldb: Delete type=2 [#9448](/bitcoin-bitcoin/9448/)
    3362022-08-12T08:39:17Z leveldb: Delete type=2 [#9864](/bitcoin-bitcoin/9864/)
    3372022-08-12T08:39:17Z leveldb: Delete type=2 [#9867](/bitcoin-bitcoin/9867/)
    3382022-08-12T08:39:17Z leveldb: Delete type=2 [#9868](/bitcoin-bitcoin/9868/)
    3392022-08-12T08:39:17Z leveldb: Delete type=2 [#9869](/bitcoin-bitcoin/9869/)
    3402022-08-12T08:39:17Z leveldb: Delete type=2 [#9870](/bitcoin-bitcoin/9870/)
    3412022-08-12T08:39:17Z leveldb: Delete type=2 [#9884](/bitcoin-bitcoin/9884/)
    3422022-08-12T08:39:17Z leveldb: Compacting 1@0 + 6@1 files
    3432022-08-12T08:39:18Z leveldb: Generated table [#9902](/bitcoin-bitcoin/9902/)@0: 26973 keys, 1405415 bytes
    3442022-08-12T08:39:18Z leveldb: Generated table [#9903](/bitcoin-bitcoin/9903/)@0: 19237 keys, 931085 bytes
    3452022-08-12T08:39:18Z leveldb: Generated table [#9904](/bitcoin-bitcoin/9904/)@0: 48680 keys, 2185147 bytes
    3462022-08-12T08:39:18Z leveldb: Generated table [#9905](/bitcoin-bitcoin/9905/)@0: 26483 keys, 1194088 bytes
    3472022-08-12T08:39:18Z leveldb: Generated table [#9906](/bitcoin-bitcoin/9906/)@0: 19704 keys, 956353 bytes
    3482022-08-12T08:39:18Z leveldb: Generated table [#9907](/bitcoin-bitcoin/9907/)@0: 49169 keys, 2186145 bytes
    3492022-08-12T08:39:18Z leveldb: Generated table [#9908](/bitcoin-bitcoin/9908/)@0: 15898 keys, 708432 bytes
    3502022-08-12T08:39:18Z leveldb: Compacted 1@0 + 6@1 files => 9566665 bytes
    3512022-08-12T08:39:18Z leveldb: compacted to: files[ 0 7 56 44 0 0 0 ]
    3522022-08-12T08:39:18Z leveldb: Delete type=2 [#9872](/bitcoin-bitcoin/9872/)
    3532022-08-12T08:39:18Z leveldb: Delete type=2 [#9873](/bitcoin-bitcoin/9873/)
    3542022-08-12T08:39:18Z leveldb: Delete type=2 [#9874](/bitcoin-bitcoin/9874/)
    3552022-08-12T08:39:18Z leveldb: Delete type=2 [#9875](/bitcoin-bitcoin/9875/)
    3562022-08-12T08:39:18Z leveldb: Delete type=2 [#9876](/bitcoin-bitcoin/9876/)
    3572022-08-12T08:39:18Z leveldb: Delete type=2 [#9877](/bitcoin-bitcoin/9877/)
    3582022-08-12T08:39:18Z leveldb: Delete type=2 [#9893](/bitcoin-bitcoin/9893/)
    3592022-08-12T08:39:18Z leveldb: Compacting 1@1 + 12@2 files
    3602022-08-12T08:39:18Z leveldb: Generated table [#9909](/bitcoin-bitcoin/9909/)@1: 54017 keys, 2191218 bytes
    3612022-08-12T08:39:18Z leveldb: Generated table [#9910](/bitcoin-bitcoin/9910/)@1: 54609 keys, 2193333 bytes
    3622022-08-12T08:39:18Z leveldb: Generated table [#9911](/bitcoin-bitcoin/9911/)@1: 53537 keys, 2190172 bytes
    3632022-08-12T08:39:18Z leveldb: Generated table [#9912](/bitcoin-bitcoin/9912/)@1: 53426 keys, 2190468 bytes
    3642022-08-12T08:39:18Z leveldb: Generated table [#9913](/bitcoin-bitcoin/9913/)@1: 31248 keys, 1399514 bytes
    3652022-08-12T08:39:19Z leveldb: Generated table [#9914](/bitcoin-bitcoin/9914/)@1: 28096 keys, 1332480 bytes
    3662022-08-12T08:39:19Z leveldb: Generated table [#9915](/bitcoin-bitcoin/9915/)@1: 23510 keys, 1091463 bytes
    3672022-08-12T08:39:19Z leveldb: Generated table [#9916](/bitcoin-bitcoin/9916/)@1: 49394 keys, 2186227 bytes
    368
    369
    370
    3712022-08-12T08:39:19Z Cache size (6921120) exceeds total space (6835008)
    3722022-08-12T08:39:19Z leveldb: Generated table [#9917](/bitcoin-bitcoin/9917/)@1: 49584 keys, 2187024 bytes
    3732022-08-12T08:39:19Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
    3742022-08-12T08:39:19Z leveldb: Generated table [#9918](/bitcoin-bitcoin/9918/)@1: 4091 keys, 182997 bytes
    3752022-08-12T08:39:19Z leveldb: Level-0 table [#9921](/bitcoin-bitcoin/9921/): started
    3762022-08-12T08:39:19Z WriteBatch memory usage: db=chainstate, before=3.9MiB, after=7.7MiB
    3772022-08-12T08:39:19Z leveldb: Level-0 table [#9921](/bitcoin-bitcoin/9921/): 2579487 bytes OK
    3782022-08-12T08:39:19Z leveldb: Delete type=0 [#9892](/bitcoin-bitcoin/9892/)
    3792022-08-12T08:39:19Z leveldb: Generated table [#9919](/bitcoin-bitcoin/9919/)@1: 41110 keys, 2173479 bytes
    3802022-08-12T08:39:19Z leveldb: Generated table [#9922](/bitcoin-bitcoin/9922/)@1: 39016 keys, 2171370 bytes
    3812022-08-12T08:39:19Z leveldb: Generated table [#9923](/bitcoin-bitcoin/9923/)@1: 8263 keys, 446542 bytes
    3822022-08-12T08:39:19Z leveldb: Compacted 1@1 + 12@2 files => 21936287 bytes
    3832022-08-12T08:39:19Z leveldb: compacted to: files[ 1 6 57 44 0 0 0 ]
    3842022-08-12T08:39:19Z leveldb: Delete type=2 [#9878](/bitcoin-bitcoin/9878/)
    3852022-08-12T08:39:19Z leveldb: Delete type=2 [#9879](/bitcoin-bitcoin/9879/)
    3862022-08-12T08:39:19Z leveldb: Delete type=2 [#9880](/bitcoin-bitcoin/9880/)
    3872022-08-12T08:39:19Z leveldb: Delete type=2 [#9881](/bitcoin-bitcoin/9881/)
    3882022-08-12T08:39:19Z leveldb: Delete type=2 [#9882](/bitcoin-bitcoin/9882/)
    3892022-08-12T08:39:19Z leveldb: Delete type=2 [#9883](/bitcoin-bitcoin/9883/)
    3902022-08-12T08:39:19Z leveldb: Delete type=2 [#9885](/bitcoin-bitcoin/9885/)
    3912022-08-12T08:39:19Z leveldb: Delete type=2 [#9886](/bitcoin-bitcoin/9886/)
    3922022-08-12T08:39:19Z leveldb: Delete type=2 [#9887](/bitcoin-bitcoin/9887/)
    3932022-08-12T08:39:19Z leveldb: Delete type=2 [#9888](/bitcoin-bitcoin/9888/)
    3942022-08-12T08:39:19Z leveldb: Delete type=2 [#9889](/bitcoin-bitcoin/9889/)
    3952022-08-12T08:39:19Z leveldb: Delete type=2 [#9890](/bitcoin-bitcoin/9890/)
    3962022-08-12T08:39:19Z leveldb: Delete type=2 [#9902](/bitcoin-bitcoin/9902/)
    3972022-08-12T08:39:19Z leveldb: Compacting 1@0 + 6@1 files
    3982022-08-12T08:39:19Z leveldb: Generated table [#9924](/bitcoin-bitcoin/9924/)@0: 26159 keys, 1392720 bytes
    3992022-08-12T08:39:19Z leveldb: Generated table [#9925](/bitcoin-bitcoin/9925/)@0: 24492 keys, 1176015 bytes
    4002022-08-12T08:39:19Z leveldb: Generated table [#9926](/bitcoin-bitcoin/9926/)@0: 48757 keys, 2186768 bytes
    4012022-08-12T08:39:20Z leveldb: Generated table [#9927](/bitcoin-bitcoin/9927/)@0: 30455 keys, 1373435 bytes
    4022022-08-12T08:39:20Z leveldb: Generated table [#9928](/bitcoin-bitcoin/9928/)@0: 24014 keys, 1158916 bytes
    4032022-08-12T08:39:20Z leveldb: Generated table [#9929](/bitcoin-bitcoin/9929/)@0: 49220 keys, 2186669 bytes
    4042022-08-12T08:39:20Z leveldb: Generated table [#9930](/bitcoin-bitcoin/9930/)@0: 18409 keys, 820476 bytes
    4052022-08-12T08:39:20Z leveldb: Compacted 1@0 + 6@1 files => 10294999 bytes
    4062022-08-12T08:39:20Z leveldb: compacted to: files[ 0 7 57 44 0 0 0 ]
    4072022-08-12T08:39:20Z leveldb: Compacting 1@1 + 12@2 files
    4082022-08-12T08:39:20Z leveldb: Generated table [#9931](/bitcoin-bitcoin/9931/)@1: 54070 keys, 2191403 bytes
    4092022-08-12T08:39:20Z leveldb: Generated table [#9932](/bitcoin-bitcoin/9932/)@1: 54644 keys, 2193010 bytes
    4102022-08-12T08:39:20Z leveldb: Generated table [#9933](/bitcoin-bitcoin/9933/)@1: 53462 keys, 2190832 bytes
    4112022-08-12T08:39:20Z leveldb: Generated table [#9934](/bitcoin-bitcoin/9934/)@1: 53423 keys, 2192131 bytes
    4122022-08-12T08:39:20Z leveldb: Generated table [#9935](/bitcoin-bitcoin/9935/)@1: 35731 keys, 1603095 bytes
    4132022-08-12T08:39:20Z leveldb: Generated table [#9936](/bitcoin-bitcoin/9936/)@1: 32496 keys, 1531874 bytes
    4142022-08-12T08:39:20Z leveldb: Generated table [#9937](/bitcoin-bitcoin/9937/)@1: 28163 keys, 1317182 bytes
    4152022-08-12T08:39:21Z leveldb: Generated table [#9938](/bitcoin-bitcoin/9938/)@1: 49394 keys, 2186144 bytes
    4162022-08-12T08:39:21Z leveldb: Generated table [#9939](/bitcoin-bitcoin/9939/)@1: 49563 keys, 2185964 bytes
    4172022-08-12T08:39:21Z leveldb: Generated table [#9940](/bitcoin-bitcoin/9940/)@1: 8317 keys, 367743 bytes
    4182022-08-12T08:39:21Z leveldb: Generated table [#9941](/bitcoin-bitcoin/9941/)@1: 41190 keys, 2173751 bytes
    4192022-08-12T08:39:21Z leveldb: Generated table [#9942](/bitcoin-bitcoin/9942/)@1: 38956 keys, 2171454 bytes
    4202022-08-12T08:39:21Z leveldb: Generated table [#9943](/bitcoin-bitcoin/9943/)@1: 542 keys, 29551 bytes
    4212022-08-12T08:39:21Z leveldb: Compacted 1@1 + 12@2 files => 22334134 bytes
    4222022-08-12T08:39:21Z leveldb: compacted to: files[ 0 6 58 44 0 0 0 ]
    4232022-08-12T08:39:21Z leveldb: Delete type=2 [#9903](/bitcoin-bitcoin/9903/)
    4242022-08-12T08:39:21Z leveldb: Delete type=2 [#9904](/bitcoin-bitcoin/9904/)
    4252022-08-12T08:39:21Z leveldb: Delete type=2 [#9905](/bitcoin-bitcoin/9905/)
    4262022-08-12T08:39:21Z leveldb: Delete type=2 [#9906](/bitcoin-bitcoin/9906/)
    4272022-08-12T08:39:21Z leveldb: Delete type=2 [#9907](/bitcoin-bitcoin/9907/)
    4282022-08-12T08:39:21Z leveldb: Delete type=2 [#9908](/bitcoin-bitcoin/9908/)
    4292022-08-12T08:39:21Z leveldb: Delete type=2 [#9909](/bitcoin-bitcoin/9909/)
    4302022-08-12T08:39:21Z leveldb: Delete type=2 [#9910](/bitcoin-bitcoin/9910/)
    4312022-08-12T08:39:21Z leveldb: Delete type=2 [#9911](/bitcoin-bitcoin/9911/)
    4322022-08-12T08:39:21Z leveldb: Delete type=2 [#9912](/bitcoin-bitcoin/9912/)
    4332022-08-12T08:39:21Z leveldb: Delete type=2 [#9913](/bitcoin-bitcoin/9913/)
    4342022-08-12T08:39:21Z leveldb: Delete type=2 [#9914](/bitcoin-bitcoin/9914/)
    4352022-08-12T08:39:21Z leveldb: Delete type=2 [#9915](/bitcoin-bitcoin/9915/)
    4362022-08-12T08:39:21Z leveldb: Delete type=2 [#9916](/bitcoin-bitcoin/9916/)
    4372022-08-12T08:39:21Z leveldb: Delete type=2 [#9917](/bitcoin-bitcoin/9917/)
    4382022-08-12T08:39:21Z leveldb: Delete type=2 [#9918](/bitcoin-bitcoin/9918/)
    4392022-08-12T08:39:21Z leveldb: Delete type=2 [#9919](/bitcoin-bitcoin/9919/)
    4402022-08-12T08:39:21Z leveldb: Delete type=2 [#9921](/bitcoin-bitcoin/9921/)
    4412022-08-12T08:39:21Z leveldb: Delete type=2 [#9922](/bitcoin-bitcoin/9922/)
    4422022-08-12T08:39:21Z leveldb: Delete type=2 [#9924](/bitcoin-bitcoin/9924/)
    4432022-08-12T08:39:21Z leveldb: Compacting 1@1 + 10@2 files
    444
    445
    446
    4472022-08-12T08:39:21Z Cache size (6838048) exceeds total space (6835008)
    4482022-08-12T08:39:21Z leveldb: Generated table [#9944](/bitcoin-bitcoin/9944/)@1: 38942 keys, 2172325 bytes
    4492022-08-12T08:39:21Z WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB
    4502022-08-12T08:39:21Z leveldb: Generated table [#9945](/bitcoin-bitcoin/9945/)@1: 38732 keys, 2170551 bytes
    4512022-08-12T08:39:21Z leveldb: Level-0 table [#9947](/bitcoin-bitcoin/9947/): started
    4522022-08-12T08:39:21Z WriteBatch memory usage: db=chainstate, before=3.8MiB, after=7.5MiB
    4532022-08-12T08:39:21Z leveldb: Level-0 table [#9947](/bitcoin-bitcoin/9947/): 2591053 bytes OK
    4542022-08-12T08:39:21Z leveldb: Delete type=0 [#9920](/bitcoin-bitcoin/9920/)
    4552022-08-12T08:39:21Z leveldb: Generated table [#9948](/bitcoin-bitcoin/9948/)@1: 39386 keys, 2174872 bytes
    4562022-08-12T08:39:21Z leveldb: Generated table [#9949](/bitcoin-bitcoin/9949/)@1: 38869 keys, 2171202 bytes
    4572022-08-12T08:39:22Z leveldb: Generated table [#9950](/bitcoin-bitcoin/9950/)@1: 39941 keys, 2173562 bytes
    4582022-08-12T08:39:22Z leveldb: Generated table [#9951](/bitcoin-bitcoin/9951/)@1: 40045 keys, 2148351 bytes
    4592022-08-12T08:39:22Z leveldb: Compacted 1@1 + 10@2 files => 13010863 bytes
    4602022-08-12T08:39:22Z leveldb: compacted to: files[ 1 5 54 44 0 0 0 ]
    4612022-08-12T08:39:22Z leveldb: Delete type=2 [#9379](/bitcoin-bitcoin/9379/)
    4622022-08-12T08:39:22Z leveldb: Delete type=2 [#9380](/bitcoin-bitcoin/9380/)
    4632022-08-12T08:39:22Z leveldb: Delete type=2 [#9381](/bitcoin-bitcoin/9381/)
    4642022-08-12T08:39:22Z leveldb: Delete type=2 [#9384](/bitcoin-bitcoin/9384/)
    4652022-08-12T08:39:22Z leveldb: Delete type=2 [#9385](/bitcoin-bitcoin/9385/)
    4662022-08-12T08:39:22Z leveldb: Delete type=2 [#9446](/bitcoin-bitcoin/9446/)
    4672022-08-12T08:39:22Z leveldb: Delete type=2 [#9484](/bitcoin-bitcoin/9484/)
    4682022-08-12T08:39:22Z leveldb: Delete type=2 [#9575](/bitcoin-bitcoin/9575/)
    

    (WRONG!!!)

    02022-08-12T08:39:22Z leveldb: Delete type=2 [#9606](/bitcoin-bitcoin/9606/)
    12022-08-12T08:39:22Z leveldb: Delete type=2 [#9820](/bitcoin-bitcoin/9820/)
    22022-08-12T08:39:22Z leveldb: Delete type=2 [#9929](/bitcoin-bitcoin/9929/)
    32022-08-12T08:39:22Z leveldb: Compacting 1@2 + 11@3 files
    42022-08-12T08:39:22Z leveldb: Generated table [#9952](/bitcoin-bitcoin/9952/)@2: 39561 keys, 2172175 bytes
    52022-08-12T08:39:22Z leveldb: Generated table [#9953](/bitcoin-bitcoin/9953/)@2: 39511 keys, 2175619 bytes
    62022-08-12T08:39:22Z leveldb: Generated table [#9954](/bitcoin-bitcoin/9954/)@2: 38783 keys, 2171071 bytes
    72022-08-12T08:39:22Z LevelDB read failure: Corruption: block checksum mismatch: G:\Bitcoin\chainstate/009775.ldb
    

    Not a “curruption”: but that file was just DELETED by error (too early)

    02022-08-12T08:39:22Z Fatal LevelDB error: Corruption: block checksum mismatch: G:\Bitcoin\chainstate/009775.ldb
    12022-08-12T08:39:22Z You can use -debug=leveldb to get more complete diagnostic messages
    22022-08-12T08:39:22Z Error: Error reading from database, shutting down.
    32022-08-12T08:39:22Z Error reading from database: Fatal LevelDB error: Corruption: block checksum mismatch: G:\Bitcoin\chainstate/009775.ldb
    42022-08-12T08:39:22Z leveldb: Generated table [#9955](/bitcoin-bitcoin/9955/)@2: 39035 keys, 2171736 bytes
    52022-08-12T08:39:22Z leveldb: Generated table [#9956](/bitcoin-bitcoin/9956/)@2: 39537 keys, 2172124 bytes
    

    So all this depends on the cache fill level and cache sizes. Note that reducing the cache size does not make the deaemon really slower, my filesystem is fast enough (and anyway the chainstate folder lives entirely on a SSD partition (also secured by RAID5) which exhibit no filesystem corruption at all, and no I/O error reported by kernel devices such as SATA adapters, bus handlers, memory managers…

    It can then happen variably depending on cache size, and one the sped at which adata blocks were downloaded (or not if the could not still be discovered with a source offering them, as advertizeed by their “headers”, and peers may choose to offer more or less block headers depednign on their capacity).

    No corruption is ever detected in data blocks (blk*.dat) or their associated reverse indexes (rev*.dat).

    I can run “-reindex-chainstate” again and again, I get this error kalways, not just the same chainstate height. All this depends on how the cache is filled in memory (or if there are some entries or not in the mempool, but with a small cache and very early blocks heights, this should never happen). There seems to exist a specific boundary above which the cache “compacter” process decides to purge some indexes that it thinks are no longer necessary, but are still needed in later index files The algotithm used seeems to use an “optimistic heuristic” that can fail. A deep check is actually not performed.

    LevelDB does not seem to be the cause of this bug, but it may be incorrectly queried so that the compacter may not find the appropriate index number for a given block height (may be some query can return several rows, and the compacter just considers the first or last row returned by the query, and forgets to process others, thinking that those queries can only return zero or one result; may be caused by a missing aggregation, or selection values not enough selective, e.g. with a “poor hash”, sometimes producing an unexpected collision in a level-0 index).

  44. verdy-p commented at 11:53 am on August 12, 2022: none

    Now it’s time to investigate the condition that generated this type of log:

    2022-08-12T08:39:22Z leveldb: Delete type=2 #9575

    LevelDB performed very well what it was asked for (during the compaction of the cache in memory), just a few milliseconds before that (now missing) index was required again when Bitcoin resumed is indexing of the chainstate for further blocks with larger heights!

    It is symptomatic that such crashes are ALWAYS occuring just after a compaction of the chainstate cache. Adn each time I check it, there’s no corruption of the data, but the file referenced in the exit logs are always for files that were just deleted within the last second (in fact with extended timestamps in logs, that deletion occured less than 10 milliseconds before, inside the cache compacting which wants to regenerate a new tree of index files while keeping only a few “old” indexes that survived multiple passes of compaction).

    Note also that to make the bug reproducible always at the same height in the chainstate, you need to use “-connect=0” (nothing will be impacted in the root “block 0”, no additional headers will be requested for getting new blocks later, something that occurs avery 10 minutes in the live chain), so you must have a safe copy of blocks (blk0*.dat and rev*.dat) generated by letting the deamon run for about 20 minutes until you interrupt it to get a clean backup of these files (it will stop with headers at unpredictable height depending on how fast you could connect to peers).

    Then flush the chainstate completely, and start the daemon once with “-connect=0 -dbcache=4 -maxmempool=5” and let it run for some minutes, you can terminate it and it will flush the chainstate to disk with some “.ldb files”. You can also checj that blocks files (blk.dat and rev*.dat) where not touched at all by comparing their content with an archive you made in a “block2” directory. Now you can also backup the “chainstate” directory to “chainstate2”.

    But each time you’ll try to the daemon once with “-connect=0 -dbcache=4 -maxmempool=5” from the existing chainstate, it will fail with the same height for the same missing index. You can restore that chainstate from “chainstate2” to “chainstate” (use robocopy no Windows or rsync to make exact mirrors) and relauch it, you’ll always get the crash after about hald an hour at most, sometimes sooner (all this depends on when you interrupted the daemon, what was the max height in the main chain of block, or what was the max height of the chainstate, and what was the maixmum size o the dbcache.

    But with dbcache=4 and maxmumpool=5, you’ll get much more frequent calls to the compacter, and you’ll finally fall in the “edge” case where it incorrectly deleted some indexes that are still needed. LEvelDB has nothing to do with that, filesystem corruption or bad hardware memory have nothing to do with that. It is entirely the internal logic of Bitcoinc Core that produces that, due to a bug in its “cache compacter”, with incorrect tests.

    And you cannot diagnose it in an easily reproducible test if you use the default configuration (the cache is much too large, the compaction occurs too unfrequently and there’s too much variation in the chain heights, because of the peers connected and the time you need to find headers and download full blocks. Some existing peers working “successfully” have probably NOT used the default configuration, but use extreme values for the dbcache, so that the “compaction” of the chainstate never occurs on them. But it’s not reasoable at all to have extreme values of dbcache, even if you run it on a machine with 128GB of RAM and very large swaps. There’s not even any need to have such extreme size for the performance of the daemon to run a full node, even with the existing rate of transactions in the mempool on the main chain.

  45. verdy-p commented at 1:37 pm on August 12, 2022: none

    Also note that you’ve tagged it for Windows, even if I have exactly the same when running in WSL, or on a native Linux installation or in a Linux VM running in a Linux hypervisor (like Proxmox).

    This is not specific to Windows. (even if I first detected it by trying the QT version on Windows, I’ve since tried to ue the CLI daemon without more success).

  46. maflcko removed the label Windows on Aug 12, 2022
  47. maflcko commented at 1:43 pm on August 12, 2022: member

    So you are claiming this is reproducible on Linux as well as Windows? Surely this can’t be with default settings, as I’ve myself synced tens of nodes on Linux with vanilla settings. And surely there are people who have done it on Windows successfully.

    It would really help us if you included exact steps to reproduce, starting from a fresh install of the operating system.

  48. sipa commented at 2:29 pm on August 12, 2022: member
    02022-08-12T08:39:22Z leveldb: Delete type=2 [#9575](/bitcoin-bitcoin/9575/)
    

    (WRONG!!!)

    02022-08-12T08:39:22Z LevelDB read failure: Corruption: block checksum mismatch: G:\Bitcoin\chainstate/009775.ldb
    

    Not a “curruption”: but that file was just DELETED by error (too early)

    Are you mistaking 9575 with 9775 in the filename? I see no record in your log of G:\Bitcoin\chainstate/009775.ldb being deleted (or touched at all since it being initially created in the beginning of your log).

    After this corruption, did you retry to get it going again with the reconsiderblock trick? The fact that that worked before, under the assumption that this is the same issue, proves there was no actual corruption on disk, so I think trying to investigate what LevelDB is doing is a red herring.

    At this point, we really need a way to reproduce the problem. The fact that you’re seeing this on cloud providers makes it indeed very unlikely to be a hardware issue, but there must be something unusual about your setup that causes you to consistently see this, while nobody else can reproduce it. That will either mean giving us steps that help reproduce it without your involvement (as @MarcoFalke has been asking), or we do this interactively with lots of back and forths to narrow it down.

    My best guess at this point is the hint about running out of file descriptors. If you want to help us figure out what is going on, would it be possible to do a clean synchronization from scratch (no bitcoin datadir at all, no chainstate, no blocks) with -debug on, and then post the full debug.log file somewhere once you hit the corruption?

  49. verdy-p commented at 7:55 am on August 13, 2022: none

    Starting from a fresh OS install is what I did already, in a vanilla installtion in a VM on Azure and AWS, with vanilla OS images. (I tried various OS distribs, Windows, and Linux) , all of them have failed to sync the IBD while indexing the chainstate. All of them have shown that some leveldb index files in the chainstate were incorrectly deleted/missing when the bug occured, each time just after a “compaction”. It seems to be a race isue with the way you determine which index file can be merged (or splitted in half); may be there’s an interaction with merge and split operations occuring together at the same time via some recursion.

    That bug occus MUCH more often and more rapidly if we set a low dbcache and minimize the mempool size.

    Could that be also caused by the memory manager (that attempts to use the “unused” mempool size as a supplementary dbcache)? Is there a way to NOT use the mempool at all as an additional reserve for the dbcache in the chainstate (could be a race issue between the two pools, with some assumptions that the mempool would remain unused, when there are some pending transactions in the blocks to be indexes that could still fill in the mempool, before being flushed, because all these transactions in the old chain benig synchronized in IBD are normally terminated since very long and shuold exit from the mempool rapidly, but in some chain bineg sync’ed they could survive longer than expected and take more space than expected, not leaving enough space for the chainstate, thus causing more files than expected to be opened, whereas some merge operation in the chainstate should first occur to reduce the usage in the main dbpool and not need to use the mempool for transactions)?

    Note also that I have NOT compiled Bitcoin core, I use the vanilla binary compiled and signed on your download site. So this is the best I can do to create a “reproducible” case. The choice of OS does not seem relevant because I get the same bugs (occuring at varying height in the chainstate durnig the IBD, and depending on the memeory settings). As well a bug in leveldb is very unlikely (even if LevelDB itsefl had similar documented issues, see its test for its bug#200, which was fixed a long time ago, but may be did not cover all edge cases)

    LevelDb is used on billions devices and applications from major software providers, including all Chromium-based browsers (Android browser, Chrome, Ms Edge), and I’ve never found it failing anywhere, including all devices on which I tried to run Bitcoin Core (but may be the copy of leveldb sources inside your repository is out of sync, I did not check if that was the case, you certainly better know that than me.

    Also note that I have used minimal options for runnign the deamon. I even allowed all logs to to displayed in the console, with maximum log settings. I’ve tried to let it run for about 20 minutes with no options, pressed control-C until I get anough blocks downloaded.

    Then I let it continue with -connect=0, to make sure that there’s no interaction with newcoming headers for the most recent fullchain tip (which occurs avery 10 minutes). So I let the IBD continue until the end of the last known chaintip and no other blocks are downloaded in the background. Normally only the chainstate indexing should occur, and the only touched files are in the chainstate (plus the special block 0 containing the state).

    I do not touch any files between runs.

    After a failure, I just try to restart with -reindex-chainstate, keeping all existing blocks. I also have several backups of these blocks (made with rsync on Linux, or robocopy with Windows), and when I check their content, they are always strictly identical between the different machines and backups. So there’s no corruption at all of block data. (If I need to perform other checks, please explain me what I can do).

    Note that I have NO Tor connection, so I use -listenonion=0. I have both IPv4 and IPv6 working. I do not use any proxy.

    Also have you got some test tool that allows inspecting the chainstate offline (when the daemon is not running), to make sure that it’s correctly synced to disk when it exits or to get more information than what the deamon currently logs (whcih is very difficult to read)

    Note that leveldb index files use their own numbering, they are hard to correlate with block heights. The daemon jsut logs only some agregated counters of index files for each level in its tree, but no indicator at all about at least the height currently indexed in each index file (I know that there can be many index files, when I look at the counters for all levels, they reach a high total count with over 800 files. I don’t know if they are all open in LevelDB, or if LevelDB maintains them open for a while in the background, but it may be the cause for not having any file descriptors available to the process, and causing the opening or creation of new files to fail. Is there a way to tune LevelDB so that it closes its used files more often (because the IBD indexing phase for the chainstate adds and removes many index files there at an incredible rate, I suspect that too many of these files are left open for a long, and even file deletions are left pending to be run in the background, and they could “stack” in a long chain of pending operations on, the file system, notably on fast machines.

    Note as well that I have tried to place the chainstate folder on another volume (by redirecting the chainstate directory with a symlink on Linux, or a junction or Windows) to minize the I/O interactions with the main blocks folder and maixmize the throughput with minimal I/O latency. I’ve also tried to reduce the CPU priority (nice in Linux) of the daemon, to give higher priority to the I/O threads in the OS. I’ve also tried to reduce the number of cores used by the deamon (using CPU affinity) so that there always remains unused cores for threads used by the OS or the filesystem (I’ve not seen in Bitcoin Core any way to tune the affinity for its many threads).

    Waht I noted also is with the QT version these bugs were ocuring much faster (because QT also needs its own threads, that’s why I’ve stopped using it and now I only try with the CLI daemon). Every attmpts I’ve made on every machines and OS (with different capabilities, CPU, memory, storage types) have constantly failed in their IBD phase.

    So I acn’t beleive that others don’t have this issue (or they don’t complain to you, they have just abandoned the idea and just think that BitCoin Core does not work at all, or it’s not for their machine). I tend to think that those for which Bitcoin core is running were just extremelt lucky, or that they do not the vanilla distribution and have compiled and fixed their bugs themselves (possibly by modifying the code and not reporting their suggested patch to you).

  50. verdy-p commented at 8:26 am on August 13, 2022: none

    Also is there a way for leveldb files to have more meaningful names that their existing default numbering (a simple “%06ld” counter before the file extension)?

    For example based on the minimum block height of blocks that are indexed in it possibly followed by a dash separator before the basic counter (using the full block hashes would be overkill)? It won’t use up more space on the filesystem, but would simplify the debugging by easier identification of “what goes where”.

    Are there ways to tune LevelDB to create regular “snapshots” with several manifests and logs for easier recovery after a crash (we could instantly restart from the previous snapshot if the last one was corrupted, of course this would mean that index files have to be mirrored to the snapshot, probably using a “writeback” mode, and no longer modified or deleted even if they are no longer in the newest working manifest).

    I also tend to think that BitCoin Core maintains the cache in memory much moe than what is needed, adn its LRU strategy is probably over-optimistic. And the way it performs its “compaction” (when the dbcache in memory is full) is way too agressive, it deletes eevrything on disk and regenerates all indexes, dropping many things with very weak checks (based only a a bacis reference counter without any basic inspection (such as using “prev” and “next” iterators on the first and last item present in the index about to be deleted to see if they are refefenced from another existing index which is kept).

    May be the compaction is unsafe because it deletes the files in the wrong order, and the iterators used are not immune to deletion of items in the middle of the iterated list (there could be similar issues if an index file is splitted, causing an insertion in the middle of the list being iterated).

    One way to test that would we to create a test where index files in the chainstate have a much smaller “max size” (which by default is about 2MiB): it would stress more often the “compacter” forcing it to cover much more “edge cases” if it was reduced to about 64KB (but also it would mean that there would be more active index files in the chainstate, so it would also stress the number of open files, to see if this is an issue in LevelDB, keeping too many open files in the process for some background management threads, and would indicate that LevelDB must be better tuned and controled about its own file system usage, or the numebr of concurrent threads it can use, or the maximum allowed length of its I/O queues (if it uses them).

    Or may be BitCoin Core forgets to call an API of Level DB to force it to sync to disk and close all handles just before Bitcoin Core attempts to use its own “compacting”. I don’t know if LevelDB supports that (but it should for many apps that want to get sure that all their daa is synced to disk before terminating safely, and possibly for creating their own safe snapshots which would be recoverable on demand).

    (Given the size of the chainstate folder, it peaks at around 2GB of storage, we could easily manage to keep about 3-10 snapshots, with one created every 5-10 minutes, it won’t take long, notably if these snashots are in “writeback” mode with new files being created and mirrored in a writable copy only if aa index that is part of the snapshot is about to be modified). this would also allow easier recovery when restarting, without having to reindex the whole chain, if this was caused by a temporary overload on the OS or on the underlying filesystem, temporarily rejecting the creation or opening of new files (including for example if there are other tools runing on the OS such as security suites, antimalwares, OS update tools taking an exclusive lock on some parts of the filesystem…)

    Also I’ve monitored everythin I could with the OS tools, in Linux or Windows (the file system I/O queues are far from being full, the physical disk I/O queues are responding nomally with very short but minor spikes, and latency remaining always below 20ms (so there’s no excessive timeout issue occuring, CPU core usage in user or kernel space: there’s a relatively high usage in the kernel space, meaning that there are lot of active system threads coing background work, posibly for optimizeing the filesystem, restructuring/compacting the allocation maps; but I see no core reaching 100%; as well I do not see alarming level of CPU temperature, no CPU throttle; disk temperatures are also stable in SMART controls, and I do not see any hardware alarms, errors, or recovery corrections applied being logged in system event logs).

    But I cannot say the same about system handles: they are constantly and slowly growing (on Linux like on Windows), especially during “compaction” inside BitCoin Core, and never come back to normal levels. This indicates that LevelDB maintain too many open files (possibly just for shared locks even if there’s no I/O and no exclusive lock). As well the total number of threads inside Bitcoind Core is slowly but constantly growing (each time after the “compaction” occurs, as we see them in the debug logs).

  51. verdy-p commented at 6:45 pm on August 13, 2022: none

    I have some good news (and bad news for you). Finally I chose to build bitcon core myself instead of using your binarise generated with guix (apparently on MacOS) for your sources from Git. First I tried to do that on Windows, with WSL (installing the Debian OS image propsoed on the Microsoft Store). I made the necessary config (editing “/etc/resolv.conf” to configure a couple of DNS servers for the Internet) and then “apt update”. I installed the necessary libraries and dependecies (also with “apt install”). I DID NOT use guix (your instructions to install Guix do not work; on Debian, all you have to do is to use “apt install guix” which jsut runs fine as is, nothing else to configure!

    I ran the 521 test of Bitcoin_test and they all worked OK. I ran the bitcoind version that I compiled myself (it took just 10 minutes, much less than described and expected). And it works perfectly.

    Then I stopped it and reconfigured it to point the datadir to the Windows volume where I had placed them (mounted in WSL as “/mnt/g/Bitcoin”, because on the Windows it is “G:\Bitcoin” (there’s nothing else on that volume, it’s a NTFS partition of 800 GB mounted on a fast and reliable RAID, much enough for now for all datablocks and additional chainstate files taking about 2GB at most, and optional indexes for the coindb, txdb, bloomfiltersdb, or zmqdb if I allow them in bitcoin. conf) Here again there’s no “corruption”: all files that were generated on windows (notably all blocks already uploaded) validate as is without any change or need to resync.

    So my opinion is that your GUIX environment running on MacOS and used to generate the binaries are the only cause of this bug that affects BOTH Windows and Linux. GUIX on MacOS is NOT reliable.

    After that I’ve tried to generate binarise by compiling them on windows with Visual Studio 2022. They work like a charm no Windows (either native, or in a VM hosted in Hyper-V or Azure, or hosted in AWS)!

    Note as well that I no longer observe any growing leakage of file handles after each “compression” stage in the chainstate. I cannot explain according to the sources why the “chainstatemanager” bugs like this in your GUIX-generated binaries.

    Conclusion: build your releases for Windows and Linux on Windows or Linux (either native or in a VM). Abandon GUIX on MacOS (except possibly only for creating the MacOS releases). I think that your GUIX environment on MacOS does not use a correct C compiler, or its C runtime libraries (for cross compiling for other systems) have bugs and are not maintained correctly, or you’ve used Clang (which has some bugs, and not a vanilla GCC), or some obscure linker that generates incorrect bindings, possibly with some limiting “shims”, but that are NOT fully compatible with Windows, and also not fully compatible with the Linux ABI when you target Linux.

    GUIX can only be the cause, or it is incorrectly installed on your MacOS machine used to generate the signed binaries released in v22 or v23 and offered for download on your official site!

    GUIX is a ery complex environment, it itsself has LOTS of depencies, if has lot of scripts, itruns multiple scripts using “Guile” and written with GNU Scheme. May be this is a bug of Guile (note that your instructions to setup Guile do not work, in fact I just ue the vanilla version of Guix provided as an installable .deb package using just the standard “apt” tool. For RedHat/Fedora/CentOS/OracleLinux/almaLinux and similar distributions may provide already their own .rpm packages ready to install. I don’t know for Gentoo/ArchLinux and similar, or other rollig distribs but I would not use them for running BitCoin Core, when it is so simple to install a basic VM with a vanilla Debian which will be well manietained and reliable.

    As well my binaries generated natively for windwos in Visual Studio 2022 run perfectly on all versions of Windows I tried (Windows 7, 8, 8.1, 10, 11, variable editions: Home, Pro, Enterprise, with or without an AD domain junction, with a local user account or with a MSA account, and running as a normal user or as an adminsitrator, without needing any UAC elevation).

    My generated binaries made on Debian also work directly (without recompilation) on other Debian-based distributions (e.g. Ubuntu, Mint, PopOS…) indepenantly of the choice of the desktop environment (or its absence), and independantly of other initialisation systems (systemd, openRC, initd…).

    Finally I used a basic tool to convert build a basic .deb package with these binaries: it also worked on all Debian-based distributions. I converted the .deb to a .rpm with a basic tool and here again they worked without modification on Fedora, CentOs and AlmaLinux.

    I’ve also tried on these Linux to change the kernel versions (on distribs where we have the choice), and this did not impact anything. Only your bvinaries compiled with GUIX on yuor MacOS do not work reliably anwywhere on Linux and Windows! So I tend to think that ALL those that run BitCoinc Core successfully have in fact NOT used your signed binaries, but compiled the tools themselves on their machine, without using GUIX, jsut using the vanilla compiler of their plaform (with up to date libraeies, but as I’ve seen, I do not detect any bug when using older Linux kernels as long as these are Linux kernels 4+)!

    It is very likely that your GUIX environment generates incorrect configurations for “./configure”, such as not linking the appropriate libraries, or using limited but unsafe emulations/workarounds in the compiled files for things that seem critical here (notably mutexes, pthreads, file locks, may be async I/O… or some parts of the C runtime libraries that you link statically to the binaries).

  52. fanquake commented at 7:03 pm on August 13, 2022: member

    GUIX on MacOS is NOT reliable.

    GUIX isn’t supported by / doesn’t run on macOS.

  53. verdy-p commented at 7:10 pm on August 13, 2022: none
    If you think that the C compiler suite or the linker is not the problem, I suspect that the generation of ./configure options are wrong with GUIX and it select the wrong compilation options. This could be inside Guile, or autoconf, or some bad version of m4, or some incorrect regexp whose semantics/behavior is undefined or has changed. But anyway GUIX is the only thing I did not use and is the cause if you’ve used it to generate precompiled Windows and Linux releases in v22 and v233.
  54. verdy-p commented at 10:28 am on August 14, 2022: none

    As well I’ve found buffer overflows (corruptng the stack or the heap, depending on implementation of the standard C runtime libraries) which may explain some bugs occuring randomly.

    This is caused by how logs are managed in the dbwrapper that you use with LevelDB. It’s badly written, it uses a completely unneeded loop just to figure out (incorrectly) what buffer size will be needed.

    I’ve just replaced this class in src/dbwrapper.cpp. The existing code was not properly testing the return value of vsnprintf(), which may be larger than the given buffer size or may be negative!!! No more buffer on the stack with a secondary buffer allocated elsewhere, and a single buffer allocated once on the heap. This also saves stack size. Allocating once a single 32KB buffer on the heap, even if it may seem overkill in size, has no importance in terms of memory, when BitCoin Core already uses tons of megabytes (possibly gigabytes) for its dbcache or mempool.

    Note that the buffer is allocated for each instance of the logger; if you have several db indexers (eg. for the coindb) running on separate threads. There’s no locking/mutex mechanism to prevent that alass to be used concurrently and have that buffer partially overriden concurrently. But even if this occurs, that implementation will remain safe, we will not overflow the buffer, but may just log some partially mixed messages. If you need other dbwrapper instances, you can perfectly allocate a new dblogger instance, without having to use costly locks, it will cost you just 32KB per dbwrapper engine!

     0class CBitcoinLevelDBLogger : public leveldb::Logger {
     1    // This code is adapted from posix_logger.h, which is why it is using vsprintf.
     2    // Please do not do this in normal code
     3private:
     4    static const int bufsize = 32768;
     5    char *buffer = new char[bufsize];
     6    ~CBitcoinLevelDBLogger() {
     7        delete[] buffer;
     8    }
     9public:
    10    void Logv(const char *format, va_list arg) override {
    11        if (!LogAcceptCategory(BCLog::LEVELDB, BCLog::Level::Debug)) {
    12            return;
    13        }
    14        // Print the message
    15        // Do not use vsnprintf elsewhere in bitcoin source code, see above.
    16        va_list ap;
    17        va_copy(ap, arg);
    18        int n = vsnprintf(buffer, bufsize - 1, format, ap);
    19        va_end(ap);
    20        if (n >= 0) { // Note that n may be larger than bufsize - 2
    21            if (n > bufsize - 2)
    22                n = bufsize - 2;
    23            // Add newline if necessary
    24            if (n == 0 || buffer[n - 1] != '\n')
    25                buffer[++n - 1] = '\n';
    26            buffer[n] = '\0';
    27            LogPrintLevel(BCLog::LEVELDB, BCLog::Level::Debug, "%s", buffer);
    28        }
    29    }
    30};
    

    This also saves some stack space (the 500 bytes allocated at each call in the current code, always allocated, even if it is not used; and removes the unncessary loop to determine if we can use it or an external dynamically allocated buffer), and can avoid performance issues if there are many concurrent threads call it in dblevel. It unstresses as well the global memory allocator.

    You’ll note also that if n < 0 above, nothing is logged at all, this may eventually happen only if some given format or its values exceed the maximum allowed here (32KB should be sufficient for every useful message emitted by LevelDB, but may be we could log some statis text saying that something to long could not be logged (and may be dump the format string after converting ‘%’ characters to some other safe character). This code will silently truncate excessively long logs emitted by LevelDB (and I don’t think this has ever happened, even in its existing debugging code; if ever some debugging code needs to log very long things, they should create a separate file, and just send a log message saying where to find it; this would be essentially for logging crash events with many more details, or backing up / dumping some parts of its database, to be used by external leveldb debugging and analysis tools).

    Note as well that the call to LogPrintLevel() reuses another sprintf() function call (just with a basic “%s” format). May be it is also unsafe, why doesn’t it allow passing just a plain text without any formatter? Don’t you think we should have a LogPrintLevelStr() function to drop the remaining “%s” formatter above?

    LogPrintLevel() is currently defined as a macro, which just calls LogPrintf() by passing three additional parameters for the current function name, its source file and source line. We would need a more basic LogPrintStr() function (without the ‘f’ suffix) and a new separate macro LogPrintLevelStr() using it.

    As well if I look in the sources, all uses of LogPrintf() are passing a string to be terminated by a static newline in the format string. We could just drop these newlines from all its input format string, and generate that extra newline internally (noly when necessary if the input format does not contain it); in that case the special management for appending the newline above would not be necessary at all.

    If I look now in src/logging.h, LogPrintf() is also a macro, calling finally the function LogInstance().LogPrintStr() after performing the formating, using tfm::format and checking its result. A more basic LogPrintStr() would not need that. We are doing unnecessary memory copies again and unnecesary checks, as we could just call LogInstance().LogPrintStr() directly. This would also given some minor boost while also eliminating possible causes of unsafe uses of string formatters.

    LogPrintf() currently is based on the TinyFormatter implementation (usnig C++ streams and not the legacy C RTL library), and for me it looks safe (except that LevelDB sends it logs by assuming they will be formatted by the C RTL library: TinyFormatter has some additional restrictions, but I’ve not seen any one where this would break LevelDB, so may be we could use TinyFormatter here instead of vnsprintf(). If ever there’s some new needed format used by LeveDB, we could easily patch TinyFormatter to accept them… except probably ‘%n’ that TinyFormat considers really unsafe; other flags like ‘L’ not supported by TinyFormatter are just silently ignored, but could be added easily: TinyFormatter is not a complex module, and it uses basic C++ iostreams, so it’s acceptable to patch it for Bitcoin Core if needed for our LevelDB wrapper).

    Now may be with this patch, your builds made in GUIX will have all the problems in this tread solved (I did not test it). I’ve just run the 521 unit tests and they all succeed when I build it outside of GUIX, and as well there’s no longer issue in Linux or Windows, just like I said above.


    edit: you can freely change the constant 32768 above. During my experiments, I just growed it a bit from the value 30000 you used when allocating it on the heap, but this has no effect; I’ve even run successfully by setting it the the value 500 that you used when allocating the buffer on the stack, or 512 if you wish. This has no importance. It may be a constructor parameter.

    But the key thing of this patch is: DON’T allocate the buffer on the stack at each log call, but allocate it on the heap once, when instantiating the logger for the dbwrapper. This is a clear sign that allocating the buffer on the stack, is causing the logger to write over some objects that are still living here.

    So this is a clear sign of “user-after-free” bug in the rest of Bitcoin Core. And I suspect that theses are “Span” objects creating inside some calls to iterate over some temporary objects allocated no the stack, but that are referenced by some other aliases and that should have been copied in their own area and no logner used when the function creating the temporary objects returned.

    Here the logger wrapper function uses minimal stack space and avoid overwriting these objects, so we no longer see the “corruption” (one way to assert it with the existing logger, would be to clear (memset) the buffer in the stack entirely at each entry to the function: I’m quite sute you’ll then hit the case where you’ve overwritten some temporary objects used after they were freed, and you’ll get a reproductible case (e.g. run Bitcoin Core with “debug=all”) where you get the “corruption” much earlier! Now you’ll be able to diagnose what is wrong on other classes and may be able to locate these other bugs (that my patch above allows to avoid creating damages).

    In summary, this is definitely not a hardware issue (and I’ve used LevelDB on other projects creating massing amounts of key/value changes with variable sizes and massive amounts of flushes/merges/splits) and never detected such corruption. The bug is definetely inside Bitcoin Core owned code.

    Note also that setting the “checklevel” to 4 instead of 3 in bitcoin.config (or in start options) always allows repairing these damages (at the price of a longer startup) when bitboin BCore (daemon, or QT version it does not matter) and continue without having to reindex the chainstate. This should be the default!

    Now it’s time to locate this damned bug in the BitCoin chain manager (and notably inspect how its “Span” are used, possibly inside some of its unsafe iterators generating lists of objects in one function, then returning, then calling some logging function, then using these objects in other functions).

  55. verdy-p commented at 12:31 pm on August 14, 2022: none

    Note also that this patch may also help solving other bugs experienced randomly (found in various web forums) for which there was no clear solution. Including various EGV or “database corruptions” (caused by some lock variables being overwritten, or some tables in the levelDB cache, or mempool, or other optional indexes for specific services?)

    I think that the existing LevelDB wrapper was just written a bit too fast and trusted too soon: the C RTL library for the legacy *printf/*scanf functions (and all their many variants) has already been the cause of many nightmares and strange bugs, and they have lot of portability problems depending on platforms, compiler types and versions, library versions, OS distributions (and their own patches to support more or less formaters) or changes in standards (or contradictions between them!), as they have lot of undefined behaviors. But I wonder why LevelDB continues to make use of, or to depends on, these legacy C RTL functions for its logging interface.

  56. verdy-p commented at 5:26 pm on August 14, 2022: none

    Note that if this has solved apparently the most frequent errors, After repeated tests, this still ccurs, but more rarely. This tends to show that there are other causes, and that they are related to other buffer overruns occuring in other objects used on the stack (with the patch above, as we no longer allocate and use any buffer on the stack, it cannot overwrite some other stacked objects which may still be used after they should have kept no reference.

    However, with the patch, when the bug occurs, and I restart the deamon, the “error” is recovered when the cache gets checked and reconnected at startup to recover the missing chaintips (this was not the case before, I was forced to restart always with -reindex-chainstate flag, so I could never terminate the IBD phase) and seems to occur only with some objects controling the dbcache or mempool, but not the content of the dbcache or mempool themselves.

    I’m searching that in the classes controling the open state of these caches. I think it’s caused by some async I/O, both on Windows (using VirtualAlloc) and on Linux (using mmap), probably becuse of some incorrectly managed locks. I’m investigating the code currently used in validation.cpp (in “loadblk” threads), as this seems to occur not just after a full compaction, but in fact after some index file at a given level gets splitted in two parts, and immediately one of them gets compacted with the next due to some recursion (and the lock already acquired by the thread for the first splitting operation does not protect from subsequent merging operations). This may be in fact an edge case inside the LevelDB library, possibly in one of its iterators (notably the backward).

  57. verdy-p commented at 5:17 pm on August 15, 2022: none

    Finally you should already know that when compiling Bitcoin Core (at least on Linux with GCC/G++), you have warnings about uninitilized variables. This may seem overkill, but I done this to initialize them to 0 (including one file in leveldb, and I added an additional assert where the size returned has a negative offset of -8 applied, just to make sure this is not the cause of undetected problems).

     0diff --git a/src/leveldb/db/memtable.cc b/src/leveldb/db/memtable.cc
     1index 00931d467..3e61f3393 100644
     2--- a/src/leveldb/db/memtable.cc
     3+++ b/src/leveldb/db/memtable.cc
     4@@ -12,7 +12,7 @@
     5 namespace leveldb {
     6
     7 static Slice GetLengthPrefixedSlice(const char* data) {
     8-  uint32_t len;
     9+  uint32_t len = 0;
    10   const char* p = data;
    11   p = GetVarint32Ptr(p, p + 5, &len);  // +5: we assume "p" is not corrupted
    12   return Slice(p, len);
    13@@ -113,10 +113,11 @@ bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) {
    14     // sequence number since the Seek() call above should have skipped
    15     // all entries with overly large sequence numbers.
    16     const char* entry = iter.key();
    17-    uint32_t key_length;
    18+    uint32_t key_length = 0;
    19     const char* key_ptr = GetVarint32Ptr(entry, entry + 5, &key_length);
    20+    assert(key_length >= 8);
    21     if (comparator_.comparator.user_comparator()->Compare(
    22             Slice(key_ptr, key_length - 8), key.user_key()) == 0) {
    23       // Correct user key
    24       const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8);
    25       switch (static_cast<ValueType>(tag & 0xff)) {
    26diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp
    27index 94399faf0..2f310776e 100644
    28--- a/src/test/fuzz/string.cpp
    29+++ b/src/test/fuzz/string.cpp
    30@@ -237,22 +237,22 @@ FUZZ_TARGET(string)
    31         (void)(bs1 + bs2);
    32     }
    33     {
    34-        int32_t i32;
    35-        int64_t i64;
    36-        uint32_t u32;
    37-        uint64_t u64;
    38-        uint8_t u8;
    39+        int32_t i32 = 0;
    40+        int64_t i64 = 0;
    41+        uint32_t u32 = 0;
    42+        uint64_t u64 = 0;
    43+        uint8_t u8 = 0;
    44         const bool ok_i32 = ParseInt32(random_string_1, &i32);
    45         const bool ok_i64 = ParseInt64(random_string_1, &i64);
    46         const bool ok_u32 = ParseUInt32(random_string_1, &u32);
    47         const bool ok_u64 = ParseUInt64(random_string_1, &u64);
    48         const bool ok_u8 = ParseUInt8(random_string_1, &u8);
    49
    50-        int32_t i32_legacy;
    51-        int64_t i64_legacy;
    52-        uint32_t u32_legacy;
    53-        uint64_t u64_legacy;
    54-        uint8_t u8_legacy;
    55+        int32_t i32_legacy = 0;
    56+        int64_t i64_legacy = 0;
    57+        uint32_t u32_legacy = 0;
    58+        uint64_t u64_legacy = 0;
    59+        uint8_t u8_legacy = 0;
    60         const bool ok_i32_legacy = LegacyParseInt32(random_string_1, &i32_legacy);
    61         const bool ok_i64_legacy = LegacyParseInt64(random_string_1, &i64_legacy);
    62         const bool ok_u32_legacy = LegacyParseUInt32(random_string_1, &u32_legacy);
    63diff --git a/src/univalue/include/univalue.h b/src/univalue/include/univalue.h
    64index 8ba6fd542..b1e659af8 100644
    65--- a/src/univalue/include/univalue.h
    66+++ b/src/univalue/include/univalue.h
    67@@ -138,7 +138,7 @@ Int UniValue::getInt() const
    68 {
    69     static_assert(std::is_integral<Int>::value);
    70     checkType(VNUM);
    71-    Int result;
    72+    Int result{};
    73     const auto [first_nonmatching, error_condition] = std::from_chars(val.data(), val.data() + val.size(), result);
    74     if (first_nonmatching != val.data() + val.size() || error_condition != std::errc{}) {
    75         throw std::runtime_error("JSON integer out of range");
    

    The second patch in fuzz tests has no impact, it is not used by the core deamon or QT. It’s just for peace of mind and costs nothing. The “LegacyParseInt?()” and “ParseInt?()” functions may not set any value to their given variable reference if they return false into “ok_*” target variables. But nothing indicates that condition so the compiler warns as it cannot check that at compile time. As this is a test case to see if “ParsInt?()” and “LegacyParsInt?()” are behaving correctly (and are effectively setting the expected value in the referenced variable when they don’t return an error flag; may be we could use some magic random values other than just 0, but different between “LegacyParse?” and “Parse?” as we will compare them, to detect such cases (which would signal a bug in their implementations, as this is exactly what these fuzz/test functions are intended to test). As well these functions are not required to set any value in their referenced output variable if they return an error.

    The last unitialized value (in src/univalue/include/univalue.h) is part of Bitcoin Core’s own RPC protocol with JSON, and I think it may be probematic, but not the cause of the dbcache corruption (which occurs even when there’s no RPC at all, and I don’t let the RPC server listening from other nodes, it is purely connected only on local 127.0.0.1 and only used by bitcoin-cli if I use it manually; but I may be wrong…).

    Basically, calling std::from_chars() and passing it a reference to an unconstructed object is undefined (not that result is not changed or set when there’s an error indicated by the error_condition returned, but that call may fail with an error if the result is not constructed; thanks the result is a basic object, whose all that matters for std::from_chars is its type, and that type is basic enough to not require additional internal fields to be se; but all examples shown in the doc for the C++ std library show that it should be constructed ilmagine that the type “Int” object type given here was overriden with extra methods, e.g. with tracing/debugging function, or used a specific placement allocator…). Note that “Int” here is a typename template parameter, so it can be arbitrary class accepting to be set by std::from_char (we could as well have used , the method just checks that is an integral type, but this could be some “BigInt” with a complex storage structure (or even a floatting point type with value checks in setters) and not just a basic integer type, so we should use its default constructor.

  58. sipa commented at 3:12 pm on August 16, 2022: member

    It’s badly written, it uses a completely unneeded loop just to figure out (incorrectly) what buffer size will be needed.

    I certainly agree the code is unnecessarily complicated, but it’s just an analogue of the logging code in LevelDB itself, but with Bitcoin Core’s logging framework instead of printing to a file. See https://github.com/google/leveldb/blob/v1.20/util/posix_logger.h#L34L92. What the code does is first try to write to a stack-allocated 500-byte buffer (correctly passing the upper bound of that buffer to vsnprintf, which won’t write outside of it, but may indeed as you say return a larger number corresponding to how much it would have written). If the returned size exceeds the buffer size, a dynamically-allocated buffer of 30000 bytes is used, and the process is retried. If the returned size still exceeds the buffer size, the result is used as is (truncated to 30000 characters).

    It’s certainly possible there is a bug in this code, and I don’t see it, and our sanitizers which are generally very good (but not perfect) at detecting buffer overflows don’t catch it, but with what I’ve seen I think it is unlikely this is the culprit. That said, pull requests to improve the code are certainly welcome.


    Back to the original issue.

    As I’ve said before, we really need a way to reproduce your problem. I’ve personally done several synchronizations from scratch, including a few just this past week, in Linux and Windows, using 23.0 and master branch code, from the network and using -reindex-chainstate and with reduced dbcache size, and never run into any form of corruption. I don’t mean to imply there is something wrong with your system(s), but there must be something different about your systems that others don’t have. I hope you believe me that no, others do not regularly see this, and no, not everyone has stopped using this software. You keep bringing up theories about what is wrong, but without the ability to reproduce and test that these issues are related, this is grasping at straws.

    Please, do at least one of these two things:

    • Provide exact instructions on how to reproduce the problem you see in Azure, or a VM, or something else reproducible. Variations on “do the default” aren’t sufficiently detailed, it appears, as people have tried this and failed to reproduce your problem.
    • Perform a synchronization from scratch, with -debug on the command-line (or debug=1 in bitcoin.conf), until your corruption occurs, and then post the resulting full debug.log file somewhere.
  59. verdy-p commented at 11:20 am on August 17, 2022: none

    I’m not contesti,g what the original dbwrapper’s logger does; but it has a visible effect as it allocates and writes its buffer on the stack. And visibly this has the effect of overwriting other objects persisting on the stack that are apparently used after free. Maybe it’s a bug of gcc or its libraries. But I did not choose any specific libaries, jsut used the current versions as shipped from the default in Debian bullseye; I did not try with any rolling release OS, so it should be fairly stable and most difficult bugs that may exist should be identified, and I shuold not be alone.

    It’s a fact that after trynig many common configurations, and on many machines, all installed with their default settings (just the necessary dependencies and regular system updates), I’ve never been able to complete the IBD on any machine (variable CPU types, memory, storages on SDD or HDD or fast RAID stores, it does not matter) with your official releases or even get a stable solution when trying to compile it from the Git sources (just a git pull from this repository, no differences, then ./configure and “make all”.

    And debug logs (even with “debug=all” for logging options) do not reveal anything to explain the cause; I’ve just seen than compiling the daemon with debug (-g) compiling option just made it more stable. And that allocating the dbwrapper’s logging buffer once outside the stack was also making it more stable. I’ve tried various configurations including reducing or growing the default dbcache size (let’s ignore the mempool, I can reduce it, it does not matter in the IBD phase where it remains empty, but may be used as a supplementary space for the dbcache).

    The number of combinations I’ve tried, even on very clean fresh installation of OSes and on fresh machines does not change the things. It has absolutely never worked reliably. So I cannot be alone (and I’m not alone visibly when I look for other users on various online forums, though some are trying to run it on very old and slow machines). In my opinion, the default logging options (even with “debug=all” in bitcoin.config are insufficient to isolate the cause, that’s why I tried to compile it myself and run it through gdb, where I also get the same issue reported). So I absolutely do not know waht to do, and like others I’m stuck there without any solution.

    If you still think my configs have something special, you’re wrong. I’ve tried many machines and very clean environments (and various OS distribs, and hardwares). IF this works for some people, may be it is because they have installed something else on their system which is not documented, but not provided by default on these systems. I even thought it could be specific to some x64 CPU models, but I’ve tried a large range of them (though I did not try with an AMD CPU, all the systems I’ve tested are Genuine Intel CPUs, various families 4 through 10, Core Duo, Core i3, Core i5, Core i7, even Core i9). And I’ve never used any acceleration (default BIOS/EFI settings). All these systems were extremely stable with various types of softwares (except with Bitcoin Core). None of them were overheating or reported any hardware issue.

  60. fanquake commented at 9:31 am on November 1, 2022: member

    If you still think my configs have something special, you’re wrong. IF this works for some people, may be it is because they have installed something else on their system which is not documented, but not provided by default on these systems.

    You haven’t provided any further useful information that we can use to try and debug your problem. Closing for now. Can be re-opened if you provide exact steps to reproduce.

  61. fanquake closed this on Nov 1, 2022

  62. verdy-p commented at 11:43 pm on December 20, 2022: none
    Well I have asked others to try using it, and in fact find absolutely no one that succeeded. The problem is not for me to reproduce the problem (it always happen for everyone), but how you exactly configured your systems to make it work (that’s not documented at all and I know absolutely nobody that uses successfully your application, except on some very specific and non-disclosed server environments). So I have used something else. I have lost too much time trying everything I could. Definitely it has NEVER been any hardware issue (it may have been an OS issue, but given the number of OSes and versions I’ve tried and others have tried too on their machines, always failing, this is very unlikely). So this can only be a software issue in your project, which has hidden/unchecked/undocumented assumptions, and/or incorrect documentation to get a working installation.
  63. bitcoin locked this on Dec 20, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me