Bitcoind repeatedly crashing at `UpdateTip` with no error message #24483

issue wbobeirne opened this issue on March 5, 2022
  1. wbobeirne commented at 11:49 PM on March 5, 2022: none

    <!-- Describe the issue -->

    Expected behavior

    Bitcoind either doesn't crash, or provides some error information in debug.log about the crash.

    Actual behavior

    About 10 seconds after starting bitcoind, it reliably crashes right after:

    2022-03-05T23:47:40Z UpdateTip: new best=0000000000000000025faae44cda1e306fdfb468c30870a3731bbdadf24c5f
    5f height=425259 version=0x20000000 log2_work=85.132165 tx=149305533 date='2016-08-15T01:43:46Z' progre
    ss=0.211905 cache=0.1MiB(867txo)
    2022-03-05T23:47:40Z UpdateTip: new best=00000000000000000287a137cfe12343e83667c5ab290f2c6ee75fe323a1c8
    ec height=425260 version=0x20000000 log2_work=85.132195 tx=149307741 date='2016-08-15T02:03:56Z' progre
    ss=0.211908 cache=1.2MiB(9061txo)
    

    I'm currently stuck on block 425258 but that's after trying to run -reindex when this was happening more up towards the high 600000s, so I'm not sure if it's anything particular with this block or not.

    To reproduce

    This happens every time consistently for me.

    System information

    <!-- What version of Bitcoin Core are you using, where did you get it (website, self-compiled, etc)? -->

    bitcoind --version
    Bitcoin Core version v22.0
    

    Originally tried with building my own, then tried again with the pre-build binary from bitcoincore.org to make sure I didn't do something wrong, same issue.

    <!-- What type of machine are you observing the error on (OS/CPU and disk type)? -->

    > lsb_release -a
    
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 20.04.3 LTS
    Release:	20.04
    Codename:	focal
    
    > lscpu
    
    Architecture:                    x86_64
    CPU op-mode(s):                  32-bit, 64-bit
    CPU(s):                          4
    Vendor ID:                       GenuineIntel
    CPU family:                      6
    Model:                           158
    Model name:                      Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz
    ...
    
    > lsmem
    
    lsmem
    RANGE                                  SIZE  STATE REMOVABLE  BLOCK
    0x0000000000000000-0x000000008fffffff  2.3G online        no   0-17
    0x0000000100000000-0x000000046fffffff 13.8G online        no 32-141
    
    Memory block size:       128M
    Total online memory:      16G
    Total offline memory:      0B
    
    > sudo smartctl --all /dev/sda
    
    === START OF INFORMATION SECTION ===
    Model Family:     Western Digital Red
    LU WWN Device Id: 5 0014ee 26488ca44
    Firmware Version: 82.00A82
    User Capacity:    3,000,592,982,016 bytes [3.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    ATA Version is:   ACS-2 (minor revision not indicated)
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    ...
    

    I put my debug.logs over in this gist, one default logging and one with debug=1: https://gist.github.com/wbobeirne/c32a5ef6d6778dc1ace63e74144dc0c5

  2. wbobeirne added the label Bug on Mar 5, 2022
  3. maflcko commented at 2:12 PM on March 7, 2022: member

    Can you run in gdb or valgrind to provide a stack?

  4. wbobeirne commented at 1:46 PM on March 9, 2022: none

    I tried rming my blocks and starting fresh in case there was something corrupted, but it's still failing in a new spot. Here's a run with gdb:

    [New Thread 0x7fff467fc700 (LWP 1284461)]
    2022-03-09T13:38:59Z net thread start
    [New Thread 0x7fff45ffb700 (LWP 1284462)]
    2022-03-09T13:38:59Z dnsseed thread start
    2022-03-09T13:38:59Z Waiting 300 seconds before querying DNS seeds.
    [New Thread 0x7fff457fa700 (LWP 1284463)]
    2022-03-09T13:38:59Z addcon thread start
    [New Thread 0x7fff44ff9700 (LWP 1284464)]
    2022-03-09T13:38:59Z opencon thread start
    [New Thread 0x7fff2ffff700 (LWP 1284465)]
    2022-03-09T13:38:59Z msghand thread start
    2022-03-09T13:38:59Z init message: Done loading
    2022-03-09T13:38:59Z [default wallet] Submitting wtx c6f5fed1dc24a5551afa3b3d6805ee8f849fa27adbdfba674545d3a3e764788f to mempool for relay
    2022-03-09T13:38:59Z [default wallet] Submitting wtx 43600c201c00ba90c99bf2b1b24de0c8b893cf10a5e4c09efa1811b7edd79148 to mempool for relay
    
    Thread 13 "bitcoind" received signal SIGBUS, Bus error.
    [Switching to Thread 0x7fff4c9de700 (LWP 1284457)]
    0x0000555555bc6070 in ?? ()
    

    I've seen other threads where it's mentioned that SIGBUS is usually due to a hardware issue. My previous HDD info says there are no SMART errors though, and I tried running memtester a few times with no issues detected. I haven't seen any other kinds of similar crashes on this machine. Lmk if you think of anything else I can do to debug.

  5. maflcko commented at 2:03 PM on March 9, 2022: member

    Well, I am not sure how to debug this further without a gdb/valgrind stacktrace that points to source code.

    Though, segfaults usually happen in wallet code. You can confirm or reject this by turning off the wallet for one run.

    Also, you can try running 23.0-rc1 in another run (with wallet) to see if the bug is already fixed.

  6. maflcko added the label Wallet on Mar 9, 2022
  7. wbobeirne commented at 6:23 AM on March 10, 2022: none

    Tried with disablewallet=1 in the conf, same crash (confirmed disabled in the startup logs, but not sure if this is what you meant when you said turning the wallet off.) If you can point me in the direction of how to provide a more useful stacktrace, I'd be happy to try to get more info around this.

  8. maflcko removed the label Wallet on Mar 10, 2022
  9. maflcko commented at 6:59 AM on March 10, 2022: member

    Since gdb didn't produce anything, maybe try valgrind? If the crash happens after 10 seconds normal time, with valgrind it may take a few more minutes:

    valgrind bitcoind
    
  10. bitcoin deleted a comment on Apr 9, 2022
  11. bobjansen commented at 8:55 PM on August 14, 2022: none

    I'm running into something similar. Unfortunately, when bitcoind stopped, it took with it my Konsole including the gdb session. Luckily, I was watching the progress: the last message I saw indicated it happened when the IBD was finishing (something about latching to false). When time permits I'll try running from a local tty instead of a window manager.

  12. bobjansen commented at 5:23 PM on August 15, 2022: none

    Did that, while stopping bitcoind every few days of progress. No crash and both bitcoind and bitcoin-qt work now...

  13. pinheadmz commented at 3:39 PM on March 20, 2023: member

    @wbobeirne any updates on this issue or more clues to help debug?

  14. wbobeirne commented at 9:58 PM on March 22, 2023: none

    @pinheadmz I'm fairly certain that despite SMART not reporting any issues, there was hardware disk failure at play here. I started to see instability in my system elsewhere. I haven't yet swapped the offending hard drive to confirm, but this can probably be closed.

  15. maflcko closed this on Mar 23, 2023

  16. bitcoin locked this on Mar 22, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-03 15:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me