b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD #31561

issue pinheadmz openend this issue on December 24, 2024
  1. pinheadmz commented at 12:57 pm on December 24, 2024: member

    Running master (specifically 318b2a2f90dfd8ae20beca58e55e6d240a7f3d27 which is HEAD of #31415) on Ubuntu 24.10 on a digital ocean droplet with 8GB RAM. Crashing regularly during IBD. Logs always end like this:

    0
    12024-12-23T20:39:07Z [validation] Enqueuing BlockConnected: block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    22024-12-23T20:39:07Z [validation] Enqueuing UpdatedBlockTip: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 fork block hash=000000000000000000023c4f2b1f2457d2f46cd7b6cb27f87ab4788cd54adde1 (in IBD=true)
    32024-12-23T20:39:07Z [validation] ActiveTipChange: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    42024-12-23T20:39:07Z [bench]   - Load block from disk: 27.96ms
    52024-12-23T20:39:07Z [bench]     - Sanity checks: 3.86ms [20.30s (5.18ms/blk)]
    62024-12-23T20:39:07Z [bench]     - Fork checks: 0.13ms [1.19s (0.30ms/blk)]
    

    Configuration is just -txindex=1. Have also added -dbcache=100 and 4 GB swap which did not fix the issue.

    Logs only report Killed but there is OOM details in dmesg

    Seems to be the same issue as https://github.com/bitcoin/bitcoin/issues/30001

  2. maflcko commented at 1:27 pm on December 24, 2024: member

    Seems to be the same issue as #30001

    I don’t think they are the same. In fact, there are many differences: You are using bitcoind with -txindex and default settings, the other person was using the GUI with a large dbcache and external HDD for the blocksdir.

    It looks similar to #31041, though.

    Given that it is likely not heap (https://github.com/bitcoin/bitcoin/issues/31041#issuecomment-2514072312), it would be good to debug where the usage is coming from. Memory-mapped files seems unlikely as well, but would be good to rule out.

  3. maflcko added the label Resource usage on Dec 24, 2024
  4. maflcko added the label Linux/Unix on Dec 24, 2024
  5. bitcoin deleted a comment on Dec 25, 2024
  6. pinheadmz commented at 7:48 pm on December 26, 2024: member

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    f5855

  7. davidgumberg commented at 8:23 pm on December 27, 2024: contributor

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    [ flamegraph.svg]

    At first glance it seems unusual that so much time (~40% of samples) is spent in the call stack above CScriptCheck::operator() doing allocations.

  8. willcl-ark commented at 9:09 pm on March 20, 2025: member

    I may take a stab at trying to reproduce this to see if it’s still a problem with master today (unless you fancy doing so).

    Just a few questions first:

    Configuration is just -txindex=1

    • Their (basic) 8GB droplets appear to have 160 GiB (SSD), so I presume you also had pruning on?
    • Your logs show at least bench and validation debug levels, did you use -debug=1, or only these (or only these on a subsequent run to debug a crash)?
    • Am I likely to need to sync to ~ height=814565 to see crashes, or did they happen earlier?

    I probably won’t try this on an actual droplet, but rather using a container with swap disabled.

  9. pinheadmz commented at 0:31 am on March 21, 2025: member

    I don’t think you can prune with txindex! I added a 1 TB volume – this was the mainnet server for the BOSS2025 program, specifically the RPC scavenger hunt challenge.

    There were several crashes, I can’t remember the height they started at but it was long before 800000.

    The conf file just had txindex=1 but I must have turned on debug=1 on the command line when the crashes started

  10. willcl-ark commented at 7:09 pm on March 23, 2025: member

    OK well, I’ve been unable to reproduce using docker limited to 8g or ram with swap disabled:

    0$ docker run --memory="8g" --memory-swap="8g" --volume="/tmp/31561:/home/bitcoin/.bitcoin" -it bitcoin/bitcoin:master -txindex=1
    

    Synced just fine to block 868650 where I shut down manually. The docker image is at commit 2db00278ea57

  11. pinheadmz commented at 7:34 pm on March 23, 2025: member
    Ok I’ll try to reproduce and if I cant ill close.
  12. pinheadmz commented at 2:55 pm on March 31, 2025: member

    I was able to reproduce this issue again, it took about a week of IBD

    This is a “basic” digital ocean droplet. General-purpose dedicated CPU, 4 vCPUs

    08 GB Memory / 160 GB Disk + 1000 GB / NYC1 - Ubuntu 24.10 x64
    

    (NO swap…)

    Bitcoin built from this commit in master: c12a2528ce665192af2ebb4ab4e03e229f1b73ca

    Build options: cmake -B build -DWITH_BDB=OFF -DWITH_ZMQ=ON -DCMAKE_BUILD_TYPE=Debug

    Runtime options: bitcoind -txindex=1 -debug

    Last messages in debug log:

     0root@issue31561:~/bitcoin# tail ~/.bitcoin/debug.log 
     12025-03-30T03:04:59Z [validation] Enqueuing MempoolTransactionsRemovedForBlock: block height=843676 txs removed=0
     22025-03-30T03:04:59Z UpdateTip: new best=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa height=843676 version=0x3e000000 log2_work=94.926527 tx=1005535512 date='2024-05-16T07:53:20Z' progress=0.843985 cache=617.1MiB(4599518txo)
     32025-03-30T03:04:59Z [bench]   - Connect postprocess: 9.06ms [3116.35s (3.69ms/blk)]
     42025-03-30T03:04:59Z [bench] - Connect block: 1927.31ms [298894.62s (354.28ms/blk)]
     52025-03-30T03:04:59Z [validation] Enqueuing BlockConnected: block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
     62025-03-30T03:04:59Z [validation] Enqueuing UpdatedBlockTip: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa fork block hash=0000000000000000000324fa67a17b4ccf51111f7927e3aeacb4744ffd2ec33f (in IBD=true)
     72025-03-30T03:04:59Z [validation] ActiveTipChange: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
     82025-03-30T03:04:59Z [bench]   - Load block from disk: 135.17ms
     92025-03-30T03:04:59Z [bench]     - Sanity checks: 52.73ms [23012.63s (27.28ms/blk)]
    102025-03-30T03:04:59Z [bench]     - Fork checks: 0.22ms [676.05s (0.80ms/blk)]
    

    dmesg:

      0[479802.540867] b-msghand invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
      1[479802.540927] CPU: 2 UID: 0 PID: 31353 Comm: b-msghand Not tainted 6.11.0-9-generic [#9](/bitcoin-bitcoin/9/)-Ubuntu
      2[479802.540934] Hardware name: DigitalOcean Droplet/Droplet, BIOS 20171212 12/12/2017
      3[479802.540944] Call Trace:
      4[479802.540950]  <TASK>
      5[479802.540966]  show_stack+0x49/0x60
      6[479802.541019]  dump_stack_lvl+0x5f/0x90
      7[479802.541045]  dump_stack+0x10/0x18
      8[479802.541049]  dump_header+0x46/0x1a6
      9[479802.541066]  oom_kill_process.cold+0xb/0xaf
     10[479802.541070]  out_of_memory+0x100/0x2b0
     11[479802.541108]  __alloc_pages_may_oom+0x10b/0x1d0
     12[479802.541127]  __alloc_pages_slowpath.constprop.0+0x41c/0xa10
     13[479802.541131]  __alloc_pages_noprof+0x336/0x360
     14[479802.541134]  alloc_pages_mpol_noprof+0xd5/0x1e0
     15[479802.541148]  folio_alloc_noprof+0x65/0xf0
     16[479802.541151]  ? filemap_get_entry+0x10b/0x1d0
     17[479802.541159]  filemap_alloc_folio_noprof+0xd9/0xe0
     18[479802.541162]  __filemap_get_folio+0x195/0x2e0
     19[479802.541166]  filemap_fault+0x1bd/0xa00
     20[479802.541169]  ? blk_mq_free_request+0x7c/0xc0
     21[479802.541195]  __do_fault+0x3b/0x130
     22[479802.541205]  do_read_fault+0x11e/0x1d0
     23[479802.541209]  do_fault+0x1ae/0x300
     24[479802.541213]  handle_pte_fault+0x12a/0x1c0
     25[479802.541215]  __handle_mm_fault+0x3d5/0x7a0
     26[479802.541228]  handle_mm_fault+0xef/0x2d0
     27[479802.541230]  do_user_addr_fault+0x2ff/0x7e0
     28[479802.541256]  exc_page_fault+0x85/0x1c0
     29[479802.541264]  asm_exc_page_fault+0x27/0x30
     30[479802.541281] RIP: 0033:0x5d49162a7dd1
     31[479802.541302] Code: 0f 1e fa 55 48 89 e5 48 83 ec 20 48 89 7d e8 64 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8b 45 e8 48 89 45 f0 48 8b 45 f0 <0f> b6 00 0f b6 c0 48 8b 55 f0 48 83 c2 01 0f b6 12 0f b6 d2 c1 e2
     32[479802.541305] RSP: 002b:00007fbc0d5fa320 EFLAGS: 00010246
     33[479802.541313] RAX: 00007fb1ee473497 RBX: 00007fbba0046b60 RCX: 00007fbc0d5fa3d0
     34[479802.541316] RDX: 0000000000001028 RSI: 00007fb1ee47246f RDI: 00007fb1ee473497
     35[479802.541318] RBP: 00007fbc0d5fa340 R08: 00007fbc0d5fa3d0 R09: 00007fbb387b7ed0
     36[479802.541320] R10: 00007fbb5893e590 R11: 0000000000000001 R12: 00005d49162cc228
     37[479802.541322] R13: 000000000000102c R14: 0000000000000002 R15: 00007ffc5f292f40
     38[479802.541326]  </TASK>
     39[479802.541332] Mem-Info:
     40[479802.541338] active_anon:272270 inactive_anon:54885 isolated_anon:0
     41                 active_file:916509 inactive_file:654442 isolated_file:0
     42                 unevictable:7214 dirty:132 writeback:0
     43                 slab_reclaimable:50068 slab_unreclaimable:24165
     44                 mapped:1103163 shmem:1046 pagetables:8729
     45                 sec_pagetables:0 bounce:0
     46                 kernel_misc_reclaimable:0
     47                 free:14309 free_pcp:13194 free_cma:0
     48[479802.541348] Node 0 active_anon:1089080kB inactive_anon:219540kB active_file:3666036kB inactive_file:2617768kB unevictable:28856kB isolated(anon):0kB isolated(file):0kB mapped:4412652kB dirty:528kB writeback:0kB shmem:4184kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:3520kB pagetables:34916kB sec_pagetables:0kB all_unreclaimable? no
     49[479802.541353] Node 0 DMA free:124kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:10320kB inactive_file:3784kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:108kB local_pcp:32kB free_cma:0kB
     50[479802.541360] lowmem_reserve[]: 0 2964 7890 0 0
     51[479802.541368] Node 0 DMA32 free:21600kB boost:0kB min:25344kB low:31680kB high:38016kB reserved_highatomic:0KB active_anon:117608kB inactive_anon:77804kB active_file:1797156kB inactive_file:948536kB unevictable:0kB writepending:492kB present:3129196kB managed:3063660kB mlocked:0kB bounce:0kB free_pcp:34440kB local_pcp:13232kB free_cma:0kB
     52[479802.541375] lowmem_reserve[]: 0 0 4925 0 0
     53[479802.541381] Node 0 Normal free:35512kB boost:0kB min:42108kB low:52632kB high:63156kB reserved_highatomic:0KB active_anon:971472kB inactive_anon:141736kB active_file:1858560kB inactive_file:1665448kB unevictable:28856kB writepending:36kB present:5242880kB managed:5052240kB mlocked:25784kB bounce:0kB free_pcp:18228kB local_pcp:8680kB free_cma:0kB
     54[479802.541393] lowmem_reserve[]: 0 0 0 0 0
     55[479802.541400] Node 0 DMA: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 124kB
     56[479802.541421] Node 0 DMA32: 32*4kB (UME) 54*8kB (UME) 219*16kB (UM) 116*32kB (UME) 18*64kB (UME) 49*128kB (UME) 11*256kB (M) 3*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 21600kB
     57[479802.541439] Node 0 Normal: 0*4kB 1003*8kB (UE) 919*16kB (UME) 401*32kB (UME) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35624kB
     58[479802.541462] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
     59[479802.541464] 1573760 total pagecache pages
     60[479802.541469] 0 pages in swap cache
     61[479802.541471] Free swap  = 0kB
     62[479802.541472] Total swap = 0kB
     63[479802.541473] 2097017 pages RAM
     64[479802.541474] 0 pages HighMem/MovableOnly
     65[479802.541476] 64202 pages reserved
     66[479802.541477] 0 pages hwpoisoned
     67[479802.541478] Tasks state (memory values in pages):
     68[479802.541479] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
     69[479802.541534] [   1056]   101  1056     1910      378      256      122         0    53248        0          -900 dbus-daemon
     70[479802.541539] [   1070]     0  1070     4610      676      512      164         0    77824        0             0 systemd-logind
     71[479802.541542] [   1105]     0  1105     1796       71       32       39         0    53248        0             0 agetty
     72[479802.541545] [   1128]     0  1128    27728     2467     2336      131         0   122880        0             0 unattended-upgr
     73[479802.541548] [   1138]     0  1138     1785       85       32       53         0    49152        0             0 agetty
     74[479802.541551] [   1729]     0  1729     5099     1249      480      769         0    81920        0           100 systemd
     75[479802.541554] [   1730]     0  1730     5043      447      421       26         0    77824        0           100 (sd-pam)
     76[479802.541558] [   5805]   996  5805    22551     1029      192      837         0    81920        0             0 systemd-timesyn
     77[479802.541561] [   5869]     0  5869     6749     1362      672      690         0    77824        0         -1000 systemd-udevd
     78[479802.541565] [   6272]   991  6272     5326     1345      576      769         0    90112        0             0 systemd-resolve
     79[479802.541569] [  11384]   989 11384    76464     1087      192      895         0   114688        0             0 polkitd
     80[479802.541572] [  11397]     0 11397    97465     1336      480      856         0   139264        0             0 ModemManager
     81[479802.541575] [  11505]     0 11505   117052     1475      512      963         0   167936        0             0 udisksd
     82[479802.541578] [  12703]   102 12703    55022     1048      256      792         0    90112        0             0 rsyslogd
     83[479802.541581] [  13208]     0 13208   307805      542      288      254         0    98304        0          -900 droplet-agent
     84[479802.541584] [  21411]     0 21411     1831      668       32      636         0    57344        0             0 cron
     85[479802.541587] [  21416]     0 21416    55449     6329     4608     1721         0   106496        0         -1000 multipathd
     86[479802.541590] [  21417]     0 21417    31791     1249      224     1025         0   282624        0          -250 systemd-journal
     87[479802.541593] [  21430]   998 21430     4570     1082      288      794         0    77824        0             0 systemd-network
     88[479802.541596] [  24725]     0 24725     4119      727      192      535         0    69632        0             0 wpa_supplicant
     89[479802.541599] [  24843]   109 24843     1528      757       64      693         0    53248        0             0 avahi-daemon
     90[479802.541602] [  24844]   109 24844     1495      287       70      217         0    49152        0             0 avahi-daemon
     91[479802.541605] [  27590]     0 27590     3330     1091      288      803         0    65536        0         -1000 sshd
     92[479802.541608] [  31274]     0 31274     5267     1616      781      835         0    81920        0             0 tmux: server
     93[479802.541614] [  31275]     0 31275     2460     1138      576      562         0    61440        0             0 bash
     94[479802.541617] [  31276]     0 31276     1723      838       64      774         0    57344        0           100 dbus-daemon
     95[479802.541620] [  31287]     0 31287     2460     1160      576      584         0    49152        0             0 bash
     96[479802.541623] [  31324]     0 31324 10892833  1409263   309527  1099736         0 31866880        0             0 bitcoind
     97[479802.541626] [  31357]     0 31357     2477     1277      521      756         0    65536        0             0 htop
     98[479802.541630] [  31432]     0 31432   144557     2342     1360      982         0   475136        0             0 fwupd
     99[479802.541633] [  31457]     0 31457    20561      546       78      468         0    61440        0             0 gpg-agent
    100[479802.541635] [  31475]     0 31475     2460     1114      576      538         0    61440        0             0 bash
    101[479802.541639] [ 137873]     0 137873     1753      675      128      547         0    57344        0             0 watch
    102[479802.541642] [ 190345]     0 190345     4538     1725      977      748         0    77824        0             0 sshd
    103[479802.541645] [ 190468]     0 190468     2453     1130      576      554         0    61440        0             0 bash
    104[479802.541647] [ 190481]     0 190481     2363      931      128      803         0    61440        0             0 tmux: client
    105[479802.541652] [ 231918]     0 231918     3330     2023      256     1767         0    69632        0             0 sshd
    106[479802.541654] [ 231925]     0 231925     3330     2050      256     1794         0    65536        0             0 sshd
    107[479802.541657] [ 231926]   105 231926     3330     1422      259     1163         0    65536        0             0 sshd
    108[479802.541659] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/tmux-spawn-b59c78ce-1ac5-4001-89ba-6d103ddba70e.scope,task=bitcoind,pid=31324,uid=0
    109[479802.541774] Out of memory: Killed process 31324 (bitcoind) total-vm:43571332kB, anon-rss:1238108kB, file-rss:4398944kB, shmem-rss:0kB, UID:0 pgtables:31120kB oom_score_adj:0
    110[594899.096127] workqueue: vmstat_shepherd hogged CPU for >13333us 67 times, consider switching to WQ_UNBOUND
    
  13. maflcko commented at 7:54 am on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

  14. willcl-ark commented at 9:15 am on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

    Indeed. FWIW my first test using a container used RelWithDebInfo. I will take another go using Debug

  15. pinheadmz renamed this:
    b-msghand invoked oom-killer: Master (v28.99) crashing during IBD
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025
  16. pinheadmz renamed this:
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    on Apr 1, 2025
  17. pinheadmz renamed this:
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-02 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me