b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD #31561

issue pinheadmz openend this issue on December 24, 2024
  1. pinheadmz commented at 12:57 pm on December 24, 2024: member

    Running master (specifically 318b2a2f90dfd8ae20beca58e55e6d240a7f3d27 which is HEAD of #31415) on Ubuntu 24.10 on a digital ocean droplet with 8GB RAM. Crashing regularly during IBD. Logs always end like this:

    0
    12024-12-23T20:39:07Z [validation] Enqueuing BlockConnected: block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    22024-12-23T20:39:07Z [validation] Enqueuing UpdatedBlockTip: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 fork block hash=000000000000000000023c4f2b1f2457d2f46cd7b6cb27f87ab4788cd54adde1 (in IBD=true)
    32024-12-23T20:39:07Z [validation] ActiveTipChange: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    42024-12-23T20:39:07Z [bench]   - Load block from disk: 27.96ms
    52024-12-23T20:39:07Z [bench]     - Sanity checks: 3.86ms [20.30s (5.18ms/blk)]
    62024-12-23T20:39:07Z [bench]     - Fork checks: 0.13ms [1.19s (0.30ms/blk)]
    

    Configuration is just -txindex=1. Have also added -dbcache=100 and 4 GB swap which did not fix the issue.

    Logs only report Killed but there is OOM details in dmesg

    Seems to be the same issue as https://github.com/bitcoin/bitcoin/issues/30001

  2. maflcko commented at 1:27 pm on December 24, 2024: member

    Seems to be the same issue as #30001

    I don’t think they are the same. In fact, there are many differences: You are using bitcoind with -txindex and default settings, the other person was using the GUI with a large dbcache and external HDD for the blocksdir.

    It looks similar to #31041, though.

    Given that it is likely not heap (https://github.com/bitcoin/bitcoin/issues/31041#issuecomment-2514072312), it would be good to debug where the usage is coming from. Memory-mapped files seems unlikely as well, but would be good to rule out.

  3. maflcko added the label Resource usage on Dec 24, 2024
  4. maflcko added the label Linux/Unix on Dec 24, 2024
  5. bitcoin deleted a comment on Dec 25, 2024
  6. pinheadmz commented at 7:48 pm on December 26, 2024: member

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    f5855

  7. davidgumberg commented at 8:23 pm on December 27, 2024: contributor

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    [ flamegraph.svg]

    At first glance it seems unusual that so much time (~40% of samples) is spent in the call stack above CScriptCheck::operator() doing allocations.

  8. willcl-ark commented at 9:09 pm on March 20, 2025: member

    I may take a stab at trying to reproduce this to see if it’s still a problem with master today (unless you fancy doing so).

    Just a few questions first:

    Configuration is just -txindex=1

    • Their (basic) 8GB droplets appear to have 160 GiB (SSD), so I presume you also had pruning on?
    • Your logs show at least bench and validation debug levels, did you use -debug=1, or only these (or only these on a subsequent run to debug a crash)?
    • Am I likely to need to sync to ~ height=814565 to see crashes, or did they happen earlier?

    I probably won’t try this on an actual droplet, but rather using a container with swap disabled.

  9. pinheadmz commented at 0:31 am on March 21, 2025: member

    I don’t think you can prune with txindex! I added a 1 TB volume – this was the mainnet server for the BOSS2025 program, specifically the RPC scavenger hunt challenge.

    There were several crashes, I can’t remember the height they started at but it was long before 800000.

    The conf file just had txindex=1 but I must have turned on debug=1 on the command line when the crashes started

  10. willcl-ark commented at 7:09 pm on March 23, 2025: member

    OK well, I’ve been unable to reproduce using docker limited to 8g or ram with swap disabled:

    0$ docker run --memory="8g" --memory-swap="8g" --volume="/tmp/31561:/home/bitcoin/.bitcoin" -it bitcoin/bitcoin:master -txindex=1
    

    Synced just fine to block 868650 where I shut down manually. The docker image is at commit 2db00278ea57

  11. pinheadmz commented at 7:34 pm on March 23, 2025: member
    Ok I’ll try to reproduce and if I cant ill close.
  12. pinheadmz commented at 2:55 pm on March 31, 2025: member

    I was able to reproduce this issue again, it took about a week of IBD

    This is a “basic” digital ocean droplet. General-purpose dedicated CPU, 4 vCPUs

    08 GB Memory / 160 GB Disk + 1000 GB / NYC1 - Ubuntu 24.10 x64
    

    (NO swap…)

    Bitcoin built from this commit in master: c12a2528ce665192af2ebb4ab4e03e229f1b73ca

    Build options: cmake -B build -DWITH_BDB=OFF -DWITH_ZMQ=ON -DCMAKE_BUILD_TYPE=Debug

    Runtime options: bitcoind -txindex=1 -debug

    Last messages in debug log:

     0root@issue31561:~/bitcoin# tail ~/.bitcoin/debug.log 
     12025-03-30T03:04:59Z [validation] Enqueuing MempoolTransactionsRemovedForBlock: block height=843676 txs removed=0
     22025-03-30T03:04:59Z UpdateTip: new best=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa height=843676 version=0x3e000000 log2_work=94.926527 tx=1005535512 date='2024-05-16T07:53:20Z' progress=0.843985 cache=617.1MiB(4599518txo)
     32025-03-30T03:04:59Z [bench]   - Connect postprocess: 9.06ms [3116.35s (3.69ms/blk)]
     42025-03-30T03:04:59Z [bench] - Connect block: 1927.31ms [298894.62s (354.28ms/blk)]
     52025-03-30T03:04:59Z [validation] Enqueuing BlockConnected: block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
     62025-03-30T03:04:59Z [validation] Enqueuing UpdatedBlockTip: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa fork block hash=0000000000000000000324fa67a17b4ccf51111f7927e3aeacb4744ffd2ec33f (in IBD=true)
     72025-03-30T03:04:59Z [validation] ActiveTipChange: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
     82025-03-30T03:04:59Z [bench]   - Load block from disk: 135.17ms
     92025-03-30T03:04:59Z [bench]     - Sanity checks: 52.73ms [23012.63s (27.28ms/blk)]
    102025-03-30T03:04:59Z [bench]     - Fork checks: 0.22ms [676.05s (0.80ms/blk)]
    

    dmesg:

      0[479802.540867] b-msghand invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
      1[479802.540927] CPU: 2 UID: 0 PID: 31353 Comm: b-msghand Not tainted 6.11.0-9-generic [#9](/bitcoin-bitcoin/9/)-Ubuntu
      2[479802.540934] Hardware name: DigitalOcean Droplet/Droplet, BIOS 20171212 12/12/2017
      3[479802.540944] Call Trace:
      4[479802.540950]  <TASK>
      5[479802.540966]  show_stack+0x49/0x60
      6[479802.541019]  dump_stack_lvl+0x5f/0x90
      7[479802.541045]  dump_stack+0x10/0x18
      8[479802.541049]  dump_header+0x46/0x1a6
      9[479802.541066]  oom_kill_process.cold+0xb/0xaf
     10[479802.541070]  out_of_memory+0x100/0x2b0
     11[479802.541108]  __alloc_pages_may_oom+0x10b/0x1d0
     12[479802.541127]  __alloc_pages_slowpath.constprop.0+0x41c/0xa10
     13[479802.541131]  __alloc_pages_noprof+0x336/0x360
     14[479802.541134]  alloc_pages_mpol_noprof+0xd5/0x1e0
     15[479802.541148]  folio_alloc_noprof+0x65/0xf0
     16[479802.541151]  ? filemap_get_entry+0x10b/0x1d0
     17[479802.541159]  filemap_alloc_folio_noprof+0xd9/0xe0
     18[479802.541162]  __filemap_get_folio+0x195/0x2e0
     19[479802.541166]  filemap_fault+0x1bd/0xa00
     20[479802.541169]  ? blk_mq_free_request+0x7c/0xc0
     21[479802.541195]  __do_fault+0x3b/0x130
     22[479802.541205]  do_read_fault+0x11e/0x1d0
     23[479802.541209]  do_fault+0x1ae/0x300
     24[479802.541213]  handle_pte_fault+0x12a/0x1c0
     25[479802.541215]  __handle_mm_fault+0x3d5/0x7a0
     26[479802.541228]  handle_mm_fault+0xef/0x2d0
     27[479802.541230]  do_user_addr_fault+0x2ff/0x7e0
     28[479802.541256]  exc_page_fault+0x85/0x1c0
     29[479802.541264]  asm_exc_page_fault+0x27/0x30
     30[479802.541281] RIP: 0033:0x5d49162a7dd1
     31[479802.541302] Code: 0f 1e fa 55 48 89 e5 48 83 ec 20 48 89 7d e8 64 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8b 45 e8 48 89 45 f0 48 8b 45 f0 <0f> b6 00 0f b6 c0 48 8b 55 f0 48 83 c2 01 0f b6 12 0f b6 d2 c1 e2
     32[479802.541305] RSP: 002b:00007fbc0d5fa320 EFLAGS: 00010246
     33[479802.541313] RAX: 00007fb1ee473497 RBX: 00007fbba0046b60 RCX: 00007fbc0d5fa3d0
     34[479802.541316] RDX: 0000000000001028 RSI: 00007fb1ee47246f RDI: 00007fb1ee473497
     35[479802.541318] RBP: 00007fbc0d5fa340 R08: 00007fbc0d5fa3d0 R09: 00007fbb387b7ed0
     36[479802.541320] R10: 00007fbb5893e590 R11: 0000000000000001 R12: 00005d49162cc228
     37[479802.541322] R13: 000000000000102c R14: 0000000000000002 R15: 00007ffc5f292f40
     38[479802.541326]  </TASK>
     39[479802.541332] Mem-Info:
     40[479802.541338] active_anon:272270 inactive_anon:54885 isolated_anon:0
     41                 active_file:916509 inactive_file:654442 isolated_file:0
     42                 unevictable:7214 dirty:132 writeback:0
     43                 slab_reclaimable:50068 slab_unreclaimable:24165
     44                 mapped:1103163 shmem:1046 pagetables:8729
     45                 sec_pagetables:0 bounce:0
     46                 kernel_misc_reclaimable:0
     47                 free:14309 free_pcp:13194 free_cma:0
     48[479802.541348] Node 0 active_anon:1089080kB inactive_anon:219540kB active_file:3666036kB inactive_file:2617768kB unevictable:28856kB isolated(anon):0kB isolated(file):0kB mapped:4412652kB dirty:528kB writeback:0kB shmem:4184kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:3520kB pagetables:34916kB sec_pagetables:0kB all_unreclaimable? no
     49[479802.541353] Node 0 DMA free:124kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:10320kB inactive_file:3784kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:108kB local_pcp:32kB free_cma:0kB
     50[479802.541360] lowmem_reserve[]: 0 2964 7890 0 0
     51[479802.541368] Node 0 DMA32 free:21600kB boost:0kB min:25344kB low:31680kB high:38016kB reserved_highatomic:0KB active_anon:117608kB inactive_anon:77804kB active_file:1797156kB inactive_file:948536kB unevictable:0kB writepending:492kB present:3129196kB managed:3063660kB mlocked:0kB bounce:0kB free_pcp:34440kB local_pcp:13232kB free_cma:0kB
     52[479802.541375] lowmem_reserve[]: 0 0 4925 0 0
     53[479802.541381] Node 0 Normal free:35512kB boost:0kB min:42108kB low:52632kB high:63156kB reserved_highatomic:0KB active_anon:971472kB inactive_anon:141736kB active_file:1858560kB inactive_file:1665448kB unevictable:28856kB writepending:36kB present:5242880kB managed:5052240kB mlocked:25784kB bounce:0kB free_pcp:18228kB local_pcp:8680kB free_cma:0kB
     54[479802.541393] lowmem_reserve[]: 0 0 0 0 0
     55[479802.541400] Node 0 DMA: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 124kB
     56[479802.541421] Node 0 DMA32: 32*4kB (UME) 54*8kB (UME) 219*16kB (UM) 116*32kB (UME) 18*64kB (UME) 49*128kB (UME) 11*256kB (M) 3*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 21600kB
     57[479802.541439] Node 0 Normal: 0*4kB 1003*8kB (UE) 919*16kB (UME) 401*32kB (UME) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35624kB
     58[479802.541462] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
     59[479802.541464] 1573760 total pagecache pages
     60[479802.541469] 0 pages in swap cache
     61[479802.541471] Free swap  = 0kB
     62[479802.541472] Total swap = 0kB
     63[479802.541473] 2097017 pages RAM
     64[479802.541474] 0 pages HighMem/MovableOnly
     65[479802.541476] 64202 pages reserved
     66[479802.541477] 0 pages hwpoisoned
     67[479802.541478] Tasks state (memory values in pages):
     68[479802.541479] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
     69[479802.541534] [   1056]   101  1056     1910      378      256      122         0    53248        0          -900 dbus-daemon
     70[479802.541539] [   1070]     0  1070     4610      676      512      164         0    77824        0             0 systemd-logind
     71[479802.541542] [   1105]     0  1105     1796       71       32       39         0    53248        0             0 agetty
     72[479802.541545] [   1128]     0  1128    27728     2467     2336      131         0   122880        0             0 unattended-upgr
     73[479802.541548] [   1138]     0  1138     1785       85       32       53         0    49152        0             0 agetty
     74[479802.541551] [   1729]     0  1729     5099     1249      480      769         0    81920        0           100 systemd
     75[479802.541554] [   1730]     0  1730     5043      447      421       26         0    77824        0           100 (sd-pam)
     76[479802.541558] [   5805]   996  5805    22551     1029      192      837         0    81920        0             0 systemd-timesyn
     77[479802.541561] [   5869]     0  5869     6749     1362      672      690         0    77824        0         -1000 systemd-udevd
     78[479802.541565] [   6272]   991  6272     5326     1345      576      769         0    90112        0             0 systemd-resolve
     79[479802.541569] [  11384]   989 11384    76464     1087      192      895         0   114688        0             0 polkitd
     80[479802.541572] [  11397]     0 11397    97465     1336      480      856         0   139264        0             0 ModemManager
     81[479802.541575] [  11505]     0 11505   117052     1475      512      963         0   167936        0             0 udisksd
     82[479802.541578] [  12703]   102 12703    55022     1048      256      792         0    90112        0             0 rsyslogd
     83[479802.541581] [  13208]     0 13208   307805      542      288      254         0    98304        0          -900 droplet-agent
     84[479802.541584] [  21411]     0 21411     1831      668       32      636         0    57344        0             0 cron
     85[479802.541587] [  21416]     0 21416    55449     6329     4608     1721         0   106496        0         -1000 multipathd
     86[479802.541590] [  21417]     0 21417    31791     1249      224     1025         0   282624        0          -250 systemd-journal
     87[479802.541593] [  21430]   998 21430     4570     1082      288      794         0    77824        0             0 systemd-network
     88[479802.541596] [  24725]     0 24725     4119      727      192      535         0    69632        0             0 wpa_supplicant
     89[479802.541599] [  24843]   109 24843     1528      757       64      693         0    53248        0             0 avahi-daemon
     90[479802.541602] [  24844]   109 24844     1495      287       70      217         0    49152        0             0 avahi-daemon
     91[479802.541605] [  27590]     0 27590     3330     1091      288      803         0    65536        0         -1000 sshd
     92[479802.541608] [  31274]     0 31274     5267     1616      781      835         0    81920        0             0 tmux: server
     93[479802.541614] [  31275]     0 31275     2460     1138      576      562         0    61440        0             0 bash
     94[479802.541617] [  31276]     0 31276     1723      838       64      774         0    57344        0           100 dbus-daemon
     95[479802.541620] [  31287]     0 31287     2460     1160      576      584         0    49152        0             0 bash
     96[479802.541623] [  31324]     0 31324 10892833  1409263   309527  1099736         0 31866880        0             0 bitcoind
     97[479802.541626] [  31357]     0 31357     2477     1277      521      756         0    65536        0             0 htop
     98[479802.541630] [  31432]     0 31432   144557     2342     1360      982         0   475136        0             0 fwupd
     99[479802.541633] [  31457]     0 31457    20561      546       78      468         0    61440        0             0 gpg-agent
    100[479802.541635] [  31475]     0 31475     2460     1114      576      538         0    61440        0             0 bash
    101[479802.541639] [ 137873]     0 137873     1753      675      128      547         0    57344        0             0 watch
    102[479802.541642] [ 190345]     0 190345     4538     1725      977      748         0    77824        0             0 sshd
    103[479802.541645] [ 190468]     0 190468     2453     1130      576      554         0    61440        0             0 bash
    104[479802.541647] [ 190481]     0 190481     2363      931      128      803         0    61440        0             0 tmux: client
    105[479802.541652] [ 231918]     0 231918     3330     2023      256     1767         0    69632        0             0 sshd
    106[479802.541654] [ 231925]     0 231925     3330     2050      256     1794         0    65536        0             0 sshd
    107[479802.541657] [ 231926]   105 231926     3330     1422      259     1163         0    65536        0             0 sshd
    108[479802.541659] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/tmux-spawn-b59c78ce-1ac5-4001-89ba-6d103ddba70e.scope,task=bitcoind,pid=31324,uid=0
    109[479802.541774] Out of memory: Killed process 31324 (bitcoind) total-vm:43571332kB, anon-rss:1238108kB, file-rss:4398944kB, shmem-rss:0kB, UID:0 pgtables:31120kB oom_score_adj:0
    110[594899.096127] workqueue: vmstat_shepherd hogged CPU for >13333us 67 times, consider switching to WQ_UNBOUND
    
  13. maflcko commented at 7:54 am on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

  14. willcl-ark commented at 9:15 am on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

    Indeed. FWIW my first test using a container used RelWithDebInfo. I will take another go using Debug

  15. pinheadmz renamed this:
    b-msghand invoked oom-killer: Master (v28.99) crashing during IBD
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025
  16. pinheadmz renamed this:
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    on Apr 1, 2025
  17. pinheadmz renamed this:
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025
  18. Crypt-iQ commented at 5:38 pm on June 19, 2025: contributor

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it: [ flamegraph.svg]

    At first glance it seems unusual that so much time (~40% of samples) is spent in the call stack above CScriptCheck::operator() doing allocations.

    Yup, this seems a bit weird. Though maybe a lot of time is spent in script checks anyways?

    Because the bench logs are on for validation, we can see that control hits this log line but never reaches the post-connect log line. This is a complete guess, but the CScriptCheck sticks out and I think the OOM could be related to the Add call here as there is a CScriptCheck for each tx input. I’m not sure why the OOM would occur on different blocks, but maybe it has something to do with the worker threads’ utilization?

  19. pinheadmz commented at 4:48 pm on June 26, 2025: member
    I’m going to debug-build and restart an IBD on another 4CPU 8GB VM from recent master commit c43cc48aaa with txindex=1 and I’ll keep an eye on it – if it syncs without OOM I’ll just close this issue

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-30 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me