b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD #31561

issue pinheadmz opened this issue on December 24, 2024
  1. pinheadmz commented at 12:57 PM on December 24, 2024: member

    Running master (specifically 318b2a2f90dfd8ae20beca58e55e6d240a7f3d27 which is HEAD of #31415) on Ubuntu 24.10 on a digital ocean droplet with 8GB RAM. Crashing regularly during IBD. Logs always end like this:

    
    2024-12-23T20:39:07Z [validation] Enqueuing BlockConnected: block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    2024-12-23T20:39:07Z [validation] Enqueuing UpdatedBlockTip: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 fork block hash=000000000000000000023c4f2b1f2457d2f46cd7b6cb27f87ab4788cd54adde1 (in IBD=true)
    2024-12-23T20:39:07Z [validation] ActiveTipChange: new block hash=000000000000000000008364a7396ba08c5ab141d10136e3ad0d9fa37899d7b0 block height=814565
    2024-12-23T20:39:07Z [bench]   - Load block from disk: 27.96ms
    2024-12-23T20:39:07Z [bench]     - Sanity checks: 3.86ms [20.30s (5.18ms/blk)]
    2024-12-23T20:39:07Z [bench]     - Fork checks: 0.13ms [1.19s (0.30ms/blk)]
    
    

    Configuration is just -txindex=1. Have also added -dbcache=100 and 4 GB swap which did not fix the issue.

    Logs only report Killed but there is OOM details in dmesg

    Seems to be the same issue as https://github.com/bitcoin/bitcoin/issues/30001

  2. maflcko commented at 1:27 PM on December 24, 2024: member

    Seems to be the same issue as #30001

    I don't think they are the same. In fact, there are many differences: You are using bitcoind with -txindex and default settings, the other person was using the GUI with a large dbcache and external HDD for the blocksdir.

    It looks similar to #31041, though.

    Given that it is likely not heap (https://github.com/bitcoin/bitcoin/issues/31041#issuecomment-2514072312), it would be good to debug where the usage is coming from. Memory-mapped files seems unlikely as well, but would be good to rule out.

  3. maflcko added the label Resource usage on Dec 24, 2024
  4. maflcko added the label Linux/Unix on Dec 24, 2024
  5. bitcoin deleted a comment on Dec 25, 2024
  6. pinheadmz commented at 7:48 PM on December 26, 2024: member

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    f5855

  7. davidgumberg commented at 8:23 PM on December 27, 2024: contributor

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it:

    [ flamegraph.svg]

    At first glance it seems unusual that so much time (~40% of samples) is spent in the call stack above CScriptCheck::operator() doing allocations.

  8. willcl-ark commented at 9:09 PM on March 20, 2025: member

    I may take a stab at trying to reproduce this to see if it's still a problem with master today (unless you fancy doing so).

    Just a few questions first:

    Configuration is just -txindex=1

    • Their (basic) 8GB droplets appear to have 160 GiB (SSD), so I presume you also had pruning on?
    • Your logs show at least bench and validation debug levels, did you use -debug=1, or only these (or only these on a subsequent run to debug a crash)?
    • Am I likely to need to sync to ~ height=814565 to see crashes, or did they happen earlier?

    I probably won't try this on an actual droplet, but rather using a container with swap disabled.

  9. pinheadmz commented at 12:31 AM on March 21, 2025: member

    I don't think you can prune with txindex! I added a 1 TB volume -- this was the mainnet server for the BOSS2025 program, specifically the RPC scavenger hunt challenge.

    There were several crashes, I can't remember the height they started at but it was long before 800000.

    The conf file just had txindex=1 but I must have turned on debug=1 on the command line when the crashes started

  10. willcl-ark commented at 7:09 PM on March 23, 2025: member

    OK well, I've been unable to reproduce using docker limited to 8g or ram with swap disabled:

    $ docker run --memory="8g" --memory-swap="8g" --volume="/tmp/31561:/home/bitcoin/.bitcoin" -it bitcoin/bitcoin:master -txindex=1
    

    Synced just fine to block 868650 where I shut down manually. The docker image is at commit 2db00278ea57

  11. pinheadmz commented at 7:34 PM on March 23, 2025: member

    Ok I'll try to reproduce and if I cant ill close.

  12. pinheadmz commented at 2:55 PM on March 31, 2025: member

    I was able to reproduce this issue again, it took about a week of IBD

    This is a "basic" digital ocean droplet. General-purpose dedicated CPU, 4 vCPUs

    8 GB Memory / 160 GB Disk + 1000 GB / NYC1 - Ubuntu 24.10 x64
    

    (NO swap...)

    Bitcoin built from this commit in master: c12a2528ce665192af2ebb4ab4e03e229f1b73ca

    Build options: cmake -B build -DWITH_BDB=OFF -DWITH_ZMQ=ON -DCMAKE_BUILD_TYPE=Debug

    Runtime options: bitcoind -txindex=1 -debug

    Last messages in debug log:

    root@issue31561:~/bitcoin# tail ~/.bitcoin/debug.log 
    2025-03-30T03:04:59Z [validation] Enqueuing MempoolTransactionsRemovedForBlock: block height=843676 txs removed=0
    2025-03-30T03:04:59Z UpdateTip: new best=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa height=843676 version=0x3e000000 log2_work=94.926527 tx=1005535512 date='2024-05-16T07:53:20Z' progress=0.843985 cache=617.1MiB(4599518txo)
    2025-03-30T03:04:59Z [bench]   - Connect postprocess: 9.06ms [3116.35s (3.69ms/blk)]
    2025-03-30T03:04:59Z [bench] - Connect block: 1927.31ms [298894.62s (354.28ms/blk)]
    2025-03-30T03:04:59Z [validation] Enqueuing BlockConnected: block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
    2025-03-30T03:04:59Z [validation] Enqueuing UpdatedBlockTip: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa fork block hash=0000000000000000000324fa67a17b4ccf51111f7927e3aeacb4744ffd2ec33f (in IBD=true)
    2025-03-30T03:04:59Z [validation] ActiveTipChange: new block hash=00000000000000000002561ea5db7b3475d8973223eb72a420194c89b1192aaa block height=843676
    2025-03-30T03:04:59Z [bench]   - Load block from disk: 135.17ms
    2025-03-30T03:04:59Z [bench]     - Sanity checks: 52.73ms [23012.63s (27.28ms/blk)]
    2025-03-30T03:04:59Z [bench]     - Fork checks: 0.22ms [676.05s (0.80ms/blk)]
    

    dmesg:

    [479802.540867] b-msghand invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
    [479802.540927] CPU: 2 UID: 0 PID: 31353 Comm: b-msghand Not tainted 6.11.0-9-generic [#9](/bitcoin-bitcoin/9/)-Ubuntu
    [479802.540934] Hardware name: DigitalOcean Droplet/Droplet, BIOS 20171212 12/12/2017
    [479802.540944] Call Trace:
    [479802.540950]  <TASK>
    [479802.540966]  show_stack+0x49/0x60
    [479802.541019]  dump_stack_lvl+0x5f/0x90
    [479802.541045]  dump_stack+0x10/0x18
    [479802.541049]  dump_header+0x46/0x1a6
    [479802.541066]  oom_kill_process.cold+0xb/0xaf
    [479802.541070]  out_of_memory+0x100/0x2b0
    [479802.541108]  __alloc_pages_may_oom+0x10b/0x1d0
    [479802.541127]  __alloc_pages_slowpath.constprop.0+0x41c/0xa10
    [479802.541131]  __alloc_pages_noprof+0x336/0x360
    [479802.541134]  alloc_pages_mpol_noprof+0xd5/0x1e0
    [479802.541148]  folio_alloc_noprof+0x65/0xf0
    [479802.541151]  ? filemap_get_entry+0x10b/0x1d0
    [479802.541159]  filemap_alloc_folio_noprof+0xd9/0xe0
    [479802.541162]  __filemap_get_folio+0x195/0x2e0
    [479802.541166]  filemap_fault+0x1bd/0xa00
    [479802.541169]  ? blk_mq_free_request+0x7c/0xc0
    [479802.541195]  __do_fault+0x3b/0x130
    [479802.541205]  do_read_fault+0x11e/0x1d0
    [479802.541209]  do_fault+0x1ae/0x300
    [479802.541213]  handle_pte_fault+0x12a/0x1c0
    [479802.541215]  __handle_mm_fault+0x3d5/0x7a0
    [479802.541228]  handle_mm_fault+0xef/0x2d0
    [479802.541230]  do_user_addr_fault+0x2ff/0x7e0
    [479802.541256]  exc_page_fault+0x85/0x1c0
    [479802.541264]  asm_exc_page_fault+0x27/0x30
    [479802.541281] RIP: 0033:0x5d49162a7dd1
    [479802.541302] Code: 0f 1e fa 55 48 89 e5 48 83 ec 20 48 89 7d e8 64 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8b 45 e8 48 89 45 f0 48 8b 45 f0 <0f> b6 00 0f b6 c0 48 8b 55 f0 48 83 c2 01 0f b6 12 0f b6 d2 c1 e2
    [479802.541305] RSP: 002b:00007fbc0d5fa320 EFLAGS: 00010246
    [479802.541313] RAX: 00007fb1ee473497 RBX: 00007fbba0046b60 RCX: 00007fbc0d5fa3d0
    [479802.541316] RDX: 0000000000001028 RSI: 00007fb1ee47246f RDI: 00007fb1ee473497
    [479802.541318] RBP: 00007fbc0d5fa340 R08: 00007fbc0d5fa3d0 R09: 00007fbb387b7ed0
    [479802.541320] R10: 00007fbb5893e590 R11: 0000000000000001 R12: 00005d49162cc228
    [479802.541322] R13: 000000000000102c R14: 0000000000000002 R15: 00007ffc5f292f40
    [479802.541326]  </TASK>
    [479802.541332] Mem-Info:
    [479802.541338] active_anon:272270 inactive_anon:54885 isolated_anon:0
                     active_file:916509 inactive_file:654442 isolated_file:0
                     unevictable:7214 dirty:132 writeback:0
                     slab_reclaimable:50068 slab_unreclaimable:24165
                     mapped:1103163 shmem:1046 pagetables:8729
                     sec_pagetables:0 bounce:0
                     kernel_misc_reclaimable:0
                     free:14309 free_pcp:13194 free_cma:0
    [479802.541348] Node 0 active_anon:1089080kB inactive_anon:219540kB active_file:3666036kB inactive_file:2617768kB unevictable:28856kB isolated(anon):0kB isolated(file):0kB mapped:4412652kB dirty:528kB writeback:0kB shmem:4184kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:3520kB pagetables:34916kB sec_pagetables:0kB all_unreclaimable? no
    [479802.541353] Node 0 DMA free:124kB boost:0kB min:128kB low:160kB high:192kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:10320kB inactive_file:3784kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:108kB local_pcp:32kB free_cma:0kB
    [479802.541360] lowmem_reserve[]: 0 2964 7890 0 0
    [479802.541368] Node 0 DMA32 free:21600kB boost:0kB min:25344kB low:31680kB high:38016kB reserved_highatomic:0KB active_anon:117608kB inactive_anon:77804kB active_file:1797156kB inactive_file:948536kB unevictable:0kB writepending:492kB present:3129196kB managed:3063660kB mlocked:0kB bounce:0kB free_pcp:34440kB local_pcp:13232kB free_cma:0kB
    [479802.541375] lowmem_reserve[]: 0 0 4925 0 0
    [479802.541381] Node 0 Normal free:35512kB boost:0kB min:42108kB low:52632kB high:63156kB reserved_highatomic:0KB active_anon:971472kB inactive_anon:141736kB active_file:1858560kB inactive_file:1665448kB unevictable:28856kB writepending:36kB present:5242880kB managed:5052240kB mlocked:25784kB bounce:0kB free_pcp:18228kB local_pcp:8680kB free_cma:0kB
    [479802.541393] lowmem_reserve[]: 0 0 0 0 0
    [479802.541400] Node 0 DMA: 1*4kB (M) 1*8kB (M) 1*16kB (M) 1*32kB (M) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 124kB
    [479802.541421] Node 0 DMA32: 32*4kB (UME) 54*8kB (UME) 219*16kB (UM) 116*32kB (UME) 18*64kB (UME) 49*128kB (UME) 11*256kB (M) 3*512kB (M) 0*1024kB 1*2048kB (M) 0*4096kB = 21600kB
    [479802.541439] Node 0 Normal: 0*4kB 1003*8kB (UE) 919*16kB (UME) 401*32kB (UME) 1*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 35624kB
    [479802.541462] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    [479802.541464] 1573760 total pagecache pages
    [479802.541469] 0 pages in swap cache
    [479802.541471] Free swap  = 0kB
    [479802.541472] Total swap = 0kB
    [479802.541473] 2097017 pages RAM
    [479802.541474] 0 pages HighMem/MovableOnly
    [479802.541476] 64202 pages reserved
    [479802.541477] 0 pages hwpoisoned
    [479802.541478] Tasks state (memory values in pages):
    [479802.541479] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
    [479802.541534] [   1056]   101  1056     1910      378      256      122         0    53248        0          -900 dbus-daemon
    [479802.541539] [   1070]     0  1070     4610      676      512      164         0    77824        0             0 systemd-logind
    [479802.541542] [   1105]     0  1105     1796       71       32       39         0    53248        0             0 agetty
    [479802.541545] [   1128]     0  1128    27728     2467     2336      131         0   122880        0             0 unattended-upgr
    [479802.541548] [   1138]     0  1138     1785       85       32       53         0    49152        0             0 agetty
    [479802.541551] [   1729]     0  1729     5099     1249      480      769         0    81920        0           100 systemd
    [479802.541554] [   1730]     0  1730     5043      447      421       26         0    77824        0           100 (sd-pam)
    [479802.541558] [   5805]   996  5805    22551     1029      192      837         0    81920        0             0 systemd-timesyn
    [479802.541561] [   5869]     0  5869     6749     1362      672      690         0    77824        0         -1000 systemd-udevd
    [479802.541565] [   6272]   991  6272     5326     1345      576      769         0    90112        0             0 systemd-resolve
    [479802.541569] [  11384]   989 11384    76464     1087      192      895         0   114688        0             0 polkitd
    [479802.541572] [  11397]     0 11397    97465     1336      480      856         0   139264        0             0 ModemManager
    [479802.541575] [  11505]     0 11505   117052     1475      512      963         0   167936        0             0 udisksd
    [479802.541578] [  12703]   102 12703    55022     1048      256      792         0    90112        0             0 rsyslogd
    [479802.541581] [  13208]     0 13208   307805      542      288      254         0    98304        0          -900 droplet-agent
    [479802.541584] [  21411]     0 21411     1831      668       32      636         0    57344        0             0 cron
    [479802.541587] [  21416]     0 21416    55449     6329     4608     1721         0   106496        0         -1000 multipathd
    [479802.541590] [  21417]     0 21417    31791     1249      224     1025         0   282624        0          -250 systemd-journal
    [479802.541593] [  21430]   998 21430     4570     1082      288      794         0    77824        0             0 systemd-network
    [479802.541596] [  24725]     0 24725     4119      727      192      535         0    69632        0             0 wpa_supplicant
    [479802.541599] [  24843]   109 24843     1528      757       64      693         0    53248        0             0 avahi-daemon
    [479802.541602] [  24844]   109 24844     1495      287       70      217         0    49152        0             0 avahi-daemon
    [479802.541605] [  27590]     0 27590     3330     1091      288      803         0    65536        0         -1000 sshd
    [479802.541608] [  31274]     0 31274     5267     1616      781      835         0    81920        0             0 tmux: server
    [479802.541614] [  31275]     0 31275     2460     1138      576      562         0    61440        0             0 bash
    [479802.541617] [  31276]     0 31276     1723      838       64      774         0    57344        0           100 dbus-daemon
    [479802.541620] [  31287]     0 31287     2460     1160      576      584         0    49152        0             0 bash
    [479802.541623] [  31324]     0 31324 10892833  1409263   309527  1099736         0 31866880        0             0 bitcoind
    [479802.541626] [  31357]     0 31357     2477     1277      521      756         0    65536        0             0 htop
    [479802.541630] [  31432]     0 31432   144557     2342     1360      982         0   475136        0             0 fwupd
    [479802.541633] [  31457]     0 31457    20561      546       78      468         0    61440        0             0 gpg-agent
    [479802.541635] [  31475]     0 31475     2460     1114      576      538         0    61440        0             0 bash
    [479802.541639] [ 137873]     0 137873     1753      675      128      547         0    57344        0             0 watch
    [479802.541642] [ 190345]     0 190345     4538     1725      977      748         0    77824        0             0 sshd
    [479802.541645] [ 190468]     0 190468     2453     1130      576      554         0    61440        0             0 bash
    [479802.541647] [ 190481]     0 190481     2363      931      128      803         0    61440        0             0 tmux: client
    [479802.541652] [ 231918]     0 231918     3330     2023      256     1767         0    69632        0             0 sshd
    [479802.541654] [ 231925]     0 231925     3330     2050      256     1794         0    65536        0             0 sshd
    [479802.541657] [ 231926]   105 231926     3330     1422      259     1163         0    65536        0             0 sshd
    [479802.541659] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/tmux-spawn-b59c78ce-1ac5-4001-89ba-6d103ddba70e.scope,task=bitcoind,pid=31324,uid=0
    [479802.541774] Out of memory: Killed process 31324 (bitcoind) total-vm:43571332kB, anon-rss:1238108kB, file-rss:4398944kB, shmem-rss:0kB, UID:0 pgtables:31120kB oom_score_adj:0
    [594899.096127] workqueue: vmstat_shepherd hogged CPU for >13333us 67 times, consider switching to WQ_UNBOUND
    
  13. maflcko commented at 7:54 AM on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

  14. willcl-ark commented at 9:15 AM on April 1, 2025: member

    -DCMAKE_BUILD_TYPE=Debug

    Could clarify that this is not a release build type in the title?

    Indeed. FWIW my first test using a container used RelWithDebInfo. I will take another go using Debug...

  15. pinheadmz renamed this:
    b-msghand invoked oom-killer: Master (v28.99) crashing during IBD
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025
  16. pinheadmz renamed this:
    b-msghand invoked oom in-killer in Debug build: Master (v28.99) crashing during IBD
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    on Apr 1, 2025
  17. pinheadmz renamed this:
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD bc
    b-msghand invoked oom-killer in Debug build: Master (v28.99) crashing during IBD
    on Apr 1, 2025
  18. Crypt-iQ commented at 5:38 PM on June 19, 2025: contributor

    The node in question finished sync after I ran it inside heaptrack and is humming along just fine now. I printed out the flamegraph although I have trouble interpreting it: [ flamegraph.svg]

    At first glance it seems unusual that so much time (~40% of samples) is spent in the call stack above CScriptCheck::operator() doing allocations.

    Yup, this seems a bit weird. Though maybe a lot of time is spent in script checks anyways?

    Because the bench logs are on for validation, we can see that control hits this log line but never reaches the post-connect log line. This is a complete guess, but the CScriptCheck sticks out and I think the OOM could be related to the Add call here as there is a CScriptCheck for each tx input. I'm not sure why the OOM would occur on different blocks, but maybe it has something to do with the worker threads' utilization?

  19. pinheadmz commented at 4:48 PM on June 26, 2025: member

    I'm going to debug-build and restart an IBD on another 4CPU 8GB VM from recent master commit c43cc48aaa with txindex=1 and I'll keep an eye on it -- if it syncs without OOM I'll just close this issue

  20. pinheadmz commented at 2:18 PM on July 21, 2025: member

    Closing this now, Bitcoin Core daemon version v29.99.0-c43cc48aaaaa (debug build) synced succesfully on a debian VM 8GB memory and no swap

  21. pinheadmz closed this on Jul 21, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-03 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me