Crash upon RPC v1 connection in v28.0.0 #31041

issue dr-orlovsky openend this issue on October 6, 2024
  1. dr-orlovsky commented at 6:14 pm on October 6, 2024: none

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    On RPC v1 connection, after the block sync and mempool read is complete, a crash happens (see logs). I used mempool fork of electrs indexer which did an RPC request.

    Original report can be found here: #31039 (comment)

    Expected behaviour

    No crash is expected

    Steps to reproduce

    Run Bitcoin Core (I used machine with 8 cores, and 4GB or 8GB of memory)

    Relevant log output

     0Oct 06 17:54:18 core bitcoind[16532]: 2024-10-06T17:54:18Z initload thread exit
     1Oct 06 17:55:43 core systemd[1]: bitcoind.service: A process of this unit has been killed by the OOM killer.
     2░░ Subject: A process of bitcoind.service unit has been killed by the OOM killer.
     3░░ Defined-By: systemd
     4░░ Support: https://www.debian.org/support
     5░░ 
     6░░ A process of unit [@UNIT](/bitcoin-bitcoin/contributor/unit/) has been killed by the Linux kernel out-of-memory (OOM)
     7░░ killer logic. This usually indicates that the system is low on memory and that
     8░░ memory needed to be freed. A process associated with bitcoind.service has been determined
     9░░ as the best process to terminate and has been forcibly terminated by the
    10░░ kernel.
    11░░ 
    12░░ Note that the memory pressure might or might not have been caused by bitcoind.service.
    13Oct 06 17:55:43 core systemd[1]: bitcoind.service: Main process exited, code=killed, status=9/KILL
    14░░ Subject: Unit process exited
    15░░ Defined-By: systemd
    16░░ Support: https://www.debian.org/support
    17░░ 
    18░░ An ExecStart= process belonging to unit bitcoind.service has exited.
    19░░ 
    20░░ The process' exit code is 'killed' and its exit status is 9.
    21Oct 06 17:55:43 core systemd[1]: bitcoind.service: Failed with result 'oom-kill'.
    22░░ Subject: Unit failed
    23░░ Defined-By: systemd
    24░░ Support: https://www.debian.org/support
    25░░ 
    26░░ The unit bitcoind.service has entered the 'failed' state with result 'oom-kill'.
    27Oct 06 17:55:43 core systemd[1]: bitcoind.service: Consumed 3min 18.749s CPU time.
    28░░ Subject: Resources consumed by unit runtime
    29░░ Defined-By: systemd
    30░░ Support: https://www.debian.org/support
    31░░ 
    32░░ The unit bitcoind.service completed and consumed the indicated resources.
    33Oct 06 17:55:43 core systemd[1]: bitcoind.service: Scheduled restart job, restart counter is at 1.
    34░░ Subject: Automatic restarting of a unit has been scheduled
    35░░ Defined-By: systemd
    36░░ Support: https://www.debian.org/support
    37░░ 
    38░░ Automatic restarting of the unit bitcoind.service has been scheduled, as the result for
    

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    v28.0.0

    Operating system and version

    Debian Bookworm

    Machine specifications

    8 CPUs; 4GB RAM

  2. dr-orlovsky renamed this:
    Crash due to inadequate memory allocation request upon RPC v1 connection in v28.0.0
    Crash upon RPC v1 connection in v28.0.0
    on Oct 6, 2024
  3. dr-orlovsky commented at 6:24 pm on October 6, 2024: none
    With the increase in memory to 8GB the crash has disappeared.
  4. maflcko commented at 8:13 am on October 7, 2024: member

    Steps to reproduce

    Run Bitcoin Core

    What are the exact steps to reproduce? What is the config? Which RPCs were called, in what order, and in what timing (parallel, sequential), …?

    Does it still happen after https://github.com/romanz/electrs/pull/1091 ?

  5. andrewtoth commented at 1:19 pm on October 7, 2024: contributor

    Does it still happen after https://github.com/romanz/electrs/pull/1091 ?

    That is for a different fork of electrs. For the mempool/electrs, you can see in the original logs in #31039 that it is already waiting for mempool to load before syncing.

  6. maflcko commented at 2:32 pm on October 7, 2024: member
    I see. It would be good to know what the batch request is looking like. Also, it would be good to confirm that this works on 27.x and is broken on 28.x. Also, it would be good to confirm that the system has enough memory to fit the whole batch request response (as json) into memory.
  7. fanquake added this to the milestone 28.1 on Oct 22, 2024
  8. willcl-ark commented at 3:07 pm on November 29, 2024: member

    I synced mempool’s fork of electrs using bitcoin core v28.0 and there were no issues with unconstrained memory. (also wow; I did not notice in the readme this requires > 1.3TB of disk space to sync)

    Memory usage as I measured peaked at about 1.9GB for bitcoind, compared with 6.1GB for electrs. This machine has 64GB RAM so total utilization was low. electrs did crash and require me to increase the max file descriptor count (with ulimit -n) which I wasn’t expecting, but after that everything worked smoothly.

    My bitcoind was already running (and synced) with default mempool size when I began the electrs sync. Perhaps this is the key difference from your methodology @dr-orlovsky ? Otherwise, I am not sure why mine behaved so differently…

    I could try with systemd’s MemoryMax=bytes or a docker container, to force-limit maximum memory of bitcoind, perhaps?

  9. maflcko commented at 3:13 pm on November 29, 2024: member

    Memory usage as I measured peaked at about 1.9GB for bitcoind, compared with 6.1GB for electrs.

    I’d say that explains the crash happening with 4 GB of RAM, but not with 8GB of RAM (albeit that seems a bit lucky).

    Not sure what to do here. I don’t think there is anything that can be done to catch an (expected) OOM?

  10. willcl-ark commented at 4:31 pm on November 29, 2024: member

    Oh yes, it could do. I had interpreted OP to mean bitcoin core had 4GB (or 8GB) of RAM to itself, but if both of those were trying to share that amount, it seems like it might not work. I note that over in #31039 it was reported that:

    The machine I use has 4GB and there is nothing other than Bitcoin Core on it.

    I got suspicious of my own earlier results as I’ve seen bitcoin core using more memory than this previously, so re-ran an electrs sync to block 600,000 only this time. bitcoind had peak memory usage of ~5GB by block 420,000, which then held steady for the remainder of the sync to 600,000.

    0$  grep VmPeak /proc/(pidof bitcoind)/status
    1VmPeak:  5039796 kB
    

    I agree that there’s not much we can do about catching an OOM. We also don’t have any confirmation that this same issue was not present on 27.x. I will try and test this later to see if there is any difference to block 600k again.

  11. willcl-ark commented at 8:50 am on November 30, 2024: member

    We also don’t have any confirmation that this same issue was not present on 27.x. I will try and test this later to see if there is any difference to block 600k again.

    By block 400,000 bitcoin core v27.0 peak usage hit VmPeak: 5048020 kB, which is approx the same (marginally higher than v28.0) at this stage where it too roughly levels out.

    I’m not going to run this further than the block 550k I am up to now, as these results seem so similar. My read is that:

    • Electrs takes 50% more ram than bitcoin core (seen max of 8GB)
    • Bitcoin Core v27.0 and v28.0 are approximately equal in requiring ~5GB RAM to host an electrs sync (to block 600k)
    • Running these both on a system with only 4GB (maybe even 8GB too) should be expected to cause OOM killing or heavy swapping
  12. maflcko removed this from the milestone 28.1 on Dec 2, 2024
  13. maflcko commented at 9:18 am on December 2, 2024: member

    Thanks for taking a detailed look. If you don’t mind, you could run it again through KDE heaptrack to see if there are any obvious low-hanging fruits for improvement of the memory usage.

    In any case, I think it is clear that this is not a regression and I’ve removed the 28.x milestone.

  14. willcl-ark commented at 9:58 am on December 3, 2024: member

    I took a look with heaptrack on a debug build of 28.0 (output file: https://tmp.256k1.dev/heaptrack.bitcoind.3546996.zst )

    image

    Whilst number of allocations increase over time as expected:

    image

    Consumed heap memory stays pretty stable:

    image

    Peak RSS of 6.8GB (this includes heaptrack though now) seems in line with previous checks.

    I would expect this to increase a bit more as we go through the chain further through block 870k, but there does not appear to be any obvious memory leaks here.

    I may try an assumeutxo sync from block 800,000 with a no-background-sync patch just to excercise those later blocks. But can’t guarantee I’ll find the time.

  15. maflcko commented at 10:03 am on December 3, 2024: member
    Thanks! I don’t expect a memory leak either, but I’d find it interesting to see where the heap usage came from. I guess it is all Univalue?
  16. willcl-ark commented at 10:17 am on December 3, 2024: member

    Actually seems to be a mixture, and UniValue doesn’t particularly stand out to me:

    image

  17. maflcko added the label Resource usage on Dec 3, 2024
  18. maflcko added the label Linux/Unix on Dec 24, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me