Memory leak when using IPC mining interface #33940

issue ryanofsky openend this issue on November 24, 2025
  1. ryanofsky commented at 2:11 pm on November 24, 2025: contributor

    Originally posted by @plebhash in #33899

    we are getting reports of out-of-memory crashes on https://github.com/stratum-mining/sv2-apps/pull/59#issuecomment-3568252007

    initially I suspected there could be thread-related issues similar to what I reported on #33923, but it turns out it was the VPS running out of it’s 2GB available RAM, which only happened after long (12h+) running sessions

    so I leveraged psrecord to observe how Bitcoin Core (a14e7b9dee9145920f93eab0254ce92942bd1e5e from 30.x branch) was consuming RAM across time (connected to mainnet for high mempool activity, for about ~1h to have enough chain tip updates)

    this is RAM consumption of Bitcoin Core with the Rust code from https://github.com/stratum-mining/sv2-apps/pull/59 (more specifically SRI Pool at https://github.com/stratum-mining/sv2-apps/pull/59/commits/57576e083e82fb5acd5dbad75e8b4de7158a3d61) connected to it:

    there’s a clear upwards trend in RAM consumption, which made me wonder if we were doing something wrong on the Rust code

    so then I ran it alongside sv2-tp for a comparison:

    which shows a similar upwards trend that never gets throttled down


    I don’t know whether the root cause for this is related to template memory management, but there’s a chance it is, so I’m reporting it here rather than opening a new issue

    Originally posted by @Sjors in #33899

    I’m planning to make similar plots for #33922 (without CPU and ideally with marks to indicate where blocks were found).

    If you’re measuring the process memory instead of only the template memory (which #33922 enables), you’ll want to hold the mempool itself constant. E.g. by picking some value for -maxmempool and waiting for it to fill before starting the measurement. You also want to set -dbcache to its minimum, because that’s also accuring

    Originally posted by @Sjors in #33899

    @plebhash it looks like you found a real bug. Because BlockTemplate::createNewBlock doesn’t have a context param, it looks like its destroy method is not invoked until sv2-tp disconnects. So the node keeps holding on to templates even though the Template Provider already pruned them.

    I’ll open a PR to fix that.

    Originally posted by @plebhash in #33899

    that’s good to know!

    but do we have to explicitly call destroy or is it sufficient to drop the reference from memory on the client side?

    from my understanding of capnp, I believe it should be sufficient to drop it from memory, but on the other hand there must be a reason for destroy to exist?

    Originally posted by @Sjors in #33899

    @plebhash IIUC libmultiprocess does this automatically, but only sv2-tp uses that library. So it probably depends on how the rust capnp library is implemented. It might be worth testing how that library behaves out of the box, with and without the fix here. Just look for the IPC server destroy messages on the Bitcoin Core side (preferably tested against master).

  2. Sjors commented at 2:17 pm on November 24, 2025: member

    Turns out I was chasing a ghost. The reason destroy wasn’t called by sv2-tp was not because of a missing context param, but because of an unrelated regression.

    So the memory leak @plebhash is seeing is probably due to not calling destroy.

  3. ryanofsky commented at 2:18 pm on November 24, 2025: contributor

    re: #33899 (comment)

    it looks like you found a real bug. Because BlockTemplate::createNewBlock doesn’t have a context param, it looks like its destroy method is not invoked until sv2-tp disconnects.

    I can imagine there’s a bug causing the block template not to be freed until the disconnect happens, but I don’t think createNewBlock method having a context parameter would affect this. Having a context parameter just allows the method to run on an asynchronous thread without blocking the event loop, instead of running on the event loop thread. But which thread the block was created on should not affect how it’s destroyed

  4. ryanofsky commented at 2:24 pm on November 24, 2025: contributor

    re: #33899 (comment)

    but do we have to explicitly call destroy or is it sufficient to drop the reference from memory on the client side?

    from my understanding of capnp, I believe it should be sufficient to drop it from memory, but on the other hand there must be a reason for destroy to exist?

    Your understanding is right, I think, but there are a lot of details here and I can imagine there being some bug where just dropping the reference on the client side does not delete the server side object right away. The reason for having an explicit destroy method is to give clients a way to destroy the server side-object and actually wait for it to be destroyed. If clients just drop their references to server side objects, the server side objects will get destroyed, but it will happen asynchronously.

  5. plebhash commented at 2:57 pm on November 24, 2025: none

    I ran two sessions with #33936 at a3a6861e131120eabf6e2f7ecd15e5ea805c66b2, where the client was our Sv2 Rust code

    both of them had at least one chain tip update, which should trigger templates being flushed from memory (so I’d expect to see a sharp decline in RAM consumption)

    in the first one, we’re not calling destroy:

    in the second one, we’re calling destroy after chain tip updates:

    I guess this indicates that @Sjors approach on #33936 isn’t fully fixing the issue yet, which I guess is already known due to this comment #33936 (comment)

  6. ryanofsky commented at 2:58 pm on November 24, 2025: contributor

    re: #33899 (comment)

    do we have to explicitly call destroy or is it sufficient to drop the reference from memory on the client side?

    Following up on this, it’s good practice to call destroy methods on objects which have them, to guarantee the objects are destroyed right away instead of asynchronously. This is most important if objects have nontrivial destructors, or if it matters what order objects get destroyed.

    Block templates using a lot of memory in a memory-constrained environment could be another reason to call destroy explicitly. But if adding destroy calls to the client fixes a memory leak, this does sound like a bug, because memory should be freed soon after references are dropped even without explicit destroy calls.

    re: #33899 (comment)

    its destroy method is not invoked until sv2-tp disconnects

    The reason destroy wasn’t called by sv2-tp was not because of a missing context param, but because of an unrelated regression.

    I was looking into this to remind myself what current behavior is. It actually changed sort of recently, but before the 30.0 release in https://github.com/bitcoin-core/libmultiprocess/commit/949573da84112388f68d1893f55aae0ca7f12d0c from https://github.com/bitcoin-core/libmultiprocess/pull/160. The commit mention there mentions “this change causes the ProxyServer::m_impl object to be freed shortly after the client is freed, instead of being delayed until the connection is closed.”

    Before that change, libmultiprocess would only delete server objects if the client explicitly called a destroy method, or when the connection was cleaned up. But currently it should delete them immediately even if a destroy method is never called. The previous behavior didn’t matter very much for libmultiprocoess c++ clients call which destroy() internally in their own destructors, but could matter for rust and python clients which don’t do that.

  7. lucasbalieiro commented at 3:50 pm on November 25, 2025: none

    here are some insights from the experiments I’ve been running over the past few days.

    When I first reported this issue (https://github.com/stratum-mining/sv2-apps/pull/59#issuecomment-3568252007), my server was also running a few utility tools that I typically use during long test sessions.

    To eliminate the “maybe it’s something else eating the memory” hypothesis, I started a fresh session with only the following running:

    • sv2-apps
    • Bitcoin Core v30 (mainnet)
    • The essential system processes (no extra utilities, no accessories)

    Server specs (VPS):

    • 2 vCPUs

    • 2 GB RAM

    • Ubuntu 24.04 LTS

      0Distributor ID: Ubuntu
      1Description:    Ubuntu 24.04 LTS
      2Release:        24.04
      3Codename:       noble
      

    Baseline Start

    At startup, with the full SV2 stack running (two apps using IPC), the total memory usage hovered between 800 MB and 1 GB, with some expected minor spikes.

    After letting it run overnight, the total VPS RAM usage sat around 1.5 GB.

    During that same period:

    • Bitcoin Core increased by roughly 160 MB
    • But the rest of the increase could not be explained by Bitcoin Core alone
    • This made me suspect the sv2-apps were also growing over time
    • A few hours later, @plebhash confirmed with psrecord that the apps were indeed growing as well

    Next 8 Hours

    Over the next 8 hours, total RAM usage climbed from 1.5 GB → 1.6 GB.

    Since I had a bit of headroom left, I decided to stress things slightly.


    Adding More Load

    Alongside the already-running apps, I launched three more pool instances (also using IPC).

    Within ~3 hours:

    • Bitcoin Core’s RAM usage increased (expected because of more IPC traffic, I think)
    • Total RAM climbed to 1.75 GB

    I then shut down the extra pool instances.

    However:

    • Bitcoin Core did not release the memory
    • Even by the next morning, RAM usage was still between 1.69–1.75 GB
    • So it seems Bitcoin Core doesn’t reduce its memory footprint after clients disconnect

    Final Incident (Today)

    Since the VPS was close to running out of RAM again, I repeated the test: I spun up one more pool instance using IPC. RAM usage jumped to ~1.82 GB.

    I closed the pool instance and waited to see if memory would drop. It didn’t.

    A few hours later, the OS killed Bitcoin Core:

    0Tue Nov 25 14:51:01 2025] Out of memory: Killed process 2763434 (bitcoin-node) total-vm:13471820kB, anon-rss:1302224kB, file-rss:2916kB, shmem-rss:0kB, UID:0 pgtables:19496kB oom_score_adj:0
    

    So at the moment, Bitcoin Core ends up consuming about 1 .3GB of RAM and never seems to shrink back down.

    Why I’m Reporting This

    I’m sharing these observations because they might help narrow down the behavior we’re seeing, both in Bitcoin Core and the SV2 apps.

    I’m also willing to run another clean testing session. If you have recommendations for specific instrumentation or profiling tools to run, I can set them up. Right now, I avoided using things like psrecord during the run so I could rule out secondary tools affecting the memory footprint.

  8. Sjors commented at 6:17 pm on November 25, 2025: member
    @lucasbalieiro does Bitcoin Core behave better if you compile the 30.x branch from source? That’s the equivalent of how v30.1 will behave when it comes out.
  9. lucasbalieiro commented at 9:46 pm on November 25, 2025: none

    @lucasbalieiro does Bitcoin Core behave better if you compile the 30.x branch from source? That’s the equivalent of how v30.1 will behave when it comes out.

    Built Bitcoin Core at commit a14e7b9dee9145920f93eab0254ce92942bd1e5e:

    0root@srv-f20833:~/Projects/bitcoin/build/bin# git log
    1commit a14e7b9dee9145920f93eab0254ce92942bd1e5e (HEAD -> 30.x, origin/30.x)
    2Merge: d0f6d9953a ae63cc4bf2
    3Author: merge-script <fanquake@gmail.com>
    4Date:   Thu 
    

    Ran the same stress test again (spawning multiple IPC-heavy apps).

    Observed differences:

    • Core feels noticeably more tolerant. Baseline RAM usage is ~100 MB lower. I could spin up more apps simultaneously without Core hitting big spikes, looks like it’s using less memory per connected app.
    • However, after disconnecting the extra pool I used for stress testing, the RAM doesn’t drop back down to the original baseline (the level seen when only the SV2 apps + Core were running). It stays elevated, as if it’s still serving several pool apps.

    Even when disconnecting all the apps. It does not return to its ~700MB of consumption

  10. ryanofsky commented at 1:41 am on November 26, 2025: contributor

    Thanks the the updates! If I’m understanding correctly, it seems like memory usage increasing 160MB overnight and 100MB over next 8 hours, and this would be pretty consistent with block templates not being freed. I think there are two key unknowns here:

    • We don’t know whether the client is holding onto the block template references. If it is, then the memory usage going up would be expected and this is client bug, otherwise it is a server bug.
    • We don’t know if server is failing to release block templates when the client disconnects. The fact that RSS number does not decrease when the client disconnects suggests that server is failing to release memory, but it’s not a reliable indicator, because it only show much memory has been allocated by the OS to the process, but not how much memory is actually in use by the process. When the C library frees memory, it’s normal for it to hold onto the pages that were in use and not return them to the OS. It’s possible to see how much memory is actually in use using a tool like heaptrack

    Probably a good next step would be to add log statements to BlockTemplateImpl constructor and destructor and write a python test program creating a template, and seeing if the destructor is actually called on sudden disconnect, or when calling the destroy method, or when just deleting the reference with del template; await asyncio.sleep(0). This could indicate if there’s an obvious problem with the IPC server. It does seem like there is a real leak here, but it’s not clear if it this is caused by a bug on the client side or server side.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-26 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me