Locked memory manager

laanwj commented at 9:17 am on September 18, 2016: member

Add a pool for locked memory chunks, replacing LockedPageManager.

This is something I’ve been wanting to do for a long time. The current approach of preventing swapping of sensitive information by locking objects where they happen to be on the stack or heap in-place causes a lot of mlock/munlock system call churn, slowing down any handling of keys.

Not only that, but locked memory is a limited resource and using a lot of it bogs down the system by increasing overall swappiness, so the previous approach of locking every page that may contain any key information (but also other information) is wasteful.

Thus replace it with a consolidated pool of locked memory, so that chunks of “secure” memory can be allocated and freed without any system calls, and there is little memory overhead as possible (for example, administrative structures are not themselves in locked memory). The pool consists of one of more arenas, which divide a contiguous memory range into chunks. Arenas are allocated per 256 kB (configurable). If all current arenas are full, allocate a new one. Arenas are directly allocated from the OS with the appropriate memory page allocation API. No arenas are ever freed unless the program exits.

I’ve kept the arena implementation itself basic, for easy review. If this turns out to be ever a bottleneck we can consider adding free-lists per chunk size, per-arena locking and other typical C heap optimizations, but I don’t want to overdesign this for no good reason. Let’s agree it’s a lot better than what we have now. Unit tests have been added.
To keep a handle on how much locked memory we’re using I’ve added a RPC call getmemoryinfo that returns statistics from LockedPoolManager. This can also be used to check whether locking actually succeeded (to prevent sudden crashes with low ulimits it is not fatal if locking fails, but it is useful to be able to see if all key data is still in unswappable memory).
This is a more portable and future-proof API. Some OSes may not be able to pin e.g. stack or heap pages in place but have an API to (de)allocate pinned or locked memory pages.

Review notes

Please review the wallet commits carefully. Especially where arrays have been switched to vectors, that no sizeof(vectortype) remains in the memcpys and memcmps usage (ick!), and .data() or &vec[x] is used as appropriate instead of &vec which would overwrite the vector structure.

Measurements

Immediately after startup, loading a fairly large wallet.

Amount of memory locked cat /proc/$PID/status | grep VmLck, current master:

0    VmLck:      1376 kB

With this patch:

0    VmLck:       512 kB

Syscall count strace -cf over whole run (start+shutdown immediately) current master:

0    0.00    0.000328           0     10541           mlock
1    0.00    0.000114           0     10541           munlock

With this patch:

0    0.00    0.000000           0         2           mlock
1    0.00    0.000000           0         2           munlock

laanwj added the label Wallet on Sep 18, 2016

laanwj added the label Utils and libraries on Sep 18, 2016

gmaxwell commented at 9:23 am on September 18, 2016: contributor

Concept ACK! I’m glad that you’re working on this. I think it’s the right approach. The other advantage is that right now, IIRC, once the ulimit maximum of locked pages is reached, no more data will be locked… silent… and the massive locked page inflation makes it easy to hit any reasonable limit quickly.

paveljanik commented at 9:50 am on September 18, 2016: contributor

There is no

0static inline std::pair<std::string,UniValue> Pair(const char *, size_t)

laanwj commented at 10:22 am on September 18, 2016: member

The other advantage is that right now, IIRC, once the ulimit maximum of locked pages is reached, no more data will be locked… silent… and the massive locked page inflation makes it easy to hit any reasonable limit quickly.

Indeed. I’ve also been thinking about heartbleed-like attacks. Currently key data is scattered all around the heap and stack, with this approach it is consolidated in a few places which are separate from the normal heap where e.g. network buffers are allocated.

It would help even more if the secret data is separated with a ‘moat’ of unmapped pages from the normal heap, so that a large read can’t get there.

I’ve done nothing special to accomplish this at the moment, though, apart from trying to use the OS page allocation directly. Which reminds me that on POSIX I should probably be using mmap not posix_memalign which may just grab the memory from the heap.

static inline std::pairstd::string,UniValue Pair(const char *, size_t)

Gah, that needs a silly cast to uint64_t (I guess this error comes up on 32-bit platforms?).

in src/support/lockedpool.cpp: in 2d603c97df outdated

236+
237+/*******************************************************************************/
238+// Implementation: LockedPool
239+
240+LockedPool::LockedPool(std::unique_ptr<LockedPageAllocator> allocator, LockingFailed_Callback lf_cb_in):
241+    allocator(std::move(allocator)), lf_cb(lf_cb_in), cumulative_bytes_locked(0)

paveljanik commented at 11:04 am on September 18, 2016:

_allocator here please.

laanwj commented at 11:08 am on September 18, 2016:

I’m using _in as convention in this file, but yes it shouldn’t shadow here.

paveljanik commented at 12:00 pm on September 18, 2016:

New file, new convention? Welcome to Bitcoin Core…

Edit: There is no need to markup irony ;-)

in src/test/allocator_tests.cpp: in 2d603c97df outdated

22+    // Fake memory base address for testing
23+    // without actually using memory.
24+    void *synth_base = reinterpret_cast<void*>(0x08000000);
25+    const size_t synth_size = 1024*1024;
26+    Arena b(synth_base, synth_size, 16);
27+    void *x = b.alloc(1000);

paveljanik commented at 11:07 am on September 18, 2016:

As you use x down in for cycles, please change this.

laanwj commented at 11:10 am on September 18, 2016: member

@paveljanik Ok, I’ve made your variable naming changes. But let’s please discuss higher-level concerns first before bombarding nits in code that may be thrown away anyway.

paveljanik commented at 12:00 pm on September 18, 2016: contributor

The higher level is already wrote by @gmaxwell, no need to repeat it.

With ulimit -l being unlimited, RPC returns:

0{
1  "locked": {
2    "used": 200608,
3    "free": 61536,
4    "total": 262144,
5    "locked": 262144
6  }
7}

After ulimit -l 128, the result is:

0{
1  "locked": {
2    "used": 200608,
3    "free": 61536,
4    "total": 262144,
5    "locked": 0
6  }
7}

No memory locked at all? Or when we jump out of the limit, you do not lock anything?

Hmm, arena is 256k min. Will try with lower arena size.

Changed arenasize to 128k and:

0{
1  "locked": {
2    "used": 200608,
3    "free": 61536,
4    "total": 262144,
5    "locked": 131072
6  }
7}

Good!

paveljanik commented at 12:19 pm on September 18, 2016: contributor

The default ulimit -l values can bring a lot of fun here… OS X unlimited, SUSE Linux 64k etc.

laanwj commented at 1:59 pm on September 18, 2016: member

No memory locked at all? Or when we jump out of the limit, you do not lock anything?

It allocates and locks memory per arena. If locking the first arena (of 256Kib) fails, nothing will be locked. You could set the ARENA_SIZE to 128 kilobytes and retry. Possibly it could read the ulimit value and create the first arena of that size, if it is less than the default of 256, I don’t know how OS specific this is, though it seems UNIX-like OSes at least have getrlimit(RLIMIT_MEMLOCK).

OS X unlimited, SUSE Linux 64k etc.

Yes on Ubuntu it’s also unlimited by default. OpenBSD has 5 MiB. 64k seems utterly useless.

laanwj commented at 2:40 pm on September 18, 2016: member

Possibly it could read the ulimit value and create the first arena of that size, if it is less than the default of 256

Done, it should always get one arena of locked memory as long as the limit is larger then 0. If not it will act as a NonLockedPoolManager, nothing else to do.

laanwj force-pushed on Sep 18, 2016

paveljanik commented at 2:56 pm on September 18, 2016: contributor

After ulimit -l 128:

0{
1  "locked": {
2    "used": 200608,
3    "free": 192608,
4    "total": 393216,
5    "locked": 131072
6  }
7}

laanwj commented at 3:40 pm on September 18, 2016: member

That’s exactly what should be expected. It uses all the locked memory allowed to it by the limit. Thanks for testing.

gmaxwell commented at 7:45 pm on September 18, 2016: contributor

You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration. This would both reduce over-allocation and increase the chances that we get all that ulimit would allow. Not a strong opinion, just a design tweak. Guard pages sound like a good idea. They should be at least as large as any object that exists in the system. Causing the locked page pool to be at a random address would be helpful too.

laanwj commented at 7:21 am on September 19, 2016: member

You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration.

The practical problem here is that having tons of one-page (or two-page for that matter) arenas reduces performance, at least with the current implementation. I don’t think allocating 256kB (or whatever the ulimit is, if it is less) on the first time this is used is such a bad compromise given Bitcoin Core’s frequent use of this memory. As said my relatively large (not huge) wallet already requires 512kiB (which is really ~300KiB rounded up, but we’re not programming for the C64 here).

Also there would be some residual memory wasted if the pool consists of 4k/8k blocks spread around. And mind the syscall overhead when requesting from OS in such small quantities.

Note that I’m all for changing the 256kB parameter to say, 128kB, if that is marginally better. As I’ve documented in the comments it’s a compromise.

Causing the locked page pool to be at a random address would be helpful too.

Hm, I guess, by specifying a random argument to mmap (modulus the virtual address size, rounded to a page) and then ‘probing’ the address space where it can be put. I think this is a great idea for later, but I’m not aiming to do so in this pull (maybe we can borrow some code from ASLR?). Also here you don’t want the individual arenas too small or it’d spray the address space to unusability, as well as burden the MMU tables.

sipa commented at 9:45 am on September 19, 2016: member

Review comment on the first two commits: if you’d change the variable name of the field/variable whose type changes, it’s obvious from the diff that you’ve adapted all places in the code that affect it (makes it easier to guarantee that there are no sizeofs left).

in src/support/lockedpool.cpp: in 0b4126ae29 outdated

24+#include <limits.h> // for PAGESIZE
25+#include <unistd.h> // for sysconf
26+#endif
27+
28+LockedPoolManager* LockedPoolManager::_instance = NULL;
29+boost::once_flag LockedPoolManager::init_flag = BOOST_ONCE_INIT;

sipa commented at 9:49 am on September 19, 2016:

std::once_flag ?

laanwj commented at 12:46 pm on September 19, 2016:

Nice, that breaks the only dependency on boost

in src/support/lockedpool.h: in 0b4126ae29 outdated

53+    struct Chunk
54+    {
55+        Chunk(size_t size_in, uint32_t flags_in):
56+            size(size_in), flags(flags_in) {}
57+        size_t size;
58+        uint32_t flags;

sipa commented at 9:53 am on September 19, 2016:

s/uint32_t/ChunkFlags/ ?

laanwj commented at 12:39 pm on September 19, 2016:

Does bitwise logic with enums work with C++11? I didn’t dare try.

sipa commented at 2:53 pm on September 19, 2016:

enums automatically decay to the integer type they are derived from for supported operations. Also, any reason for not just using a boolean? Do we expect more flags to be added in the future?

laanwj commented at 5:17 pm on September 19, 2016:

enums automatically decay to the integer type they are derived from for supported operations.

Right - it always broke down when trying to assign the result of a bitwise operation back to the enum type. Good to hear that this is not a problem anymore.

Do we expect more flags to be added in the future?

No, I don’t expect more flags to be added.

I’m thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It’d be silly to set the alignment to 1 anyway.

sipa commented at 5:24 pm on September 19, 2016:

You need to cast to assign back to the enum type, as integers aren’t automatically converted to enums, only the other way around.

What I meant was that you’d just have an enum with two values, and you wouldn’t use any bitwise logic. But that would not make it much more clear than just using a boolean.

I’m thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It’d be silly to set the alignment to 1 anyway.

Or use the MSB and disallow locking over 2 GiB :)

laanwj commented at 10:52 am on September 20, 2016:

Or use the MSB and disallow locking over 2 GiB :)

Good idea, going with that, it’s less invasive.

in src/support/lockedpool.cpp: in 0b4126ae29 outdated

51+}
52+
53+void* Arena::alloc(size_t size)
54+{
55+    /* Round to next multiple of alignment */
56+    size = (size + alignment - 1) & ~(alignment-1);

sipa commented at 9:54 am on September 19, 2016:

Use align_up?

in src/support/lockedpool.cpp: in 0b4126ae29 outdated

58+    /* Don't handle zero-sized chunks */
59+    if (size == 0)
60+        return nullptr;
61+
62+    for (auto& chunk: chunks)
63+    {

sipa commented at 9:59 am on September 19, 2016:

Code style nit: braces on the same line, except for namespaces, classes, functions, methods.

in src/support/lockedpool.cpp: in 0b4126ae29 outdated

301+    if (!addr)
302+        return false;
303+    if (locked) {
304+        cumulative_bytes_locked += size;
305+    } else if (lf_cb) { // Call the locking-failed callback if locking failed
306+        if (!lf_cb()) { // If the callback returns false, free the memory and fail, other consider the user warned and proceed.

sipa commented at 10:20 am on September 19, 2016:

Typo: otherwise

sipa commented at 10:31 am on September 19, 2016: member

Concept ACK

laanwj commented at 12:39 pm on September 19, 2016: member

Review comment on the first two commits: if you’d change the variable name of the field/variable whose type changes, it’s obvious from the diff that you’ve adapted all places in the code that affect it

Good idea, I did this for the class field at least (chKey to vchKey and chIV to vchIV), makes sense as a general suggestion.

laanwj force-pushed on Sep 20, 2016

laanwj commented at 1:38 pm on September 20, 2016: member

rebased:

Squashed the squashme commits, except the last one.
Fixed all of @sipa’s nits.
All variables and member variables that change type in the wallet commits have been renamed.

jonasschnelli commented at 3:01 pm on September 20, 2016: contributor

Concept ACK, impressive work, will try to test this soon.

laanwj commented at 3:14 pm on September 29, 2016: member

Ref: #3949

laanwj force-pushed on Sep 29, 2016

laanwj commented at 4:06 pm on October 18, 2016: member

Added a benchmark to bench/

wallet: Change CCrypter to use vectors with secure allocator

Change CCrypter to use vectors with secure allocator instead of buffers
on in the object itself which will end up on the stack. This avoids
having to call LockedPageManager to lock stack memory pages to prevent the
memory from being swapped to disk. This is wasteful.

999e4c91c2

wallet: Get rid of LockObject and UnlockObject calls in key.h

Replace these with vectors allocated from the secure allocator.

This avoids mlock syscall churn on stack pages, as well as makes
it possible to get rid of these functions.

Please review this commit and the previous one carefully that
no `sizeof(vectortype)` remains in the memcpys and memcmps usage
(ick!), and `.data()` or `&vec[x]` is used as appropriate instead of
&vec.

f4d1fc259b

laanwj force-pushed on Oct 19, 2016

laanwj commented at 2:27 pm on October 19, 2016: member

Rebased for #8788 and #8873

laanwj commented at 8:56 am on October 24, 2016: member

Pushed a commit to make LockedPool::free() more robust and remove some ugly construction.

in src/support/lockedpool.cpp: in 1cb5f2dcad outdated

132@@ -133,6 +133,12 @@ Arena::Stats Arena::stats() const
133     return r;
134 }
135 
136+bool Arena::addressInArena(void *ptr_in) const
137+{
138+    uintptr_t ptr = reinterpret_cast<uintptr_t>(ptr_in);

sipa commented at 8:46 pm on October 24, 2016:

Wouldn’t it be easier to store a begin and end void*, and then compare (ptr_in >= ptr_base && ptr_in < ptr_end)? That way you don’t need the reinterpret cast.

laanwj commented at 5:26 am on October 25, 2016:

I’m slightly allergic to pointer math (most of it is undefined for void* anyhow), that’s why the preference to cast to uintptr_t where possible. But sure, for comparisons it can’t hurt (it’s well-defined).

laanwj commented at 8:16 am on October 25, 2016: member

Added a commit that changes pointer arithmethic to use char* instead of uintptr_t, this required surprisingly few changes.

sipa commented at 6:53 pm on October 25, 2016: member

Admittedly, you’d need to convert to either char* or uintptr anyway in order to compute the end pointer… feel free to ignore.

laanwj commented at 6:57 pm on October 25, 2016: member

Admittedly, you’d need to convert to either char* or uintptr anyway in order to compute the end pointer… feel free to ignore.

Converting to char* and computing an end pointer is what I did in the last commit.

support: Add LockedPool

Add a pool for locked memory chunks, replacing LockedPageManager.

This is something I've been wanting to do for a long time. The current
approach of locking objects where they happen to be on the stack or heap
in-place causes a lot of mlock/munlock system call overhead, slowing
down any handling of keys.

Also locked memory is a limited resource on many operating systems (and
using a lot of it bogs down the system), so the previous approach of
locking every page that may contain any key information (but also other
information) is wasteful.

4536148b15

rpc: Add `getmemoryinfo` call

```
getmemoryinfo
Returns an object containing information about memory usage.

Result:
{
  "locked": {               (json object) Information about locked memory manager
    "used": xxxxx,          (numeric) Number of bytes used
    "free": xxxxx,          (numeric) Number of bytes available in current arenas
    "total": xxxxxxx,       (numeric) Total number of bytes managed
    "locked": xxxxxx,       (numeric) Amount of bytes that succeeded locking. If this number is smaller than total, locking pages failed at some point and key data could be swapped to disk.
  }
}

Examples:
> bitcoin-cli getmemoryinfo
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getmemoryinfo", "params": [] }' -H 'content-type: text/plain;' http://127.0.0.1:8332/
```

6567999096

bench: Add benchmark for lockedpool allocation/deallocation 444c673d85

laanwj force-pushed on Oct 27, 2016

laanwj commented at 11:21 am on October 27, 2016: member

Squashed the following commits

f42c60a - squashme: uintptr_t to char* for pointer arithmethic
1cb5f2d - lockedpool: Make free() more robust

laanwj merged this on Nov 2, 2016

laanwj closed this on Nov 2, 2016

laanwj referenced this in commit f8723d2318 on Nov 2, 2016

paveljanik commented at 1:01 pm on November 2, 2016: contributor

OS X, clang (one -Wshadow warning, and one additional error):

0support/lockedpool.cpp:67:19: warning: declaration shadows a field of 'Arena' [-Wshadow]
1            char* base = chunk.first;
2                  ^
3./support/lockedpool.h:117:11: note: previous declaration is here
4    char* base;
5          ^
6support/lockedpool.cpp:231:49: error: use of undeclared identifier 'MAP_ANONYMOUS'
7    addr = mmap(nullptr, len, 0x01|0x02, 0x0002|MAP_ANONYMOUS, -1, 0);

man mmap:

0     MAP_ANON          Map anonymous memory not associated with any specific
1                       file.  The offset argument is ignored.  Mac OS X spe-
2                       cific: the file descriptor used for creating MAP_ANON
3                       regions can be used to pass some Mach VM flags, and can
4                       be specified as -1 if no such flags are associated with
5                       the region.  Mach VM flags are defined in <mach/vm_sta-
6                       tistics.h> and the ones that currently apply to mmap
7                       are:

So maybe

0+#ifndef MAP_ANONYMOUS
1+#define MAP_ANONYMOUS MAP_ANON
2+#endif

somewhere before its usage?

codablock referenced this in commit 1237647866 on Sep 19, 2017

laanwj referenced this in commit 94c9015bca on Sep 22, 2017

codablock referenced this in commit bc3b9294e8 on Jan 13, 2018

andvgal referenced this in commit 53ede2fba3 on Jan 6, 2019

PastaPastaPasta referenced this in commit bad429f601 on Dec 22, 2019

PastaPastaPasta referenced this in commit d0e6a1c4e4 on Jan 2, 2020

PastaPastaPasta referenced this in commit c5e1cfe005 on Jan 4, 2020

PastaPastaPasta referenced this in commit d15dd02d78 on Jan 4, 2020

PastaPastaPasta referenced this in commit d6df1e605a on Jan 10, 2020

PastaPastaPasta referenced this in commit f6a6860cbc on Jan 10, 2020

PastaPastaPasta referenced this in commit 44b47017ef on Jan 10, 2020

PastaPastaPasta referenced this in commit dd8e0c0153 on Jan 12, 2020

random-zebra referenced this in commit cc84157e52 on May 17, 2020

zkbot referenced this in commit 5ef5d8d268 on Jul 31, 2020

zkbot referenced this in commit 7d94064616 on Sep 29, 2020

ckti referenced this in commit 7a256c4b08 on Mar 28, 2021

furszy referenced this in commit 7793f1804d on Apr 5, 2021

MarcoFalke locked this on Sep 8, 2021

Locked memory manager #8753

Review notes

Measurements