Locked memory manager #8753

pull laanwj wants to merge 5 commits into bitcoin:master from laanwj:2016_09_lockedpool changing 15 files +966 −410
  1. laanwj commented at 9:17 am on September 18, 2016: member

    Add a pool for locked memory chunks, replacing LockedPageManager.

    This is something I’ve been wanting to do for a long time. The current approach of preventing swapping of sensitive information by locking objects where they happen to be on the stack or heap in-place causes a lot of mlock/munlock system call churn, slowing down any handling of keys.

    Not only that, but locked memory is a limited resource and using a lot of it bogs down the system by increasing overall swappiness, so the previous approach of locking every page that may contain any key information (but also other information) is wasteful.

    Thus replace it with a consolidated pool of locked memory, so that chunks of “secure” memory can be allocated and freed without any system calls, and there is little memory overhead as possible (for example, administrative structures are not themselves in locked memory). The pool consists of one of more arenas, which divide a contiguous memory range into chunks. Arenas are allocated per 256 kB (configurable). If all current arenas are full, allocate a new one. Arenas are directly allocated from the OS with the appropriate memory page allocation API. No arenas are ever freed unless the program exits.

    • I’ve kept the arena implementation itself basic, for easy review. If this turns out to be ever a bottleneck we can consider adding free-lists per chunk size, per-arena locking and other typical C heap optimizations, but I don’t want to overdesign this for no good reason. Let’s agree it’s a lot better than what we have now. Unit tests have been added.
    • To keep a handle on how much locked memory we’re using I’ve added a RPC call getmemoryinfo that returns statistics from LockedPoolManager. This can also be used to check whether locking actually succeeded (to prevent sudden crashes with low ulimits it is not fatal if locking fails, but it is useful to be able to see if all key data is still in unswappable memory).
    • This is a more portable and future-proof API. Some OSes may not be able to pin e.g. stack or heap pages in place but have an API to (de)allocate pinned or locked memory pages.

    Review notes

    • Please review the wallet commits carefully. Especially where arrays have been switched to vectors, that no sizeof(vectortype) remains in the memcpys and memcmps usage (ick!), and .data() or &vec[x] is used as appropriate instead of &vec which would overwrite the vector structure.

    Measurements

    Immediately after startup, loading a fairly large wallet.

    Amount of memory locked cat /proc/$PID/status | grep VmLck, current master:

    0    VmLck:      1376 kB
    

    With this patch:

    0    VmLck:       512 kB
    

    Syscall count strace -cf over whole run (start+shutdown immediately) current master:

    0    0.00    0.000328           0     10541           mlock
    1    0.00    0.000114           0     10541           munlock
    

    With this patch:

    0    0.00    0.000000           0         2           mlock
    1    0.00    0.000000           0         2           munlock
    
  2. laanwj added the label Wallet on Sep 18, 2016
  3. laanwj added the label Utils and libraries on Sep 18, 2016
  4. gmaxwell commented at 9:23 am on September 18, 2016: contributor
    Concept ACK! I’m glad that you’re working on this. I think it’s the right approach. The other advantage is that right now, IIRC, once the ulimit maximum of locked pages is reached, no more data will be locked… silent… and the massive locked page inflation makes it easy to hit any reasonable limit quickly.
  5. paveljanik commented at 9:50 am on September 18, 2016: contributor

    There is no

    0static inline std::pair<std::string,UniValue> Pair(const char *, size_t)
    
  6. laanwj commented at 10:22 am on September 18, 2016: member

    The other advantage is that right now, IIRC, once the ulimit maximum of locked pages is reached, no more data will be locked… silent… and the massive locked page inflation makes it easy to hit any reasonable limit quickly.

    Indeed. I’ve also been thinking about heartbleed-like attacks. Currently key data is scattered all around the heap and stack, with this approach it is consolidated in a few places which are separate from the normal heap where e.g. network buffers are allocated.

    It would help even more if the secret data is separated with a ‘moat’ of unmapped pages from the normal heap, so that a large read can’t get there.

    I’ve done nothing special to accomplish this at the moment, though, apart from trying to use the OS page allocation directly. Which reminds me that on POSIX I should probably be using mmap not posix_memalign which may just grab the memory from the heap.

    static inline std::pairstd::string,UniValue Pair(const char *, size_t)

    Gah, that needs a silly cast to uint64_t (I guess this error comes up on 32-bit platforms?).

  7. in src/support/lockedpool.cpp: in 2d603c97df outdated
    236+
    237+/*******************************************************************************/
    238+// Implementation: LockedPool
    239+
    240+LockedPool::LockedPool(std::unique_ptr<LockedPageAllocator> allocator, LockingFailed_Callback lf_cb_in):
    241+    allocator(std::move(allocator)), lf_cb(lf_cb_in), cumulative_bytes_locked(0)
    


    paveljanik commented at 11:04 am on September 18, 2016:
    _allocator here please.

    laanwj commented at 11:08 am on September 18, 2016:
    I’m using _in as convention in this file, but yes it shouldn’t shadow here.

    paveljanik commented at 12:00 pm on September 18, 2016:

    New file, new convention? Welcome to Bitcoin Core…

    Edit: There is no need to markup irony ;-)

  8. in src/test/allocator_tests.cpp: in 2d603c97df outdated
    22+    // Fake memory base address for testing
    23+    // without actually using memory.
    24+    void *synth_base = reinterpret_cast<void*>(0x08000000);
    25+    const size_t synth_size = 1024*1024;
    26+    Arena b(synth_base, synth_size, 16);
    27+    void *x = b.alloc(1000);
    


    paveljanik commented at 11:07 am on September 18, 2016:
    As you use x down in for cycles, please change this.
  9. laanwj commented at 11:10 am on September 18, 2016: member
    @paveljanik Ok, I’ve made your variable naming changes. But let’s please discuss higher-level concerns first before bombarding nits in code that may be thrown away anyway.
  10. paveljanik commented at 12:00 pm on September 18, 2016: contributor

    The higher level is already wrote by @gmaxwell, no need to repeat it.

    With ulimit -l being unlimited, RPC returns:

    0{
    1  "locked": {
    2    "used": 200608,
    3    "free": 61536,
    4    "total": 262144,
    5    "locked": 262144
    6  }
    7}
    

    After ulimit -l 128, the result is:

    0{
    1  "locked": {
    2    "used": 200608,
    3    "free": 61536,
    4    "total": 262144,
    5    "locked": 0
    6  }
    7}
    

    No memory locked at all? Or when we jump out of the limit, you do not lock anything?

    Hmm, arena is 256k min. Will try with lower arena size.

    Changed arenasize to 128k and:

    0{
    1  "locked": {
    2    "used": 200608,
    3    "free": 61536,
    4    "total": 262144,
    5    "locked": 131072
    6  }
    7}
    

    Good!

  11. paveljanik commented at 12:19 pm on September 18, 2016: contributor
    The default ulimit -l values can bring a lot of fun here… OS X unlimited, SUSE Linux 64k etc.
  12. laanwj commented at 1:59 pm on September 18, 2016: member

    No memory locked at all? Or when we jump out of the limit, you do not lock anything?

    It allocates and locks memory per arena. If locking the first arena (of 256Kib) fails, nothing will be locked. You could set the ARENA_SIZE to 128 kilobytes and retry. Possibly it could read the ulimit value and create the first arena of that size, if it is less than the default of 256, I don’t know how OS specific this is, though it seems UNIX-like OSes at least have getrlimit(RLIMIT_MEMLOCK).

    OS X unlimited, SUSE Linux 64k etc.

    Yes on Ubuntu it’s also unlimited by default. OpenBSD has 5 MiB. 64k seems utterly useless.

  13. laanwj commented at 2:40 pm on September 18, 2016: member

    Possibly it could read the ulimit value and create the first arena of that size, if it is less than the default of 256

    Done, it should always get one arena of locked memory as long as the limit is larger then 0. If not it will act as a NonLockedPoolManager, nothing else to do.

  14. laanwj force-pushed on Sep 18, 2016
  15. paveljanik commented at 2:56 pm on September 18, 2016: contributor

    After ulimit -l 128:

    0{
    1  "locked": {
    2    "used": 200608,
    3    "free": 192608,
    4    "total": 393216,
    5    "locked": 131072
    6  }
    7}
    
  16. laanwj commented at 3:40 pm on September 18, 2016: member
    That’s exactly what should be expected. It uses all the locked memory allowed to it by the limit. Thanks for testing.
  17. gmaxwell commented at 7:45 pm on September 18, 2016: contributor
    You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration. This would both reduce over-allocation and increase the chances that we get all that ulimit would allow. Not a strong opinion, just a design tweak. Guard pages sound like a good idea. They should be at least as large as any object that exists in the system. Causing the locked page pool to be at a random address would be helpful too.
  18. laanwj commented at 7:21 am on September 19, 2016: member

    You might want to make the allocation increment just one page and on start have an initial allocation that is equal to whatever typical usage is in the current configuration.

    The practical problem here is that having tons of one-page (or two-page for that matter) arenas reduces performance, at least with the current implementation. I don’t think allocating 256kB (or whatever the ulimit is, if it is less) on the first time this is used is such a bad compromise given Bitcoin Core’s frequent use of this memory. As said my relatively large (not huge) wallet already requires 512kiB (which is really ~300KiB rounded up, but we’re not programming for the C64 here).

    Also there would be some residual memory wasted if the pool consists of 4k/8k blocks spread around. And mind the syscall overhead when requesting from OS in such small quantities.

    Note that I’m all for changing the 256kB parameter to say, 128kB, if that is marginally better. As I’ve documented in the comments it’s a compromise.

    Causing the locked page pool to be at a random address would be helpful too.

    Hm, I guess, by specifying a random argument to mmap (modulus the virtual address size, rounded to a page) and then ‘probing’ the address space where it can be put. I think this is a great idea for later, but I’m not aiming to do so in this pull (maybe we can borrow some code from ASLR?). Also here you don’t want the individual arenas too small or it’d spray the address space to unusability, as well as burden the MMU tables.

  19. sipa commented at 9:45 am on September 19, 2016: member
    Review comment on the first two commits: if you’d change the variable name of the field/variable whose type changes, it’s obvious from the diff that you’ve adapted all places in the code that affect it (makes it easier to guarantee that there are no sizeofs left).
  20. in src/support/lockedpool.cpp: in 0b4126ae29 outdated
    24+#include <limits.h> // for PAGESIZE
    25+#include <unistd.h> // for sysconf
    26+#endif
    27+
    28+LockedPoolManager* LockedPoolManager::_instance = NULL;
    29+boost::once_flag LockedPoolManager::init_flag = BOOST_ONCE_INIT;
    


    sipa commented at 9:49 am on September 19, 2016:
    std::once_flag ?

    laanwj commented at 12:46 pm on September 19, 2016:
    Nice, that breaks the only dependency on boost
  21. in src/support/lockedpool.h: in 0b4126ae29 outdated
    53+    struct Chunk
    54+    {
    55+        Chunk(size_t size_in, uint32_t flags_in):
    56+            size(size_in), flags(flags_in) {}
    57+        size_t size;
    58+        uint32_t flags;
    


    sipa commented at 9:53 am on September 19, 2016:
    s/uint32_t/ChunkFlags/ ?

    laanwj commented at 12:39 pm on September 19, 2016:
    Does bitwise logic with enums work with C++11? I didn’t dare try.

    sipa commented at 2:53 pm on September 19, 2016:
    enums automatically decay to the integer type they are derived from for supported operations. Also, any reason for not just using a boolean? Do we expect more flags to be added in the future?

    laanwj commented at 5:17 pm on September 19, 2016:

    enums automatically decay to the integer type they are derived from for supported operations.

    Right - it always broke down when trying to assign the result of a bitwise operation back to the enum type. Good to hear that this is not a problem anymore.

    Do we expect more flags to be added in the future?

    No, I don’t expect more flags to be added.

    I’m thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It’d be silly to set the alignment to 1 anyway.


    sipa commented at 5:24 pm on September 19, 2016:

    You need to cast to assign back to the enum type, as integers aren’t automatically converted to enums, only the other way around.

    What I meant was that you’d just have an enum with two values, and you wouldn’t use any bitwise logic. But that would not make it much more clear than just using a boolean.

    I’m thinking of using the typical C heap solution: use the LSB of size as used-flag, then set the minimum for the required alignment to 2. It’d be silly to set the alignment to 1 anyway.

    Or use the MSB and disallow locking over 2 GiB :)


    laanwj commented at 10:52 am on September 20, 2016:

    Or use the MSB and disallow locking over 2 GiB :)

    Good idea, going with that, it’s less invasive.

  22. in src/support/lockedpool.cpp: in 0b4126ae29 outdated
    51+}
    52+
    53+void* Arena::alloc(size_t size)
    54+{
    55+    /* Round to next multiple of alignment */
    56+    size = (size + alignment - 1) & ~(alignment-1);
    


    sipa commented at 9:54 am on September 19, 2016:
    Use align_up?
  23. in src/support/lockedpool.cpp: in 0b4126ae29 outdated
    58+    /* Don't handle zero-sized chunks */
    59+    if (size == 0)
    60+        return nullptr;
    61+
    62+    for (auto& chunk: chunks)
    63+    {
    


    sipa commented at 9:59 am on September 19, 2016:
    Code style nit: braces on the same line, except for namespaces, classes, functions, methods.
  24. in src/support/lockedpool.cpp: in 0b4126ae29 outdated
    301+    if (!addr)
    302+        return false;
    303+    if (locked) {
    304+        cumulative_bytes_locked += size;
    305+    } else if (lf_cb) { // Call the locking-failed callback if locking failed
    306+        if (!lf_cb()) { // If the callback returns false, free the memory and fail, other consider the user warned and proceed.
    


    sipa commented at 10:20 am on September 19, 2016:
    Typo: otherwise
  25. sipa commented at 10:31 am on September 19, 2016: member
    Concept ACK
  26. laanwj commented at 12:39 pm on September 19, 2016: member

    Review comment on the first two commits: if you’d change the variable name of the field/variable whose type changes, it’s obvious from the diff that you’ve adapted all places in the code that affect it

    Good idea, I did this for the class field at least (chKey to vchKey and chIV to vchIV), makes sense as a general suggestion.

  27. laanwj force-pushed on Sep 20, 2016
  28. laanwj commented at 1:38 pm on September 20, 2016: member

    rebased:

    • Squashed the squashme commits, except the last one.
    • Fixed all of @sipa’s nits.
    • All variables and member variables that change type in the wallet commits have been renamed.
  29. jonasschnelli commented at 3:01 pm on September 20, 2016: contributor
    Concept ACK, impressive work, will try to test this soon.
  30. laanwj commented at 3:14 pm on September 29, 2016: member
    Ref: #3949
  31. laanwj force-pushed on Sep 29, 2016
  32. laanwj commented at 4:06 pm on October 18, 2016: member
    Added a benchmark to bench/
  33. wallet: Change CCrypter to use vectors with secure allocator
    Change CCrypter to use vectors with secure allocator instead of buffers
    on in the object itself which will end up on the stack. This avoids
    having to call LockedPageManager to lock stack memory pages to prevent the
    memory from being swapped to disk. This is wasteful.
    999e4c91c2
  34. wallet: Get rid of LockObject and UnlockObject calls in key.h
    Replace these with vectors allocated from the secure allocator.
    
    This avoids mlock syscall churn on stack pages, as well as makes
    it possible to get rid of these functions.
    
    Please review this commit and the previous one carefully that
    no `sizeof(vectortype)` remains in the memcpys and memcmps usage
    (ick!), and `.data()` or `&vec[x]` is used as appropriate instead of
    &vec.
    f4d1fc259b
  35. laanwj force-pushed on Oct 19, 2016
  36. laanwj commented at 2:27 pm on October 19, 2016: member
    Rebased for #8788 and #8873
  37. laanwj commented at 8:56 am on October 24, 2016: member
    Pushed a commit to make LockedPool::free() more robust and remove some ugly construction.
  38. in src/support/lockedpool.cpp: in 1cb5f2dcad outdated
    132@@ -133,6 +133,12 @@ Arena::Stats Arena::stats() const
    133     return r;
    134 }
    135 
    136+bool Arena::addressInArena(void *ptr_in) const
    137+{
    138+    uintptr_t ptr = reinterpret_cast<uintptr_t>(ptr_in);
    


    sipa commented at 8:46 pm on October 24, 2016:
    Wouldn’t it be easier to store a begin and end void*, and then compare (ptr_in >= ptr_base && ptr_in < ptr_end)? That way you don’t need the reinterpret cast.

    laanwj commented at 5:26 am on October 25, 2016:
    I’m slightly allergic to pointer math (most of it is undefined for void* anyhow), that’s why the preference to cast to uintptr_t where possible. But sure, for comparisons it can’t hurt (it’s well-defined).
  39. laanwj commented at 8:16 am on October 25, 2016: member
    Added a commit that changes pointer arithmethic to use char* instead of uintptr_t, this required surprisingly few changes.
  40. sipa commented at 6:53 pm on October 25, 2016: member
    Admittedly, you’d need to convert to either char* or uintptr anyway in order to compute the end pointer… feel free to ignore.
  41. laanwj commented at 6:57 pm on October 25, 2016: member

    Admittedly, you’d need to convert to either char* or uintptr anyway in order to compute the end pointer… feel free to ignore.

    Converting to char* and computing an end pointer is what I did in the last commit.

  42. support: Add LockedPool
    Add a pool for locked memory chunks, replacing LockedPageManager.
    
    This is something I've been wanting to do for a long time. The current
    approach of locking objects where they happen to be on the stack or heap
    in-place causes a lot of mlock/munlock system call overhead, slowing
    down any handling of keys.
    
    Also locked memory is a limited resource on many operating systems (and
    using a lot of it bogs down the system), so the previous approach of
    locking every page that may contain any key information (but also other
    information) is wasteful.
    4536148b15
  43. rpc: Add `getmemoryinfo` call
    ```
    getmemoryinfo
    Returns an object containing information about memory usage.
    
    Result:
    {
      "locked": {               (json object) Information about locked memory manager
        "used": xxxxx,          (numeric) Number of bytes used
        "free": xxxxx,          (numeric) Number of bytes available in current arenas
        "total": xxxxxxx,       (numeric) Total number of bytes managed
        "locked": xxxxxx,       (numeric) Amount of bytes that succeeded locking. If this number is smaller than total, locking pages failed at some point and key data could be swapped to disk.
      }
    }
    
    Examples:
    > bitcoin-cli getmemoryinfo
    > curl --user myusername --data-binary '{"jsonrpc": "1.0", "id":"curltest", "method": "getmemoryinfo", "params": [] }' -H 'content-type: text/plain;' http://127.0.0.1:8332/
    ```
    6567999096
  44. bench: Add benchmark for lockedpool allocation/deallocation 444c673d85
  45. laanwj force-pushed on Oct 27, 2016
  46. laanwj commented at 11:21 am on October 27, 2016: member

    Squashed the following commits

    • f42c60a - squashme: uintptr_t to char* for pointer arithmethic
    • 1cb5f2d - lockedpool: Make free() more robust
  47. laanwj merged this on Nov 2, 2016
  48. laanwj closed this on Nov 2, 2016

  49. laanwj referenced this in commit f8723d2318 on Nov 2, 2016
  50. paveljanik commented at 1:01 pm on November 2, 2016: contributor

    OS X, clang (one -Wshadow warning, and one additional error):

    0support/lockedpool.cpp:67:19: warning: declaration shadows a field of 'Arena' [-Wshadow]
    1            char* base = chunk.first;
    2                  ^
    3./support/lockedpool.h:117:11: note: previous declaration is here
    4    char* base;
    5          ^
    6support/lockedpool.cpp:231:49: error: use of undeclared identifier 'MAP_ANONYMOUS'
    7    addr = mmap(nullptr, len, 0x01|0x02, 0x0002|MAP_ANONYMOUS, -1, 0);
    

    man mmap:

    0     MAP_ANON          Map anonymous memory not associated with any specific
    1                       file.  The offset argument is ignored.  Mac OS X spe-
    2                       cific: the file descriptor used for creating MAP_ANON
    3                       regions can be used to pass some Mach VM flags, and can
    4                       be specified as -1 if no such flags are associated with
    5                       the region.  Mach VM flags are defined in <mach/vm_sta-
    6                       tistics.h> and the ones that currently apply to mmap
    7                       are:
    

    So maybe

    0+#ifndef MAP_ANONYMOUS
    1+#define MAP_ANONYMOUS MAP_ANON
    2+#endif
    

    somewhere before its usage?

  51. codablock referenced this in commit 1237647866 on Sep 19, 2017
  52. laanwj referenced this in commit 94c9015bca on Sep 22, 2017
  53. codablock referenced this in commit bc3b9294e8 on Jan 13, 2018
  54. andvgal referenced this in commit 53ede2fba3 on Jan 6, 2019
  55. PastaPastaPasta referenced this in commit bad429f601 on Dec 22, 2019
  56. PastaPastaPasta referenced this in commit d0e6a1c4e4 on Jan 2, 2020
  57. PastaPastaPasta referenced this in commit c5e1cfe005 on Jan 4, 2020
  58. PastaPastaPasta referenced this in commit d15dd02d78 on Jan 4, 2020
  59. PastaPastaPasta referenced this in commit d6df1e605a on Jan 10, 2020
  60. PastaPastaPasta referenced this in commit f6a6860cbc on Jan 10, 2020
  61. PastaPastaPasta referenced this in commit 44b47017ef on Jan 10, 2020
  62. PastaPastaPasta referenced this in commit dd8e0c0153 on Jan 12, 2020
  63. random-zebra referenced this in commit cc84157e52 on May 17, 2020
  64. zkbot referenced this in commit 5ef5d8d268 on Jul 31, 2020
  65. zkbot referenced this in commit 7d94064616 on Sep 29, 2020
  66. ckti referenced this in commit 7a256c4b08 on Mar 28, 2021
  67. furszy referenced this in commit 7793f1804d on Apr 5, 2021
  68. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me