Fix horrific performance found by gmaxwell. #740

pull TheBlueMatt wants to merge 1 commits into bitcoin:master from TheBlueMatt:perf changing 1 files +33 −1
  1. TheBlueMatt commented at 7:58 am on January 3, 2012: member
    I hereby certify that I have greped for ever use of CDataStream and find no case where secure_allocator is necessary. Note that this fix increases blockchain download by gmaxwell’s estimate of some “1000x faster”
  2. luke-jr commented at 8:00 am on January 3, 2012: member
    would be much nicer if you just typedef std::vector vector_type; :P
  3. TheBlueMatt commented at 8:04 am on January 3, 2012: member
    I prefer it the other way, but meh I dont care; changed.
  4. gmaxwell commented at 8:07 am on January 3, 2012: contributor

    well, … 1000x is perhaps an exaggeration. More like 40-50x or so. A complete chain syncup here took 29 minutes on code with all the mlock removed. On the same system syncing to about height 37k took three and a half hours earlier today. I’m not patient enough to benchmark a complete chain syncup without the fix, though perhaps someone else will.

    This problem was really tricky to find— oprofile cycle sampling was completely blind to the slowdown caused by mlock, I guess if I had hooked GLOBAL_TLB_FLUSHES I would have seen it but who would guess to do that? Valgrind’s callgrind couldn’t see it (unsurprisingly). I eventually found it with ltrace.

    In general we shouldn’t be calling mlock per-allocation for anything. Even on private key data it’s really too slow to use more than a couple times during execution, though I guess it’s not completely horrific if just used for that. Instead we should pre-allocate a mlocked arena and allocate from that for private keys. Perhaps boost has something to make this easier?

    Independent of the that, CDataStream’s wide usage and allocation behavior make me cringe. It’s creating a LOT of tiny indirect heap allocation calls all over the code. That can’t be good for performance. (though obviously not on the same scale as the mlock overuse)

  5. gavinandresen commented at 3:42 pm on January 3, 2012: contributor

    Error compiling, compiler version problem?

    i686-apple-darwin10-llvm-g++-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.6)

    serialize.h:872: error: ‘CDataStream::CDataStream(const std::vector<char, std::allocator >&, int, int)’ cannot be overloaded serialize.h:867: error: with ‘CDataStream::CDataStream(const std::vector<char, std::allocator >&, int, int)’

    Also, not zeroing freed memory any more makes me nervous, because memory containing leftover garbage is a good building block for remote code exploits. Maybe a free_after_delete_allocator… (does boost already define one?)

  6. sipa commented at 3:54 pm on January 3, 2012: member
    Same error here, on Ubuntu. I’m also in favor of keeping the zeroing functionality, even for non-mlocked allocations.
  7. gmaxwell commented at 4:01 pm on January 3, 2012: contributor

    Boost apparently does have an allocator, which you can override how it gets memory: http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/interfaces/user_allocator.html If we want to be properly security paranoid, and we’re using a specialized allocator, we could potentially add canary functionality just like the stack protection in addition to the zeroize. I’d have to dump the allocator usage data to see how much overhead that might have. (e.g. add a word after (and perhaps before) every allocation which is checked on free)

    I think the mlock security improvement is pretty inconsequential. The significance of the zeroizing at making use-after-free exploits hard is more significant. While we’re discussing this, we should note that if we use our own allocator we should add IFDEFed out valgrind macros (see memcheck.h IIRC) so that we don’t reduce valgrind’s sensitivity.

  8. TheBlueMatt commented at 6:43 pm on January 3, 2012: member
    Fixed (I need to stop coding so late, probably also why I didnt catch this bug before I included it in my encryption pull…) My 2 cents (well, ok more like 1): use-after-free exploits are so rare its almost not worth doing the freeing. When you combine that with the fact that the item in question is a std::vector, not a class, I would argue its even less of an issue. If someone wants to use the stuff in the vector to call a nasty function, they would be 100x more likely to be able to while its still allocated and in memory, not once it has been freed/reallocated.
  9. TheBlueMatt commented at 7:07 pm on January 3, 2012: member

    Oops more to fix( I had to board and didn’t have time to test it) I’ll fix it when I’m in the air and have wifi

    gmaxwell reply@reply.github.com wrote:

    Boost apparently does have an allocator, which you can override how it gets memory: http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/interfaces/user_allocator.html If we want to be properly security paranoid, and we’re using a specialized allocator, we could potentially add canary functionality just like the stack protection in addition to the zeroize. I’d have to dump the allocator usage data to see how much overhead that might have. (e.g. add a word after (and perhaps before) every allocation which is checked on free)

    I think the mlock security improvement is pretty inconsequential. The significance of the zeroizing at making use-after-free exploits hard is more significant. While we’re discussing this, we should note that if we use our own allocator we should add IFDEFed out valgrind macros (see memcheck.h IIRC) so that we don’t reduce valgrind’s sensitivity.


    Reply to this email directly or view it on GitHub: #740 (comment)

  10. TheBlueMatt commented at 7:50 am on January 4, 2012: member
    Actually fixed in that commit.
  11. laanwj commented at 10:53 am on January 4, 2012: member
    ACK, this is great :)
  12. gavinandresen commented at 3:49 pm on January 4, 2012: contributor

    I love the performance improvement, but I still don’t like the elimination of zero-after-free. Security in depth is important.

    Here’s the danger:

    Attacker finds a remotely-exploitable buffer overrun somewhere in the networking code that crashes the process. They turn the crash into a full remote exploit by sending carefully constructed packets before the crash packet, to initialize used-but-then-freed memory to a known state.

    Unlikely? Sure.

    Is it ugly to define a zero_after_free_allocator for CDataStream? Sure. (simplest implementation: copy secure_allocator, remove the mlock/munlock calls).

    But given that CDataStream is the primary interface between bitcoin and the network, I think being extra paranoid here is a very good idea.

  13. gmaxwell commented at 4:09 pm on January 4, 2012: contributor

    The performance difference from avoiding the zeroization doesn’t appear to be huge: It saves about 30 seconds out of a 30 minute full sync (compared to not zeroizing).

    Chart showing all mlocks gone, vs bluematt’s patch: http://people.xiph.org/~greg/bitcoin-sync.png The bluematt patch gets a late start due to the time spent filling the keypool, but because its a bit faster it eventually catches up.

    The mlocks on the keying stuff are still problematic: The keypool refill takes 17 seconds with the mlocks in, <1 second with them out. Users with encrypted wallets will often do a mass refill when they unlock. To resolve that the frequency of mlocking should be reduced. The pool allocator I linked to above should make that easy to use (make secure_allocator use the pool, make the pool use malloc + mlock).

    The frequent mlock usage may also be creating security weaknesses in the form of mlock ineffectiveness. If the process leaks mlocked pages then eventually it will hit the limit (e.g. I think most linux desktops have a 1MB/process limit or something like that), and past the limit pages will no longer be mlocked.

  14. TheBlueMatt commented at 7:04 pm on January 4, 2012: member
    I was thinking it would make more of a performance difference, but as long as its very minor, I just made it zero_after_free.
  15. gmaxwell commented at 7:10 pm on January 4, 2012: contributor
    BlueMatt: I don’t think you should have to provide a allocate(). Though maybe it’s fine to leave it in case we convert it into a canary allocator.
  16. sipa commented at 10:49 pm on January 4, 2012: member
    ACK
  17. Fix horrific performance found by gmaxwell. f7a9a11391
  18. TheBlueMatt commented at 0:46 am on January 5, 2012: member
    @gmaxwell heh, of course…maybe I just need to give up programming at this point…
  19. gmaxwell commented at 4:01 am on January 5, 2012: contributor
    ACK
  20. gavinandresen referenced this in commit 7486c64dd8 on Jan 5, 2012
  21. gavinandresen merged this on Jan 5, 2012
  22. gavinandresen closed this on Jan 5, 2012

  23. coblee referenced this in commit 4d52355e5d on Jul 17, 2012
  24. ptschip referenced this in commit 3199704967 on Aug 14, 2017
  25. sipa referenced this in commit ef7117193c on May 22, 2020
  26. sipa referenced this in commit 67f232b5d8 on Jun 9, 2020
  27. stackman27 referenced this in commit 78cde6f8c7 on Jun 26, 2020
  28. rajarshimaitra referenced this in commit ea333901cc on Aug 5, 2021
  29. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me