Replace use of mmap in leveldb for improved reliability #3340

pull pstratem wants to merge 2 commits into bitcoin:master from pstratem:leveldbnommap changing 1 files +126 −16
  1. pstratem commented at 5:29 am on December 2, 2013: contributor

    There has been speculation recently that the issue OS X users are seeing is related to a bug in leveldb related to mmap.

    This replaces all mmap use in the posix environment handler with pread and write calls.

    There is a performance impact which is significant with microbenchmarks for sequential writes. (db_bench)

    However bitcoin tends to make large batched writes instead of large numbers of smaller writes which significantly reduces the impact.

    I could not measure a meaningful difference in reindex time (using txindex=1) from 1-100k blocks total time was 27 seconds for the mmap version and 28 seconds with this patch.

  2. jgarzik commented at 5:30 am on December 2, 2013: contributor
    Drool. Very nice.
  3. pstratem commented at 6:29 am on December 2, 2013: contributor

    fdatasync is defined as fcntl(fd, F_FULLFSYNC, 0) on os x

    https://github.com/bitcoin/bitcoin/blob/master/src/leveldb/port/port_posix.h#L72

  4. laanwj commented at 6:33 am on December 2, 2013: member

    Good!

    BTW: I suppose not using mmap here also reduces virtual memory requirements for leveldb?

  5. pstratem commented at 7:00 am on December 2, 2013: contributor
    possibly, i haven’t checked.
  6. jgarzik commented at 7:01 am on December 2, 2013: contributor
    @laanwj Correct, it should. Even though leveldb’s mmap’d pages are file-backed and discardable by the OS at any time [after writeout, if dirty], they still require virtual address space to be allocated – which is in limited supply on 32-bit platforms.
  7. wtogami commented at 7:49 am on December 2, 2013: contributor
    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 The 0.8.5-OMG7-no-mmap variant build includes this patch. @toffoo and @coblee are testing it.
  8. laanwj commented at 8:50 am on December 2, 2013: member
    Can you please squash this into a single commit? That makes cherry-picking it to 0.8.6 more straightforward as the second commit is a fix for the first one not an independent change.
  9. mikehearn commented at 5:25 pm on December 2, 2013: contributor
    Do we really want to disable mmap outside of OS X? The issues if any are not in LevelDB, they are in MacOS X. If it’s a loss then I’m not sure it makes sense to hurt performance on platforms that work correctly.
  10. pstratem commented at 7:05 pm on December 2, 2013: contributor

    @mikehearn there is pretty clearly a problem with the way os x handles the interaction of mmap and normal io; however given that this is change has effectively no negative impact on performance and is considerably simpler logic than the mmap version, i think it should be used on all posix platforms.

    (indeed I’ve been advocating for replacing leveldb entirely with a BitcoinKVDB but who has time for that…)

  11. jgarzik commented at 7:36 pm on December 2, 2013: contributor
    @mikehearn Agreed that Linux probably has a sane mmap :) However, besides OSX, no-mmap should hopefully extend the lifetime of Windows/32-bit by reducing virtual address space usage.
  12. pstratem commented at 8:20 pm on December 2, 2013: contributor

    I went back and ran initial block sync to block 200k.

    mmap took 622 seconds write took 646 seconds

    that’s ~4% @jgarzik this patch only addresses mmap on posix however something very similar could be done there (i have no idea what the performance impact would be)

  13. madthanu commented at 9:47 pm on December 2, 2013: none
    @pstratem The PosixWriteableFile class should be fixed to do SyncDirIfManifest (similar to PosixMmapFile) before this goes into production …
  14. sipa commented at 10:49 pm on December 2, 2013: member
    On the issue tracker, Sanjay comments that he experimented with a no-mmap approach as well, and it seems they’re considering it.
  15. pstratem commented at 0:56 am on December 3, 2013: contributor
    call SyncDirIfManifest as @madthanu pointed out
  16. wtogami commented at 6:48 am on December 3, 2013: contributor
  17. laanwj commented at 8:11 am on December 5, 2013: member

    Are we going to do the mmap change for all platforms, or just for MacOSX? (see https://github.com/litecoin-project/bitcoinomg/commit/71eb4af for a patch that makes this OSX-only) I think there’s something to be said for both;

    Advantages to merge it for all Posix platforms:

    • Total virtual memory usage goes down, giving 32-bit architectures some breathing room
    • Consistency

    Advantages to merge it for MacOSX only:

    • Minimize risk
    • Prevent ~4% performance drop with bitcoin workloads on other Posix platforms
  18. laanwj commented at 4:01 pm on December 11, 2013: member

    This is included in a leveldb upgrade to 1.15

    • switched from mmap based writing to simpler stdio based writing. Has a minor impact (0.5 microseconds) on microbenchmarks for asynchronous writes. Synchronous writes speed up from 30ms to 10ms on linux/ext4. Should be much more reliable on diverse platforms.
  19. sipa commented at 5:27 pm on December 11, 2013: member
    Well, the implementation in leveldb 1.15 is by Sanjay (one of LevelDB’s primary authors), not the one from this PR.
  20. laanwj commented at 5:30 pm on December 11, 2013: member
    Yes, I just thought I’d mention it, it probably doesn’t make sense to merge both…
  21. rescrv commented at 10:24 pm on December 11, 2013: none
    I’d recommend using Sanjay’s implementation because it uses userspace-buffered I/O, which will give higher performance than this PR. If it matters, it’ll also minimize drift between upstream and the carried LevelDB code.
  22. pstratem commented at 10:27 pm on December 11, 2013: contributor
    I actually had a version of this that had an internal cache, the performance improvement was surprisingly small.
  23. dont use mmap in leveldb, this is a marginal performance hit
    fail on short writes
    
    Ensure new files referred to by the manifest are in the filesystem.
    b67ca63e8b
  24. add internal memory buffer for PosixWriteableFile
    miinor improvement in sequential write times
    b1a4f7e758
  25. pstratem commented at 11:11 pm on December 11, 2013: contributor

    I’ve added a 1 MB buffer (I have not extensively reviewed it for correctness!)

    I could not measure any performance difference for a single thread, which is the only test I tried (or really care about)

  26. sipa commented at 11:27 pm on December 11, 2013: member
    I don’t really care about write performance, and I certainly prefer keeping a small diff with upstream.
  27. BitcoinPullTester commented at 11:34 pm on December 11, 2013: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/b1a4f7e75864dafdf5be8eda20bbabeb5788150c for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.
  28. gavinandresen commented at 11:44 pm on December 11, 2013: contributor
    +1 for tracking upstream as closely as we can.
  29. wtogami commented at 6:13 am on December 12, 2013: contributor

    I’m pleased that we have a seemingly working implementation in 0.8.6. Please take due care in validating the long-term 0.9 solution.

    FWIW, as 1.15 is favored, Bitcoin OMG’s next build will expose that to users for early real-world testing.

  30. sipa commented at 9:25 pm on December 12, 2013: member
    See #3405 for LevelDB 1.15.
  31. jgarzik commented at 0:03 am on December 13, 2013: contributor
    Superceded by #3405
  32. jgarzik closed this on Dec 13, 2013

  33. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-08 22:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me