leveldb: Win32WritableFile without memory mapping #6917

pull laanwj wants to merge 3 commits into bitcoin:master from laanwj:2015_10_leveldb_win_nomap changing 1 files +39 −197
  1. laanwj commented at 9:43 pm on October 30, 2015: member

    Use a simple Win32WritableFile, equivalent to PosixWritableFile in posix_env.cc

    The goal is to solve leveldb corruption issues on windows - this issue has been reported many times, see #5865, #6727, #5610, …

    In my testing with this change I was unable to cause the database to be corrupted when crashing a windows laptop in various ways using notmyfault. Without the change it happened the first time I tried.

  2. leveldb: Win32WritableFile without memory mapping
    Use a simple Win32WritableFile, equivalent to PosixWritableFile in
    posix_env.cc
    256bea33ab
  3. laanwj added the label UTXO Db and Indexes on Oct 30, 2015
  4. sipa commented at 9:45 pm on October 30, 2015: member
    Please submit to https://github.com/bitcoin/leveldb first?
  5. laanwj commented at 9:51 pm on October 30, 2015: member
    Thought about that, but I think it gets more exposure and testing here. (obviously it should still go there when accepted…)
  6. sipa commented at 10:06 pm on October 30, 2015: member
    @laanwj Sure!
  7. jgarzik commented at 10:31 pm on October 30, 2015: contributor
    ut ACK
  8. dcousens commented at 0:24 am on October 31, 2015: contributor
    ut ACK
  9. sipa commented at 4:27 am on October 31, 2015: member
    We should probably benchmark a reindex with this?
  10. gmaxwell commented at 4:38 am on October 31, 2015: contributor
    @sipa Yes, and if its much slower it’s reason to figure out whats strictly required for the syncs. And if it’s not, we could call it done. Reindex time right now is ruined by signature validation, if someone benchmarks that they must be sure to bypass.
  11. laanwj commented at 7:40 am on October 31, 2015: member
    This is the same as how it works on POSIX already, directly converted to WIN32 API. By all means, benchmark, but IMO we should apply this nevertheless (if it has no new bugs). We should first get rid of stability issues and only then worry about optimization. This will likely save many people from having to do a reindex at all.
  12. MarcoFalke commented at 8:45 am on October 31, 2015: member
    Concept ACK
  13. leveldb: Don't do anything in Flush()
    Flush() is supposed to clear application-side buffers (like fflush). As
    this writable file implementation uses the OS function directly, there
    are no buffers to flush.
    2fc32341cc
  14. laanwj commented at 9:12 am on October 31, 2015: member

    Last commit is optional and potentially risky. From what I understand, leveldb’s WritableFiles are supposed to do their own caching, and flush() is supposed to flush this cache to the OS (its implementation is fflush(FILE*) in POSIX). As this implementation uses OS calls directly, there is no need for that. With that reasoning, Flush() can be empty.

    … it’s of course also possible to use libc FILE* fwrite fflush on Windows instead of using the WIN32 API directly, then use this hack to sync, this will provide local buffering, but I was afraid that extra level of buffering may introduce syncing issues (see my post below for an implementation of this)

    Then again these are all performance concerns. As said above it may be wrong to worry about that here, I’m mostly concerned with this being correct.

  15. Diapolo commented at 9:18 am on October 31, 2015: none
    Isn’t LevelDB dead at all? Their Github repo seemed to be not actively developed on the last time I checked.
  16. laanwj commented at 9:19 am on October 31, 2015: member
    @Diapolo That is a good observation. That’s why I’m doing an attempt at maintaining it. Can you help testing?
  17. Diapolo commented at 9:25 am on October 31, 2015: none
    @laanwj Yes, will first do a -reindex on testnet to be able to compare before and after… after that I’m going to kill the testnet instance and see if it’s corrupting.
  18. laanwj commented at 9:26 am on October 31, 2015: member
    Thanks!
  19. laanwj commented at 11:02 am on October 31, 2015: member
    Here is an alternative implementation of Win32WritableFile that uses application-buffered libc primitives (fopen, fwrite, …): https://github.com/laanwj/bitcoin/tree/2015_10_leveldb_win_nomap2 . Barely tested, and I’d not necessarily suggest using it instead of this, but if someone is going to benchmark it should be included (as it reduces the number of system calls in case of lots of small writes, which we don’t do, but leveldb may internally…).
  20. laanwj commented at 9:03 am on November 1, 2015: member
    @jonasschnelli it would be useful to have executables here, mind pointing your nightly builder at this?
  21. jonasschnelli commented at 2:05 pm on November 1, 2015: contributor
  22. NicolasDorier commented at 2:59 pm on November 1, 2015: contributor
    I am so glad about it you can’t imagine the time I lost because of this bug. This is the only reason I don’t have a full node on my laptop. Trying it right now.
  23. in src/leveldb/util/env_win.cc: in 2fc32341cc outdated
    428+    : filename_(fname)
    429 {
    430-    return ((x + y - 1) / y) * y;
    431+    std::wstring path;
    432+    ToWidePath(fname, path);
    433+    DWORD Flag = PathFileExistsW(path.c_str()) ? OPEN_EXISTING : CREATE_ALWAYS;
    


    sipa commented at 8:20 pm on November 1, 2015:
    Any reason not to use OPEN_ALWAYS (which should open if exists, and create if not, according to https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx)?

    sipa commented at 8:21 pm on November 1, 2015:
    Oh, this code is just moved.

    laanwj commented at 2:33 am on November 2, 2015:
    The goal is to have f=fopen(filename, "wb") semantics. It’s possible that this could be improved, but indeed I just copied the code from the previous WritableFile implementation.
  24. in src/leveldb/util/env_win.cc: in 2fc32341cc outdated
    137-    size_t _TruncateToPageBoundary(size_t s);
    138-    bool _UnmapCurrentRegion();
    139-    bool _MapNewRegion();
    140-    DISALLOW_COPY_AND_ASSIGN(Win32MapFile);
    141-    BOOL _Init(LPCWSTR Path);
    142+    std::string filename_;
    


    sipa commented at 8:53 pm on November 1, 2015:
    Is this field used at all?

    laanwj commented at 2:36 am on November 2, 2015:
    Good catch. It’s there for troubleshooting. Looking at env_posix it’s supposed to be passed as ‘context’ argument to IOErrors, but that’s not done at the moment.
  25. sipa commented at 9:04 pm on November 1, 2015: member
    @laanwj We don’t do small writes anyway, so I doubt an application-level cache would help.
  26. sipa commented at 10:07 pm on November 1, 2015: member
    Concept ACK in any case.
  27. laanwj commented at 2:29 am on November 2, 2015: member

    @laanwj We don’t do small writes anyway, so I doubt an application-level cache would help.

    We don’t, but maybe leveldb does internally? (WritableFiles are used for all of leveldb’s file creation and writing, also when building/rebuilding tables, not just to write out database batches. Not sure how careful it is to not, say, call Append multiple times to write different parts of data structures)

  28. jonasschnelli commented at 2:10 pm on November 2, 2015: contributor
    I just have started a fresh node (mainnet / this PR) on a win8.1pro with enabled mcafee antivirus (SSD / 2GB RAM / 1.4 GHZ Core i5) … will report soon.
  29. jonasschnelli commented at 9:25 am on November 3, 2015: contributor

    Have synced up to 318301,.. twice restarted bitcoin-qt (just to see if stop / start / checkblocks works). No problems with the database so far.

    My node is still syncing (very slow, serval blocks per minutes). But a strange problem appeared. Qt console window says I’m no block 318301, … looking in the debug.log i can see that im 318613… Peers windows is updating… but somehow with missing data.

    bildschirmfoto-2015-11-03-um-10 19 21

    Peers:

    Taskmanager:

  30. laanwj commented at 11:09 am on November 3, 2015: member
    @jonasschnelli that’s a known issue and unrelated to these changes, see #5664
  31. NicolasDorier commented at 6:36 am on November 4, 2015: contributor

    Pro tip for corruption problem : Attach a disk, index the blockchain data on it, unplug savagely.

    I had corruption everytimes my cat played with the ethernet cable between my computer and my network disk.

    I did not managed to reproduce the problem so far by just killing the process, I’ll try once I find a spare disk somewhere.

  32. sipa commented at 6:38 am on November 4, 2015: member
    I’m pretty surprised that having the blockchain database on a network disk works at all…
  33. NicolasDorier commented at 6:41 am on November 4, 2015: contributor
    Mapped Drive, then just pointing to SMB share. I’m not at my office, so I’ll try with external drive unplugged savagely instead of real network disk as soon as I can.
  34. sipa commented at 6:43 am on November 4, 2015: member
    Well it’s good to know that it seems to work even with a network filesystem. But network filesystems usually offer much weaker synchronization guarantees, so often are inappropriate for databases with strong consistency requirements.
  35. NicolasDorier commented at 7:15 am on November 4, 2015: contributor

    Yes I agree, this is why I switched to a normal VM later on Microsoft Azure. But since I’ve moved to VM they are sometimes rebooted automatically for maintainance purposes, and it got corrupted several time.

    The good thing about mapped disk is that thanks to weaker synchronization guarantee, it makes easier to reproduce data corruption problems ! :)

  36. laanwj commented at 10:04 am on November 4, 2015: member

    The issue that this change tries to improve is extremely easy to reproduce already: it happens nearly every time bitcoind or bitcoin-qt running on Windows crashes, or the Windows crashes, or the power is pulled. This happens for local filesystems on drives connected through SATA.

    The PR does not purport to fix issues with external harddisks, network filesystems, corrupting cables, felines, and other additional sources of complications. Being robust to those would be great, if realistically possible, but is not the immediate goal here.

  37. jonasschnelli commented at 12:11 pm on November 4, 2015: contributor

    I have powered down (unclean sudden shutdown) my Win8.1 VM and restarted Windows and Bitcoin-Qt. Bitcoin-Qt started without issues and is ~ on the same height (353900). I’ll wait now a couple of minutes and will power down (unplug) the VM host system (MacMini) and see, what happens then.

    Would it be worth to do the same procedure with the current master WITHOUT this PR to compare it?

  38. laanwj commented at 12:45 pm on November 4, 2015: member

    Would it be worth to do the same procedure with the current master WITHOUT this PR to compare it?

    Sure. If you have a VM environment where you can take a snapshot, that’s helpful, you can return to that after messing up the db.

  39. jonasschnelli commented at 12:47 pm on November 4, 2015: contributor
    I made a snapshot before the first sudden shutdown. But just powered down the host by unplugging the power cable… Best chain could be activated and the node continues syncing. Will do some more shutdowns and see if nothing happens,… then switch to current master and try again.
  40. jonasschnelli commented at 1:21 pm on November 4, 2015: contributor

    Now, after a “power off” (sudden shutdown), the db is corrupted (still using this PR).

    “Rebuild the block databases” ends with “error opening database”:

    Log:

  41. laanwj commented at 2:13 pm on November 4, 2015: member
    Re: “bad undo data”, see #6923 - it’s a potential issue, but not something that could be solved by this (as it isn’t a leveldb problem)
  42. jonasschnelli commented at 2:43 pm on November 4, 2015: contributor

    I now did restore serval times back to my snapshot and did some sudden power downs. ~50% it will end up with a corrupt database (100% with “bad undo data”).

    Will now try to run my snapshot with current master without this PR.

  43. laanwj commented at 2:54 pm on November 4, 2015: member
    That’s strange - I only managed to get the undo data error once in all my tries, whereas (without this PR) it produced consistent leveldb errors.
  44. gmaxwell commented at 3:37 pm on November 4, 2015: contributor

    Difference in VM flushing behavior, I guess — it may just be that the VM buffers and reorders writes and ignores fsync.

    Although… we appear to be missing a FileCommit in UndoWriteToDisk– I think we need to have synced the blocks and undo before calling the insert.

    It would probably be better for performance if it wrote the block then undo, then did the syncs on both however.

    Edit: oh nevermind, FlushBlockFile hits the undo file too. :-/

  45. leveldb: report filename in errors in Win32WritableFile d3ad568777
  46. jonasschnelli commented at 3:50 pm on November 4, 2015: contributor

    With the current master (without this PR), after every sudden shut down, i get “Corruptions: error in middle of record”.

    This PR looks after an improvement but getting #6923 for roughly 50% of power-off shutdowns.

  47. laanwj added the label Windows on Nov 4, 2015
  48. laanwj commented at 4:49 pm on November 4, 2015: member

    If it improves things, that is good.

    Filed this for the leveldb repository: https://github.com/bitcoin/leveldb/pull/9

  49. laanwj commented at 4:51 pm on November 4, 2015: member
    Closing the pull here, as it has been moved to the leveldb repository. It should come back to the bitcoin repository in a subtree update.
  50. laanwj closed this on Nov 4, 2015

  51. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me