wallet flushing thread could use improvement #10236

issue dooglus openend this issue on April 19, 2017
  1. dooglus commented at 5:39 pm on April 19, 2017: contributor

    In a recent blog post Wladimir writes that:

    It turns out that [tests run slowly] because the wallet code does an fsync after every operation to make sure that changes to the database are safely written to disk

    ThreadFlushWalletDB() in src/wallet/walletdb.cpp has roughly the following logic:

    every 500 milliseconds:
        if (the wallet has been modified since the last time it was flushed to disk &&
            the wallet hasn't been modified for at least 2 seconds &&
            we can get a database lock &&
            the database isn't currently in use):
        then
            flush it to disk
    

    This has a couple of problems:

    • If the wallet receives a deposit every second, the “hasn’t been modified for at least 2 seconds” condition is never met, and so the wallet is never flushed to disk. Perhaps the loop needs to have an extra check, “if the wallet wasn’t flushed in the last N1 seconds then flush it now even if it was modified in the last 2 seconds”.

    • If the wallet receives a deposit every 2.5 seconds, the “hasn’t been modified for at least 2 seconds” condition is almost always met, and so the wallet is continually flushed to disk. The wallet file grows bigger with each deposit, and the flushing causes a full rewrite of the file. This causes a quadratically increasing disk usage over time, eventually swamping the capacity of the drive. Perhaps the loop needs to have an extra check: “if the wallet was flushed in the last N2 seconds then don’t flush it now, even if it wasn’t modified in the last 2 seconds”.

    Is there some way of safely updating a BDB database without rewriting the whole file to disk for every update? This seems like a major inefficiency.

    I’ve been struggling with this issue of ever increasing hard drive use as wallet size grows for years now, but only recently discovered the root cause of the problem.

  2. jonasschnelli commented at 11:43 am on April 20, 2017: contributor

    I think the major inefficiency is in BDB itself. Not sure if we can improve this much further.

    IMO the most promising solution to this is switching the database type. @sipa’s logdb was IMO the best alternative (https://github.com/bitcoin/bitcoin/pull/5686). Though, append only maybe no longer be always efficient (since abandontransaction we do update single-property wtx state).

    Others have discussed about switching to sqlite, which IMO is reasonable for the transactions and maybe the pubkey cache.

    For HD wallets, we could probably store the 256bit seed material as Bech32 together with the used-keypath-scheme in a single file that only gets written once. Then maybe use a form of @sipa’s logdb as wtx/pubkey storage.

    The path to that would be to abstract the database interface first (in order to allow BerkleyDB and LogDb the same time [to allow smooth migrations]).

    We already made some baby-steps (#9256, #9143, #9142), next one would be #9951.

  3. sipa commented at 11:52 am on April 20, 2017: member

    For HD wallets, we could probably store the 256bit seed material as Bech32

    Bech32 is designed to protect against human transcription errors, not faulty hardware or software. If you want to protect against hardware errors, your best approach is just writing the whole thing multiple times. Even something like an RS code only corrects individual bytes/words, while disk errors are more likely to lose entire sectors.

  4. jonasschnelli commented at 11:53 am on April 20, 2017: contributor

    Bech32 is designed to protect against human transcription errors, not faulty hardware or software. If you want to protect against hardware errors, your best approach is just writing the whole thing multiple times. Even something like an RS code only corrects individual bytes/words, while disk errors are more likely to lose entire sectors.

    Fair point.

  5. laanwj commented at 2:06 pm on April 20, 2017: member

    Is there some way of safely updating a BDB database without rewriting the whole file to disk for every update? This seems like a major inefficiency.

    To be clear: despite the name, ThreadFlushWalletDB was not actually a flush thread.

    What does the actual fsyncing I’m talking about in the blog post is the destructor of CWalletDB if the constructur was called with _fFlushOnClose = true.

    ThreadFlushWalletDB doesn’t exist anymore in master. It now is scheduled by CScheduler and has the more suitable name MaybeCompactWalletDB.

    But more importantly: Compacting the database is not required to “safely update” the database. The only reason for doing it is to keep the database contained to one file most of the time. Normally a bdb database consists of a .dat file and a directory with log files that contain uncompacted updates to it. But this was deemed confusing for users, a long time ago.

    If you don’t care about that you could (AFAIK!) safely disable that periodic compaction.

    I’ve been struggling with this issue of ever increasing hard drive use as wallet size grows for years now, but only recently discovered the root cause of the problem.

    This one I don’t understand - why would compaction result in a larger file? Isn’t it just that your wallet accumulates more transactions and keys over time?

  6. dooglus commented at 7:33 pm on April 20, 2017: contributor

    @laanwj Compaction doesn’t result in a larger file. But constantly rewriting an ever growing file is problem.

    I would prefer if there was an option to have the wallet ‘forget’ about transactions which are no longer of any interest. ie. if it no longer gives me any utxos then I don’t care about it. I do care about account balances, but not individual transactions. That way the wallet.dat file wouldn’t grow without limit. In the past I have seen wallet.dat grow as large as 260,857,856 bytes. Having 260MB written to disk every 2 seconds can really kill performance.

  7. laanwj commented at 7:39 pm on April 20, 2017: member

    @dooglus As said, you could disable the rewrite. Just make sure you do a compaction before you move the wallet.dat around and/or change the berkeleydb version.

    As for deleting old transactions, that’s a completely different subject. I don’t think the disk space taken by the wallet is a big issue, compared to the block chain it’s peanuts. However the fact that the wallet stores all transactions in memory is a problem there.

  8. dooglus commented at 8:52 pm on April 20, 2017: contributor

    @laanwj The disk space taken by the wallet isn’t a problem. The problem is that the whole wallet is rewritten every time anything in it changes. We already tell people not to copy wallet.dat while the client is running, so couldn’t we simply only run the compaction on shutdown?

    Is compaction currently guaranteed to happen on shutdown, or is it possible that an incoming transaction happens in the last 2 seconds before shutdown and hence no compaction happens?

  9. jonasschnelli added the label Wallet on Apr 21, 2017
  10. laanwj commented at 9:02 am on August 24, 2017: member

    Is compaction currently guaranteed to happen on shutdown, or is it possible that an incoming transaction happens in the last 2 seconds before shutdown and hence no compaction happens?

    It should be guaranteed at a clean shutdown. The reason for doing it intermediately, too, is to make sure it’s also (usually, mostly) in compacted form after a crash.

    (because people tend to want to move wallet.dat’s around, and that’s more not less likely after a crash)

  11. AdvancedStyle commented at 1:15 am on February 16, 2018: none

    Also having issues with this; and was forced to run with -flushwallet=0

    Is there any concerns about running with flushwallet=0, other than not moving the wallet in non-flushed state? Is backupwallet still safe to use?

    Note that this flag made a huge performance improvement for me.

  12. laanwj commented at 7:33 am on February 16, 2018: member

    Is there any concerns about running with flushwallet=0, other than not moving the wallet in non-flushed state

    Nothing besides what is mentioned here. BerkeleyDB is perfectly usable without periodic compaction, if taking into account that a database is a directory, not a single file.

    Is backupwallet still safe to use?

    Yes. Backupwallet does the necessary close/checkpoint before copying, see CWalletDBWrapper::Backup.

  13. MarcoFalke commented at 12:38 pm on May 9, 2020: member

    The feature request didn’t seem to attract much attention in the past. Also, the issue seems not important enough right now to keep it sitting around idle in the list of open issues.

    Closing due to lack of interest. Pull requests with improvements are always welcome.

  14. MarcoFalke closed this on May 9, 2020

  15. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me