[Wallet] Wallet-pruning #5389

pull cozz wants to merge 1 commits into bitcoin:master from cozz:cozz10 changing 13 files +265 −13
  1. cozz commented at 2:25 pm on November 28, 2014: contributor

    Wallet pruning can be used to speed up and shrink large wallets.

    Transactions will only be removed, if this for sure does not affect your balance.

    RPC prunewallet(n) removes transactions older than n days (n >= 1). Or adding prunewallet=n to bitcoin.conf prunes on startup. Note that the latter is required to prevent rescan from readding the transactions.

    A transaction is pruned, if the following conditions are met

    • at least 100 confirmations
    • older than n days according to block-timestamp (n >= 1)
    • our outputs must be completely spent and also the spending transaction must have at least 100 confirmations
    • the inputs we spent are also pruned/prunable transactions, so that they dont require us as a spent-flag anymore

    A new class STXO has been introduced, which loops over the last 100 blocks and builds a temporary recently-spent-outputs index. Its only used on rescan, if wallet pruning is enabled. This is necessary to answer the question “has the spending transaction at least 100 confirmations?”. Because if not, a rescan wants to always add the transaction to the wallet, and not skip. This is because, when we rescan, we dont have the spending transaction anymore in the wallet, to check how many confirmations it has. Also a rescan checks if spent or not in our coins-utxo, because the wallet does not know if spent or not while rescaning.

    Note that the feature is disabled if someone uses the account-feature. I could add support for this, but I am not bothering if our plan is to remove the account-feature. Because pruning as is, would destroy account-balances.

  2. [Wallet] Wallet-pruning 3b33f7e274
  3. laanwj added the label Wallet on Dec 1, 2014
  4. laanwj commented at 8:36 am on December 2, 2014: member

    Interesting concept.

    It’s a good observation that it is possible to ‘freeze’ transactions that are fully spent and no longer count toward the balance. They can be excluded from GetBalance() computations and such. You could hide them in default views.

    I’m not so sure about actually deleting them though - storage is cheap so it doesn’t really make much sense to remove old transaction data for storage concerns. And for privacy, well, the transactions are still there in the block chain. You could delete the metadata, but not much more. (if no-address-reuse was rigorously enforced in Bitcoin one could also prune old keys and have some plausible deniablity, i suppose… but right now that would be foot-shooting waiting to happen)

  5. cozz commented at 10:35 am on December 2, 2014: contributor
    Ok, closing.
  6. cozz closed this on Dec 2, 2014

  7. laanwj commented at 10:57 am on December 2, 2014: member
    I didn’t mean that as ‘go close this!’. I just wanted to get a discussion started. There could be other people for which this may be just what they needed.
  8. laanwj reopened this on Dec 2, 2014

  9. gmaxwell commented at 2:21 pm on December 2, 2014: contributor
    The reorging of the data to avoid needlesly iterating over them sounds interesting. I agree that deleting is less exciting. An additional thing to contemplate is deleting only the data which is in the chain, e.g. keeping any metadata.
  10. cozz commented at 12:03 pm on December 3, 2014: contributor

    Our wallet by design loads everything in memory, including all transactions. So you reduce memory usage, disk space and speed up many wallet operations.

    Of course maybe nobody even has a wallet that big, where things make difference. But the feature doesnt hurt anybody, and just in case someone complains about performance or memory issues with a large wallet, we could point him to this.

    Or if people make regular wallet online-backups, they might want to trim down the wallet.

    Besides useful, to me at least its kind of a cool feature to see all transactions disappear, except those actually holding your coins.

  11. gmaxwell commented at 4:06 pm on December 3, 2014: contributor
    @cozz Absolutely. We should improve the scalability, but I think permantly deleting irrecoverable finiancial information is a cost too great. Instead we should be making possible to load some of this data only on demand, or archiving it off without deleting it completely.
  12. sipa commented at 4:11 pm on December 3, 2014: member
    Ultimately, I think the correct solution is considering the UTXO set and the transaction list/ledger as separate entities, and maybe just occasionally check for consistency between them, rather than constantly computing one from the other. Doing that probably means pretty much a total rewrite.
  13. cozz commented at 9:22 pm on December 3, 2014: contributor

    @gmaxwell agreed about “loading on demand”, but I dont think anybody will redesign the wallet anytime soon, so I was just trying to find a rather simple solution which fits in the current design. For archiving you can simply use backupwallet.

    About “irrecoverable finiancial information”, not sure what you mean. Wallet-pruning is disabled when using the account-feature, and payment-request is not even exposed to rpc and we only show the merchant in the gui in the tx-details. So there is only the actual bitcoin transaction, which I dont consider as irrecoverable. (simple rescan will do)

    Our wallet grows forever by design, so I just thought that a simple straighforward pruning feature might be useful to someone.

  14. luke-jr commented at 12:59 pm on December 5, 2014: member
    Agree with comments that we should just hide and optimise, not delete. Especially not delete on the basis of what is to users essentially random (the UTXOs created when a userside-transaction is received don’t remain associated to that userside-transaction).
  15. mchatham commented at 5:08 pm on January 27, 2015: none

    We have been researching this possibility of deleting fully-spent transactions in an effort to improve performance and reliability of backups with very large wallets. Finding this patch was quite pleasant as it has implemented the solution exactly as we had imagined it might work.

    The main idea behind this is rather simple. It would greatly improve performance for large wallets containing many hundreds of thousands of transactions where a majority portion of transactions are fully-spent and have many hundreds or thousands of confirmations.

    Another reason for this patch would be to alleviate the stress of handling wallet backups. While storage is not a concern, creating backups that get quite large(and will continue to grow) is time consuming and can be rather concerning. It would seem reasonable that a task requiring more time to complete would also have more time to fail.

    We believe this feature to be exactly what we need. We are experiencing these issues now and it’s obvious that the problem will only worsen with continued use.

  16. arnuschky commented at 10:44 am on February 7, 2015: contributor

    We’re in almost the same situation as @mchatham. We’ve been researching to develop this functionality ourselves, and are delighted to see that it has been already done. Thank you, @cozz.

    Some of bitcoind’s functionality does depend on the fact that the involved addresses are in the wallet. For high-throughput installations (e.g., merchants), this was always a problem: one had to chose between having the functionality and facing the absolute non-scalability of the wallet, or implement the whole wallet functionality separately. This patch allows one to use bitcoind for high-throughput installations again, which is very useful.

    Has anyone used this patch in production yet? @cozz, does this play ball with #4702?

  17. cozz commented at 9:10 pm on February 7, 2015: contributor

    @arnuschky not yet, you would also need to delete from mapOrderedTxItems. But that would be easy.

    I think it would be good to have a final decision on both patches.

  18. arnuschky commented at 3:33 am on February 8, 2015: contributor
    @cozz: Ok, thanks. I am going to deploy this PR on a test instance and see how it behaves.
  19. mchatham commented at 1:55 am on February 17, 2015: none

    We agree that it’s like @arnuschky has stated, this patch will allow bitcoind to be used in high-throughput situations again. We would add that for those already using bitcoind and finding themselves needing this patch, it will be a godsend.

    In an article by Mike Hearn wrote back in December 2013, he made the statement about server wallet problems saying, “As far as I know exchanges and major payment processors have all had to implement lots of custom code to work around the lack of scalability of Bitcoin-Qt”(https://medium.com/@octskyward/merge-avoidance-7f95a386692f). In our research of the topic we’ve learned that this issue has largely gone unaddressed. While he specifically is referring to wallets with large numbers of keys we feel the same applies to wallets with lots of transactions.

    It’s our belief this decreases stability in the Bitcoin world if every exchange or payment processor must develop their own wallet software versus simply using the default wallet implementation.

  20. gmaxwell commented at 2:25 am on February 17, 2015: contributor

    @mchatham The lobbying isn’t helpful. If you’re interested in this you could work on an alternative which does not destroy information and instead just skips loading it when a switch is set, as that would address the concern.

    I believe the project is not interested in a workaround that creates any data loss risk.

  21. arnuschky commented at 8:18 am on February 18, 2015: contributor

    @gmaxwell: I don’t really get this “loss of financial information” argument. All information is contained in the blockchain and can be recovered anytime with a rescan, or am I wrong here?

    On the other hand, I think it is very reasonable to allow the user to decide to which extend the wallet stores/replicates information on past transactions. Especially as this functionality allows users to use the current wallet implementation for addresses with large amounts of transactions.

  22. sipa commented at 10:16 am on February 18, 2015: member

    It does not contain all information. It misses timing data, comments that may have been added, data due to the usage of the payment protocol, and everything from unconfirmed transactions (which isn’t a consideration here specifically).

    Worse, it is incompatible with blockchain pruning which we’d very much like to get in the next release.

    Please, the memory usage problem is trivial to solve in a much less invasive way.

  23. cozz commented at 7:52 pm on February 18, 2015: contributor
    It seems like this is going nowhere anyway, so closing it, saving us further pointless discussion.
  24. cozz closed this on Feb 18, 2015

  25. mchatham commented at 8:34 pm on February 18, 2015: none

    We’re in agreement, the risk of data loss is apparent, but it’s always a risk. For those in our situation, the risk of data loss increases with wallet database size. Because of this, managing backups for these wallets is a genuine concern for us. We make backups every 100 addresses.

    How often would you recommend wallet backups be created? We’ve seen one recommendation recently that indicated every 90 transactions. If we followed this recommendation we’d be creating a backup of our wallet many times a day. This overall would not be an issue if it were not for the sheer size of the wallet on disk. Ours is currently at 1.2GB, and as we all know for sure, will continue to get larger.

    In the case of this patch, what’s the real concern with losing data? As far as we see, there’s nothing in this patch that would in effect destroy any value that we own. No coins could be destroyed. Not by normal operation. If the concern is with losing meta-data, be reminded that we can already provide -zapwallettxes with 2, and it will simply not restore the old meta-data after rescan.

    While an alternative to destroying data would improve performance for large wallets, it completely ignores the other issue: ALL data forever being stored in a single, very important file. @gmaxwell What else to do when a wallet file gets this large? Your recommendations for an alternative only solve one half of the current problem and this patch solves both.

    We’ve given this feature a lot of thought and we’re well aware of the risks involved. Just the same as we’re aware of the risks involved with other features available in the client, which some, with improper use could actually result in losing bitcoins.

  26. gmaxwell commented at 8:40 pm on February 18, 2015: contributor
    @mchatham The correct (initial) coarse of action would be to fix the issues causing your wallet to become 1.2GB. What version of the software are you using that is creating wallets that large? (I ask because some issues that caused wallet bloat were previously fixed).
  27. mchatham commented at 8:54 pm on February 18, 2015: none
    @gmaxwell We have been in 0.9.4 for quite sometime.
  28. arnuschky commented at 10:14 am on February 19, 2015: contributor

    @gmaxwell Same here, our wallet.dat is > 2.5 GB. Running 0.9.3 (currently evaluating 0.10.0). No, nothing is broken with that wallet, we simply have few addresses with tens of thousands transactions each. We might split this up on multiple nodes (one address each), but that is kind of a last-resort workaround.

    Once the wallet is this big, it becomes extremely slow - it doesn’t scale at all. AddToWallet currently takes 2-3 seconds, a rescan is accordingly an operation that takes a 1-2 weeks.

    Now, one might claim that we’re an edge case, pushing bitcoin core beyond it’s limits. I agree that this is the case at the moment, but over time, more and more users will reach similar wallet sizes if there’s no pruning. This PR might not be the solution and just a temporary workaround (a quite acceptable one for us, but maybe not for the general public). However, the scalability problem of the wallet needs to be addresses at some point. @cozz Tested your patch. prunewallet 60 goes away and does something for a long time and claims to have pruned a few thousand tx. However, there’s nothing in the debug log and wallet.dat didn’t shrink at all. Am I missing something obvious here?

  29. cozz commented at 0:19 am on February 20, 2015: contributor

    @arnuschky I fixed the biggest wallet-bottlenecks here #4805 #4712 and #4702. However the last one has recently been closed, because a guy is eventually going to redesign the wallet at some unknown point in the future, but still going to load the whole wallet in memory on startup.

    To me wallet-pruning is a feature as obvious as blockchain-pruning. Making the two compatible would not be a problem. The argument that you should shrink your 1.2GB wallet by upgrading, is not an argument to me. Even if there is wallet-bloat and he can shrink his wallet to 120MB, what if bitcoin scales to 10MB blocks, or his company by factor 10, or he just keeps using his wallet for years. We must assume scale. There should be an option to be able to delete old transactions from the wallet somehow for sure. The potential performance boost is just a side-effect here, because our wallet design is bad.

    There is another patch which improves wallet performance here #5411, this has been closed, because its a little hacky. But with this patch and the other 2 closed ones, even the current wallet design could be used on larger scale. Its just 3 simple patches, as long as we have the current wallet design and they are making such a drama. The whole wallet is hacky, as most might agree, so I am just adapting to what is there, but improve performance. I hope that they are not thinking of removing the wallet completely to solve problems, as they do with the account features.

    As why the patch didnt work for you, I dont know. There should be lots of EraseFromWallet in debug.log. I hope you compiled the branch as is, and did not apply the patch to some other branch like master or 0.10. Back in the day, when I tested it, I actually saw the transactions disappear in the GUI, and I also saw the wallet.dat file to actually shrink from 20MB to like 100kb.

    However, I am not going to spend anymore time on this patch, if the bitcoin core developers disagree with it.

  30. gmaxwell commented at 0:45 am on February 20, 2015: contributor

    We must assume scale

    Which is part of why why addressing performance problems by irrecoverably deleting financial information is just not considered acceptable here. We’re happy to work on features that improve scale and efficiency, but not ones where the system forces (or strongly encourages) behavior which is irresponsible.

    I consider deleting transaction data to be irresponsible, and I think there is a general consensus around this among comitters on this point. People are, of course, free to run their own wallet software if they wish to adopt this kind of behavior (see also: mtgox). Part of the value the reference wallet can provide is sticking to responsible, safe behavior even when it stands in the way of commercial expedience. Instead we should adopt designs that accomplish a goal of scaling without compromise (as Pieter suggested in his first response).

    A 2.5GB wallet is just outright broken– assuming that it’s not somehow responsible for 5% or more of all Bitcoin transactions which have ever happened, and we should figure out whats causing that. This ecosystem has enough irresponsible behavior from people patching over issues instead of fixing things. We don’t need to add more.

  31. sipa commented at 8:12 am on February 20, 2015: member
    @arnuschky Nobody is saying that scaling problems shouldn’t be addressed. The reason the wallet is slow is not because the file is large, but because all transactions are loaded into memory, and almost all operations iterate over all of them. It’s been suggested to remove old fully-spent ones from memory at loading time, which is much less invasive instead.
  32. arnuschky commented at 12:30 pm on February 20, 2015: contributor

    Yes, sorry for taking this discussion further than needed. I keep forgetting that these are PRs for merging into the bitcoin core mainline, and this kept arguing that this patch has merit for some individuals/specific applications.

    Regarding a more constructive direction of discussion: is someone working on a new wallet implementation? If yes, who/where? Is there a feature- or task-list? Can we contribute? I guess you are referring to the work of @jonasschnelli and #5752, #5758, #5744, #5761, #5745, no? @gmaxwell What’s a normal wallet size for, say, 1 million transactions? What’s “normal” unbroken use for you? @cozz Thank you for your work and your reply. I’ll have a look at the other PRs and debug this one. It might be hacky, but it’s the only way how we can keep running in the current situation - even if this solution is only temporary.

  33. jonasschnelli commented at 4:01 am on February 21, 2015: contributor

    @arnushky: I’m willing to spend around 20-30h a week in improving/rewriting the wallet. #5761 should be the ticket to discuss wallet related improvements. I don’t have a final wallet concept in my mind.

    Regarding this issue, I’d prefere to write scripts or preforget regtests-wallet-and-blocks-datadir with huge amount of transaction to track down the problems and find solutions to handle such sizes. IMO a 2.5GB wallet must be totally unoptimized. Please have a look at the logdb wallet storage work. There is also a rewriting function. With this new storage backend, the wallet file should not be much bigger then all wtx serialized (+count(wtx)*(keysize+8bytechecksum)). Of course you have to add the amount of keys and metadata to the calculation.

    And handling a 1mio wtx wallet should not be the main scope. It should work but could require large amount of free ram and probably a slowdown of some wallet related functions.

  34. arnuschky commented at 9:47 am on February 22, 2015: contributor

    @jonasschnelli thanks for your answer and all your effort. We will have a look at your logdb wallet as soon as we resolve our current scalability issues. However, I do not understand what you mean by “I’d prefere to write scripts or preforget regtests-wallet-and-blocks-datadir “?

    On the long run, I think a wallet must be able to handle 1mio transactions, even though it might not be the main scope right now.

  35. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-05 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me