[Wallet] replace BDB with internal append only (logdb) backend #5686

pull jonasschnelli wants to merge 7 commits into bitcoin:master from jonasschnelli:2015/01/wallet/logdb changing 17 files +1376 −690
  1. jonasschnelli commented at 3:05 pm on January 20, 2015: contributor

    This pull request is in early stage but discussions could help driving this into the right direction.

    It will basically remove bdb (currently not on compile level) and make use of a internal key/value store. Logdb (the internal key/value store) is a append only store where every key/value-frame contains a sha256 checksum all available frames. Consistency is guaranteed.

    The current implementation passes all tests but has lack of recover/rewrite possibilities.

    Why this

    • IMO BerkleyDB is oversized for a “reference wallet implementation”. This append-only file-store could prevent from corruptions and relieves compile stress.
    • wallet flush threads are no longer required.
    • remove another dependency and get more control over our stack

    What’s next / ToDos

    • add compact Rewrite() function (save whole memory map atomically to a clean file): this is required after encrypting the wallet (added 2015/01/22)
    • add thread safe / concurrency tests for logdb
    • add Recover() function: scan through a corrupt wallet file and try to recover keys even when some key/value-frames and/or checksums are corrupted
    • add more tests
    • add wallet encryption test

    This is on my internal roadmap:

    • remove bdb

    • transform account to labels
    • implement bip32
    • add spv mode
  2. jonasschnelli force-pushed on Jan 20, 2015
  3. jonasschnelli commented at 3:10 pm on January 20, 2015: contributor

    Currently there is no migration tool from bdb to logdb. Because it would be good to avoid Bdb in our binaries, a python tool for wallet migration could do the job.

    IMO the version code 0.1x should make clear that such required manual migrations are still possible.

  4. jgarzik commented at 3:14 pm on January 20, 2015: contributor

    @jonasschnelli Agree that a migration tool is necessary before rolling this out.

    There are a few complications with python that would need double-checking:

    • Are python installs guaranteed capable of reading BDB 4.8-format data?
    • For people using –with-incompatible-bdb configure option – like me and some of the other devs – conversion becomes even more annoying.

    I would prefer a C++ migration tool. Eventually you could split it off into a separate project or repository if the continued BDB link-time dependency remains annoying.

  5. jonasschnelli commented at 3:26 pm on January 20, 2015: contributor

    @jgarzik a C++ tool would probably get better results. Agreed. It would just help the main bitcoin repository to not have to much legacy stuff. There is also some other things that can be removed from the code like the bdb wallet migration and features-checking (example: wallet feature for compressed key, etc.).

    Moving the c++ migration tool to separate repository makes sense. There we could also support the --with-incompatible-bdb option so users could make sure to use the same version as they used while creating/manipulating their wallets.

  6. laanwj commented at 4:04 pm on January 20, 2015: member

    Are python installs guaranteed capable of reading BDB 4.8-format data?

    Yes - Python is generally linked against the system’s BDB, which on modern systems is always >4.8. Newer BDB versions have no problem reading old databases. The problem is the other way around. After the file is “touched” it won’t be backwards compatible anymore. Making a temporary copy then reading that would be a strategy to keep the origin file as-is.

  7. laanwj commented at 4:07 pm on January 20, 2015: member

    Apart form that: nice work! I agree with your internal roadmap.

    I think it makes sense to do this work on a ’newwallet’ branch so that other people can cooperate, and we can merge it into master when the work is complete, which will likely be no sooner than 0.12.

  8. jgarzik commented at 4:13 pm on January 20, 2015: contributor

    +1 @laanwj

    Concept ACK for the wallet changes.

  9. jgarzik commented at 4:16 pm on January 20, 2015: contributor

    @jonasschnelli Yes. I go back and forth between “separate repo now” and “separate repo later”

    For the initial conversion, it makes everything easier if the migration tool is in bitcoin/bitcoin.git and is built with the current build system. That is the path of least resistance.

    Build system & install Just Works. You automatically get the migration tool, even if you didn’t know you needed it until (oops, right now!)

    I try to be extra-conservative with Real Money(tm), which is what we’re dealing with when it comes to the “legacy wallet” we’re maintaining.

  10. jonasschnelli force-pushed on Jan 20, 2015
  11. luke-jr commented at 4:40 pm on January 20, 2015: member
    It may be a good idea to have 0.11 still compatible with bdb wallets without upgrading them - note that all our wallet upgrades have been opt-in so far, and you can still use a 0.4.0 wallet with 0.10 while retaining backward compatibility.
  12. jonasschnelli force-pushed on Jan 20, 2015
  13. jonasschnelli commented at 7:29 pm on January 20, 2015: contributor

    Supporting both (bdb and logdb) wallet formats (at runtime) would require some work. I’m also aware of the missing backward compatibility. Supporting bdb wallets after a possible switch to logdb would require resources better spent at bip32, accounting overhaul, etc. IMO a clean cut would be worth it. Remove all the migration, min-version feature-compatibility legacy stuff.

    I agree with not letting users in the dark and support a migration tool (c++, included in bitcoind for 0.11 or 0.12, but slowly extract migration tool to different repository). First “logdb” release could still include bdb as dependency, used for compiling the bitcoin migration tool. If you miss the library, it will build bitcoin-core without this tool (like #5680).

    Migration of a wallet is a risky process and it’s probably better to do this outside of bitcoind as a conscious decision from the user.

  14. jonasschnelli force-pushed on Jan 20, 2015
  15. jonasschnelli force-pushed on Jan 20, 2015
  16. laanwj commented at 8:39 am on January 21, 2015: member

    On IRC I’ve explicitly argued against dual database support, and I’m going to do so here again. Having an intermediate phase where both databases are supported means legacy concerns (such as accounts) have to ooze into the new wallet interface. It also complicates testing. Are you going to verify that the new wallet will keep working with 0.3 wallets at every step of development?

    An alternative option I’ve thought of is to leave the old wallet untouched. Move wallet,walletdb,rpcwallet (and others as needed) to src/legacywallet.

    Then build this new wallet with new database, new interface in src/wallet/. As well as a migration tool.

    Using a configure option, one can then either build against the new wallet or the old wallet. This gives the ‘conservative’ choice to stick with the old wallet (for however long we’re willing to carry it) with all its quirks, but without intermediate complications for the new code, and having to carry any of that over.

    In the beginning the legacywallet will be the default and the new wallet will be experimental-only.

    This is basically the “fork the wallet and develop it separately” approach, but a bit improvised as it’s all in one repository - for now. It means we can merge this soon(TM) and experiment with it, instead of delaying that to a further future ‘when it’s ready’.

    (the only complication here I see is with regard to the UI, which has to interface with both, but a similar thing can be done there; UI that only applies to the old wallet can be moved to a legacy directory)

  17. luke-jr commented at 9:05 am on January 21, 2015: member
    Sounds reasonable, but if we can’t build with both supported, how will binaries go out? Probably not too much more effort to just keep the class names unique and switch between them at runtime startup, is it?
  18. jonasschnelli commented at 9:25 am on January 21, 2015: contributor

    I like laanwjs idea of supporting both formats with a compile option.

    Two concerns I see:

    • the new wallet needs some changes in init.cpp. I could try to make the init.cpp changes flexible in case of the backend database. Even rpcwallet.cpp could support both. But this needs work.

    The 2nd concern is more a “conceptual” one. If we support both wallet formats, why should one upgrade his wallet? Couldn’t that not lead to a need of supporting both formats? And if we once drop support of BDB, how is the migration process different now from the one in future (the user has to do it once anyway)? Maybe we could force people to upgrade and save some - already rare - development resources. I mean it’s not the network, it’s the wallet.

  19. jgarzik commented at 10:07 am on January 21, 2015: contributor
    The original idea I discussed w/ @sipa on IRC was dropping support for BDB from bitcoind entirely, plus an auto-launched migration tool. Still prefer that arrangement. From bitcoind’s standpoint, BDB support is deleted. No dual mode codebase.
  20. laanwj commented at 11:38 am on January 21, 2015: member

    @jonasschnelli

    • Yes, initialization/deinitalization should be moved to the wallet module(s) themselves and only called from init. This is long due anyhow, and mostly low-impact move-only.
    • My idea is not about supporting two formats, but supporting two different wallets. The old wallet will stay exactly as it is now (barring critical fixes). The new wallet will see new development, and come with new features (as well as get rid of old cruft), and an interface that supports e.g. multiwallet. It just means that the transition process is more smooth.

    More daring users can use the new wallet, whereas more conservative users can stay with the old wallet for say a major release or two until it is completely axed (but the new one, as well as the conversion process, much better tested). @luke-jr Building a monster executable with both wallet implementations could be possible, with e.g. an command line option to switch. Not sure if this is a good idea in practice, but I don’t want to exclude the possibility upfront. We’ll see when we get there I suppose. In the beginning this won’t be an issue as the new wallet will be experimental-only so requiring build-time invocations is a good thing.

  21. jgarzik commented at 11:48 am on January 21, 2015: contributor
    @laanwj hmmm, that is an interesting way to transition. Let’s think through the user making that old wallet / new wallet choice. That plan would indeed make for a nice, clean separation.
  22. jonasschnelli force-pushed on Jan 21, 2015
  23. jonasschnelli force-pushed on Jan 21, 2015
  24. jonasschnelli force-pushed on Jan 21, 2015
  25. jonasschnelli force-pushed on Jan 21, 2015
  26. jonasschnelli force-pushed on Jan 21, 2015
  27. jonasschnelli force-pushed on Jan 22, 2015
  28. jonasschnelli force-pushed on Jan 22, 2015
  29. jonasschnelli force-pushed on Jan 22, 2015
  30. jonasschnelli force-pushed on Jan 22, 2015
  31. jonasschnelli force-pushed on Jan 22, 2015
  32. jonasschnelli force-pushed on Jan 22, 2015
  33. jonasschnelli force-pushed on Jan 22, 2015
  34. jonasschnelli force-pushed on Jan 22, 2015
  35. jonasschnelli force-pushed on Jan 22, 2015
  36. jtimon commented at 3:39 pm on January 23, 2015: contributor

    I like the roadmap with multiwallet support and a legacy wallet / new wallet selectable. Maybe add a migrate button that only appears when the legacy wallet is selected. Maybe we want to support this wallet selection (and the migration button) as a build option disabled by default until it gets more testing as @laanwj also suggests. Then enable it by default or just remove the option and enable it always and eventually, some day, drop support for the legacy wallet entirely. The “alternate repo for the migration tool” would just be the version previous to dropping the support. It could be as short as:

    • 0.11 multiwallet with migration tool
    • 0.12 drop support for legacy wallet.
  37. jonasschnelli commented at 8:47 am on January 25, 2015: contributor

    I did some brainwork in how we could handle the wallet code. Supporting both type of wallets at runtime or as compile option would be possible but some has drawbacks:

    • if we support both wallets, things like multiwallet or bip32 needs implementation in both worlds. Even if only one wallet type can handle things like multiwallet support, the interface needs adaption for supporting this somehow.
    • also how would we handle unit-tests, rpc-tests? Run twice for both wallets? IMO this makes no sense.

    I also did a simple implementation of a possible command line arg to switch the wallet type. Abstract CWallet, implement CLogDBWallet, CBDBWallet. But it just feels after the wrong way.

    I strongly suggest supporting only one wallet type and supply a perfect working migration tool. A clean cut in implementation. IMO this sounds after good invested time/work into the right things.

    There’s no rush in merging the new wallet into the master.

  38. jtimon commented at 2:33 pm on January 25, 2015: contributor

    @jonasschnelli What about

    0void CLegacyWallet::Bip32Interface(args)
    1{
    2    throw "CWallet::Bip32Interface() not implemented by CLegacyWallet";
    3}
    

    ? For testing you can test both for the common interface and only the new one for the new features. Of course, I agree it’s all simpler if no new features are added until the legacy wallet is removed, which I think should be the next release just after the one with the migration tool.

    Another option to avoid having multi-wallet support at all would be to have a release with the new wallet and the migration tool in the form of an “import old wallet from file” option. But I’m not sure how other people will feel about that type of backards support.

    I believe separating a CWallet interface from a CLegacyWallet implementation would already be a gain and a good first step for all cases. I’m not sure if you’re arguing against that, against multi-wallet support, or only against supporting several types of wallets at the same time.

  39. jonasschnelli commented at 7:32 pm on January 25, 2015: contributor

    I thought about the implementation once more. Unless somebody else convince me for another way i think i start with this (maybe this is more or less the original idea of @laanwj):

    • Replace the wallet code inside of the #ifdef WALLET_ENABLE with flexible hooks. Goal: get rid of wallet-code inside of the bitcoind’s general code. We can keep the ifdefs or we could provide a empty “no-wallet” instance to allow the user to run without a wallet.
    • Move everything wallet related to src/wallet/ (rpc dispatch table, unit-test, etc.)
    • Create a clean, more or less abstract, wallet interface outside or src/wallet where wallets can inherit from
    • Add support for the legacy bdb wallet (update to the new interface)
    • Add support for the already more or less done logdb wallet
    • Add a feature flags to wallet-interface so bitcoin-qt can check if a wallet support a such feature like multiwallet, backup-capabilities, bip32, etc.

    This would be more effort then i originally thought, but will finally cut out the wallet-code and place it behind a clean interface.

    Any concerns when going into this direction?

  40. jgarzik commented at 8:34 pm on January 25, 2015: contributor
    It really depends on how ugly the code becomes, how large the maintenance burden becomes, to support two wallets in parallel. It sounds like there will be a lot of “creeping #ifdefs” and semi-duplicated code in an effort to keep the legacy wallet unchanging.
  41. jonasschnelli commented at 8:44 pm on January 25, 2015: contributor

    @jgarzik The legacy-wallet needs mostly move-only operation (move stuff from init.cpp, rpcmisc, etc.) to src/legacywallet/wallet.cpp (which then should provide a new class(name) CLegacyWallet [or similar]). I would see most wallet #ifdefs gone, unless people would find it better to have ifdefs then a CWallet conform “no-wallet” implementation.

    But yes. The legacy-bdb-wallet needs a lot of moving/changes to fit in a new flexible wallet implementation. The drawbacks are time and risks.

  42. laanwj added the label Wallet on Jan 26, 2015
  43. laanwj commented at 5:11 am on January 27, 2015: member

    @jonasschnelli : Sounds good to me. Some notes:

    provide a empty "no-wallet" instance to allow the user to run without a wallet..

    Don’t call it a no-wallet. As with the NotificationInterface, just give it a general name, like ModuleInterface. In this case it’s used to plug in a wallet, but the same mechanism could be used to plug in another module that provides e.g. RPC calls and hooks into the init/shutdown.

    This is not just a hack to support two wallets, but a step toward more modularity. Not sure if you ever saw #3440, but it was a plan in the same direction.

    if we support both wallets, things like multiwallet or bip32 needs implementation in both worlds. Even if only one wallet type can handle things like multiwallet support, the interface needs adaption for supporting this somehow.

    My point was to avoid a common interface by completely separating the implementations. E.g. move the old wallet and RPC code that uses it out of the way.

    If this turns out to not be possible/practical, of course you could decide to not do this.

    I’m just afraid this will mean it will take a long time until the new wallet can be merged. It means it needs to be suitable for all users of the current wallet, instead of initially just experimental users. It also means it will get a lot less testing until it is merged. This has shown to be a problem for pulls that make large changes to the wallet before (like the last round of multi-wallet changes), eventually resulting in the owner giving up and them never being merged. That’s why I proposed this, not to give you extra work.

    But yes. The legacy-bdb-wallet needs a lot of moving/changes to fit in a new flexible wallet implementation.

    OK, my idea was that this would not be needed. Every change potentially introduces bugs. If you need many changes to the legacy wallet, this defeats the purpose of having a stable, unchanged legacy wallet.

  44. jonasschnelli renamed this:
    [Wallet] replace BDB with internal append only (logdb) backend
    [Wallet] replace BDB with internal append only (logdb) backend [under heavy work | not ready]
    on Feb 3, 2015
  45. jonasschnelli force-pushed on Feb 3, 2015
  46. jonasschnelli referenced this in commit 6828277076 on Feb 4, 2015
  47. jonasschnelli commented at 1:13 pm on February 4, 2015: contributor
    Because this is getting to big, i try to form independent easy-reviewable PRs to slowly come towards the new wallet. I first like to move the wallet into a module by decoupling init/RPC/tests and remove the ifdef ENABLE_WALLET (only one ifdef ENABLE_WALLET for loading the module).
  48. jonasschnelli referenced this in commit 7bd633c39a on Feb 4, 2015
  49. jonasschnelli force-pushed on Feb 5, 2015
  50. jonasschnelli renamed this:
    [Wallet] replace BDB with internal append only (logdb) backend [under heavy work | not ready]
    [Wallet] replace BDB with internal append only (logdb) backend
    on Feb 5, 2015
  51. jonasschnelli force-pushed on Feb 5, 2015
  52. logdb: an safe append-only key-value store
    Conflicts:
    	src/makefile.unix
    b42ace679c
  53. [Wallet] integrate internal file backend "logdb"
    - basic integration (needs sanity, etc.)
    - remove openssl SHA256 depenencies and use internal SHA256 methods
    - use internal logging
    
    Conflicts:
    	src/init.cpp
    0f7be5e753
  54. [Wallet] remove hard coded wallet.dat filename 34119f6de9
  55. [Wallet] better flush/close handling of logdb
    - make sure std::map elements gets erased first
    - don't allow oversized keys/values in CLogDB::Write()
    - remove ref count
    - remove mem leak
    f041536454
  56. [Wallet] improve wallet <-> walletdb interface
    - add logdb unit test
    - remove usage of CWalletDB outside of CWallet
    - logdb: add version and file-start headerbytes
    - fix rpc test-framework after removing bdb
    - fix rpc_wallet_tests.cpp
    - correct Makefile and --disable-wallet handling
    - remove bitdb from TestSetup()
    c856ad7f47
  57. [Wallet] support for db rewriting
    - adds encryption and compact write for logdb wallets
    75851f3bdf
  58. jonasschnelli force-pushed on Feb 5, 2015
  59. jonasschnelli force-pushed on Feb 5, 2015
  60. jonasschnelli force-pushed on Feb 5, 2015
  61. jtimon commented at 7:10 pm on February 6, 2015: contributor
    Sounds like a good plan to me. Can you reference those PRs from here as you create them?
  62. jonasschnelli commented at 7:15 pm on February 6, 2015: contributor
    @jtimon currently it’s #5752, #5758, #5744 But also have a look at the “status” page: #5761.
  63. jonasschnelli force-pushed on Feb 9, 2015
  64. [Wallet] tests and cleanup
    - add wallet encryption/unlocking test
    - cleanup logdb, remove of initial header with logdb-file version number
    - remove fileBackend=false option (doesn't make sense anymore)
    - remove wallet max/min version
    bde59e1974
  65. jonasschnelli force-pushed on Feb 9, 2015
  66. pstratem commented at 6:19 am on May 19, 2015: contributor

    A suggestion for a more robust frame format.

    0uint64_t Unique Magic per log file
    1uint64_t Counter
    2uint64_t Length
    3    0xffffffffffffffff indicates length not specified
    4uint64_t Unique Magic per Record
    5uint8_t[Length] Body
    6uint64_t Unique Magic per Record
    7SHA256 hash of Counter|Length|Unique Magic per Record|Body
    
  67. jgarzik commented at 6:44 am on May 19, 2015: contributor
    • on length-not-specified, seems like the majority of cases (100%?) would use 0xffff..
    • it can sometimes be useful to pad the body to a 64-bit etc. alignment
  68. jonasschnelli commented at 6:49 am on May 19, 2015: contributor
    @pstratem but wouldn’t the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?
  69. pstratem commented at 7:00 am on May 19, 2015: contributor

    @jgarzik

    on length-not-specified, seems like the majority of cases (100%?) would use 0xffff..

    The purpose of allowing an unspecified length is extremely long records, those simply do not exist in a bitcoin wallet.

    • Keys
    • Transactions

    it can sometimes be useful to pad the body to a 64-bit etc. alignment

    eh… yeah I guess sometimes, but that should be defined as part of the format for the Body, note everything else is 64 bit aligned

  70. pstratem commented at 7:01 am on May 19, 2015: contributor

    @jonasschnelli

    but wouldn’t the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?

    you’re gonna have to explain that one

  71. jonasschnelli commented at 7:06 am on May 19, 2015: contributor

    but wouldn’t the 2nd and 3rd element (uint64_t Counter and uint64_t Length) break the advantage of a append only format?

    you’re gonna have to explain that one

    Ah now i see! I first thought the counter and length is at the beginning of the file and contains the combined length/counter of all frames within the filestore. But as i read again i saw that you propose the counter and length per frame.

  72. in src/logdb.cpp: in bde59e1974
    0@@ -0,0 +1,614 @@
    1+// Copyright (c) 2012 The Bitcoin developers
    2+// Distributed under the MIT/X11 software license, see the accompanying
    3+// file license.txt or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <stdio.h>
    6+#include <unistd.h>
    7+#include <sys/stat.h>
    8+
    9+#include "logdb.h"
    


    Diapolo commented at 12:41 pm on May 19, 2015:
    Nit: Include group ordering (own headers before system ones). Edit: Also obsolete file/license header…
  73. in src/logdb.cpp: in bde59e1974
     9+#include "logdb.h"
    10+
    11+#define LOGDB_MAX_KEY_SIZE 0x1000
    12+#define LOGDB_MAX_VALUE_SIZE 0x100000
    13+
    14+static const unsigned char logdb_frameheader_magic[4] = {0xB1,0xA0,0xEE,0xC9};
    


    Diapolo commented at 12:42 pm on May 19, 2015:
    Was this introduced here and what is the meaning of this constant?

    jonasschnelli commented at 12:43 pm on May 19, 2015:
  74. pstratem commented at 11:04 pm on May 20, 2015: contributor
    This kind of stuff is the reason for the frame to be more robust. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg886512.html
  75. jgarzik commented at 5:59 pm on September 15, 2015: contributor

    Leaning towards closing as Work In Progress. General IRC consensus has been “we want this” for a long time, so concept ACK from most developers seems implied, myself included.

    However, this seems more of a longer term project resulting in a long lived PR. Recommend storing this on a branch and opening a mailing list thread to follow development.

  76. jonasschnelli commented at 6:10 pm on September 15, 2015: contributor
    Closing. This is included in my core wallet fork an will be further developed there with a chance of allowing a merge to this repository if it’s once is stable and tested enough.
  77. jonasschnelli closed this on Sep 15, 2015

  78. jonasschnelli commented at 10:21 am on March 9, 2016: contributor

    I started an append only key/value file format in pure C, based on @sipa concept, that aims to be storage format for wallets (https://github.com/libbtc/libbtc/pull/41), suitable for MCUs, smartphones, desktop, etc.

    Just in case someone wants to contribute.

  79. halvors commented at 10:06 pm on April 2, 2017: none
    What’s the status on this? It’s been over a year now, is this idea dead?
  80. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me