Sqlite wallet storage

Sjors commented at 4:25 pm on May 8, 2020: member

This has been brought up a few times since at least 2018, e.g. here, here, but I never bothered to open an issue.

Some cherry-picked arguments in favor:

19:47:38 sqlite is also nice in that they provide a monolithic source file and encourage direct integration.

19:54:05 integrating sqlite into a project is trivial, indeed can be done as a single .cpp file if that’s desirable

19:54:33 any reason to not consider postgres for instance? 19:54:38 AHHHH 19:54:41 promag: god why

19:55:45 Also BDB is a compile pitfall

We could make this opt-in in a first release. Then after a few releases make it the default. That way users can still downgrade their node without losing access to the wallet.

It should be possible to ./configure without BDB, but we’ll probably have to ship with BDB 4.8 (at least the ability to read) until the heat death of the universe.

Sjors added the label Feature on May 8, 2020

achow101 commented at 4:37 pm on May 8, 2020: member

This is something I’m actively working on.

Before we can move to a new datbase, we need to cleanup and refactor our database handling code. It’s really convoluted right now and not well encapsulated.

ryanofsky commented at 6:10 pm on May 8, 2020: member

This is something I’m actively working on.

Before we can move to a new datbase, we need to cleanup and refactor our database handling code. It’s really convoluted right now and not well encapsulated.

This is great! On the code cleanup, I’d be curious to know what some ideas are there, and in particular if you think a change like #18608, specifically the walletdb.h / walletdb.cpp changes there goes in the right direction or wrong direction. (You can ignore the discussion in that PR where I lose my sanity arguing with luke 😁 ).

I think the originally intended structure where db.h / db.cpp provides an API for reading and writing key value blobs, and walletdb.h / walletdb.cpp handles turning higher types like CKeyMetadata, CPubKey, CKeyMetadata, CMasterKey, CWalletTx, CTxDestination into blobs on behalf of wallet application code is a sane basic structure, that we’re not just following very well right now. The startup / verify / recovery code is probably the worst of it.

This is just one model though. Another type of model I could think of (not pushing for it) is one where instead of the data layer only handling persistent storage, and application code indexing its own state in memory, the data layer handles persistent storage and memory, and the wallet code queries and writes information to that.

achow101 commented at 6:43 pm on May 8, 2020: member

This is great! On the code cleanup, I’d be curious to know what some ideas are there, and in particular if you think a change like #18608, specifically the walletdb.h / walletdb.cpp changes there goes in the right direction or wrong direction. (You can ignore the discussion in that PR where I lose my sanity arguing with luke grin ).

I think its in the right direction, but that’s also an application level change that is orthogonal to what I’d like to do.

I think the originally intended structure where db.h / db.cpp provides an API for reading and writing key value blobs, and walletdb.h / walletdb.cpp handles turning higher types like CKeyMetadata, CPubKey, CKeyMetadata, CMasterKey, CWalletTx, CTxDestination into blobs on behalf of wallet application code is a sane basic structure, that we’re not just following very well right now. The startup / verify / recovery code is probably the worst of it.

Indeed. I’d like to move towards that. I have some notes on the startup/verify/recovery stuff: https://gist.github.com/achow101/cda3ea1bb07f585e56caaf969e842188 and how I would like to change it (most of that is just me trying to follow how it all works).

ryanofsky commented at 6:57 pm on May 8, 2020: member

Indeed. I’d like to move towards that. I have some notes on the startup/verify/recovery stuff: https://gist.github.com/achow101/cda3ea1bb07f585e56caaf969e842188 and how I would like to change it (most of that is just me trying to follow how it all works).

Wow, I love this! Description of current code was surprisingly fun to read, and in your plans I see you’re focusing on the blob storage layer so #18608 is orthogonal. Only feedback on the gist is I’m not sure what you get out of combining the Database & Batch classes (obviously the Environment class is a Berkeley specific detail and should go away or be hidden in the bdb implementation). But I guess having just Database objects without Batch objects is not a problem, as nothing in the wallet really relies on atomic writes.

achow101 commented at 8:15 pm on May 8, 2020: member

Only feedback on the gist is I’m not sure what you get out of combining the Database & Batch classes

The goal is to kind of emulate the typical database api where reads and writes are functions of the database. Essentially just to have it be a wrapper around Db for now. But what we have now with Database kind of doesn’t make sense. Batch uses the Db from Database. It’s the one that opens and closes it for some reason. It also calls private functions in Database. So what we have now kind of feels like the same class anyways.

ryanofsky commented at 8:35 pm on May 8, 2020: member

Yes current code is a little unusual because Batch doesn’t just do a transaction, but actually opens and closes the db object.

The goal is to kind of emulate the typical database api where reads and writes are functions of the database.

It is pretty normal part of database APIs to have one object connecting to the database, and a different object doing reads and writes and fancier things like creating transactions and batching operations. E.g. in the sqlite python API you have connection and cursor objects, and in the sqlite C API you have connection and statement objects.

But that’s just explaining my reaction, no need to overthink this

Sjors commented at 9:42 am on May 9, 2020: member

There was discussion in yesterdays IRC wallet meeting about Sqlite vs. Berkeley DB, forward and backwards compatibility, whether dead projects are good, etc: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2020/bitcoin-core-dev.2020-05-08-19.00.log.html

jonatack commented at 12:37 pm on May 9, 2020: member

There was discussion in yesterdays IRC wallet meeting about Sqlite vs. Berkeley DB, forward and backwards compatibility, whether dead projects are good, etc: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2020/bitcoin-core-dev.2020-05-08-19.00.log.html @Sjors, perhaps add that link to the PR description, as well as http://www.erisian.com.au/bitcoin-core-dev/log-2020-05-05.html#l-100

TheBlueMatt commented at 0:47 am on May 10, 2020: member

None of the “arguments in favor” listed in the opener (or, seemingly, in the discussion on IRC) there seem to be “arguments in favor”. They’re more like “if bdb is causing issues, this is a good alternative because its designed more for our use-case”. Of course, that is predicated on BDB causing issues. I’m not sure that the existing issues that we’ve seemingly pretty much fully worked around are worth the legwork to switch? What am I missing?

pstratem commented at 3:39 pm on May 10, 2020: contributor

The current model is to load the entire contents of the wallet database into memory on load. Supporting sqlite only really makes sense if this model is changed. However it doesn’t seem likely to me that performance can be held anywhere near close to current if we’re actively querying a sqlite database.

I can’t find it now but there was an issue opened about using an append only journal file for the wallet database. That seems to match our usage much better than sqlite does.

achow101 commented at 4:58 pm on May 11, 2020: member

None of the “arguments in favor” listed in the opener (or, seemingly, in the discussion on IRC) there seem to be “arguments in favor”. They’re more like “if bdb is causing issues, this is a good alternative because its designed more for our use-case”. Of course, that is predicated on BDB causing issues. I’m not sure that the existing issues that we’ve seemingly pretty much fully worked around are worth the legwork to switch? What am I missing?

I would agree if this was just brought up without other wallet changes occurring simultaneously.

The reason I had originally suggested this on IRC recently was because we had recently merged descriptor wallets. My idea was to introduce a new storage database along with descriptor wallets for descriptor wallets only. The reason is that changing BDB for something else is something that we had been wanting to do for a while. It seemed reasonable to me to do this at the same time we introduced a wholly new, backwards incompatible wallet feature like descriptor wallets. So what I had wanted was this to be for descriptor wallets only to go along with the fact that descriptor wallets were completely new.

The current model is to load the entire contents of the wallet database into memory on load. Supporting sqlite only really makes sense if this model is changed. However it doesn’t seem likely to me that performance can be held anywhere near close to current if we’re actively querying a sqlite database.

I would like for us to use some features provided by sqlite and relational databases in general. Not initially, but in the future.

I can’t find it now but there was an issue opened about using an append only journal file for the wallet database. That seems to match our usage much better than sqlite does.

#5686 is the previous PR. But I’m not convinced that append only really fits what we do as we don’t use the database strictly append only. There are times when things are being rewritten and updated. Additionally I’m not convinced that a homegrown append only format provides enough guarantees of consistency and non-corruption. And I couldn’t find any such database in existence already.

MarcoFalke added the label Wallet on May 19, 2020

MarcoFalke added this to the milestone 0.21.0 on May 19, 2020

TheBlueMatt commented at 7:20 pm on May 21, 2020: member

The reason is that changing BDB for something else is something that we had been wanting to do for a while

It is? Why? Its come up, but never with any real justification that I’ve seen. Only “wouldnt it be nice”. There doesn’t appear to be a feature that we want from a non-BDB database, nor any issues we’ve having with BDB. Its not like we can drop the BDB dependency, either, given we’d need to be able to load old wallets, so its just yet another thing.

jgarzik commented at 7:28 pm on May 21, 2020: contributor

Some old historical info, Mailing list thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011604.html Code reference: https://github.com/jgarzik/bitcoin/tree/2015_sqlite

It’s not addressing the same goals as this issue, but worth including for background.

With regards to BDB dependency, technically it can be dropped from the main binary via a conversion tool, possibly even the repo, if the tool is in a separate repo. Regardless, @TheBlueMatt ’s point is valid.

sipa commented at 7:29 pm on May 21, 2020: member

I think it’s concerning that the BDB we’re using now is 10 years old, and even the latest version we can reasonably support (5.3) is 7 years old. I think that on its own makes it worth considering alternatives.

TheBlueMatt commented at 7:46 pm on May 21, 2020: member

I think it’s concerning that the BDB we’re using now is 10 years old

I don’t entirely disagree, but I also haven’t heard of any issues with it? If we were using a bunch of its features like cross-process db use etc or seeing real performance drag from it I might agree, but, to my knowledge, we haven’t seen a single issue that appears likely to have been cause by BDB (the possible interaction of log files not being copied with the wallet.dat aside, and we’ve largely fixed that with constant flushing and wallet folders instead of files).

achow101 commented at 8:40 pm on May 21, 2020: member

It is? Why?

I think there are three primary reasons to move off of BDB.

It does not fit our use case and there are still some issues with potential corruption and data loss.

BDB seems to be more designed for large, multithreaded and multiprocess applications with multiple things trying to access the database at once. It isn’t designed to be a single portable application file format like we are trying to do with the wallet file. It isn’t designed for the idea of having a single file. This means that for us to achieve this goal, we are constantly flushing and have to have workarounds to achieve this goal. It would be better if we could use something that is designed to be a single portable application file format.

Even with all of the workarounds that we do, there are still some issues with flushing (albeit in very edge cases). An example of this came up in #18844 which tried to bump BDB to 5.3. There, the feature_backwards_compatibility.py test was copying a wallet.dat file to be opened in a previous release, but without unloading the wallet file. What happened was that even though we have aggressive flushing, not all the changes were actually being flushed to the wallet.dat file. So the wallet.dat file that was copied didn’t actually have the correct data written to it before being loaded into the previous release. Because the log files were not compatible, the logs which contained what had actually been written were being removed so none of the data ended up being persisted and readable by the previous release. This is, of course, data loss and also resulted in the test failing. While this is an edge case that we can’t see in a release, it just shows how BDB doesn’t fit our usage.

Because we are trying to use it in a way it perhaps wasn’t intended to be used, we have tons of workarounds and magic that makes the code developer unfriendly and scary to touch

Furthermore, the way that we use BDB is very fragile. Changing the implementation would probably result in a regression. This makes it difficult to improve and change things. It’s just workarounds on workarounds on workarounds which makes it a scary thing to even consider touching. I don’t think we should have stuff like that in the codebase. We shouldn’t be have something that a refactor could accidentally introduce a regression that is both hard to test and debug. The complexity of our implementation seems to be largely due to the workarounds that we have in place to try to make BDB do what we want it to do.

For example, in trying to refactor the storage code so that we can support other storage backends, I’ve somehow caused travis to fail even though the refactor shouldn’t be changing behavior and is largely a bunch of moves.

It’s old, unsupported, and unmaintained.

Some people argue that this is a good thing because it’s a stable API and that moving to something that is maintained would mean we risk having changing APIs and behavior changing unexpectedly. But we can simply pin to a specific version of whatever library and get the same thing. But I think that being able to use up to date and maintained versions of a library is a good thing. The fact that BDB 4.8 is now over 10 years old should itself be something to consider. Additionally, we have to apply patches to BDB just to make it compile in the depends system with C++11. I can only imagine that this will get worse as C++ and compilers change. I would rather we use a library that has the requisite changes to work with modern libraries and has had testing to ensure that the behavior is still the same.

As for why SQLite specifically, it is designed for our use case (application file format is an advertised use case of SQLite) and makes the workarounds that we have had with BDB unnecessary. I’ve already done most of the implementation of a SQLite storage backend and it’s much simpler and easier to follow than the BDB stuff. Because SQLite does not require the use of a database environment like BDB does, and it does not use a shared cache by default, we don’t need the workarounds of separate directories and unique fileids that we currently use for BDB. This itself would reduce a lot of complexity.

Because SQLite doesn’t use a persistent log file, we don’t have the issue of users forgetting log files when copying their wallets. The log files that SQLite generates are cleaned up after each transaction. This means that once everything is flushed, it all ends up in the database file, not just in the log file as BDB does. There is additionally no need for force flushing or periodic flushing. This would let us remove a scheduler thread for periodic flushes when we remove BDB. The edge case I mentioned above would no longer be an issue as once SQLite says the data is written, it is actually written to the database file, not the log file. So copying a wallet file while it is still in use, while still not recommended, wouldn’t result in data loss.

And of course SQLite is still maintained with the code released in the public domain so Licensing shouldn’t be an issue for us as it was with BDB.

While we would still need to keep BDB around for a while after introducing and switching to a new database system, we can still allow for migration without needing to have BDB. I’ve done some digging into the database file format and it’s actually fairly simple. We could implement our own BDB file deserializer and just pull the key/value pairs from the file directly and then import them into whatever new database we use. I’ve even implemented such a deserializer in python for tests in #18836. So we can still do migrations without needing BDB. BDB would only be necessary if we want to continue allowing people to keep using old wallet files without migrating.

MarcoFalke commented at 0:04 am on May 22, 2020: member

There are quite a few known issues with bdb:

Memory leaks, data races, and undefined behaviour. I don’t think anyone actually took the time to confirm if those are false positives are actual issues. However, the general workflow when running bitcoin core with sanitizers enabled is to disable the wallet. In light of the database corruptions we see in wallets, I wouldn’t be surprised if this was caused by bdb itself and not necessarily bad hardware. Just three recent issues from the last two weeks for reference: #19034, #19036 #18944 (comment). Someone should probably go back further than two weeks to see if there are more wallet related issues that hint to file corruption.
Then there is the known issue that salvagewallet will throw away privkeys when it shouldn’t. #7379 #7463 #10991 …
Then the disk grinding issue, writing the whole wallet file every 2 seconds to disk. #7475 (comment) #2511 (comment) #10236 …

I am happy to dig out more bug reports, but as an illustration those should suffice.

fanquake referenced this in commit 5879bfa9a5 on Jun 2, 2020

sidhujag referenced this in commit e2b11597e6 on Jun 2, 2020

ComputerCraftr referenced this in commit 48199f2dc7 on Jun 16, 2020

meshcollider added this to the "Design" column in a project

gburd commented at 1:58 pm on August 6, 2020: none

As a former employee for 10 years of Sleepycat Software (5 before Oracle, 5 after acquisition) consider me a resource regarding BerkeleyDB. My feelings are mixed, but I support a move away from BerkeleyDB as Oracle’s stewardship seems to be at odds with the goals for this software (read: AGPLv3 ain’t going to work and no one is maintaining the last 5.x release or any previous release for that matter AFAIK).

Historically, as is the case here, most issues with BDB are due more to misunderstanding the API in one way or another. Those are real bugs in that they are repeated by many developers (read: the API is complex, so hard to get right). BerkeleyDB’s locking subsystem is fantastic, but built when single core CPUs were the norm and cache lines we’re not something one had to worry about. WiredTiger, authored by primary engineers from Sleepycat after leaving post-acquisition, sorted this out using a different approach to coordination, but that library is also AGPLv3, so not an option.

LMDB is about as close to a solid drop-in replacement for BDB as you can find today, Howard Chu will certainly back that up (he’s vocal, but a super solid person/engineer). Howard has spent time replicating the BDB API and includes features for transactions. He’s also used a locking approach that fundamentally works better (read: scales) on today’s CPUs. As he’s using it in OpenLDAP, an 99/1 read/write workload, it should be very optimized for storage of the chain data which is also a very read-skewed workload. I think it’s used in Monero today for such things.

Sqlite is certainly an amazing piece of software, Dr Hipp and team are wonderful and talented people and their work has more than achieved critical mass and quality to be considered a solid replacement. I had the pleasure of working with him while at Oracle as we tried to port his SQL layer onto BerkeleyDB’s BTREE and create a “DBSQL” product.

The use of LevelDB for other parts of this project makes me shudder a bit due to my time at Basho working on our fork of that codebase and my knowledge of its shortcomings when compared to LMDB, BerkeleyDB or its more mature fork: RocksDB.

If I were the decision maker in this regard I’d migrate this code away from LevelDB and BerkeleyDB to either: a) LMDB for all the storage if SQL queries and relational storage isn’t required, but only a “nice to have” b) LMDB or RocksDB where LevelDB is used today and Sqlite for the wallet if SQL is somehow a requirement

Feel free to contact me personally to chat or ignore me entirely, just thought I’d drop in and leave some friendly unsolicited advice. :)

achow101 commented at 4:41 pm on August 6, 2020: member

@gburd Thanks for the insight! I had considered LMDB but it seems like the database format isn’t portable. IIUC it’s architecture dependent, but for the wallet we need this to be architecture independent, so LMDB doesn’t work for us there.

hyc commented at 6:50 pm on August 6, 2020: none

@gburd thanks for the ping. LMDB’s file format is mainly dependent on endianness, but since all CPUs in common use today are Little-Endian it doesn’t really matter. The default build on 32bit machines also uses smaller page numbers than on 64bit machines, but e.g. in Monero we build it with 64bit support across 32bit and 64bit machines, so the format is fully compatible across architectures. In short, the concern you raise is a non-issue.

achow101 commented at 7:37 pm on August 6, 2020: member

but since all CPUs in common use today are Little-Endian it doesn’t really matter

Even so, we do officially support big endian systems, we even have a Travis machine for that!

Edit: There’s also plans to make PowerPC builds for both little and big endian: #14066

hyc commented at 7:42 pm on August 6, 2020: none

Note that you can use mdb_dump on the original machine and mdb_load on the target machine to import across architectures.

Sjors commented at 4:29 pm on August 7, 2020: member

I think #19137 reduces the need for architecture-independence.

hyc commented at 4:49 pm on August 7, 2020: none

That’s definitely the better approach. It’s stupid to pay the performance penalty 100% of the time on the less-favored architecture, for a compatibility issue that only arises once in a very long while.

Side note: LMDB’s mdb_load is fully compatible with BDB’s db_dump output.

gburd commented at 4:52 pm on August 7, 2020: none

I’ll translate this aside from Howard, I think he’s saying that “migration from BDB to LMDB would be fairly straight forward as you can dump a BDB database using db_dump tool or API and then LMDB can load that into a new LMDB format database with all the same data.”

-greg

On Aug 7, 2020, at 12:49 PM, hyc notifications@github.com wrote:

Side note: LMDB’s mdb_load is fully compatible with BDB’s db_dump output.

achow101 commented at 6:49 pm on August 7, 2020: member

@gburd @hyc I appreciate the input, but at this point, I think we will stick to our original plan of using sqlite for the wallet. There has already been significant work put into implementing sqlite storage and the potential for incompatible data files with LMDB presents a significant challenge in convincing other contributors that it is suitable for use in the wallet (there was a long discussion about this on IRC several months ago when I first proposed switching off of BDB). Additionally, the wallet barely uses the database so performance isn’t really a concern to us right now (everything is read into memory at loading time so there are basically no reads during normal operation. Writes happen fairly infrequently too).

At the very least, the way that we have restructured the database storage code in the wallet will let us introduce further database systems if we want to. So it shouldn’t be that hard to add/switch to LMDB if we find that the performance is needed.

sipa commented at 8:18 pm on August 7, 2020: member

@gburd @hyc Thank you both for chiming in!

I feel perhaps this discussion is confusing the requirements for the wallet file vs. our other database uses (chainstate, indexes).

For the wallet:

Don’t need performance. Right now, the entire wallet is effectively loaded in memory, and writes are rare. The everything-in-memory loading may change at some point, but even if that happens, it won’t be for frequently-read things.
Backward compatibility. Wallet files often live years, can be restored from long-forgotten backups, … which ideally Just Work(tm). For BDB this is effectively solved by fixing the library version used. Before that happened, the db_dump + db_load that was sometimes necessary actually caused problems.
Forward compatibility. To a lesser extent, people do sometimes downgrade software when some incompatibility with their setup was used.
Portability/convenience. Moving wallet files between systems does happen, even across architectures. The current hack we need where BDB’s log files are erased on shutdown, and that it effectively means wallets are directories rather than single files is somewhat annoying.
Features. Currently the wallet is a dumb key/value store, but I believe @achow101 likes keeping the possibility open to use more advanced queries at some point.
Stability. Files are ideally resilient against hardware failures etc.

I believe sqlite is a pretty good choice for all of this.

Things are different for the chainstate and other application data, where performance is very important, and with the assumeutxo work we’ll get the ability to serialize/deserialize chainstate data in a neutral dump format, reducing the need for cross-platform compatibility.

I think LMDB and RocksDb are reasonable candidates to consider for this at some point.

hyc commented at 10:14 pm on August 7, 2020: none

Thanks for clarifying. Fwiw, the Monero wallet currently works the same way - read entirely into RAM, rewritten entirely out to disk whenever anything is changed. We have an open issue to port it to LMDB as well, because rewriting the entire file is proving to be unwieldy for users with large/busy wallets (e.g. exchanges). One key factor to consider here is that BDB supports page-level encryption, and there’s an unreleased patch for LMDB that does also. The LMDB-based Monero wallet will be using this feature.

Not to be too prejudicial, but I think you’ll find that RocksDB is unworkable. Its memory/cache/tuning requirements are ridiculous, and its need for huge numbers of open file descriptors (same as LevelDB) is ludicrous for a process that already needs a large number of descriptors for network sockets. (But feel free to disregard my obviously biased opinion and test this for yourselves…)

meshcollider commented at 7:18 am on October 15, 2020: contributor

#19077 is merged so this can be closed 🎉

meshcollider moved this from the "Design" to the "Done" column in a project

meshcollider closed this on Oct 15, 2020

DrahtBot locked this on Feb 15, 2022

Sqlite wallet storage #18916