Isolate the storage abstraction layer from the application/serialization layer

TheQuantumPhysicist commented at 2:53 pm on April 19, 2022: contributor

This PR separates the storage abstraction layer (e.g., leveldb layer) from the bitcoin core application layer. This has many benefits:

Easier testing for the bitcoin core application layer without requiring an open database
Easier replacement for the storage library in case it’s decided to do it in the future with a simple implementation of an interface
Ensure that any other future implementations can live to the same standard we have now by testing different implementations the same way
Better interface description to understand the expectations from the implementations of the interface
Much easier bench-marking separate from serialization

This change has caused zero modification to the bitcoin core application source code, as I deliberately maintained everything from the outside. This change is a drop-in replacement to everything that uses leveldb.

Things I failed to do without breaking backwards compatibility, so I skipped:

Use DBIndex to decide the prefix of the storage. The problem here is that some key/value database systems provide other mechanisms for separating between different database “tables”, so to say. In our case, we use a char as a prefix. For example, lmdb provides an integer index. The problem here is that when keys are submitted as a Span, prepending a prefix is a relatively expensive operation as it requires copying the whole key. I don’t see how we can do this without that. This is a conflict between the serialization and the storage layer that I couldn’t resolve.

Things I’d love to achieve in the next PR(s):

~~Create an in-memory database that can easily replace leveldb and assist in tests~~ (Given that tests use in-memory db as was pointed out in the comments, we may or may not do this)
Move the block and block-undo storage behind this interface and use DBIndex as a way to tell the implementation how to store the data (in files or in leveldb)

jamesob commented at 3:25 pm on April 19, 2022: member

Create an in-memory database that can easily replace leveldb and assist in tests

FWIW, unittests already run leveldb in-memory: https://github.com/bitcoin/bitcoin/blob/master/src/test/util/setup_common.cpp#L207-L208

DrahtBot added the label Build system on Apr 19, 2022

DrahtBot added the label UTXO Db and Indexes on Apr 19, 2022

DrahtBot commented at 4:56 pm on April 19, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25202 (log: Use severity-based logging for leveldb/libevent messages, reverse LogPrintLevel order by laanwj)
#25172 (refactor: use std:: prefix for std lib funcs by fanquake)
#24232 (assumeutxo: add init and completion logic by jamesob)
#22663 (dbwrapper: properly destroy objects in case CDBWrapper ctor throws by Crypt-iQ)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

TheQuantumPhysicist commented at 6:31 pm on April 19, 2022: contributor

The functional test wallet_send.py seems to be flaky, unrelated to this PR.

luke-jr commented at 7:16 pm on April 19, 2022: member

Not sure if this is desirable. LevelDB is de facto part of the consensus layer, and may not be safe to just substitute willy-nilly.

TheQuantumPhysicist commented at 8:18 pm on April 19, 2022: contributor

LevelDB is de facto part of the consensus layer

In fact, this PR is the first step to solve that problem. LevelDB should be only a key/value store. While I totally understand your concern, this PR will make it possible to quantify our reliance on LevelDB specifics, if any. In no way is this PR meant to advocate for substituting LevelDB in a negligent fashion. We advocate only for a correct design to better understand the bitcoin protocol.

luke-jr commented at 8:54 pm on April 19, 2022: member

It’s NOT solvable. Adding another layer in the middle can only introduce problems, not solve it.

TheQuantumPhysicist commented at 5:43 am on April 20, 2022: contributor

It’s NOT solvable. Adding another layer in the middle can only introduce problems, not solve it.

Even though I understand your concerns, the claim that this is not solvable requires evidence, which is lacking here. I’m not even fully onboard on the idea that LevelDB is part of the “consensus layer”, but I understand that LevelDB is part of the consensus library; after all it’s a key/value store and it should be just that. The solution to this fear, however, is not keeping it untouched (because it was touched before, quite often over the last decade). But the solution is to quantify how it works and make it part of the protocol and run proper tests, like it has always been done in this repo.

And I’m happy to do more to make this PR more appealing and address any concerns, whether it’s tests or something else.

DrahtBot added the label Needs rebase on Apr 25, 2022

TheQuantumPhysicist force-pushed on May 2, 2022

DrahtBot removed the label Needs rebase on May 2, 2022

Introduce isolation between storage backend and application storage layer 0aa0464c58

TheQuantumPhysicist force-pushed on May 2, 2022

DrahtBot added the label Needs rebase on May 26, 2022

DrahtBot commented at 1:35 pm on May 26, 2022: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

achow101 commented at 7:13 pm on October 12, 2022: member

Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.

achow101 closed this on Oct 12, 2022

TheQuantumPhysicist commented at 7:17 pm on October 12, 2022: contributor

Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.

I am interested. But no one seems to have said anything for me to react.

ryanofsky commented at 7:36 pm on October 12, 2022: contributor

Reopening so this can be discussed. The code change seems very simple, but more reviewers need to weigh in on whether having this abstraction is good or bad. @TheQuantumPhysicist, since the first listed benefit of the change is “Easier testing for the bitcoin core application layer” maybe you could motivate this change a little more by adding new tests that take advantage of the abstraction.

ryanofsky reopened this on Oct 12, 2022

TheQuantumPhysicist commented at 8:55 pm on October 12, 2022: contributor

Reopening so this can be discussed. The code change seems very simple, but more reviewers need to weigh in on whether having this abstraction is good or bad.

@TheQuantumPhysicist, since the first listed benefit of the change is “Easier testing for the bitcoin core application layer” maybe you could motivate this change a little more by adding new tests that take advantage of the abstraction.

I’m more than happy to add tests once there’s agreement on the PR. Quite frankly I’m a little disappointed that no one cared, which is OK, but let’s not get ahead of ourselves adding more code and work before people show the least interest.

And to get on the practicality train, the reviewer can see now there’s abstract interfaces now for storage and iteration over storage, which can be used to define behavior or even mock the storage to simulate errors with something like gmock. It’s not hard to see how interface-behavior can be well-defined and tested.

Cheers!

achow101 commented at 9:37 pm on October 25, 2022: member

I am interested. But no one seems to have said anything for me to react.

This PR has needed rebase for a while. Often reviewers will not look at PRs that need rebasing as such review will be invalidated following rebase, and may also no longer be applicable.

TheQuantumPhysicist commented at 5:46 am on October 26, 2022: contributor

I am interested. But no one seems to have said anything for me to react.

This PR has needed rebase for a while. Often reviewers will not look at PRs that need rebasing as such review will be invalidated following rebase, and may also no longer be applicable.

I’ll take a look and see if It’s not needed anymore.

TheQuantumPhysicist closed this on Oct 29, 2022

bitcoin locked this on Oct 29, 2023

Isolate the storage abstraction layer from the application/serialization layer #24922

Conflicts