Split from #24914 as suggested in #24914#pullrequestreview-1442091917
This PR adds a wallet database cursor that gives a view over all of the records beginning with the same prefix.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--021abf342d371248e50ceaed478a90ca-->
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
496 | @@ -496,6 +497,9 @@ DatabaseCursor::Status SQLiteCursor::Next(DataStream& key, DataStream& value) 497 | return Status::FAIL; 498 | } 499 | 500 | + key.clear(); 501 | + value.clear();
In commit "wallet: Add GetPrefixCursor to DatabaseBatch" (011087176e8aa3ec86f032c3e4c6bac432d0ac3e)
This change seems like it might belong in the previous commit "walletdb: Consistently clear key and value streams before writing" (5b5c131f9665e21f6dcf109e400926d054dd1fb5)
Done
105 | + 106 | + // Test each supported db 107 | + for (const auto& database : dbs) { 108 | + BOOST_ASSERT(database); 109 | + 110 | + std::vector<std::string> prefixes = {"FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
In commit "walletdb: Consistently clear key and value streams before writing" (5d74d702c16f4e34c96e9573049a2fe930ac935c)
Right now it seems like there is not test coverage for calling GetNewPrefixCursor with an empty prefix. Also, the \xff\xff test case I previously suggested is not testing what I originally thought it would test. Because of the way strings are serialized, there's a compact int value before the prefix so the prefix has other characters beside \xff.
Seeing these things, I wrote a new test case to add more coverage, and wound up finding two corner case bugs in bdb and sqlite code when empty spans are passed around. I think the bugs cannot have real-world side effects, but I'm not sure about this, and think they are probably worth fixing to avoid surprises in the future.
Here are all the changes I would suggest including the new test and two bugfixes
diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
index 8d764911ca5b..68abdcd81e9e 100644
--- a/src/wallet/bdb.cpp
+++ b/src/wallet/bdb.cpp
@@ -698,7 +698,7 @@ DatabaseCursor::Status BerkeleyCursor::Next(DataStream& ssKey, DataStream& ssVal
if (ret == DB_NOTFOUND) {
return Status::DONE;
}
- if (ret != 0 || datKey.get_data() == nullptr || datValue.get_data() == nullptr) {
+ if (ret != 0) {
return Status::FAIL;
}
diff --git a/src/wallet/db.h b/src/wallet/db.h
index 9d684225c343..9d7bceb893f3 100644
--- a/src/wallet/db.h
+++ b/src/wallet/db.h
@@ -49,6 +49,7 @@ private:
virtual bool WriteKey(DataStream&& key, DataStream&& value, bool overwrite = true) = 0;
virtual bool EraseKey(DataStream&& key) = 0;
virtual bool HasKey(DataStream&& key) = 0;
+ friend class DatabaseBatchTest;
public:
explicit DatabaseBatch() {}
diff --git a/src/wallet/sqlite.cpp b/src/wallet/sqlite.cpp
index 9c7ec9721ce2..fe10f911c4ce 100644
--- a/src/wallet/sqlite.cpp
+++ b/src/wallet/sqlite.cpp
@@ -40,7 +40,11 @@ static bool BindBlobToStatement(sqlite3_stmt* stmt,
Span<const std::byte> blob,
const std::string& description)
{
- int res = sqlite3_bind_blob(stmt, index, blob.data(), blob.size(), SQLITE_STATIC);
+ // Pass a pointer to the empty string "" below instead of passing the
+ // blob.data() pointer if the blob.data() pointer is null. Passing a null
+ // data pointer to bind_blob would cause sqlite to bind the SQL NULL value
+ // instead of the empty blob value X'', which would mess up SQL comparisons.
+ int res = sqlite3_bind_blob(stmt, index, blob.data() ? static_cast<const void*>(blob.data()) : "", blob.size(), SQLITE_STATIC);
if (res != SQLITE_OK) {
LogPrintf("Unable to bind %s to statement: %s\n", description, sqlite3_errstr(res));
sqlite3_clear_bindings(stmt);
diff --git a/src/wallet/test/db_tests.cpp b/src/wallet/test/db_tests.cpp
index 14272e2fe441..a22e1381437c 100644
--- a/src/wallet/test/db_tests.cpp
+++ b/src/wallet/test/db_tests.cpp
@@ -20,7 +20,52 @@
#include <memory>
#include <string>
+inline std::ostream& operator<<(std::ostream& os, const std::pair<const SerializeData, SerializeData>& kv)
+{
+ Span key{kv.first}, value{kv.second};
+ os << "(\"" << std::string_view{reinterpret_cast<const char*>(key.data()), key.size()} << "\", \""
+ << std::string_view{reinterpret_cast<const char*>(key.data()), key.size()} << "\")";
+ return os;
+}
+
namespace wallet {
+
+class DatabaseBatchTest
+{
+public:
+ static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value,
+ bool overwrite = true)
+ {
+ return batch.Write(DataStream{key}, DataStream{value}, overwrite);
+ }
+};
+
+static Span<const std::byte> StringBytes(std::string_view str)
+{
+ return AsBytes<const char>({str.data(), str.size()});
+}
+
+static SerializeData StringData(std::string_view str)
+{
+ auto bytes = StringBytes(str);
+ return SerializeData{bytes.begin(), bytes.end()};
+}
+
+static void CheckPrefix(DatabaseBatch& batch, Span<const std::byte> prefix, MockableData expected)
+{
+ std::unique_ptr<DatabaseCursor> cursor = batch.GetNewPrefixCursor(prefix);
+ MockableData actual;
+ while (true) {
+ DataStream key, value;
+ DatabaseCursor::Status status = cursor->Next(key, value);
+ if (status == DatabaseCursor::Status::DONE) break;
+ BOOST_CHECK(status == DatabaseCursor::Status::MORE);
+ BOOST_CHECK(
+ actual.emplace(SerializeData(key.begin(), key.end()), SerializeData(value.begin(), value.end())).second);
+ }
+ BOOST_CHECK_EQUAL_COLLECTIONS(actual.begin(), actual.end(), expected.begin(), expected.end());
+}
+
BOOST_FIXTURE_TEST_SUITE(db_tests, BasicTestingSetup)
static std::shared_ptr<BerkeleyEnvironment> GetWalletEnv(const fs::path& path, fs::path& database_filename)
@@ -86,28 +131,29 @@ BOOST_AUTO_TEST_CASE(getwalletenv_g_dbenvs_free_instance)
BOOST_CHECK(env_2_a == env_2_b);
}
-BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
+static std::vector<std::unique_ptr<WalletDatabase>> TestDatabases(const fs::path& path_root)
{
std::vector<std::unique_ptr<WalletDatabase>> dbs;
-
- // Create dbs
DatabaseOptions options;
DatabaseStatus status;
bilingual_str error;
- std::vector<bilingual_str> warnings;
#ifdef USE_BDB
- dbs.emplace_back(MakeBerkeleyDatabase(m_path_root / "bdb", options, status, error));
+ dbs.emplace_back(MakeBerkeleyDatabase(path_root / "bdb", options, status, error));
#endif
#ifdef USE_SQLITE
- dbs.emplace_back(MakeSQLiteDatabase(m_path_root / "sqlite", options, status, error));
+ dbs.emplace_back(MakeSQLiteDatabase(path_root / "sqlite", options, status, error));
#endif
dbs.emplace_back(CreateMockableWalletDatabase());
+ return dbs;
+}
+BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
+{
// Test each supported db
- for (const auto& database : dbs) {
+ for (const auto& database : TestDatabases(m_path_root)) {
BOOST_ASSERT(database);
- std::vector<std::string> prefixes = {"FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
+ std::vector<std::string> prefixes = {"", "FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
// Write elements to it
std::unique_ptr<DatabaseBatch> handler = database->MakeBatch();
@@ -143,5 +189,31 @@ BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
}
}
+// Lower level DatabaseBase::GetNewPrefixCursor test, to cover cases that aren't
+// covered in the higher level test above. The higher level test uses
+// serialized strings which are prefixed with string length, so it doesn't test
+// truly empty prefixes or prefixes that begin with \xff
+BOOST_AUTO_TEST_CASE(db_cursor_prefix_byte_test)
+{
+ const MockableData::value_type
+ e{StringData(""), StringData("e")},
+ p{StringData("prefix"), StringData("p")},
+ ps{StringData("prefixsuffix"), StringData("ps")},
+ f{StringData("\xff"), StringData("f")},
+ fs{StringData("\xffsuffix"), StringData("fs")},
+ ff{StringData("\xff\xff"), StringData("ff")},
+ ffs{StringData("\xff\xffsuffix"), StringData("ffs")};
+ for (const auto& database : TestDatabases(m_path_root)) {
+ std::unique_ptr<DatabaseBatch> batch = database->MakeBatch();
+ for (const auto& [k, v] : {e, p, ps, f, fs, ff, ffs}) {
+ DatabaseBatchTest::WriteBytes(*batch, k, v);
+ }
+ CheckPrefix(*batch, StringBytes(""), {e, p, ps, f, fs, ff, ffs});
+ CheckPrefix(*batch, StringBytes("prefix"), {p, ps});
+ CheckPrefix(*batch, StringBytes("\xff"), {f, fs, ff, ffs});
+ CheckPrefix(*batch, StringBytes("\xff\xff"), {ff, ffs});
+ }
+}
+
BOOST_AUTO_TEST_SUITE_END()
} // namespace wallet
Adopted these suggestions. I've put the bug fixes in an separate commit.
132 | + key >> key_back; 133 | + BOOST_CHECK_EQUAL(key_back, prefix); 134 | + 135 | + unsigned int value_back; 136 | + value >> value_back; 137 | + BOOST_CHECK_EQUAL(value_back, i);
In commit "walletdb: Consistently clear key and value streams before writing" (5d74d702c16f4e34c96e9573049a2fe930ac935c)
This test case seems to rely on getting values back in a particular order, but I don't think the sqlite implementation actually guarantees rows will be returned in any particular order, since no sorting is requested.
Hmm, indeed. I've changed the test to check against the i stored in the key.
Before writing data to the output key and value streams, make sure they
are cleared.
32 | + 33 | +static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value, 34 | + bool overwrite = true) 35 | +{ 36 | + return batch.Write(DataStream{key}, DataStream{value}, overwrite); 37 | +}
In commit "wallet: Add GetPrefixCursor to DatabaseBatch" (13476fe7bebdbf51e09821850b2c808c8ecf116a)
Maybe add a comment above the batch.Write() call like "Convert the key and value to DataStream objects in order to bypass serialization. We want raw bytes to be written to the database, not serialized byte strings. The DatabaseBatch::Write template method normally serializes its arguments, but because DataStream has a Serialize method that does concatenation instead of serialization, it can be used to bypass serialization."
I was very surprised that this worked and it took me a while to figure out what was happening. I do think it would probably be more straightforward to call the private DatabaseBatch::WriteKey method instead of DatabaseBatch::Write which is why I added the friend class in my earlier suggestion. But I guess this approach should be ok as long it is explained.
It does seem pretty crazy to me that DataStream has a Serialize method that concatenates instead of serializing, and can't be unserialized. But I guess it is handy in this situation and maybe others.
Is the same not true for span itself? The only reason it doesn't work is that it is missing a std::byte specialization?
Is the same not true for span itself?
No because Span<char> is fixed length and can be deserialized, while DataStream is variable length and can't be deserialized.
Reasons why I think it would be good to get rid of DataStream::Serialize method:
DataStream has no Unserialize method and no reasonable way of adding one that would be consistent with the existing Serialize implementation.Serialize method has surprising behavior. If DataStream was going to have a serialize method I would expect to behave same was as serialize methods as other variable length objects (strings, vectors, maps) and be prefixed with a size field and be deserializable into an empty object. I suggested adding a comment here because code using it was not working the way I expected it to, and it took me a while to figure out how it was working at all.Serialize method. If we want a concatenation operator, += should work perfectly well. << is good to use for serialization and formatting, and should in principle be reversible with >>. Concatenation is a different thing.If I remove DataStream::Serialize method in master there are only a few compilation errors, so I think I should be able to make a simple PR to clean things up
I think @MarcoFalke's point was that Spans are serialized without length prefixes, so this function could just pass key and value directly to DatabaseBatch::Write rather than going through a DataStream, as long as there is a Serialize method for Span<const std::byte>. This seems like a more intuitive solution, so I've implemented that.
Oops, I guess I was objecting to a point Marco wasn't making. But I think I would still like to avoid overloading Serialize for std::byte spans. The whole point of std::byte is that it's supposed to be a very safe type which requires you to be deliberate and explicit about conversions. So I'm not sure it would be great to serialize std::byte spans (or types which can be converted to byte spans) as raw bytes without requiring a more explicit cast.
Maybe it would make sense to change serialize.h in the future, but I think it would be best to leave it alone in this PR. Would suggest reverting serialize.h and just doing:
--- a/src/wallet/test/db_tests.cpp
+++ b/src/wallet/test/db_tests.cpp
@@ -30,12 +30,6 @@ inline std::ostream& operator<<(std::ostream& os, const std::pair<const Serializ
namespace wallet {
-static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value,
- bool overwrite = true)
-{
- return batch.Write(key, value, overwrite);
-}
-
static Span<const std::byte> StringBytes(std::string_view str)
{
return AsBytes<const char>({str.data(), str.size()});
@@ -203,7 +197,7 @@ BOOST_AUTO_TEST_CASE(db_cursor_prefix_byte_test)
for (const auto& database : TestDatabases(m_path_root)) {
std::unique_ptr<DatabaseBatch> batch = database->MakeBatch();
for (const auto& [k, v] : {e, p, ps, f, fs, ff, ffs}) {
- WriteBytes(*batch, k, v);
+ batch->Write(MakeUCharSpan(k), MakeUCharSpan(v));
}
CheckPrefix(*batch, StringBytes(""), {e, p, ps, f, fs, ff, ffs});
CheckPrefix(*batch, StringBytes("prefix"), {p, ps});
Good point, done as suggested.
But I think I would still like to avoid overloading Serialize for std::byte spans. The whole point of std::byte is that it's supposed to be a very safe type which requires you to be deliberate and explicit about conversions.
Agree on this. For raw C-Arrays where the length is denoted in the type, it shouldn't cause any issues. Probably the same for C++20 spans that are fixed size (but our Span doesn't have that feature).
No because
Span<char>is fixed length
Are you sure? One can call subspan(1) to change the length.
So I'm not sure it would be great to serialize std::byte spans (or types which can be converted to byte spans) as raw bytes without requiring a more explicit cast.
What about removing the Span overloads and replacing them with one that forces the call site to do a conversion, such as MakeSerSpan(a)? Though, that will probably require introducing a 4th type-safe type for a byte.
I think I agree that current Serialize(Span<char>) overloads could be a little dangerous:
Because if a variable-length container could be implicitly converted to span of these types, it might incorrectly match one of these overloads and be serialized as a fixed length span without a length prefix, and no longer be deserializable as the original container.
If that is a real problem, we could probably fix it by replacing those overloads with:
template<typename Stream, typename T> inline void Serialize(Stream& s, const Span<T>& span);
template<typename Stream, typename T> inline void Unserialize(Stream& s, Span<T>& span);
Since these would only match actual Span objects, and would not match arguments implicitly convertible to Span. I think this would be as safe as your MakeSerSpan idea and not require introducing a new type.
I also agree that the Span<char> char type is different char[N] type because one is a fixed length type and the other just has fixed length objects (subspan method returns a new object, it doesn't change the size of an existing object). But the point about deserialization applies to both of cases, because you can deserialize both char[N] objects and Span<char> objects without a length prefix.
Makes sense. The same approach is used by AsBytes or std::as_bytes
Actually, I don't think a silent unintended conversion can happen, because if there is no exact match, it will fallback to calling the Serialize method, see https://github.com/bitcoin/bitcoin/blob/7d65e3372fc7e08fccd7babb3aa66351e0e9442d/src/serialize.h#L693-L700
So I think we should just go with my original suggestion to enable Span<std::byte> serialization?
Span<std::byte> serialization added in https://github.com/bitcoin/bitcoin/pull/27927
Code review ACK 13476fe7bebdbf51e09821850b2c808c8ecf116a
163 | + for (const auto& prefix : prefixes) { 164 | + DataStream s_prefix; 165 | + s_prefix << prefix; 166 | + std::unique_ptr<DatabaseCursor> cursor = handler->GetNewPrefixCursor(s_prefix); 167 | + DataStream key; 168 | + CDataStream value(SER_DISK, CLIENT_VERSION);
DataStream value;
nit (feel free to ignore)
Done
In order to get records beginning with a prefix, we will need a cursor
specifically for that prefix. So add a GetPrefixCursor function and
DatabaseCursor classes for dealing with those prefixes.
Tested on each supported db engine.
1) Write two different key->value elements to db.
2) Create a new prefix cursor and walk-through every returned element,
verifying that it gets parsed properly.
3) Try to move the cursor outside the filtered range: expect failure
and flag complete=true.
Co-Authored-By: Ryan Ofsky <ryan@ofsky.org>
Co-Authored-By: furszy <matiasfurszyfer@protonmail.com>
Code review ACK ba616b932cb9e9adb7eb9f1826caa62ce422a22d. Just suggested changes since last review
130 | +#ifdef USE_BDB 131 | + dbs.emplace_back(MakeBerkeleyDatabase(path_root / "bdb", options, status, error)); 132 | +#endif 133 | +#ifdef USE_SQLITE 134 | + dbs.emplace_back(MakeSQLiteDatabase(path_root / "sqlite", options, status, error)); 135 | +#endif
Could:
for (const DatabaseFormat& db_format : DATABASE_FORMATS) {
dbs.emplace_back(MakeDatabase(path_root / strprintf("%d", db_format).c_str(), options, status, error));
BOOST_ASSERT(status == DatabaseStatus::SUCCESS);
}
559 | + break; 560 | + } 561 | + if (it == end_range.rend()) { 562 | + // If the prefix is all 0xff bytes, clear end_range as we won't need it 563 | + end_range.clear(); 564 | + }
Noticed that this isn't covered by the tests.
In the test, all prefixes are serialized into a DataStream and then provided to GetNewPrefixCursor, so the size is always part of the data. Therefore, we never get up to this point.
Still, it doesn't seems to be reachable right now; As the db handler write function always serializes data, the size is always there.
But.. for the sake of test coverage completeness and leave nothing to chance, made a commit that exercises it: https://github.com/furszy/bitcoin-core/commit/1ace124d2e5dd35b3bbe882512a425810b0fb83e. Which is passing, so great.
Not sure if it worth to include it, as it's including a new RawWrite function to skip the serialization step to not add the size..
we might get rid of it by subclassing the sqlite in the test and adding a custom Write function and also cleanup the code further by sharing part of the code with the loop that is above but.. still not sure.
Noticed that this isn't covered by the tests.
If this is true, it seems like a bug in the test because two of the CheckPrefix lines were written specifically to cover this case.
I think there is no need for a RawWrite method because spans of bytes should be serialized by just appending the bytes.
oh well, I noticed it while was checking the first test, and went deeper without noticing the second test existence :man_facepalming:. What a day.. forget all what I said above. Thanks for the quick heads up.
ACK ba616b93
Nice db_cursor_prefix_byte_test test case 👌🏼.