Split from #24914 as suggested in #24914#pullrequestreview-1442091917
This PR adds a wallet database cursor that gives a view over all of the records beginning with the same prefix.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
496@@ -496,6 +497,9 @@ DatabaseCursor::Status SQLiteCursor::Next(DataStream& key, DataStream& value)
497 return Status::FAIL;
498 }
499
500+ key.clear();
501+ value.clear();
In commit “wallet: Add GetPrefixCursor to DatabaseBatch” (011087176e8aa3ec86f032c3e4c6bac432d0ac3e)
This change seems like it might belong in the previous commit “walletdb: Consistently clear key and value streams before writing” (5b5c131f9665e21f6dcf109e400926d054dd1fb5)
105+
106+ // Test each supported db
107+ for (const auto& database : dbs) {
108+ BOOST_ASSERT(database);
109+
110+ std::vector<std::string> prefixes = {"FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
In commit “walletdb: Consistently clear key and value streams before writing” (5d74d702c16f4e34c96e9573049a2fe930ac935c)
Right now it seems like there is not test coverage for calling GetNewPrefixCursor with an empty prefix. Also, the \xff\xff
test case I previously suggested is not testing what I originally thought it would test. Because of the way strings are serialized, there’s a compact int value before the prefix so the prefix has other characters beside \xff
.
Seeing these things, I wrote a new test case to add more coverage, and wound up finding two corner case bugs in bdb and sqlite code when empty spans are passed around. I think the bugs cannot have real-world side effects, but I’m not sure about this, and think they are probably worth fixing to avoid surprises in the future.
Here are all the changes I would suggest including the new test and two bugfixes
0diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
1index 8d764911ca5b..68abdcd81e9e 100644
2--- a/src/wallet/bdb.cpp
3+++ b/src/wallet/bdb.cpp
4@@ -698,7 +698,7 @@ DatabaseCursor::Status BerkeleyCursor::Next(DataStream& ssKey, DataStream& ssVal
5 if (ret == DB_NOTFOUND) {
6 return Status::DONE;
7 }
8- if (ret != 0 || datKey.get_data() == nullptr || datValue.get_data() == nullptr) {
9+ if (ret != 0) {
10 return Status::FAIL;
11 }
12
13diff --git a/src/wallet/db.h b/src/wallet/db.h
14index 9d684225c343..9d7bceb893f3 100644
15--- a/src/wallet/db.h
16+++ b/src/wallet/db.h
17@@ -49,6 +49,7 @@ private:
18 virtual bool WriteKey(DataStream&& key, DataStream&& value, bool overwrite = true) = 0;
19 virtual bool EraseKey(DataStream&& key) = 0;
20 virtual bool HasKey(DataStream&& key) = 0;
21+ friend class DatabaseBatchTest;
22
23 public:
24 explicit DatabaseBatch() {}
25diff --git a/src/wallet/sqlite.cpp b/src/wallet/sqlite.cpp
26index 9c7ec9721ce2..fe10f911c4ce 100644
27--- a/src/wallet/sqlite.cpp
28+++ b/src/wallet/sqlite.cpp
29@@ -40,7 +40,11 @@ static bool BindBlobToStatement(sqlite3_stmt* stmt,
30 Span<const std::byte> blob,
31 const std::string& description)
32 {
33- int res = sqlite3_bind_blob(stmt, index, blob.data(), blob.size(), SQLITE_STATIC);
34+ // Pass a pointer to the empty string "" below instead of passing the
35+ // blob.data() pointer if the blob.data() pointer is null. Passing a null
36+ // data pointer to bind_blob would cause sqlite to bind the SQL NULL value
37+ // instead of the empty blob value X'', which would mess up SQL comparisons.
38+ int res = sqlite3_bind_blob(stmt, index, blob.data() ? static_cast<const void*>(blob.data()) : "", blob.size(), SQLITE_STATIC);
39 if (res != SQLITE_OK) {
40 LogPrintf("Unable to bind %s to statement: %s\n", description, sqlite3_errstr(res));
41 sqlite3_clear_bindings(stmt);
42diff --git a/src/wallet/test/db_tests.cpp b/src/wallet/test/db_tests.cpp
43index 14272e2fe441..a22e1381437c 100644
44--- a/src/wallet/test/db_tests.cpp
45+++ b/src/wallet/test/db_tests.cpp
46@@ -20,7 +20,52 @@
47 #include <memory>
48 #include <string>
49
50+inline std::ostream& operator<<(std::ostream& os, const std::pair<const SerializeData, SerializeData>& kv)
51+{
52+ Span key{kv.first}, value{kv.second};
53+ os << "(\"" << std::string_view{reinterpret_cast<const char*>(key.data()), key.size()} << "\", \""
54+ << std::string_view{reinterpret_cast<const char*>(key.data()), key.size()} << "\")";
55+ return os;
56+}
57+
58 namespace wallet {
59+
60+class DatabaseBatchTest
61+{
62+public:
63+ static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value,
64+ bool overwrite = true)
65+ {
66+ return batch.Write(DataStream{key}, DataStream{value}, overwrite);
67+ }
68+};
69+
70+static Span<const std::byte> StringBytes(std::string_view str)
71+{
72+ return AsBytes<const char>({str.data(), str.size()});
73+}
74+
75+static SerializeData StringData(std::string_view str)
76+{
77+ auto bytes = StringBytes(str);
78+ return SerializeData{bytes.begin(), bytes.end()};
79+}
80+
81+static void CheckPrefix(DatabaseBatch& batch, Span<const std::byte> prefix, MockableData expected)
82+{
83+ std::unique_ptr<DatabaseCursor> cursor = batch.GetNewPrefixCursor(prefix);
84+ MockableData actual;
85+ while (true) {
86+ DataStream key, value;
87+ DatabaseCursor::Status status = cursor->Next(key, value);
88+ if (status == DatabaseCursor::Status::DONE) break;
89+ BOOST_CHECK(status == DatabaseCursor::Status::MORE);
90+ BOOST_CHECK(
91+ actual.emplace(SerializeData(key.begin(), key.end()), SerializeData(value.begin(), value.end())).second);
92+ }
93+ BOOST_CHECK_EQUAL_COLLECTIONS(actual.begin(), actual.end(), expected.begin(), expected.end());
94+}
95+
96 BOOST_FIXTURE_TEST_SUITE(db_tests, BasicTestingSetup)
97
98 static std::shared_ptr<BerkeleyEnvironment> GetWalletEnv(const fs::path& path, fs::path& database_filename)
99@@ -86,28 +131,29 @@ BOOST_AUTO_TEST_CASE(getwalletenv_g_dbenvs_free_instance)
100 BOOST_CHECK(env_2_a == env_2_b);
101 }
102
103-BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
104+static std::vector<std::unique_ptr<WalletDatabase>> TestDatabases(const fs::path& path_root)
105 {
106 std::vector<std::unique_ptr<WalletDatabase>> dbs;
107-
108- // Create dbs
109 DatabaseOptions options;
110 DatabaseStatus status;
111 bilingual_str error;
112- std::vector<bilingual_str> warnings;
113 #ifdef USE_BDB
114- dbs.emplace_back(MakeBerkeleyDatabase(m_path_root / "bdb", options, status, error));
115+ dbs.emplace_back(MakeBerkeleyDatabase(path_root / "bdb", options, status, error));
116 #endif
117 #ifdef USE_SQLITE
118- dbs.emplace_back(MakeSQLiteDatabase(m_path_root / "sqlite", options, status, error));
119+ dbs.emplace_back(MakeSQLiteDatabase(path_root / "sqlite", options, status, error));
120 #endif
121 dbs.emplace_back(CreateMockableWalletDatabase());
122+ return dbs;
123+}
124
125+BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
126+{
127 // Test each supported db
128- for (const auto& database : dbs) {
129+ for (const auto& database : TestDatabases(m_path_root)) {
130 BOOST_ASSERT(database);
131
132- std::vector<std::string> prefixes = {"FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
133+ std::vector<std::string> prefixes = {"", "FIRST", "SECOND", "P\xfe\xff", "P\xff\x01", "\xff\xff"};
134
135 // Write elements to it
136 std::unique_ptr<DatabaseBatch> handler = database->MakeBatch();
137@@ -143,5 +189,31 @@ BOOST_AUTO_TEST_CASE(db_cursor_prefix_range_test)
138 }
139 }
140
141+// Lower level DatabaseBase::GetNewPrefixCursor test, to cover cases that aren't
142+// covered in the higher level test above. The higher level test uses
143+// serialized strings which are prefixed with string length, so it doesn't test
144+// truly empty prefixes or prefixes that begin with \xff
145+BOOST_AUTO_TEST_CASE(db_cursor_prefix_byte_test)
146+{
147+ const MockableData::value_type
148+ e{StringData(""), StringData("e")},
149+ p{StringData("prefix"), StringData("p")},
150+ ps{StringData("prefixsuffix"), StringData("ps")},
151+ f{StringData("\xff"), StringData("f")},
152+ fs{StringData("\xffsuffix"), StringData("fs")},
153+ ff{StringData("\xff\xff"), StringData("ff")},
154+ ffs{StringData("\xff\xffsuffix"), StringData("ffs")};
155+ for (const auto& database : TestDatabases(m_path_root)) {
156+ std::unique_ptr<DatabaseBatch> batch = database->MakeBatch();
157+ for (const auto& [k, v] : {e, p, ps, f, fs, ff, ffs}) {
158+ DatabaseBatchTest::WriteBytes(*batch, k, v);
159+ }
160+ CheckPrefix(*batch, StringBytes(""), {e, p, ps, f, fs, ff, ffs});
161+ CheckPrefix(*batch, StringBytes("prefix"), {p, ps});
162+ CheckPrefix(*batch, StringBytes("\xff"), {f, fs, ff, ffs});
163+ CheckPrefix(*batch, StringBytes("\xff\xff"), {ff, ffs});
164+ }
165+}
166+
167 BOOST_AUTO_TEST_SUITE_END()
168 } // namespace wallet
132+ key >> key_back;
133+ BOOST_CHECK_EQUAL(key_back, prefix);
134+
135+ unsigned int value_back;
136+ value >> value_back;
137+ BOOST_CHECK_EQUAL(value_back, i);
In commit “walletdb: Consistently clear key and value streams before writing” (5d74d702c16f4e34c96e9573049a2fe930ac935c)
This test case seems to rely on getting values back in a particular order, but I don’t think the sqlite implementation actually guarantees rows will be returned in any particular order, since no sorting is requested.
i
stored in the key.
Before writing data to the output key and value streams, make sure they
are cleared.
32+
33+static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value,
34+ bool overwrite = true)
35+{
36+ return batch.Write(DataStream{key}, DataStream{value}, overwrite);
37+}
In commit “wallet: Add GetPrefixCursor to DatabaseBatch” (13476fe7bebdbf51e09821850b2c808c8ecf116a)
Maybe add a comment above the batch.Write() call like “Convert the key and value to DataStream objects in order to bypass serialization. We want raw bytes to be written to the database, not serialized byte strings. The DatabaseBatch::Write template method normally serializes its arguments, but because DataStream has a Serialize method that does concatenation instead of serialization, it can be used to bypass serialization.”
I was very surprised that this worked and it took me a while to figure out what was happening. I do think it would probably be more straightforward to call the private DatabaseBatch::WriteKey
method instead of DatabaseBatch::Write
which is why I added the friend class in my earlier suggestion. But I guess this approach should be ok as long it is explained.
It does seem pretty crazy to me that DataStream has a Serialize method that concatenates instead of serializing, and can’t be unserialized. But I guess it is handy in this situation and maybe others.
Is the same not true for span itself? The only reason it doesn’t work is that it is missing a std::byte
specialization?
Is the same not true for span itself?
No because Span<char>
is fixed length and can be deserialized, while DataStream
is variable length and can’t be deserialized.
Reasons why I think it would be good to get rid of DataStream::Serialize
method:
DataStream
has no Unserialize
method and no reasonable way of adding one that would be consistent with the existing Serialize
implementation.Serialize
method has surprising behavior. If DataStream
was going to have a serialize method I would expect to behave same was as serialize methods as other variable length objects (strings, vectors, maps) and be prefixed with a size field and be deserializable into an empty object. I suggested adding a comment here because code using it was not working the way I expected it to, and it took me a while to figure out how it was working at all.Serialize
method. If we want a concatenation operator, +=
should work perfectly well. <<
is good to use for serialization and formatting, and should in principle be reversible with >>
. Concatenation is a different thing.If I remove DataStream::Serialize
method in master there are only a few compilation errors, so I think I should be able to make a simple PR to clean things up
Span
s are serialized without length prefixes, so this function could just pass key
and value
directly to DatabaseBatch::Write
rather than going through a DataStream
, as long as there is a Serialize
method for Span<const std::byte>
. This seems like a more intuitive solution, so I’ve implemented that.
Oops, I guess I was objecting to a point Marco wasn’t making. But I think I would still like to avoid overloading Serialize for std::byte spans. The whole point of std::byte is that it’s supposed to be a very safe type which requires you to be deliberate and explicit about conversions. So I’m not sure it would be great to serialize std::byte spans (or types which can be converted to byte spans) as raw bytes without requiring a more explicit cast.
Maybe it would make sense to change serialize.h in the future, but I think it would be best to leave it alone in this PR. Would suggest reverting serialize.h and just doing:
0--- a/src/wallet/test/db_tests.cpp
1+++ b/src/wallet/test/db_tests.cpp
2@@ -30,12 +30,6 @@ inline std::ostream& operator<<(std::ostream& os, const std::pair<const Serializ
3
4 namespace wallet {
5
6-static bool WriteBytes(DatabaseBatch& batch, Span<const std::byte> key, Span<const std::byte> value,
7- bool overwrite = true)
8-{
9- return batch.Write(key, value, overwrite);
10-}
11-
12 static Span<const std::byte> StringBytes(std::string_view str)
13 {
14 return AsBytes<const char>({str.data(), str.size()});
15@@ -203,7 +197,7 @@ BOOST_AUTO_TEST_CASE(db_cursor_prefix_byte_test)
16 for (const auto& database : TestDatabases(m_path_root)) {
17 std::unique_ptr<DatabaseBatch> batch = database->MakeBatch();
18 for (const auto& [k, v] : {e, p, ps, f, fs, ff, ffs}) {
19- WriteBytes(*batch, k, v);
20+ batch->Write(MakeUCharSpan(k), MakeUCharSpan(v));
21 }
22 CheckPrefix(*batch, StringBytes(""), {e, p, ps, f, fs, ff, ffs});
23 CheckPrefix(*batch, StringBytes("prefix"), {p, ps});
But I think I would still like to avoid overloading Serialize for std::byte spans. The whole point of std::byte is that it’s supposed to be a very safe type which requires you to be deliberate and explicit about conversions.
Agree on this. For raw C-Arrays where the length is denoted in the type, it shouldn’t cause any issues. Probably the same for C++20 spans that are fixed size (but our Span
doesn’t have that feature).
No because
Span<char>
is fixed length
Are you sure? One can call subspan(1)
to change the length.
So I’m not sure it would be great to serialize std::byte spans (or types which can be converted to byte spans) as raw bytes without requiring a more explicit cast.
What about removing the Span overloads and replacing them with one that forces the call site to do a conversion, such as MakeSerSpan(a)
? Though, that will probably require introducing a 4th type-safe type for a byte.
I think I agree that current Serialize(Span<char>)
overloads could be a little dangerous:
Because if a variable-length container could be implicitly converted to span of these types, it might incorrectly match one of these overloads and be serialized as a fixed length span without a length prefix, and no longer be deserializable as the original container.
If that is a real problem, we could probably fix it by replacing those overloads with:
0template<typename Stream, typename T> inline void Serialize(Stream& s, const Span<T>& span);
1template<typename Stream, typename T> inline void Unserialize(Stream& s, Span<T>& span);
Since these would only match actual Span
objects, and would not match arguments implicitly convertible to Span. I think this would be as safe as your MakeSerSpan
idea and not require introducing a new type.
I also agree that the Span<char>
char type is different char[N]
type because one is a fixed length type and the other just has fixed length objects (subspan
method returns a new object, it doesn’t change the size of an existing object). But the point about deserialization applies to both of cases, because you can deserialize both char[N]
objects and Span<char>
objects without a length prefix.
AsBytes
or std::as_bytes
Actually, I don’t think a silent unintended conversion can happen, because if there is no exact match, it will fallback to calling the Serialize
method, see https://github.com/bitcoin/bitcoin/blob/7d65e3372fc7e08fccd7babb3aa66351e0e9442d/src/serialize.h#L693-L700
So I think we should just go with my original suggestion to enable Span<std::byte>
serialization?
Span<std::byte>
serialization added in https://github.com/bitcoin/bitcoin/pull/27927
163+ for (const auto& prefix : prefixes) {
164+ DataStream s_prefix;
165+ s_prefix << prefix;
166+ std::unique_ptr<DatabaseCursor> cursor = handler->GetNewPrefixCursor(s_prefix);
167+ DataStream key;
168+ CDataStream value(SER_DISK, CLIENT_VERSION);
0 DataStream value;
nit (feel free to ignore)
In order to get records beginning with a prefix, we will need a cursor
specifically for that prefix. So add a GetPrefixCursor function and
DatabaseCursor classes for dealing with those prefixes.
Tested on each supported db engine.
1) Write two different key->value elements to db.
2) Create a new prefix cursor and walk-through every returned element,
verifying that it gets parsed properly.
3) Try to move the cursor outside the filtered range: expect failure
and flag complete=true.
Co-Authored-By: Ryan Ofsky <ryan@ofsky.org>
Co-Authored-By: furszy <matiasfurszyfer@protonmail.com>
130+#ifdef USE_BDB
131+ dbs.emplace_back(MakeBerkeleyDatabase(path_root / "bdb", options, status, error));
132+#endif
133+#ifdef USE_SQLITE
134+ dbs.emplace_back(MakeSQLiteDatabase(path_root / "sqlite", options, status, error));
135+#endif
Could:
0for (const DatabaseFormat& db_format : DATABASE_FORMATS) {
1 dbs.emplace_back(MakeDatabase(path_root / strprintf("%d", db_format).c_str(), options, status, error));
2 BOOST_ASSERT(status == DatabaseStatus::SUCCESS);
3}
559+ break;
560+ }
561+ if (it == end_range.rend()) {
562+ // If the prefix is all 0xff bytes, clear end_range as we won't need it
563+ end_range.clear();
564+ }
Noticed that this isn’t covered by the tests.
In the test, all prefixes are serialized into a DataStream
and then provided to GetNewPrefixCursor
, so the size is always part of the data. Therefore, we never get up to this point.
Still, it doesn’t seems to be reachable right now; As the db handler write function always serializes data, the size is always there.
But.. for the sake of test coverage completeness and leave nothing to chance, made a commit that exercises it: https://github.com/furszy/bitcoin-core/commit/1ace124d2e5dd35b3bbe882512a425810b0fb83e. Which is passing, so great.
Not sure if it worth to include it, as it’s including a new RawWrite
function to skip the serialization step to not add the size..
we might get rid of it by subclassing the sqlite in the test and adding a custom Write
function and also cleanup the code further by sharing part of the code with the loop that is above but.. still not sure.
Noticed that this isn’t covered by the tests.
If this is true, it seems like a bug in the test because two of the CheckPrefix lines were written specifically to cover this case.
I think there is no need for a RawWrite method because spans of bytes should be serialized by just appending the bytes.
ACK ba616b93
Nice db_cursor_prefix_byte_test
test case 👌🏼.