wallet: Implement independent BDB parser #26606

pull achow101 wants to merge 16 commits into bitcoin:master from achow101:berkeleyro changing 23 files +1270 −19
  1. achow101 commented at 9:50 pm on November 29, 2022: member

    Split from #26596

    This PR adds BerkeleyRODatabase which is an independent implementation of a BDB file parser. It provides read only access to a BDB file, and can therefore be used as a read only database backend for wallets. This will be used for dumping legacy wallet records and migrating legacy wallets without the need for BDB itself.

    Wallettool’s dump command is changed to use BerkeleyRODatabase instead of BerkeleyDatabase (and CWallet itself) to demonstrate that this parser works and to test it against the existing wallettool functional tests.

  2. DrahtBot commented at 9:50 pm on November 29, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, josibake, furszy, laanwj, theStack
    Concept ACK darosior, Sjors

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29071 (refactor: Remove Span operator==, Use std::ranges::equal by maflcko)
    • #28236 (fuzz: wallet, add target for Spend by Ayush170-Future)
    • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Wallet on Nov 29, 2022
  4. achow101 force-pushed on Nov 30, 2022
  5. achow101 force-pushed on Nov 30, 2022
  6. achow101 force-pushed on Nov 30, 2022
  7. Marek777777 approved
  8. achow101 force-pushed on Dec 2, 2022
  9. Marek777777 changes_requested
  10. Marek777777 commented at 2:11 am on December 3, 2022: none
    .
  11. DrahtBot added the label Needs rebase on Feb 17, 2023
  12. achow101 force-pushed on Feb 17, 2023
  13. DrahtBot removed the label Needs rebase on Feb 17, 2023
  14. in src/streams.h:535 in 1d093bbf97 outdated
    529@@ -530,6 +530,14 @@ class AutoFile
    530     //
    531     // Stream subset
    532     //
    533+    void seek(int64_t offset, int origin)
    534+    {
    535+        if (!file)
    


    Sjors commented at 4:02 pm on March 1, 2023:

    1d093bbf97b8d4ec5413ca560ac79e56f23d7f1d {

    Could also use the IsNull() helper.


    achow101 commented at 6:09 pm on March 15, 2023:
    Done
  15. in src/streams.h:525 in 1d093bbf97 outdated
    529@@ -530,6 +530,14 @@ class AutoFile
    530     //
    531     // Stream subset
    532     //
    533+    void seek(int64_t offset, int origin)
    


    Sjors commented at 4:05 pm on March 1, 2023:

    1d093bbf97b8d4ec5413ca560ac79e56f23d7f1d: Note for other reviewers: yes

    https://en.cppreference.com/w/c/io/fseek

  16. in src/streams.h:531 in 1d093bbf97 outdated
    529@@ -530,6 +530,14 @@ class AutoFile
    530     //
    531     // Stream subset
    532     //
    533+    void seek(int64_t offset, int origin)
    534+    {
    535+        if (!file)
    536+            throw std::ios_base::failure("CAutoFile::read: file handle is nullptr");
    537+        if (fseek(file, offset, origin) != 0)
    538+            throw std::ios_base::failure(feof(file) ? "CAutoFile::seek: end of file" : "CAutoFile::seek: fseek failed");
    


    Sjors commented at 4:07 pm on March 1, 2023:
    1d093bbf97b8d4ec5413ca560ac79e56f23d7f1d: maybe log the exact error code.

    achow101 commented at 6:09 pm on March 15, 2023:
    It’s kind of annoying to do that, so I’ll leave it as is. The other functions do the same.
  17. in src/wallet/migrate.h:6 in 1224f45a92 outdated
    0@@ -0,0 +1,104 @@
    1+// Copyright (c) 2021 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifndef BITCOIN_WALLET_MIGRATE_H
    6+#define BITCOIN_WALLET_MIGRATE_H
    


    Sjors commented at 4:32 pm on March 1, 2023:
    1224f45a92972199799d1ea33700eee567dc66fe: maybe call this file bdb_ro_wallet_db.h?

    achow101 commented at 6:10 pm on March 15, 2023:
    The intent is to have all of the migration stuff in these files, so I will leave it as is.
  18. in src/wallet/migrate.cpp:24 in d6c5aa0f67 outdated
    20@@ -21,17 +21,37 @@ bool BerkeleyRODatabase::Backup(const std::string& dest) const
    21 
    22 bool BerkeleyROBatch::ReadKey(DataStream&& key, DataStream& value)
    23 {
    24-    return false;
    


    Sjors commented at 4:35 pm on March 1, 2023:
    d6c5aa0f6783ef9f26f93a583d2fad5a943e23ec: commit description also mentions StartCursor, ReadAtCursor, CloseCursor.

    achow101 commented at 6:10 pm on March 15, 2023:
    Updated the message.
  19. Sjors commented at 4:42 pm on March 1, 2023: member
    Some review nibbles.
  20. achow101 force-pushed on Mar 15, 2023
  21. darosior commented at 10:29 am on March 22, 2023: member
    Concept ACK. Have you looked into a fuzz target for this parser?
  22. achow101 commented at 6:02 pm on March 22, 2023: member

    Have you looked into a fuzz target for this parser?

    I haven’t. If someone would like to write one, I’d be happy to add it.

  23. DrahtBot added the label CI failed on May 1, 2023
  24. achow101 force-pushed on May 2, 2023
  25. achow101 force-pushed on May 2, 2023
  26. DrahtBot removed the label CI failed on May 2, 2023
  27. Sjors commented at 3:40 pm on June 5, 2023: member

    Adding a fuzz target that compares this to the real thing seems useful. Given that this PR changes dumpwallet to rely on the new parser, I think we should do that sooner rather than later. Otherwise I suggest changing e88ad38ac2f3e43de54a332965b13eb7cc27b91f so that the read-only parser is optional, and then change the tests to always use it.

    It seems a bit daunting to compare the low level DBD parsing code in this PR to the real thing. Having a fuzzer also reduces the need to be very thorough in that. I’m not sure how much confidence I have in the existing test coverage for dumpwallet.

  28. DrahtBot added the label CI failed on Jun 7, 2023
  29. achow101 force-pushed on Jun 8, 2023
  30. achow101 force-pushed on Jun 8, 2023
  31. DrahtBot removed the label CI failed on Jun 8, 2023
  32. TheCharlatan commented at 4:52 pm on June 10, 2023: contributor
    Concept ACK
  33. in src/wallet/migrate.h:96 in f108c6b565 outdated
    91+{
    92+private:
    93+    const BerkeleyRODatabase& m_database;
    94+
    95+    bool ReadKey(DataStream&& key, DataStream& value) override;
    96+    bool WriteKey(DataStream&& key, DataStream&& value, bool overwrite=true) override { return true; }
    


    TheCharlatan commented at 10:11 am on June 14, 2023:
    Why is this method returning true, but false for the erase methods?

    achow101 commented at 4:20 pm on June 30, 2023:
    IIRC I did this to avoid issues with old wallets that the wallet loading logic would do automatic upgrading of which expects writing to not fail. It actually should be fine for those to not write anything.

    josibake commented at 9:03 am on May 13, 2024:
    nit: might be worth adding a comment here explaining this. I was also surprised to see this not returning false and couldn’t understand why this was different from other methods until I read your comment here.

    achow101 commented at 3:17 am on May 14, 2024:
    Added a comment.
  34. TheCharlatan commented at 9:21 pm on June 14, 2023: contributor

    Approach ACK

    I’m testing this against my hand-rolled databases, so far this looks solid.

  35. TheCharlatan commented at 10:58 am on June 21, 2023: contributor
    Wrote my first fuzz test (I really don’t know what I am doing so not even sure this is useful), and managed to hit: fuzz: wallet/migrate.cpp:384: void wallet::RecordsPage::Unserialize(Stream &) [Stream = CAutoFile]: Assertion false failed.
  36. maflcko commented at 11:04 am on June 21, 2023: member

    wallet/migrate.cpp:384: void wallet::RecordsPage::Unserialize(Stream &) [Stream = CAutoFile]: Assertion false failed

    Sound like a real bug, because assert should generally not be used to validate user input or files read from disk?

  37. achow101 commented at 4:27 pm on June 21, 2023: member

    Wrote my first fuzz test (I really don’t know what I am doing so not even sure this is useful), and managed to hit: fuzz: wallet/migrate.cpp:384: void wallet::RecordsPage::Unserialize(Stream &) [Stream = CAutoFile]: Assertion false failed.

    Do you have the input that hit that?

  38. TheCharlatan commented at 6:33 pm on June 21, 2023: contributor

    Re #26606 (comment)

    Do you have the input that hit that?

    Yes, pasting as base64 here, since I’m not sure about the practice of attaching random binary files:

    0echo "HQABAAAAAQAAAACPAAUxYgAAAAmK/////wkAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAKysrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/////gAAAAAAz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Xz8/Pz8/Pz8/Pz8/Pz8/Pz88Az8/Pz8/PADqKIIr/////////EwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAArKysAAAAAAAAAAAAAAAAAAADPz8/Pz8/PxXt7e3t7e3t7AABPAAUxz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/S0tLS0tLS0tLS0tLS0tLS0tLS0tLS0nh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eNLS0tLS0tLSKtLS0tLS0tLS0tLS0tLS0tDS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0s/Pz8/Pz8/Pz8/Pz8/PAAB7e3t7e3t7e3sBAAABAAACK08ABQAgXFxcXFxcAQAAAACPAAUxYgAAAAmK/////////////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKysrAAAAAAAAAAAAAAAAAAAAz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/PzwAAAP/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pzw==" | base64 -d > crash-assert
    
  39. achow101 force-pushed on Jun 28, 2023
  40. achow101 commented at 10:28 pm on June 28, 2023: member
    Found an issue with prefix cursor handling, updated the tests to check for that. Also pulled in @TheCharlatan’s fuzz test and changed several asserts to instead throw exceptions which resolves the fuzz crash.
  41. DrahtBot added the label CI failed on Jun 29, 2023
  42. maflcko commented at 6:49 am on June 29, 2023: member
    Looks like the fuzz target doesn’t compile on windows?
  43. achow101 force-pushed on Jun 29, 2023
  44. DrahtBot removed the label CI failed on Jun 29, 2023
  45. in src/streams.h:528 in 5b3573e201 outdated
    521@@ -522,6 +522,16 @@ class AutoFile
    522     //
    523     // Stream subset
    524     //
    525+    void seek(int64_t offset, int origin)
    526+    {
    527+        if (IsNull()) {
    528+            throw std::ios_base::failure("CAutoFile::read: file handle is nullptr");
    


    maflcko commented at 8:21 pm on June 29, 2023:
    0            throw std::ios_base::failure("AutoFile::seek: file handle is nullptr");
    

    ?


    achow101 commented at 9:02 pm on June 29, 2023:
    Done
  46. achow101 force-pushed on Jun 29, 2023
  47. in src/streams.h:528 in 6d2f8d9990 outdated
    521@@ -522,6 +522,16 @@ class AutoFile
    522     //
    523     // Stream subset
    524     //
    525+    void seek(int64_t offset, int origin)
    526+    {
    527+        if (IsNull()) {
    528+            throw std::ios_base::failure("AutoFile::read: file handle is nullptr");
    


    maflcko commented at 6:00 am on June 30, 2023:
    0            throw std::ios_base::failure("AutoFile::seek: file handle is nullptr");
    

    achow101 commented at 3:43 pm on June 30, 2023:
    Done
  48. maflcko commented at 6:02 am on June 30, 2023: member
    (nit)
  49. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:22 in 6d2f8d9990 outdated
    17+
    18+using wallet::DatabaseOptions;
    19+using wallet::DatabaseStatus;
    20+
    21+namespace {
    22+const TestingSetup* g_setup;
    


    TheCharlatan commented at 9:58 am on June 30, 2023:

    After testing with a valid database in the fuzzing corpus, I noticed that I messed up the m_args, m_node.args, gArgs division again (see here) - sorry about that. I have a basic patch here, but feel free to commit a more elegant fix:

     0diff --git a/src/wallet/test/fuzz/wallet_bdb_parser.cpp b/src/wallet/test/fuzz/wallet_bdb_parser.cpp
     1index aabde093b6..a1acb95178 100644
     2--- a/src/wallet/test/fuzz/wallet_bdb_parser.cpp
     3+++ b/src/wallet/test/fuzz/wallet_bdb_parser.cpp
     4@@ -19,23 +19,15 @@ using wallet::DatabaseOptions;
     5 using wallet::DatabaseStatus;
     6 
     7 namespace {
     8-const TestingSetup* g_setup;
     9+TestingSetup* g_setup;
    10 } // namespace
    11 
    12 void initialize_wallet_bdb_parser()
    13 {
    14-    static const auto testing_setup = MakeNoLogFileContext<const TestingSetup>();
    15+    static auto testing_setup = MakeNoLogFileContext<TestingSetup>();
    16     g_setup = testing_setup.get();
    17 }
    18 
    19-void SetDumpFile(ArgsManager& args) {
    20-    auto dumpfile{args.GetDataDirNet() / "fuzzed_dumpfile.dat"};
    21-    if (fs::exists(dumpfile)) { // Writing into an existing dump file will throw an exception
    22-        remove(dumpfile);
    23-    }
    24-    args.ForceSetArg("-dumpfile", fs::PathToString(args.GetDataDirNet() / "fuzzed_dumpfile.dat"));
    25-}
    26-
    27 FUZZ_TARGET_INIT(wallet_bdb_parser, initialize_wallet_bdb_parser)
    28 {
    29     FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    30@@ -52,10 +44,14 @@ FUZZ_TARGET_INIT(wallet_bdb_parser, initialize_wallet_bdb_parser)
    31     DatabaseStatus status;
    32     bilingual_str error;
    33 
    34+    auto dumpfile{g_setup->m_args.GetDataDirNet() / "fuzzed_dumpfile.dat"};
    35+    if (fs::exists(dumpfile)) { // Writing into an existing dump file will throw an exception
    36+        remove(dumpfile);
    37+    }
    38+    g_setup->m_args.ForceSetArg("-dumpfile", dumpfile);
    39+
    40     try {
    41         auto db{MakeBerkeleyRODatabase(wallet_path, options, status, error)};
    42-        const auto& node = g_setup->m_node;
    43-        SetDumpFile(*node.args);
    44         assert(DumpWallet(g_setup->m_args, *db, error));
    45     }
    46     catch (const std::runtime_error& e) {
    

    achow101 commented at 3:43 pm on June 30, 2023:
    Done as suggested
  50. in src/wallet/migrate.cpp:680 in 0b8eac6a78 outdated
    581@@ -519,7 +582,7 @@ void BerkeleyRODatabase::Open()
    582                     uint32_t next_page = orec->page_number;
    583                     while (next_page != 0) {
    584                         SeekToPage(db_file, next_page, page_size);
    585-                        PageHeader opage_header;
    586+                        PageHeader opage_header(inner_meta.other_endian);
    587                         db_file >> opage_header;
    588                         OverflowPage opage(opage_header);
    589                         db_file >> opage;
    


    TheCharlatan commented at 1:11 pm on June 30, 2023:

    In commit “Implement handling of other endianness in BerkeleyRODatabase” 0b8eac6a78351a68c1b5a6126564493ab50031dc

    I might be reading something wrong, or confusing something, but I don’t quite understand why the record data itself does not have to get its endianness adjusted.


    achow101 commented at 3:37 pm on June 30, 2023:
    The record data should just be arbitrary byte strings so they don’t have an endianness.

    TheCharlatan commented at 8:01 pm on July 4, 2023:
    ACK, please resolve.
  51. in src/wallet/migrate.cpp:224 in d5fe130106 outdated
    219+    uint32_t page_number; // Page number where data begins
    220+    uint32_t item_len;    // Total length of item
    221+
    222+    static constexpr size_t SIZE = 9; // Overflow record is always 9 bytes
    223+
    224+    std::vector<std::byte> data; // Data from all of the overflow pages
    


    TheCharlatan commented at 1:15 pm on June 30, 2023:

    In commit “wallet: implement independent BDB deserializer in BerkeleyRODatabase” d5fe130106bdf367c0bca157b02d12254590b585

    Can this data field be removed? Seems to be unused.


    achow101 commented at 3:44 pm on June 30, 2023:
    Removed. I think originally I though I could put the overflow page data into this field, but I didn’t end up doing it that way.
  52. achow101 force-pushed on Jun 30, 2023
  53. achow101 force-pushed on Jun 30, 2023
  54. DrahtBot added the label CI failed on Jun 30, 2023
  55. achow101 force-pushed on Jul 21, 2023
  56. achow101 force-pushed on Jul 21, 2023
  57. DrahtBot removed the label CI failed on Jul 21, 2023
  58. in src/wallet/migrate.cpp:527 in 6e28ff97a9 outdated
    522+    }
    523+    RecordsPage page(header);
    524+    db_file >> page;
    525+
    526+    // First record should be the string "main"
    527+    if (!std::holds_alternative<DataRecord>(page.records.at(0)) || std::get_if<DataRecord>(&page.records.at(0))->data != std::vector<std::byte>({std::byte('m'), std::byte('a'), std::byte('i'), std::byte('n')})) {
    


    TheCharlatan commented at 7:30 pm on July 25, 2023:
    This would be more readable if the main bytes were stuffed into a constant somewhere.

    achow101 commented at 7:30 pm on August 1, 2023:
    Done
  59. in src/wallet/migrate.cpp:570 in 6e28ff97a9 outdated
    525+
    526+    // First record should be the string "main"
    527+    if (!std::holds_alternative<DataRecord>(page.records.at(0)) || std::get_if<DataRecord>(&page.records.at(0))->data != std::vector<std::byte>({std::byte('m'), std::byte('a'), std::byte('i'), std::byte('n')})) {
    528+        throw std::runtime_error("Subdatabase has an unexpected name");
    529+    }
    530+    if (!std::holds_alternative<DataRecord>(page.records.at(1)) || std::get_if<DataRecord>(&page.records.at(1))->m_header.len != 4) {
    


    TheCharlatan commented at 7:43 pm on July 25, 2023:
    Can you add a comment explaining why we check this?

    achow101 commented at 7:30 pm on August 1, 2023:
    Added a comment
  60. TheCharlatan commented at 7:58 pm on July 25, 2023: contributor

    ACK 6e28ff97a99519ec8b50123bc1177084bba68f96 (besides the two small comments)

    I’m sure I could spend many hours more verifying the parser, but at this point I’d like to encourage other reviewers to take a look. I think if this is merged soon and given the opportunity to be fuzzed against a large corpus of both random seeds, bdb databases, and valid wallets we could also gain some confidence in the code.

  61. Sjors commented at 2:20 pm on July 31, 2023: member

    Concept ACK

    I still suggest changing fc810c1df899e5b42fe7658ebdddc6b4320240a6 so that dumpwallet does not use the read-only format by default, and instead change the tests use it. ~However with the new fuzz target commit I’m less worried about it.~

    The new fuzz target helps, but currently doesn’t compare between BDB and the RO implementation. I’m also not sure how to see if we’re covering all code paths. Should we manually produce some wallets to bootstrap a good corpus or is the fuzzer smart enough to find it?

  62. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:36 in 6e28ff97a9 outdated
    30+
    31+FUZZ_TARGET(wallet_bdb_parser, .init = initialize_wallet_bdb_parser)
    32+{
    33+    FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    34+
    35+    const auto wallet_path = g_setup->m_args.GetDataDirNet() / "fuzzed_wallet.dat";
    


    Sjors commented at 3:07 pm on July 31, 2023:
    Maybe add a recommendation to the fuzz doc to mount /tmp to tmpfs, or ensure we don’t use the file system.

    maflcko commented at 3:09 pm on July 31, 2023:

    This is already in the test docs, no?

    0$ git grep tmpfs test/README.md
    1test/README.md:sudo mount -t tmpfs -o size=4g tmpfs /mnt/tmp/
    

    Sjors commented at 3:16 pm on July 31, 2023:
    Not in the fuzz docs, but just having it in the test docs is probably fine.
  63. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:59 in 6e28ff97a9 outdated
    50+    }
    51+    g_setup->m_args.ForceSetArg("-dumpfile", fs::PathToString(dumpfile));
    52+
    53+    try {
    54+        auto db{MakeBerkeleyRODatabase(wallet_path, options, status, error)};
    55+        assert(DumpWallet(g_setup->m_args, *db, error));
    


    Sjors commented at 3:14 pm on July 31, 2023:

    It would be useful here - if BDB is compiled - to also open it with BDB, dump and then compare both dumps.

    I’m not sure what to do for cases where BDB and RO have slightly different opinions on what they’re willing to open at all. It’s probably fine to skip comparing files where either refuses to open.


    achow101 commented at 7:30 pm on August 1, 2023:
    Added a comparison

    Sjors commented at 4:26 pm on August 2, 2023:
    Nice! I’ll run the fuzzer against that…
  64. DrahtBot added the label Needs rebase on Aug 1, 2023
  65. achow101 force-pushed on Aug 1, 2023
  66. achow101 commented at 7:32 pm on August 1, 2023: member

    I still suggest changing fc810c1 so that dumpwallet does not use the read-only format by default, and instead change the tests use it.

    Since dumping should be a read-only operation, I think it’s better to use the read-only database since it definitely won’t write to it like BDB does. Also it provides a good mechanism for testing.

  67. DrahtBot added the label CI failed on Aug 1, 2023
  68. DrahtBot removed the label Needs rebase on Aug 1, 2023
  69. in src/streams.h:531 in 0dd59f3769 outdated
    527@@ -528,6 +528,7 @@ class AutoFile
    528     //
    529     // Stream subset
    530     //
    531+    void seek(int64_t offset, int origin);
    


    maflcko commented at 5:26 am on August 2, 2023:
    style nit: I don’t think this method is a subset of our streams. Many streams can’t seek to the beginning after a read. It may be better to move this up by 3 lines of code (or more if you want)

    achow101 commented at 5:29 pm on August 2, 2023:
    Done
  70. maflcko commented at 5:26 am on August 2, 2023: member
    Looks like CI is red?
  71. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:41 in 0dd59f3769 outdated
    36+
    37+    const auto wallet_path = g_setup->m_args.GetDataDirNet() / "fuzzed_wallet.dat";
    38+
    39+    {
    40+        AutoFile outfile{fsbridge::fopen(wallet_path, "wb")};
    41+        const auto file_data{ConsumeRandomLengthByteVector(fuzzed_data_provider)};
    


    maflcko commented at 5:30 am on August 2, 2023:

    style nit: If you remove this, it may be possible to feed in a “happy” wallet.dat (created with vanilla Bitcoin Core) as a fuzz input. See https://github.com/bitcoin-core/qa-assets/pull/140#issuecomment-1660088315

    Though, I haven’t tested this.


    Sjors commented at 5:20 pm on August 2, 2023:
    I simply copied a wallet.dat into the corpus directory. That resulted in a bunch of extra corpus entities and almost immediately a crash.

    Sjors commented at 5:23 pm on August 2, 2023:
    Though the file seems to get ignored when trying to merge.

    Sjors commented at 5:36 pm on August 2, 2023:
    Ah, figured it out! Will push some new entries to the qa-assets PR that should reproduce the crash.
  72. Sjors commented at 4:25 pm on August 2, 2023: member

    Since dumping should be a read-only operation, I think it’s better to use the read-only database since it definitely won’t write to it like BDB does.

    But it could produce a corrupt dump, which someone might rely on for their backup (even if they shouldn’t).

  73. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:95 in 0dd59f3769 outdated
    73+        if (std::string(e.what()) == "Unknown record type in records page") return;
    74+        if (std::string(e.what()) == "Unknown record type in internal page") return;
    75+        if (std::string(e.what()) == "Unexpected page size") return;
    76+        if (std::string(e.what()) == "Unexpected page type") return;
    77+        throw e;
    78+    }
    


    Sjors commented at 5:05 pm on August 2, 2023:
    So far my corpus doesn’t make it here, as tested by sprinkling assert(false). It may indeed need some help with a real wallet.
  74. achow101 force-pushed on Aug 2, 2023
  75. achow101 force-pushed on Aug 2, 2023
  76. Sjors commented at 6:14 pm on August 2, 2023: member

    I tried adding if (bdb_ro_err) return; above #ifdef USE_BDB (although that does reduce coverage of potentially valid wallets). I also added in the second catch:

    0// We are not interested in fuzzing the original BDB implementation
    1if (std::string(e.what()) == "BerkeleyDatabase: Error 22, can't open database fuzzed_wallet.dat") return;
    

    That helps a bit, but still leads to a “SEGV on unknown address 0x000000000000” in DumpWallet very quickly. I haven’t looked at what the fuzzer came up with. Perhaps it can be dealt with by making the RO parser more strict?

    Or can we tell the fuzzer to ignore crashes in the original BDB entirely? We only care about comparing dumps for wallets that the original implementation can load.

  77. DrahtBot removed the label CI failed on Aug 2, 2023
  78. achow101 force-pushed on Aug 2, 2023
  79. achow101 commented at 8:22 pm on August 2, 2023: member

    I’ve fixed at least one fuzzer crash. BDB verifies a couple of other things that RO was not.

    I also changed the fuzzer to make sure that BDB fails to open the database if RO failed to open it, and vice versa.

    There’s still another crash that I have, but it’s something where RO is being stricter than BDB.

  80. DrahtBot added the label CI failed on Aug 2, 2023
  81. achow101 force-pushed on Aug 2, 2023
  82. achow101 force-pushed on Aug 3, 2023
  83. DrahtBot removed the label CI failed on Aug 3, 2023
  84. Sjors commented at 8:50 am on August 3, 2023: member

    There’s still another crash that I have

    Not sure if that was fixed in your later push, but in any case, I pushed another crash to the qa-assets PR.

  85. in src/streams.cpp:39 in 167cbf7b18 outdated
    34+int64_t AutoFile::tell()
    35+{
    36+    if (IsNull()) {
    37+        throw std::ios_base::failure("AutoFile::tell: file handle is nullptr");
    38+    }
    39+    int64_t r = ftell(m_file);
    


    maflcko commented at 8:55 am on August 3, 2023:

    nit in 167cbf7b1851eeeb28937bd8e241e1f865b813f8:

    0    int64_t r{std::ftell(m_file)};
    
  86. in src/streams.cpp:29 in 167cbf7b18 outdated
    20@@ -21,6 +21,28 @@ std::size_t AutoFile::detail_fread(Span<std::byte> dst)
    21     }
    22 }
    23 
    24+void AutoFile::seek(int64_t offset, int origin)
    25+{
    26+    if (IsNull()) {
    27+        throw std::ios_base::failure("AutoFile::seek: file handle is nullptr");
    28+    }
    29+    if (fseek(m_file, offset, origin) != 0) {
    


    maflcko commented at 8:56 am on August 3, 2023:

    nit in 167cbf7b1851eeeb28937bd8e241e1f865b813f8:

    Could add std:: before fseek? Also, the commit message has a typo:

    usefule -> useful (remove e)

  87. maflcko commented at 8:56 am on August 3, 2023: member
    (nits only)
  88. achow101 force-pushed on Aug 3, 2023
  89. achow101 force-pushed on Aug 3, 2023
  90. achow101 commented at 4:26 pm on August 3, 2023: member

    But it could produce a corrupt dump, which someone might rely on for their backup (even if they shouldn’t).

    I’ve changed it to be optional and enabled with the argument -withinternalbdb.

    Not sure if that was fixed in your later push, but in any case,

    Yes, I fixed that

    I pushed another crash to the qa-assets PR.

    This seems to be crash in BDB itself rather than in our parser. Not entirely sure how to handle that since we don’t care about fuzzing BDB itself.

  91. Sjors commented at 5:15 pm on August 3, 2023: member

    Not entirely sure how to handle that since we don’t care about fuzzing BDB itself.

    Indeed, @MarcoFalke any idea on how to make the fuzzer ignore that?

  92. maflcko commented at 10:33 am on August 8, 2023: member
    Can you give more context? What are the steps to reproduce? My understanding is that the fuzz CI tasks don’t install and use bdb. And you can put a comment in the fuzz test to document that running this with bdb can be done at the users own risk. Outside of that, my recommendations would be to use bdb-5.3, which fixed a few bugs (or maybe it is possible to use an even later version, just for fuzzing?).
  93. DrahtBot added the label CI failed on Sep 17, 2023
  94. achow101 force-pushed on Sep 27, 2023
  95. achow101 force-pushed on Sep 27, 2023
  96. DrahtBot removed the label CI failed on Sep 28, 2023
  97. achow101 force-pushed on Oct 24, 2023
  98. DrahtBot added the label CI failed on Oct 24, 2023
  99. DrahtBot removed the label CI failed on Oct 25, 2023
  100. in src/wallet/migrate.cpp:28 in b7b90daf46 outdated
    20@@ -21,17 +21,41 @@ bool BerkeleyRODatabase::Backup(const std::string& dest) const
    21 
    22 bool BerkeleyROBatch::ReadKey(DataStream&& key, DataStream& value)
    23 {
    24-    return false;
    25+    SerializeData key_data{key.begin(), key.end()};
    26+    if (m_database.m_records.count(key_data) == 0) {
    27+        return false;
    28+    }
    29+    auto val = m_database.m_records.at(key_data);
    


    theStack commented at 0:53 am on November 5, 2023:

    (in commit b7b90daf46ae840d9ab089ac438f71f7b79dbf5a) could avoid duplicate std::map lookup by using .find instead:

    0    const auto it{m_database.m_records.find(key_data)};
    1    if (it == m_database.m_records.end()) {
    2         return false;
    3    }
    4    auto val = it->second;
    

    achow101 commented at 11:42 pm on November 13, 2023:
    Done as suggested
  101. in src/wallet/migrate.cpp:505 in a02db44419 outdated
    500+    }
    501+    RecordsPage page(header);
    502+    db_file >> page;
    503+
    504+    // First record should be the string "main"
    505+    if (!std::holds_alternative<DataRecord>(page.records.at(0)) || std::get_if<DataRecord>(&page.records.at(0))->data != SUBDATABASE_NAME) {
    


    theStack commented at 0:59 am on November 5, 2023:

    (in commit a02db44419148984d726f95fd66030b9a40fc0bb) readability nit: could also use std::get here, as the RHS of the OR conditoin is only executed if the variant holds the right data type:

    0    if (!std::holds_alternative<DataRecord>(page.records.at(0)) || std::get<DataRecord>(page.records.at(0)).data != SUBDATABASE_NAME) {
    

    (same for the two page.records.at(1) instances a few lines below)


    achow101 commented at 11:42 pm on November 13, 2023:
    Done as suggested
  102. theStack commented at 1:15 am on November 5, 2023: contributor

    Concept ACK

    As a follow-up, it might be nice if we could somehow add coverage for the “other endianness” code path in the tests. Did that manually with the following patch (creates the BDB database in big-endian format, while I’m running a little-endian system):

     0diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
     1index 9ea43ca67c..945e12ebdb 100644
     2--- a/src/wallet/bdb.cpp
     3+++ b/src/wallet/bdb.cpp
     4@@ -389,6 +389,7 @@ void BerkeleyDatabase::Open()
     5                 }
     6             }
     7 
     8+            pdb_temp->set_lorder(4321);
     9             ret = pdb_temp->open(nullptr,                             // Txn pointer
    10                             fMockDb ? nullptr : strFile.c_str(),      // Filename
    11                             fMockDb ? strFile.c_str() : "main",       // Logical db name
    

    Tests still passed, only had to adapt the BDB magic number check in wallet_migration.py which currently assumes LE byte order.

  103. TheCharlatan commented at 7:50 am on November 6, 2023: contributor

    Re #26606 (comment)

    What are the steps to reproduce?

    Run the fuzzer on the fuzz test introduced in this PR seeded with Sjors’ inputs from his qa repo PR and compiled with bdb enabled.

    Outside of that, my recommendations would be to use bdb-5.3

    The bugs persist even with bdb 5.3.

  104. maflcko commented at 9:16 am on November 6, 2023: member

    Ok, but then I think this is a problem for the qa-assets repo to solve. As I said, BDB is not enabled in the fuzz CI, so this shouldn’t cause CI issues.

    One way for qa-assets to solve this would be to exclude inputs that crash bdb.

  105. achow101 force-pushed on Nov 13, 2023
  106. achow101 force-pushed on Nov 28, 2023
  107. in src/wallet/walletdb.cpp:1480 in f18d35ec78 outdated
    1476@@ -1476,6 +1477,11 @@ std::unique_ptr<WalletDatabase> MakeDatabase(const fs::path& path, const Databas
    1477         return nullptr;
    1478     }
    1479 
    1480+    // If BERKELEY_RO was the format, then change the format from BERKELEY to BERKELEY_RO
    


    TheCharlatan commented at 3:15 pm on December 14, 2023:
    This comment is confusing, aren’t we going from BERKELEY to BERKELEY_RO?

    achow101 commented at 9:36 pm on January 2, 2024:
    Indeed, fixed.
  108. TheCharlatan approved
  109. TheCharlatan commented at 4:54 pm on December 14, 2023: contributor
    Re-ACK 3c91202c5a03c57c9f83186de9d3caf337b78d45
  110. DrahtBot requested review from Sjors on Dec 14, 2023
  111. DrahtBot requested review from darosior on Dec 14, 2023
  112. DrahtBot requested review from theStack on Dec 14, 2023
  113. achow101 force-pushed on Jan 2, 2024
  114. DrahtBot added the label Needs rebase on Jan 5, 2024
  115. achow101 force-pushed on Jan 5, 2024
  116. DrahtBot removed the label Needs rebase on Jan 6, 2024
  117. DrahtBot added the label CI failed on Jan 13, 2024
  118. achow101 force-pushed on Feb 1, 2024
  119. DrahtBot removed the label CI failed on Feb 1, 2024
  120. DrahtBot added the label Needs rebase on Feb 6, 2024
  121. achow101 force-pushed on Feb 8, 2024
  122. DrahtBot removed the label Needs rebase on Feb 8, 2024
  123. DrahtBot added the label CI failed on Feb 18, 2024
  124. achow101 force-pushed on Feb 20, 2024
  125. DrahtBot removed the label CI failed on Feb 20, 2024
  126. DrahtBot added the label CI failed on Feb 28, 2024
  127. DrahtBot commented at 6:27 am on February 28, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/21780695174

  128. achow101 force-pushed on Feb 29, 2024
  129. achow101 force-pushed on Mar 11, 2024
  130. achow101 commented at 3:44 pm on March 11, 2024: member
    Rebased for silent merge conflict
  131. achow101 force-pushed on Mar 11, 2024
  132. DrahtBot removed the label CI failed on Mar 12, 2024
  133. Add AutoFile::seek and tell
    It's useful to be able to seek to a specific position in a file. Allow
    AutoFile to seek by using fseek.
    
    It's also useful to be able to get the current position in a file.
    Allow AutoFile to tell by using ftell.
    ca18aea5c4
  134. achow101 force-pushed on Apr 1, 2024
  135. DrahtBot added the label CI failed on Apr 2, 2024
  136. DrahtBot removed the label CI failed on Apr 5, 2024
  137. in src/wallet/migrate.cpp:16 in 7f8d4c963c outdated
    11+#include <optional>
    12+#include <variant>
    13+
    14+namespace wallet {
    15+// Magic bytes in both endianness's
    16+constexpr uint32_t BTREE_MAGIC = 0x00053162; // If the file endianness matches our system, we see this magic
    


    laanwj commented at 12:04 pm on April 9, 2024:
    Just a note for other reviewers: This comment mentioning “our system” was slightly confusing to me: our serialization framework always serializes in LE, so if we see BTREE_MAGIC_OE it means the file is in big-endian. Which means we always need to swap it, independent of system’s endianness.
  138. in src/wallet/migrate.cpp:81 in 7f8d4c963c outdated
    76+    uint32_t last_page;       // Page number of last page in db
    77+    uint32_t partitions;      // Number of partitions
    78+    uint32_t key_count;       // Cached key count
    79+    uint32_t record_count;    // Cached record count
    80+    BTreeFlags flags;         // Flags
    81+    uint160 uid;              // Unique file ID (20 bytes, fits in uint16)
    


    laanwj commented at 12:16 pm on April 9, 2024:
    i don’t think importing uint160 is necessary here–the actual value isnt used, we’re just using it to consume some bytes

    achow101 commented at 7:47 pm on April 15, 2024:
    Dropped uint160, changed to use an array.
  139. in src/wallet/migrate.cpp:454 in 7707db3ad5 outdated
    449+    }
    450+};
    451+
    452+static void SeekToPage(AutoFile& s, uint32_t page_num, uint32_t page_size)
    453+{
    454+    size_t pos = page_num * page_size;
    


    murchandamus commented at 12:49 pm on April 9, 2024:
    calculation of two 32 bit integers, you need to use a cast

    sipa commented at 12:52 pm on April 9, 2024:
    You probably want size_t pos = size_t{page_num} * page_size;, or you’ll get an integer overflow for files over 4 GiB.

    laanwj commented at 12:53 pm on April 9, 2024:
    aside: might be better to not use size_t at all for file offsets, as the type is 32 bit on usual 32 bit platforms edit: AutoFile::seek uses int64_t internally, let’s ust that

    achow101 commented at 7:48 pm on April 15, 2024:
    Changed to int64_t with casting.
  140. in src/wallet/migrate.cpp:114 in 7f8d4c963c outdated
    102+        s >> lsn;
    103+        s >> page_num;
    104+        s >> magic;
    105+        s >> version;
    106+        s >> pagesize;
    107+        s >> encrypt_algo;
    


    laanwj commented at 12:59 pm on April 9, 2024:
    do we want to sanity check for encryption to be disabled? (didn’t know bdb supported db-level encryption in the first place) edit: i checked BDB, apparently this value is 0 for non-encrypted databases, otherwise some other value

    achow101 commented at 7:48 pm on April 15, 2024:
    Yes, I’ve added a check for that.
  141. laanwj commented at 3:59 pm on April 15, 2024: member

    i tried some old wallets, and there was a problem with parsing the headers of overflow pages (which have btree level 0), the following makes the check pass and adds a further check that overflow records correspond to overflow pages:

     0diff --git a/src/wallet/migrate.cpp b/src/wallet/migrate.cpp
     1index be54b58f67c9bc961f8df306fe4bc0bf810b41f3..1c4ce1c1ed8d44221626613df2cd0d4712fc5c85 100644
     2--- a/src/wallet/migrate.cpp
     3+++ b/src/wallet/migrate.cpp
     4@@ -361,7 +361,7 @@ public:
     5         if (expected_page_num != page_num) {
     6             throw std::runtime_error("Page number mismatch");
     7         }
     8-        if (level < 1) {
     9+        if ((type != PageType::OVERFLOW_DATA && level < 1) || (type == PageType::OVERFLOW_DATA && level != 0)) {
    10             throw std::runtime_error("Bad btree level");
    11         }
    12     }
    13@@ -636,6 +636,9 @@ void BerkeleyRODatabase::Open()
    14                         SeekToPage(db_file, next_page, page_size);
    15                         PageHeader opage_header(next_page, inner_meta.other_endian);
    16                         db_file >> opage_header;
    17+                        if (opage_header.type != PageType::OVERFLOW_DATA) {
    18+                            throw std::runtime_error("Bad overflow record page type");
    19+                        }
    20                         OverflowPage opage(opage_header);
    21                         db_file >> opage;
    22                         data.insert(data.end(), opage.data.begin(), opage.data.end());
    
  142. achow101 force-pushed on Apr 15, 2024
  143. achow101 force-pushed on Apr 15, 2024
  144. DrahtBot added the label CI failed on Apr 15, 2024
  145. DrahtBot commented at 7:47 pm on April 15, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/23845360026

  146. achow101 commented at 7:54 pm on April 15, 2024: member

    parsing the headers of overflow pages (which have btree level 0)

    Interesting, did not know that they would use an invalid btree level. Fixed as suggested.


    I’ve also commented out the enums that we do not use. I decided to leave them there for completeness, but they won’t be used.


    For help with reviewing, I suggest looking at the BDB source code. I’ve uploaded a copy of the version that we use at https://github.com/achow101/bdb-sources/tree/main/4.8.30.NC. The files that I used for referencing and checking that things were implemented correctly are dbinc/db_page.h, btree/bt_open.c, and btree/bt_put.c. That repo also has a couple of other BDB versions that people may have used for their wallets (except 18.x), so you may want to compare with those versions and check that the file format did not change between them.

  147. achow101 commented at 9:01 pm on April 15, 2024: member
    There was a comment at CoreDev that this should check that the file is compacted, so I’ve added a commit that does that. To check that this is correct, see the implementation of lsn_reset, which uses LSN_NOT_LOGGED. The definition of DB_LSN is also useful.
  148. laanwj commented at 6:13 am on April 16, 2024: member

    There was a comment at CoreDev that this should check that the file is compacted, so I’ve added a commit that does that.

    That’s good to have. It does look like the newly added check trips up in the CI in some places:

    0unknown location(0): fatal error: in "db_tests/db_cursor_prefix_range_test": class std::runtime_error: LSNs are not reset, this database is not completely flushed. Please reopen then close the database with a version that has BDB support
    
  149. achow101 commented at 2:22 pm on April 16, 2024: member

    That’s good to have. It does look like the newly added check trips up in the CI in some places:

    0unknown location(0): fatal error: in "db_tests/db_cursor_prefix_range_test": class std::runtime_error: LSNs are not reset, this database is not completely flushed. Please reopen then close the database with a version that has BDB support
    

    Fixed

  150. achow101 force-pushed on Apr 16, 2024
  151. DrahtBot removed the label CI failed on Apr 16, 2024
  152. in src/wallet/migrate.cpp:638 in ae58ab0a1d outdated
    633+        pages.pop_back();
    634+        SeekToPage(db_file, curr_page, page_size);
    635+        PageHeader header(curr_page, inner_meta.other_endian);
    636+        db_file >> header;
    637+        if (header.type != PageType::BTREE_LEAF && header.type != PageType::BTREE_INTERNAL) {
    638+            throw std::runtime_error("Unexpected inner database page type");
    


    laanwj commented at 6:13 pm on April 16, 2024:
    header.type is compared against BTREE_LEAF and BTREE_INTERNAL, which are the same cases as handled in the switch() statement below, which also errors in the default page (though with a slightly different error), so this check is redundant?

    achow101 commented at 8:43 pm on April 16, 2024:
    Good point. Removed this check.
  153. in src/wallet/migrate.cpp:407 in ae58ab0a1d outdated
    403+            indexes.push_back(index);
    404+            pos += sizeof(uint16_t);
    405+
    406+            // Go to the offset from the index
    407+            int64_t to_jump = index - pos;
    408+            s.ignore(to_jump);
    


    laanwj commented at 6:33 pm on April 16, 2024:
    Note that the argument to ignore() is unsigned (a size_t), might want to error out here if somehow index < pos. (same for the occurence in InternalPage)

    achow101 commented at 8:43 pm on April 16, 2024:
    Added a check
  154. in src/wallet/migrate.cpp:531 in ae58ab0a1d outdated
    526+{
    527+    // Open the file
    528+    FILE *file = fsbridge::fopen(m_filepath, "rb");
    529+    AutoFile db_file(file);
    530+    if (db_file.IsNull()) {
    531+        db_file.fclose();
    


    laanwj commented at 6:41 pm on April 16, 2024:
    It looks like fclose() does nothing if db_file.IsNull() already holds.

    achow101 commented at 8:43 pm on April 16, 2024:
    Removed
  155. in src/wallet/migrate.cpp:693 in ae58ab0a1d outdated
    681+                } else {
    682+                    m_records.emplace(SerializeData{key.begin(), key.end()}, SerializeData{data.begin(), data.end()});
    683+                    key.clear();
    684+                }
    685+                is_key = !is_key;
    686+            }
    


    laanwj commented at 7:11 pm on April 16, 2024:
    Could add a check that is_key is true at the end. If not, there’s an odd number of values, which would be odd?

    achow101 commented at 8:44 pm on April 16, 2024:
    Good point. I’ve added a check earlier that the number of records is even to avoid this kind of issue.
  156. in src/wallet/migrate.cpp:216 in ae58ab0a1d outdated
    211+    RecordHeader(bool other_endian) : other_endian(other_endian) {}
    212+    RecordHeader() = delete;
    213+
    214+    RecordType GetRealType() const
    215+    {
    216+        return static_cast<RecordType>(static_cast<uint8_t>(type) & ~static_cast<uint8_t>(RecordType::DELETE));
    


    laanwj commented at 7:24 pm on April 16, 2024:

    Instead of casting back and forth and having a GetRealType() it would, imo, be more straightforward to split the delete flag off from the type in Unserialize before casting to RecordType. So to have

    0RecordType type;
    1bool deleted;
    

    Then these accessors could go.


    achow101 commented at 8:44 pm on April 16, 2024:
    Done as suggested.
  157. achow101 force-pushed on Apr 16, 2024
  158. laanwj commented at 8:43 am on April 18, 2024: member

    ACK f4f7eda0bf0e798d305df331c1d94443b81dff22

    i’ve extensively reviewed the code and tested bitcoin-wallet -withinternalbdb dump ... on a fair number of historical wallet database backups, getting the exact same dump as BerkeleyDB, or in one case the “LSNs are not reset” error (because it wasn’t cleanly closed at the time).

    There are a lot of checks and i’m reasonably confident that this won’t silently drop or corrupt any data (in one way safer than BDB: it truly opens read-only), instead fails when it sees something is outside expectations. It’s of course possible some legit wallet will trigger an edge case exception, but that will only become evident after this reaches wider testing. As this doesn’t remove the BerkeleyDB dependency yet, it’s not something to be excessively worried about for this PR.

  159. DrahtBot requested review from TheCharlatan on Apr 18, 2024
  160. in src/wallet/migrate.cpp:572 in f4f7eda0bf outdated
    563+        // The LSN is composed of 2 32-bit ints, the first is a file id, the second an offset
    564+        // It will always be the first 8 bytes of a page, so we deserialize it directly for every page
    565+        uint32_t file;
    566+        uint32_t offset;
    567+        SeekToPage(db_file, i, page_size);
    568+        db_file >> file >> offset;
    


    laanwj commented at 8:46 am on April 21, 2024:

    The check currently always fails for other-endian wallets, needs something like:

     0diff --git a/src/wallet/migrate.cpp b/src/wallet/migrate.cpp
     1index 15478e8bc5bfe39e04b400fa475215e25eef6452..0a41ae199e1950bf372bae90b8a88fccca2b88f0 100644
     2--- a/src/wallet/migrate.cpp
     3+++ b/src/wallet/migrate.cpp
     4@@ -566,6 +566,10 @@ void BerkeleyRODatabase::Open()
     5         uint32_t offset;
     6         SeekToPage(db_file, i, page_size);
     7         db_file >> file >> offset;
     8+        if (outer_meta.other_endian) {
     9+            file = internal_bswap_32(file);
    10+            offset = internal_bswap_32(offset);
    11+        }
    12         if (file != 0 || offset != 1) {
    13             throw std::runtime_error("LSNs are not reset, this database is not completely flushed. Please reopen then close the database with a version that has BDB support");
    14         }
    

    achow101 commented at 8:07 pm on April 22, 2024:
    Good catch, fixed
  161. in src/wallet/migrate.cpp:727 in a0b4c1b7a3 outdated
    34+        LogPrintf("copied %s to %s\n", fs::PathToString(m_filepath), fs::PathToString(dst));
    35+        return true;
    36+    } catch (const fs::filesystem_error& e) {
    37+        LogPrintf("error copying %s to %s - %s\n", fs::PathToString(m_filepath), fs::PathToString(dst), fsbridge::get_filesystem_error_message(e));
    38+        return false;
    39+    }
    


    maflcko commented at 9:03 am on April 21, 2024:
    nit in a0b4c1b7a3bfeaeae0277fe151cd5ef9b31ccf15: Is this used in this pull request? I wonder if it could make sense to return the error string to the caller, assuming this is called by RPC eventually?

    achow101 commented at 8:13 pm on April 22, 2024:

    Is this used in this pull request?

    No, it’s used in the followup.

    I wonder if it could make sense to return the error string to the caller, assuming this is called by RPC eventually?

    Backup is a part of the WalletDatabase parent class, so changing this would also mean changing the Backup of existing database classes, and all of their callers. I think that’s something that would be worth doing in a followup, but not this PR.

  162. achow101 force-pushed on Apr 22, 2024
  163. achow101 commented at 8:11 pm on April 22, 2024: member

    For testing big endian wallets, you can use db_dump wallet.dat | db_load -c db_lorder=4321 wallet_big_endian.dat to make big endian databasese using BDB’s tools.

    However, I also realized that BDB lets us set the endianness to use for a new database, so I’ve added a bdb_swap format to createfromdump that will make a BDB database with the other endianness. This will flip it depending on the system that you use, so on little endian, it will make a big endian wallet, and on big endian, it will make a little endian wallet. Adding this lets us have a test of BerkeleyRO dumping other endian wallets.

    I’ve also added a test for the overflow page issue that was discovered earlier, and another test for the LSN reset check.

  164. achow101 force-pushed on Apr 22, 2024
  165. DrahtBot commented at 9:15 pm on April 22, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24124143992

  166. DrahtBot added the label CI failed on Apr 22, 2024
  167. in src/wallet/init.cpp:92 in dc938e9032 outdated
    86@@ -87,8 +87,9 @@ void WalletInit::AddWalletOptions(ArgsManager& argsman) const
    87     argsman.AddArg("-dblogsize=<n>", strprintf("Flush wallet database activity from memory to disk log every <n> megabytes (default: %u)", DatabaseOptions().max_log_mb), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::WALLET_DEBUG_TEST);
    88     argsman.AddArg("-flushwallet", strprintf("Run a thread to flush wallet periodically (default: %u)", DEFAULT_FLUSHWALLET), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::WALLET_DEBUG_TEST);
    89     argsman.AddArg("-privdb", strprintf("Sets the DB_PRIVATE flag in the wallet db environment (default: %u)", !DatabaseOptions().use_shared_memory), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::WALLET_DEBUG_TEST);
    90+    argsman.AddArg("-swapbdbendian", "Swaps the internal endianness of BDB wallet databases (default: false)", ArgsManager:: ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::WALLET_DEBUG_TEST);
    91 #else
    92-    argsman.AddHiddenArgs({"-dblogsize", "-flushwallet", "-privdb"});
    93+    argsman.AddHiddenArgs({"-dblogsize", "-flushwallet", "-privdb", "swapbdbendian"});
    


    laanwj commented at 2:05 pm on April 23, 2024:

    CI linter complains, apparently the option needs to be added in another place:

    0AssertionError: Please add {'-swapbdbendian'} to the hidden args in DummyWalletInit::AddWalletOptions
    

    Edit: maybe forgotten -?


    achow101 commented at 3:41 pm on April 23, 2024:
    Oops fixed.
  168. achow101 force-pushed on Apr 23, 2024
  169. achow101 force-pushed on Apr 23, 2024
  170. DrahtBot removed the label CI failed on Apr 23, 2024
  171. achow101 force-pushed on Apr 25, 2024
  172. achow101 force-pushed on Apr 25, 2024
  173. achow101 force-pushed on Apr 25, 2024
  174. achow101 force-pushed on Apr 25, 2024
  175. DrahtBot added the label CI failed on Apr 25, 2024
  176. DrahtBot commented at 11:16 pm on April 25, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24275967354

  177. achow101 force-pushed on Apr 25, 2024
  178. achow101 force-pushed on Apr 26, 2024
  179. DrahtBot removed the label CI failed on Apr 26, 2024
  180. laanwj commented at 8:54 am on April 28, 2024: member
    New tests LGTM Re-ACK a0943b812ef38826a4ee2732af5f24e470281117
  181. DrahtBot added the label CI failed on May 2, 2024
  182. furszy commented at 9:24 pm on May 6, 2024: member
    Going slowly here, I want to go through the bdb code diffs too. But, so far, so good. Benchmark 8f2267ff24acdc shows no diff with master and also have created and migrated a v0.6.3 wallet without any problem.
  183. josibake commented at 8:49 am on May 9, 2024: member

    Concept ACK

    Mostly focusing my review on https://github.com/bitcoin/bitcoin/pull/26606/commits/5f0cddb4f66f2a174babdfb17bc31a3011e5cc20. Regarding testing, it occurred to me that we had some large / interesting wallet files from when we were doing wallet simulations for coin selection. @murchandamus @achow101 any chance those wallets were BDB and if so, would be useful to test here?

  184. in src/wallet/bdb.cpp:304 in 9dd2b9aa22 outdated
    299@@ -300,7 +300,11 @@ static Span<const std::byte> SpanFromDbt(const SafeDbt& dbt)
    300 }
    301 
    302 BerkeleyDatabase::BerkeleyDatabase(std::shared_ptr<BerkeleyEnvironment> env, fs::path filename, const DatabaseOptions& options) :
    


    TheCharlatan commented at 9:00 am on May 12, 2024:
    In commit 9dd2b9aa228f5d30b3620840d893b5a50daafe49: Nit: Can you also say why this database type is generated in the commit message? Something like “This is added to make testing other endian databases easier.”.

    achow101 commented at 3:17 am on May 14, 2024:
    Done
  185. in src/wallet/migrate.cpp:567 in 77765c4703 outdated
    563@@ -564,6 +564,24 @@ void BerkeleyRODatabase::Open()
    564         throw std::runtime_error("BDB builtin encryption is not supported");
    565     }
    566 
    567+    // Check all LSNs point to file 0 and offset 1 which indicates that the LSNs were
    


    TheCharlatan commented at 10:13 am on May 12, 2024:
    Nit: Would be nice to say what LSN means, e.g. “Check all LSNs (log sequence numbers)…”

    achow101 commented at 3:17 am on May 14, 2024:
    Done
  186. TheCharlatan approved
  187. TheCharlatan commented at 10:49 am on May 12, 2024: contributor

    Re-ACK a0943b812ef38826a4ee2732af5f24e470281117

    The test improvements look good and give some more confidence.

    Can you run the commits with newly added files through contrib/devtools/clang-format-diff? There are a bunch of simple whitespace issues.

    It would also be nice to mention the byte-swapped database type and that it is only used for testing purposes until the final deprecation in the pull request description.

  188. DrahtBot requested review from josibake on May 12, 2024
  189. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:36 in 1009235087 outdated
    31+    g_setup = testing_setup.get();
    32+}
    33+
    34+FUZZ_TARGET(wallet_bdb_parser, .init = initialize_wallet_bdb_parser)
    35+{
    36+    FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    


    furszy commented at 0:50 am on May 13, 2024:
    nit: unused

    achow101 commented at 6:19 pm on May 14, 2024:
    Done
  190. DrahtBot removed the label CI failed on May 13, 2024
  191. in src/wallet/migrate.cpp:490 in 5f0cddb4f6 outdated
    485+    int64_t size = db_file.tell();
    486+    /* BDB doesn't actually error on this
    487+    if (size % page_size != 0) {
    488+        throw std::runtime_error("File size is not a multiple of page size");
    489+    }
    490+    */
    


    josibake commented at 9:12 am on May 13, 2024:

    in https://github.com/bitcoin/bitcoin/pull/26606/commits/5f0cddb4f66f2a174babdfb17bc31a3011e5cc20 (“wallet: implement independent BDB deserializer in BerkeleyRODatabase”):

    nit: I somewhat know whats going here on based on previous discussions about this PR, but without that prior context this being commented out is pretty confusing and I don’t think the comment above provides enough explanation. Perhaps add more context in the comment or just remove this snippet entirely?


    achow101 commented at 3:17 am on May 14, 2024:
    Expanded the comment.
  192. in src/wallet/migrate.cpp:781 in e643cac230 outdated
    760+        status = DatabaseStatus::SUCCESS;
    761+        return db;
    762+    } catch (const std::runtime_error& e) {
    763+        error.original = e.what();
    764+        status = DatabaseStatus::FAILED_LOAD;
    765+        return nullptr;
    


    josibake commented at 9:21 am on May 13, 2024:

    in https://github.com/bitcoin/bitcoin/pull/26606/commits/e643cac230fc2ba059847bd4e196bddfd97b1e8e (“Add MakeBerkeleyRODatabase”):

    Seems like we should have some sort of call to action here prompting the user to open an issue / reach out on IRC if this fails? If we fail to open a bdb database here, the database itself could be fine and it instead might be an issue with our parser.


    achow101 commented at 3:17 am on May 14, 2024:
    I don’t think that’s really necessary? I would expect that users would complain if a wallet suddenly was not able to be opened.
  193. josibake approved
  194. josibake commented at 9:25 am on May 13, 2024: member

    ACK https://github.com/bitcoin/bitcoin/pull/26606/commits/a0943b812ef38826a4ee2732af5f24e470281117

    Spent some time comparing this to the bdb header files and LGTM. Test coverage also looks good. Left a few nits regarding comments. Overall, I do think we should prompt users to file an issue specifically for this if loading a database fails, considering it might be nothing wrong with their database and instead an edge case being hit in the parser.

  195. wallet: add dummy BerkeleyRODatabase and BerkeleyROBatch classes
    BerkeleyRODatabase and BerkeleyROBatch will be used to access a BDB file
    without the use of BDB. For now, these are dummy classes.
    756ff9b478
  196. wallet: implement BerkeleyROBatch
    Implement ReadKey and HasKey of BerkeleyROBatch, and Next of BerkeleyROCursor.
    Also adds the containers for records to BerkeleyRODatabase so that
    BerkeleyROBatch will be able to access the records.
    0c8e728476
  197. wallet: implement BerkeleyRODatabase::Backup ecba230979
  198. achow101 force-pushed on May 14, 2024
  199. DrahtBot commented at 5:22 am on May 14, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24929953857

  200. DrahtBot added the label CI failed on May 14, 2024
  201. in src/wallet/bdb.cpp:397 in abbbb1a38d outdated
    392@@ -389,6 +393,10 @@ void BerkeleyDatabase::Open()
    393                 }
    394             }
    395 
    396+            if (m_byteswap) {
    397+                pdb_temp->set_lorder(std::endian::native == std::endian::little ? 4321 : 1234);
    


    maflcko commented at 6:38 am on May 14, 2024:

    To fix the warning, maybe this can be pulled into a constexpr context? Otherwise, unreachable-code can be disabled.

    0wallet/bdb.cpp:397:90: error: code will never be executed [-Werror,-Wunreachable-code]
    1  397 |                 pdb_temp->set_lorder(std::endian::native == std::endian::little ? 4321 : 1234);
    2      |                                                                                          ^~~~
    3wallet/bdb.cpp:533:97: error: code will never be executed [-Werror,-Wunreachable-code]
    4  533 |                         pdbCopy->set_lorder(std::endian::native == std::endian::little ? 4321 : 1234);
    5      |                                                                                                 ^~~~
    62 errors generated.
    
    0constexpr auto lorder{std::endian::native == std::endian::little ? 4321 : 1234};
    1            if (m_byteswap) {
    2                pdb_temp->set_lorder(lorder);
    

    achow101 commented at 6:18 pm on May 14, 2024:
    Done

    josibake commented at 9:37 am on May 15, 2024:
    Looks like this warning is still failing the CI, perhaps just disabling the unreachable-code warning here is the simplest?
  202. furszy commented at 5:58 pm on May 14, 2024: member
    Despite the CI failures, light ACK abbbb1a38d28bf365fb0f219fc60993e48627631.
  203. DrahtBot requested review from laanwj on May 14, 2024
  204. DrahtBot requested review from TheCharlatan on May 14, 2024
  205. DrahtBot requested review from josibake on May 14, 2024
  206. achow101 force-pushed on May 14, 2024
  207. in src/wallet/bdb.cpp:397 in 143db620d3 outdated
    392@@ -389,6 +393,11 @@ void BerkeleyDatabase::Open()
    393                 }
    394             }
    395 
    396+            if (m_byteswap) {
    397+                constexpr auto lorder{std::endian::native == std::endian::little ? 4321 : 1234};
    


    maflcko commented at 9:47 am on May 15, 2024:

    Ok, it is a bit stupid to fight this compile warning, but apparently moving lorder out of the function scope fixes it :shrug:

    It only happens with -stdlib=libc++.

    See https://godbolt.org/z/vjEM7zjzb


    achow101 commented at 6:49 pm on May 15, 2024:
    I think it’s fixed this time.
  208. achow101 force-pushed on May 15, 2024
  209. DrahtBot removed the label CI failed on May 16, 2024
  210. DrahtBot requested review from furszy on May 16, 2024
  211. josibake approved
  212. TheCharlatan approved
  213. TheCharlatan commented at 10:32 am on May 16, 2024: contributor
    Re-ACK c0e0a0affea3e3cee9a8910de4bae55a9bdd24cf
  214. in src/wallet/migrate.cpp:555 in 0481e79a5b outdated
    550+    std::vector<uint32_t> pages{inner_meta.root};
    551+    while (pages.size() > 0) {
    552+        uint32_t curr_page = pages.back();
    553+        if (curr_page > inner_meta.last_page) {
    554+            std::runtime_error("Page number is greater than subdatabase last page");
    555+        }
    


    theStack commented at 11:07 am on May 16, 2024:

    With the databases I tested with, this branch is always executed, as inner_meta.last_page is 2, and each processed page curr_page is >= 3. The reason that we don’t notice is that there is a throw missing here. I assume we should do this instead:

    0        if (curr_page > outer_meta.last_page) {
    1            throw std::runtime_error("Page number is greater than database last page");
    2        }
    

    achow101 commented at 7:03 pm on May 16, 2024:
    Good catch. It looks like BDB doesn’t actually set last_pgno to the correct value, so this check is too strict. I’ve commented it out entirely.
  215. DrahtBot requested review from theStack on May 16, 2024
  216. theStack referenced this in commit 776674076a on May 16, 2024
  217. wallet: implement independent BDB deserializer in BerkeleyRODatabase
    BerkeleyRODatabase is intended for use after BDB is removed, so it needs
    to be able to read all of the records from a BDB file. Thus an
    independent deserializer for BDB data files is implemented in it. This
    deserializer is targeted towards the data files that Bitcoin Core
    creates so it does not fully support all of BDB's features (e.g. other
    database types, encryption, etc.).
    cdd61c9cc1
  218. Implement handling of other endianness in BerkeleyRODatabase 6e50bee67d
  219. Add MakeBerkeleyRODatabase
    Implements MakeBerkeleyRODatabase and adds DatabaseFormat::BERKELEY_RO
    so that MakeDatabase can use BerkeleyRO as the backend database.
    dd57713f6e
  220. wallettool: Optionally use BERKELEY_RO as format when dumping BDB wallets
    In order to ease the transition to not having BDB, make the dump tool
    use DatabaseFormmat::BERKELEY_RO when -withinternalbdb is set.
    70cfbfdadf
  221. tests: Add BerkeleyRO to db prefix tests 3568dce9e9
  222. Berkeley RO Database fuzz test 4d7a3ae78e
  223. Error if LSNs are not reset d9878903fb
  224. bdb: Be able to make byteswapped databases
    Byteswapped databases make it easier to test opening and deserializing
    other endian databases.
    6ace3e953f
  225. test: Test dumps of other endian BDB files fd7b16e391
  226. test: Test dumping dbs with overflow pages c1984f1282
  227. test: Test bdb_ro dump of wallet without reset LSNs 0b753156ce
  228. wallet, test: Be able to always swap BDB endianness d51fbab4b3
  229. achow101 force-pushed on May 16, 2024
  230. TheCharlatan approved
  231. TheCharlatan commented at 8:41 pm on May 16, 2024: contributor
    Re-ACK d51fbab4b32d56765e8faab6ad01245fb259b0ca
  232. DrahtBot requested review from josibake on May 16, 2024
  233. hebasto added the label Needs CMake port on May 17, 2024
  234. furszy commented at 7:53 pm on May 17, 2024: member
    reACK d51fbab4b32d56765e8faab6ad01245fb259b0ca
  235. laanwj approved
  236. laanwj commented at 8:05 pm on May 17, 2024: member
    re-ACK d51fbab4b32d56765e8faab6ad01245fb259b0ca
  237. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:82 in 4d7a3ae78e outdated
    77+            error.original == "Unexpected page size" ||
    78+            error.original == "Unexpected page type" ||
    79+            error.original == "Page number mismatch" ||
    80+            error.original == "Bad btree level" ||
    81+            error.original == "Bad page size" ||
    82+            error.original == "File size is not a multiple of page size" ||
    


    theStack commented at 9:12 pm on May 17, 2024:

    nit: isn’t thrown in the parser (commented out), hence can be removed here

  238. in src/wallet/test/fuzz/wallet_bdb_parser.cpp:74 in 4d7a3ae78e outdated
    69+            error.original == "Unexpected database flags, should only be 0x20 (subdatabases)" ||
    70+            error.original == "Unexpected outer database root page type" ||
    71+            error.original == "Unexpected number of entries in outer database root page" ||
    72+            error.original == "Subdatabase has an unexpected name" ||
    73+            error.original == "Subdatabase page number has unexpected length" ||
    74+            error.original == "Unexpected inner database page type" ||
    


    theStack commented at 9:14 pm on May 17, 2024:

    nit: isn’t thrown anywhere, can be removed

  239. in test/functional/tool_wallet.py:27 in d51fbab4b3
    23@@ -24,11 +24,14 @@ class ToolWalletTest(BitcoinTestFramework):
    24     def add_options(self, parser):
    25         self.add_wallet_options(parser)
    26         parser.add_argument("--bdbro", action="store_true", help="Use the BerkeleyRO internal parser when dumping a Berkeley DB wallet file")
    27+        parser.add_argument("--swap-bdb-endian", action="store_true",help="When making Legacy BDB wallets, always make then byte swapped internally")
    


    theStack commented at 10:15 pm on May 17, 2024:

    typo nit

    0        parser.add_argument("--swap-bdb-endian", action="store_true",help="When making Legacy BDB wallets, always make them byte swapped internally")
    
  240. theStack approved
  241. theStack commented at 10:25 pm on May 17, 2024: contributor

    ACK d51fbab4b32d56765e8faab6ad01245fb259b0ca

    Convinced myself that this parser works reliably partly by looking at the original code, partly by extending the test framework’s BDB parser to match this PR’s logic (see #30125), which helped a lot in understanding the structure. Also tested with a few random signet legacy wallets that I still had around, though those were all quite small.

    Btw, I couldn’t manage to find or create a BDB wallet where the DELETE flag is set in a record header (would be nice to exercise that in a test, though it’s hard to verify that it’s really set without inspecting manually, or e.g. adding debug messages on deserialization), but the code to handle that is simple enough and looks correct.

  242. fanquake merged this on May 21, 2024
  243. fanquake closed this on May 21, 2024

  244. maflcko commented at 9:29 am on May 21, 2024: member
    I wonder if it makes sense to collect the fuzz inputs to qa-assets, from testers that still have them. However, the fuzz CI does not compile with BDB, so that’ll probably need to be adjusted first.
  245. theStack referenced this in commit 648236f793 on May 21, 2024
  246. hebasto commented at 11:07 am on June 5, 2024: member
    Ported to the CMake-based build system in https://github.com/hebasto/bitcoin/pull/220.
  247. hebasto removed the label Needs CMake port on Jun 5, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me