test: improve BDB parser (handle internal/overflow pages, support all page sizes) #30125

pull theStack wants to merge 2 commits into bitcoin:master from theStack:complete_bdb-ro_python_parser2 changing 2 files +144 −39
  1. theStack commented at 5:48 pm on May 16, 2024: contributor

    This PR adds missing features to our test framework’s BDB parser with the goal of hopefully being able to read all legacy wallets that are created with current and past versions of Bitcoin Core. This could be useful both for making review of #26606 easier and to also possibly improve our functional tests for the wallet BDB-ro parser by additionally validating it with an alternative implementation. The second commits introduces a test that create a legacy wallet with huge label strings (in order to create overflow pages, i.e. pages needed for key/value data than is larger than the page size) and compares the dump outputs of wallet tool and the extended test framework BDB parser. It can be exercised via $ ./test/functional/tool_wallet.py --legacy. BDB support has to be compiled in (obviously).

    For some manual tests regarding different page sizes, the following patch can be used:

     0diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
     1index 38cca32f80..1bf39323d3 100644
     2--- a/src/wallet/bdb.cpp
     3+++ b/src/wallet/bdb.cpp
     4@@ -395,6 +395,7 @@ void BerkeleyDatabase::Open()
     5                             DB_BTREE,                                 // Database type
     6                             nFlags,                                   // Flags
     7                             0);
     8+            pdb_temp->set_pagesize(1<<9); /* valid BDB pagesizes are from 1<<9 (=512) to <<16 (=65536) */
     9 
    10             if (ret != 0) {
    11                 throw std::runtime_error(strprintf("BerkeleyDatabase: Error %d, can't open database %s", ret, strFile));
    

    I verified that the newly introduced test passes with all valid page sizes between 512 and 65536.

  2. DrahtBot commented at 5:48 pm on May 16, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30125.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK achow101
    Concept ACK laanwj, BrandonOdiwuor

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31375 (multiprocess: Add bitcoin wrapper executable by ryanofsky)
    • #31250 (wallet: Disable creating and loading legacy wallets by achow101)
    • #28710 (Remove the legacy wallet and BDB dependency by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on May 16, 2024
  4. in test/functional/test_framework/bdb.py:17 in d166e8886c outdated
    15-Each key-value pair is two entries in a btree leaf. The first is the key, the one that follows
    16+Each key-value pair is two entries in a btree leaf, which optionally refers to overflow pages
    17+if the data doesn't fit into a single page. The first entry is the key, the one that follows
    18 is the value. And so on. Note that the entry data is itself not in the correct order. Instead
    19 entry offsets are stored in the correct order and those offsets are needed to then retrieve
    20 the data itself.
    


    laanwj commented at 7:30 pm on May 16, 2024:
    Might want to note that the implemenation handles only bdb files with the same endian as the host.

    theStack commented at 5:18 pm on May 21, 2024:
    Good idea, done.
  5. in test/functional/test_framework/bdb.py:82 in d166e8886c outdated
    86-        entry['len'] = e_len
    87-        entry['pg_type'] = pg_type
    88-        entry['data'] = data[offset + 3:offset + 3 + e_len]
    89+        record_header = data[offset:offset + 3]
    90+        offset += 3
    91+        e_len, record_type = struct.unpack('HB', record_header)
    


    laanwj commented at 7:31 pm on May 16, 2024:
    Forgot = Edit: doesn’t matter, it only affects alignment versus the default of @, and a byte after a short will always have the same alignment
  6. laanwj commented at 7:37 pm on May 16, 2024: member
    Code review ACK, from what i learned about how BDB databases work internally, implementation looks correct on first glance.
  7. laanwj added the label Wallet on May 17, 2024
  8. DrahtBot added the label Needs rebase on May 21, 2024
  9. theStack force-pushed on May 21, 2024
  10. theStack commented at 5:19 pm on May 21, 2024: contributor
    Rebased on master (necessary since #26606 was merged and also touched the same functional test file).
  11. DrahtBot removed the label Needs rebase on May 21, 2024
  12. DrahtBot added the label CI failed on May 21, 2024
  13. theStack force-pushed on May 22, 2024
  14. DrahtBot removed the label CI failed on May 22, 2024
  15. BrandonOdiwuor commented at 11:36 am on May 29, 2024: contributor
    Concept ACK
  16. in test/functional/test_framework/bdb.py:82 in 648236f793 outdated
    87-        entry['len'] = e_len
    88-        entry['pg_type'] = pg_type
    89-        entry['data'] = data[offset + 3:offset + 3 + e_len]
    90+        record_header = data[offset:offset + 3]
    91+        offset += 3
    92+        e_len, record_type = struct.unpack('HB', record_header)
    


    rkrux commented at 9:30 am on May 30, 2024:
    There’s a separate effort from @maflcko regarding removing the usage of struct.pack from all places in test. Though here it’s struct.unpack but I guess the reasoning for the removal of struct.pack is still applicable. https://github.com/bitcoin/bitcoin/pull/29401

    maflcko commented at 12:57 pm on May 30, 2024:
    There is a small benefit to struct.pack being less code when more than one item is packed or unpacked. Seems fine to use either in this case, up to the author. However, in most other cases where only a single item is (un)packed, the modern alternatives are less code, and clearer.

    theStack commented at 12:13 pm on June 2, 2024:
    The main reason for using struct.unpack in this PR is to be consistent with the already existing code in the bdb module (i.e. avoiding to mix up different serialization methods), and I didn’t want to introduce large refactoring changes in this PR. I strongly agree though that int.{to,from}_bytes is preferred and support #29401.
  17. in test/functional/test_framework/bdb.py:175 in 648236f793 outdated
    177-            pages.append(data)
    178-            data = f.read(PAGESIZE)
    179+        # Determine pagesize first
    180+        data = f.read(PAGE_HEADER_SIZE)
    181+        pagesize = struct.unpack('I', data[20:24])[0]
    182+        assert pagesize in (512, 1024, 2048, 4096, 8192, 16384, 32768, 65536)
    


    rkrux commented at 9:35 am on May 30, 2024:
    WDYT about creating a constant tuple for these values? They seem pretty generic enough.

    laanwj commented at 10:11 pm on November 6, 2024:

    WDYT about creating a constant tuple for these values? They seem pretty generic enough.

    are they needed anywhere else? in #31222 i suggested

    0MIN_PAGESIZE = 512
    1MAX_PAGESIZE = 65536
    2
    3assert pagesize >= MIN_PAGESIZE and pagesize <= MAX_PAGESIZE and (pagesize & (pagesize-1) == 0)
    

    but honestly, enumerating all 8 possible values here seems fine


    rkrux commented at 12:35 pm on November 8, 2024:
    Yeah now that I think about my comment again, I feel creating a constant is not necessary. Instead, I like your suggestion more specially with that bit hack. If other usages of is-power-of-two are anticipated in the functional tests, then probably a function can be added like so: https://stackoverflow.com/a/600306. Not necessary in this PR though.
  18. rkrux commented at 12:12 pm on May 30, 2024: none
    Incomplete review and untested yet, left couple comments off the top of my head atm. Will do another round thoroughly in the next few days.
  19. achow101 requested review from achow101 on Oct 15, 2024
  20. in test/functional/tool_wallet.py:568 in ad0a8ffcea outdated
    563+
    564+        wallet_dumpfile = self.nodes[0].datadir_path / "bdb_ro_test.dump"
    565+        args = ["-wallet={}".format(wallet_name), "-dumpfile={}".format(wallet_dumpfile), "dump"]
    566+        p = self.bitcoin_wallet_process(*args)
    567+        _, _ = p.communicate()
    568+        assert_equal(p.poll(), 0)
    


    achow101 commented at 10:25 pm on October 23, 2024:

    In ad0a8ffceacfd5ce37826b2882b0e51a14c5a6bf “test: compare BDB dumps of test framework parser and wallet tool”

    Could use self.assert_tool_output, see how the other calls to dump are done in this test.


    theStack commented at 1:58 pm on October 24, 2024:
    Done.
  21. in test/functional/tool_wallet.py:580 in ad0a8ffcea outdated
    572+        del expected_dump['BITCOIN_CORE_WALLET_DUMP']
    573+        del expected_dump['format']
    574+        del expected_dump['checksum']
    575+        bdb_ro_parser_dump_raw = dump_bdb_kv(self.nodes[0].wallets_path / wallet_name / "wallet.dat")
    576+        bdb_ro_parser_dump = OrderedDict()
    577+        assert any([len(bytes.fromhex(value)) >= 150000 for value in expected_dump.values()])
    


    achow101 commented at 10:31 pm on October 23, 2024:

    In ad0a8ffceacfd5ce37826b2882b0e51a14c5a6bf “test: compare BDB dumps of test framework parser and wallet tool”

    Since we generate the large labels, this could also check that all show up with the correct address.


    theStack commented at 1:58 pm on October 24, 2024:
    Done. The whole dump is already compared a few lines below, but I guess it can’t hurt to do the detailed address->label entry checks in addition.
  22. in test/functional/tool_wallet.py:557 in ad0a8ffcea outdated
    548@@ -545,6 +549,36 @@ def test_dump_unclean_lsns(self):
    549         self.stop_node(0)
    550         self.assert_tool_output("The dumpfile may contain private keys. To ensure the safety of your Bitcoin, do not share the dumpfile.\n", "-wallet=unclean_lsn", f"-dumpfile={wallet_dump}", "dump")
    551 
    552+    def test_compare_legacy_dump_with_framework_bdb_parser(self):
    553+        self.log.info("Verify that legacy wallet database dump matches the one from the test framework's BDB parser")
    554+        wallet_name = "bdb_ro_test"
    555+        self.start_node(0)
    556+        # add some really large labels (above twice the largest valid page size) to create BDB overflow pages
    


    achow101 commented at 10:33 pm on October 23, 2024:

    In ad0a8ffceacfd5ce37826b2882b0e51a14c5a6bf “test: compare BDB dumps of test framework parser and wallet tool”

    In addition to testing overflow pages, we can also test internal pages by generating a bunch of keys with keypoolrefill.


    theStack commented at 1:58 pm on October 24, 2024:
    Done.
  23. test: complete BDB parser (handle internal/overflow pages, support all page sizes)
    This aims to complete our test framework BDB parser to reflect
    our read-only BDB parser in the wallet codebase. This could be
    useful both for making review of #26606 easier and to also possibly
    improve our functional tests for the BDB parser by comparing with
    an alternative implementation.
    01ddd9f646
  24. test: compare BDB dumps of test framework parser and wallet tool d45eb3964f
  25. theStack force-pushed on Oct 24, 2024
  26. theStack commented at 1:59 pm on October 24, 2024: contributor
    Thanks for the review! Rebased on master (as I suspect CI would be very unhappy otherwise) and addressed comments #30125 (review), #30125 (review) and #30125 (review)
  27. achow101 commented at 9:13 pm on November 6, 2024: member
    ACK d45eb3964f693eddcf96f1e4083cf19d327be989
  28. DrahtBot requested review from BrandonOdiwuor on Nov 6, 2024
  29. DrahtBot requested review from laanwj on Nov 6, 2024
  30. maflcko added this to the milestone 29.0 on Nov 11, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me