test: improve BDB parser (handle internal/overflow pages, support all page sizes) #30125

pull theStack wants to merge 2 commits into bitcoin:master from theStack:complete_bdb-ro_python_parser2 changing 2 files +135 −39
  1. theStack commented at 5:48 pm on May 16, 2024: contributor

    This PR adds missing features to our test framework’s BDB parser with the goal of hopefully being able to read all legacy wallets that are created with current and past versions of Bitcoin Core. This could be useful both for making review of #26606 easier and to also possibly improve our functional tests for the wallet BDB-ro parser by additionally validating it with an alternative implementation. The second commits introduces a test that create a legacy wallet with huge label strings (in order to create overflow pages, i.e. pages needed for key/value data than is larger than the page size) and compares the dump outputs of wallet tool and the extended test framework BDB parser. It can be exercised via $ ./test/functional/tool_wallet.py --legacy. BDB support has to be compiled in (obviously).

    For some manual tests regarding different page sizes, the following patch can be used:

     0diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
     1index 38cca32f80..1bf39323d3 100644
     2--- a/src/wallet/bdb.cpp
     3+++ b/src/wallet/bdb.cpp
     4@@ -395,6 +395,7 @@ void BerkeleyDatabase::Open()
     5                             DB_BTREE,                                 // Database type
     6                             nFlags,                                   // Flags
     7                             0);
     8+            pdb_temp->set_pagesize(1<<9); /* valid BDB pagesizes are from 1<<9 (=512) to <<16 (=65536) */
     9 
    10             if (ret != 0) {
    11                 throw std::runtime_error(strprintf("BerkeleyDatabase: Error %d, can't open database %s", ret, strFile));
    

    I verified that the newly introduced test passes with all valid page sizes between 512 and 65536.

  2. DrahtBot commented at 5:48 pm on May 16, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK laanwj, BrandonOdiwuor

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28710 (Remove the legacy wallet and BDB dependency by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on May 16, 2024
  4. in test/functional/test_framework/bdb.py:17 in d166e8886c outdated
    15-Each key-value pair is two entries in a btree leaf. The first is the key, the one that follows
    16+Each key-value pair is two entries in a btree leaf, which optionally refers to overflow pages
    17+if the data doesn't fit into a single page. The first entry is the key, the one that follows
    18 is the value. And so on. Note that the entry data is itself not in the correct order. Instead
    19 entry offsets are stored in the correct order and those offsets are needed to then retrieve
    20 the data itself.
    


    laanwj commented at 7:30 pm on May 16, 2024:
    Might want to note that the implemenation handles only bdb files with the same endian as the host.

    theStack commented at 5:18 pm on May 21, 2024:
    Good idea, done.
  5. in test/functional/test_framework/bdb.py:82 in d166e8886c outdated
    86-        entry['len'] = e_len
    87-        entry['pg_type'] = pg_type
    88-        entry['data'] = data[offset + 3:offset + 3 + e_len]
    89+        record_header = data[offset:offset + 3]
    90+        offset += 3
    91+        e_len, record_type = struct.unpack('HB', record_header)
    


    laanwj commented at 7:31 pm on May 16, 2024:
    Forgot = Edit: doesn’t matter, it only affects alignment versus the default of @, and a byte after a short will always have the same alignment
  6. laanwj commented at 7:37 pm on May 16, 2024: member
    Code review ACK, from what i learned about how BDB databases work internally, implementation looks correct on first glance.
  7. laanwj added the label Wallet on May 17, 2024
  8. DrahtBot added the label Needs rebase on May 21, 2024
  9. test: complete BDB parser (handle internal/overflow pages, support all page sizes)
    This aims to complete our test framework BDB parser to reflect
    our read-only BDB parser in the wallet codebase. This could be
    useful both for making review of #26606 easier and to also possibly
    improve our functional tests for the BDB parser by comparing with
    an alternative implementation.
    648236f793
  10. theStack force-pushed on May 21, 2024
  11. theStack commented at 5:19 pm on May 21, 2024: contributor
    Rebased on master (necessary since #26606 was merged and also touched the same functional test file).
  12. DrahtBot removed the label Needs rebase on May 21, 2024
  13. DrahtBot added the label CI failed on May 21, 2024
  14. test: compare BDB dumps of test framework parser and wallet tool ad0a8ffcea
  15. theStack force-pushed on May 22, 2024
  16. DrahtBot removed the label CI failed on May 22, 2024
  17. BrandonOdiwuor commented at 11:36 am on May 29, 2024: contributor
    Concept ACK
  18. in test/functional/test_framework/bdb.py:82 in 648236f793 outdated
    87-        entry['len'] = e_len
    88-        entry['pg_type'] = pg_type
    89-        entry['data'] = data[offset + 3:offset + 3 + e_len]
    90+        record_header = data[offset:offset + 3]
    91+        offset += 3
    92+        e_len, record_type = struct.unpack('HB', record_header)
    


    rkrux commented at 9:30 am on May 30, 2024:
    There’s a separate effort from @maflcko regarding removing the usage of struct.pack from all places in test. Though here it’s struct.unpack but I guess the reasoning for the removal of struct.pack is still applicable. https://github.com/bitcoin/bitcoin/pull/29401

    maflcko commented at 12:57 pm on May 30, 2024:
    There is a small benefit to struct.pack being less code when more than one item is packed or unpacked. Seems fine to use either in this case, up to the author. However, in most other cases where only a single item is (un)packed, the modern alternatives are less code, and clearer.

    theStack commented at 12:13 pm on June 2, 2024:
    The main reason for using struct.unpack in this PR is to be consistent with the already existing code in the bdb module (i.e. avoiding to mix up different serialization methods), and I didn’t want to introduce large refactoring changes in this PR. I strongly agree though that int.{to,from}_bytes is preferred and support #29401.
  19. in test/functional/test_framework/bdb.py:175 in 648236f793 outdated
    177-            pages.append(data)
    178-            data = f.read(PAGESIZE)
    179+        # Determine pagesize first
    180+        data = f.read(PAGE_HEADER_SIZE)
    181+        pagesize = struct.unpack('I', data[20:24])[0]
    182+        assert pagesize in (512, 1024, 2048, 4096, 8192, 16384, 32768, 65536)
    


    rkrux commented at 9:35 am on May 30, 2024:
    WDYT about creating a constant tuple for these values? They seem pretty generic enough.
  20. rkrux commented at 12:12 pm on May 30, 2024: none
    Incomplete review and untested yet, left couple comments off the top of my head atm. Will do another round thoroughly in the next few days.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me