coins: replace manual CDBBatch size estimation with LevelDB’s native ApproximateSize #32185

pull l0rinc wants to merge 3 commits into bitcoin:master from l0rinc:l0rinc/cdbatch-size-estimation changing 3 files +18 −24
  1. l0rinc commented at 2:06 pm on April 1, 2025: contributor

    Summary

    The manual batch size estimation of CDBBatch serialized size was added when LevelDB didn’t expose this functionality yet. The PR refactors the logic to use the native leveldb::WriteBatch::ApproximateSize() function, structured in 3 focused commits to incrementally replace the old behavior safely.

    Context

    The previous manual size calculation initialized the estimate to 0, instead of LevelDB’s header size (containing an 8-byte sequence number followed by a 4-byte count). This PR corrects that and transitions to the now-available native LevelDB function for improved accuracy and maintainability.

    Approach

    The fix and refactor follow a strangle pattern over three commits:

    • correct the initialization bug in the existing manual calculation, isolating the fix and ensuring the subsequent assertions use the corrected logic;
    • introduce the native ApproximateSize() method alongside the corrected manual one, adding assertions to verify their equivalence at runtime;
    • remove the verified manual calculation logic and assertions, leaving only the native method.
  2. DrahtBot commented at 2:06 pm on April 1, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32185.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa, TheCharlatan, laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31668 (Added rescan option for import descriptors by saikiran57)
    • #30673 (coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries by andrewtoth)
    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on Apr 1, 2025
  4. l0rinc force-pushed on Apr 1, 2025
  5. in src/dbwrapper.cpp:171 in 10e88dcd62 outdated
    168 
    169 void CDBBatch::Clear()
    170 {
    171     m_impl_batch->batch.Clear();
    172-    size_estimate = 0;
    173+    assert(m_impl_batch->batch.ApproximateSize() == kHeader);
    


    laanwj commented at 10:26 am on April 7, 2025:
    i don’t think it’s necessary to leave this assertion in the final code. (maybe i’m misunderstanding something but it doesn’t seem critical to functionality to assert this?)

    l0rinc commented at 11:38 am on April 7, 2025:
  6. laanwj commented at 10:26 am on April 7, 2025: member
    Concept ACK, leveldb provides internal functionality for this it’s better to use that instead of maintaining our own estimate.
  7. Coins: Add `kHeader` to `CDBBatch::size_estimate`
    The initialization of the manual `size_estimate` in `CDBBatch::Clear()` is corrected from `0` to `kHeader` (LevelDB's fixed batch header size).
    This aligns the manual estimate with LevelDB's actual size immediately after clearing, fixing discrepancies that would otherwise be caught by tests in the next commit (e.g., `coins_tests`, `validation_chainstatemanager_tests`).
    751077c6e2
  8. refactor: Delegate to LevelDB for CDBBatch size estimation
    Serialized batch size can be queried via the underlying LevelDB implementation calling the native `leveldb::WriteBatch::ApproximateSize()`.
    
    The previous manual calculation was added in https://github.com/bitcoin/bitcoin/pull/10195/commits/e66dbde6d14cb5f253b3bf8850a98f4fda2d9f49 as part of https://github.com/bitcoin/bitcoin/pull/10195. At that time (April 2017), the version of LevelDB used by Bitcoin Core (and even the latest source) lacked a native function for this. LevelDB added this capability in https://github.com/google/leveldb/commit/69e2bd224b7f11e021527cb95bab18f1ee6e1b3b, merged later that year.
    
    The old manual estimation method (`SizeEstimate()`) is kept temporarily in this commit, and assertions are added in `txdb.cpp` to verify its results against `ApproximateSize()` during batch writes. This ensures the native function behaves as expected before removing the manual calculation in the subsequent commit.
    8b5e19d8b5
  9. l0rinc force-pushed on Apr 7, 2025
  10. l0rinc force-pushed on Apr 7, 2025
  11. l0rinc requested review from laanwj on Apr 7, 2025
  12. in src/dbwrapper.cpp:163 in aeda60c600 outdated
    157@@ -158,14 +158,16 @@ struct CDBBatch::WriteBatchImpl {
    158 
    159 CDBBatch::CDBBatch(const CDBWrapper& _parent)
    160     : parent{_parent},
    161-      m_impl_batch{std::make_unique<CDBBatch::WriteBatchImpl>()} {};
    162+      m_impl_batch{std::make_unique<CDBBatch::WriteBatchImpl>()}
    163+{
    164+    Clear();
    


    laanwj commented at 1:10 pm on April 7, 2025:
    nit: Is the Clear() here still necessary, now that Clear doesn’t do anything but call Clear on the internal batch?

    l0rinc commented at 2:08 pm on April 7, 2025:
    We’re calling batch.Clear() after the first batch is written, which does rep_.resize(kHeader) (hence the original assert). For consistency we need this presizing for the very first batch as well.
  13. refactor: Remove manual CDBBatch size estimation
    Remove the manual batch size estimation logic (`SizeEstimate()` method and `size_estimate` member) from `CDBBatch`.
    Size is now determined solely by the `ApproximateSize()` method introduced in the previous commit, which delegates to the native LevelDB function.
    
    The manual calculation is no longer necessary as LevelDB now provides this functionality directly, and the previous commit verified that the native function's results matched the manual estimation.
    
    Assertions comparing the two methods are removed from `txdb.cpp`.
    
    Co-authored-by: Wladimir J. van der Laan <laanwj@protonmail.com>
    e419b0e17f
  14. in src/dbwrapper.h:78 in aeda60c600 outdated
    74@@ -75,6 +75,8 @@ class CDBBatch
    75     friend class CDBWrapper;
    76 
    77 private:
    78+    static constexpr size_t kHeader{12}; // See: src/leveldb/db/write_batch.cc#L27
    


    laanwj commented at 1:10 pm on April 7, 2025:
    This constant is unused now.

    l0rinc commented at 2:10 pm on April 7, 2025:
    Indeed, removed
  15. l0rinc force-pushed on Apr 7, 2025
  16. sipa commented at 2:31 pm on April 7, 2025: member
    utACK e419b0e17f8acfe577c35c62a8a71a19aad249f3
  17. DrahtBot requested review from laanwj on Apr 7, 2025
  18. TheCharlatan commented at 4:14 pm on April 7, 2025: contributor

    Concept ACK

    Is there a reason you only assert the equivalence in batch sizes in txdb? I think it would be better to do them at the start and end of the methods than at a few of their call sites.

  19. l0rinc commented at 4:57 pm on April 7, 2025: contributor

    Is there a reason you only assert the equivalence in batch sizes in txdb?

    I have asserted equivalence at every invocation of SizeEstimate, since that’s what I’m replacing here. I asserted both before and after to make sure I don’t introduce a bias (e.g. “I know that this method won’t change it, so there’s no need to assert”) and because it was temporary.

    I think it would be better to do them at the start and end of the methods than at a few of their call sites.

    Can you show me an example, I’m not sure I follow.

  20. TheCharlatan approved
  21. TheCharlatan commented at 8:09 am on April 8, 2025: contributor

    ACK e419b0e17f8acfe577c35c62a8a71a19aad249f3

    I have asserted equivalence at every invocation of SizeEstimate, since that’s what I’m replacing here. I asserted both before and after to make sure I don’t introduce a bias (e.g. “I know that this method won’t change it, so there’s no need to assert”) and because it was temporary.

    I missed that the estimate is only used in txdb. Seems fine as is then.

  22. laanwj approved
  23. laanwj commented at 8:17 am on April 8, 2025: member
    Code review ACK e419b0e17f8acfe577c35c62a8a71a19aad249f3
  24. ryanofsky assigned ryanofsky on Apr 8, 2025
  25. ryanofsky merged this on Apr 8, 2025
  26. ryanofsky closed this on Apr 8, 2025

  27. l0rinc deleted the branch on Apr 8, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-16 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me