RFC: Remove boost usage from kernel api / headers #28335

pull TheCharlatan wants to merge 15 commits into bitcoin:master from TheCharlatan:mempoolBoostHeaderPrune changing 18 files +913 −816
  1. TheCharlatan commented at 5:46 am on August 24, 2023: contributor

    Similarly to #28327 I wanted to open this PR to receive some opinions and better approaches.

    The kernel library is currently at the stage where unwanted headers are removed from its set of headers. In practice, this means we are reducing the number of includes that are required for compiling the experimental bitcoin-chainstate binary. This is described in stage 1 step 3 of the project tracking issue.

    Currently the mempool is part of the kernel library. The mempool headers include the boost multi index headers. Thus any application wanting to use the kernel library and its headers will have to include the boost headers too. This is not only undesirable because of the sheer size of these headers, but also might lead to conflicts if the including application uses a different boost version.

    In the approach laid out by this PR, mempool member variables and methods are declared in the header without having to include boost by either wrapping them in a struct and pimpling them, or making methods static implementation functions. The boost definitions are gathered into separate header (mempool_set_definitions.h) that is only included by implementation files that require definitions of the boost types. This allows us to retain the current architecture with roughly the same interfaces.

    The approach laid out by this PR also has some, albeit small, compilation speed and size benefits. Averaged over a few of compilation runs I consistently observe faster compilation by a couple of seconds and some smaller pre-processed and compiled object sizes. The main detractor of this method is obviously the number of lines touched. However it also has the benefit of inventorizing all the files that require direct access to the mempool data structures as well getting rid of boost multi index includes in non-kernel implementation files that include the mempool, but don’t directly manipulate its data structures (e.g. wallet.cpp).

    A much simpler alternative approach, at least on the surface, would be removing all txmempool.h includes from kernel library headers (see this branch). Currently this is only validation.h. Due to the mutex member of CTxMemPool and the correspondingly defined lock decorators on the chainstate methods this becomes a bit more complicated though and I am not sure how this might be possible with the current architecture.

    A discussion of how and if to remove the mempool from the kernel library has so far been intentionally punted to the next stage of the kernel library development. Pimpling the mempool itself precludes this discussion, since the library could never be shipped with the CTxMemPool headers. Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.

  2. DrahtBot commented at 5:46 am on August 24, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK theuni

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28960 (kernel: Remove dependency on CScheduler by TheCharlatan)
    • #28948 (v3 transaction policy for anti-pinning by glozow)
    • #28368 (Fee Estimator updates from Validation Interface/CScheduler thread by ismaelsadeeq)
    • #26593 (tracing: Only prepare tracepoint arguments when actually tracing by 0xB10C)
    • #25038 (policy: nVersion=3 and Package RBF by glozow)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Aug 24, 2023
  4. ryanofsky commented at 4:02 pm on August 29, 2023: contributor

    This is a pretty big change, but looking over it, it seems all straightforward and mechanical, and should not be very hard to review.

    I’d be interested in opinions from developers who work more with closely with mempool code about whether this improves the mempool code, or if it creates any issues.

    My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

    I don’t think the stated goals of this PR are very compelling on their own if they are just about speeding up compilation times, or making it easier to compile the kernel library and kernel applications with different versions of boost. (I doubt there are a lot of applications that would opt for using multiple versions of boost in the same build, and if there are it would seem more straightforward for them to put an #include boundary between their application code and libbitcoin_kernel than would be is for this PR to put a boundary between mempool code and the rest of the kernel.)

    But despite the large diff and long description, the changes in the PR do seem pretty straightforward, so I don’t think the benefits need to be that big to justify the changes.

  5. theuni commented at 2:15 pm on August 30, 2023: member

    Concept ACK. I looked at an early version of this and thought generally the same: it’s mostly mechanical and not too bad to review.

    My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

    Generally agree. I’d also like to hear from some mempool folks. @glozow ?

    making it easier to compile the kernel library and kernel applications with different versions of boost. (I doubt there are a lot of applications that would opt for using multiple versions of boost in the same build, and if there are it would seem more straightforward for them to put an #include boundary between their application code and libbitcoin_kernel than would be is for this PR to put a boundary between mempool code and the rest of the kernel.)

    I think this glosses over the larger result of this work: boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That’s a huge achievement imo. Barring us vendoring Boost, it’s the difference between: “take this lib and build a hello world app in your point-and-click IDE” to “use this docker image which includes our recommended version of boost headers to build from a script”. I know, I know, the latter is an exaggeration and wouldn’t be the end of the world. But without boost, as-is right now, we’d be able to ship a libbitcoinkernel that depends on only these headers (though obviously paring down that list is a next step) and nothing from the outside world. To me that’s very significant.

    But despite the large diff and long description, the changes in the PR do seem pretty straightforward, so I don’t think the benefits need to be that big to justify the changes.

    +1 though as stated above, to me the benefits are that big :)

  6. theuni commented at 2:26 pm on August 30, 2023: member

    PR title is misleading btw. Boost is not removed from the kernel, it’s removed from the kernel’s api.

    I think conceptually this is the commit that really demonstrates the change here: bitcoin-chainstate no longer has any external dependencies.

  7. TheCharlatan renamed this:
    RFC: Remove boost usage from kernel
    RFC: Remove boost usage from kernel api / headers
    on Aug 30, 2023
  8. in src/txmempool.cpp:91 in e59f92518c outdated
    86+ *     exceed ancestor limits. It's the responsibility of the caller to
    87+ *     removeRecursive them.
    88+ * @param[in,out] mapTx a reference to the complete mempool map.
    89+ * @param[in] limits the mempool limits configuration.
    90+ */
    91+static void UpdateForDescendants(
    


    glozow commented at 4:19 pm on August 30, 2023:
    e59f92518c831f0da237ac83875709b4174b55b2 Instead of making this static, perhaps you could inline this into UpdateTransactionsFromBlock? Dropping the thread-safety annotations is a bit of a negative imo.
  9. glozow commented at 4:29 pm on August 30, 2023: member

    boost is now not a requirement for using libbitcoinkernel at all. It removes our last external dependency. That’s a huge achievement imo.

    Sounds pretty cool to me.

    I’m also quite interested in this advertised benefit:

    Pimpling the mempool members (like done in this PR) might also make a future splitting of block and mempool validation logic into separate compilation units easier.

    My opinion is would be that if this improves mempool code it should be merged, and if it makes mempool code more complicated or harder to maintain, it should not be merged.

    I think the size of the diff is a symptom of lots of code external to txmempool knowing a bit too much about how txmempool is implemented, i.e. accessing mapTx when it should probably use a different CTxMemPool method to get the information it needs. Adding .impl in those areas seems a bit ugly tbh - I think a cleaner approach would be to have those bits of code not use txiter/mapTx/setEntries. Apart from that, it seems generally good to move more of the implementation inward. I just had 1 comment about UpdateForDescendants.

    This branch refactors some of these mapTx accesses to use things like infoAll() and GetIter() and size(), which imo are the more appropriate functions to be calling there. The result is that a lot of things don’t need to be pimpl’d anymore or fewer places are using the pimpl’d data structures. I rebased this PR on top of that branch, which resulted in 7 commits being unnecessary and the pimpling commits touching fewer files. The overall diff isn’t that much smaller, but I think it achieves the intended result + independently nice changes.

    I sorted the commits from most trivial to least trivial - the first ones are very straightforward things like “call pool.size() instead of pool.mapTx.size().” The last part re-implements DisconnectedBlockTransactions to not need a boost multi index container, which I suppose could be controversial. Would you maybe be interested in taking any of these changes?

  10. ryanofsky commented at 4:41 pm on August 30, 2023: contributor

    I think this glosses over the larger result of this work: boost is now not a requirement for using libbitcoinkernel at all.

    Yes this is a distinction I glossed over and didn’t think much about. With this PR, boost is still a requirement for building libbitcoinkernel, but not a requirement for using it.

    But I don’t understand what the big advantage is (other than the ones already mentioned about speeding up build times or mixing versions of boost). And I don’t undestand or what you are saying about the docker image and IDE. To my mind there are two type of developers who might want to use libbitcoinkernel:

    1. Developers who will want to use a packaged version of libbitcoinkernel in the form of a vcpkg, conan, nix, or wherever other package.
    2. Developers who will take a more DIY approach and put libbitcoinkernel in a vendor branch or monorepo or something like that.

    The effect of this PR seems negligible in both of this cases, probably resulting in literal 1-line change like fd49dc99398a3ec331f393f877ec9ae804373043 removing a path from the include list.

    So I’m guessing maybe the thing you are enthusiastic about is supporting a third type of developer who (1) doesn’t want to use a package and (2) doesn’t want to build from source, but instead wants to download a tarball containing a precompiled libbitcoinkernel library binary and some headers, and use that. But the difference is pretty small in this case too, no? Without this PR the tarball needs to contain a few boost header files, and with this PR, it doesn’t.

    In any case, I’m happy about the benefits of this PR, but just think the main consideration here should be whether it improves the mempool code or not, and glad to see glozow had some suggestions for simplifying it.

  11. theuni commented at 5:12 pm on August 30, 2023: member

    In any case, I’m happy about the benefits of this PR, but just think the main consideration here should be whether it improves the mempool code or not, and glad to see glozow had some suggestions for simplifying it.

    I think we should leave it at this. I’ll admit to complete tunnel-vision on this particular boost removal. It’s been ~an obsession~ a goal of mine for long enough that I’m probably too biased to judge the code changes required for it. And as you said, I don’t think i need to belabor that point anyway when there’s other merit to the changes. @glozow thanks for taking a look! Seems getting rid of some of the direct uses of mapTx would be a nice cleanup even outside of this PR?

  12. TheCharlatan commented at 8:54 pm on August 30, 2023: contributor

    Re #28335#pullrequestreview-1603077161

    Thank you for taking so much consideration for this change! I think your additional patches are well worth reducing the amount of code that has to directly access the boost data structure from outside of the CTxMemPool class.

    Would you maybe be interested in taking any of these changes?

    Most of the changes (as you mentioned) should be uncontroversial, e.g. just directly using existing helpers, or the vTxHashes changes. I’ll gladly integrate these here.

    I think the DisconnectedBlockTransactions change would merit its own PRs, since it does change the internal behavior and performance somewhat.

    The other two cases I am not sure about is using infoAll() (as done here) to create a vector of TxMemPoolInfo instead of directly iterating over mapTx and construction of a new GenTxid here. In both cases it would be fairly straight forward to replace them with new CTxMemPool helper methods if they prove too controversial.

  13. glozow added the label Refactoring on Sep 1, 2023
  14. TheCharlatan force-pushed on Sep 1, 2023
  15. TheCharlatan commented at 9:24 am on September 1, 2023: contributor

    Updated with @glozow’s patches. I think these are well worth it for both cleaning up the code to not needlessly reach into complex mempool member variables as well as reducing the number of source files that need to include the multiindex headers.

    The order of the commits was slightly changed, the heavier DisconnectedBlockTransactions refactor is now the first commit. As a further approach I think two separate PRs should be opened; one for the heavier DisconnectedBlockTransactions and related changes as well as another one for the easier refactors. Once these are processed, this current PR can be rebased and opened for code review.

  16. DrahtBot removed the label CI failed on Sep 1, 2023
  17. in src/txmempool.h:36 in bdbc5093c2 outdated
    33@@ -34,6 +34,16 @@
    34 
    35 class CChain;
    36 
    37+#ifndef BITCOIN_MEMPOOL_SET_DEFINITIONS_H
    


    theuni commented at 2:21 pm on September 1, 2023:
    1. I don’t think this guard is necessary.
    2. DisconnectedTransactionsIteratorImpl and IndexedDisconnectedTransactionsImpl don’t seem to exist
    3. Need to remove mempool_set_definitions.h to actually get boost out of the headers, no?

    TheCharlatan commented at 2:29 pm on September 1, 2023:

    Will fixup points 2 and 3, must have slipped through the rebase.

    I don’t think this guard is necessary.

    The reason I added this here is that if there is an implementation that first includes the mempool_set_definitions and then the txmempool we don’t redefine these data structures. Is there a better way to do this?


    theuni commented at 2:33 pm on September 1, 2023:

    Correct me if I’m misunderstanding, but I believe this is the pattern?

    0struct foo;
    1struct foo
    2{
    3    int bar;
    4};
    5struct foo;
    6int main(){}
    

    If so, I (and gcc/clang :) don’t see an issue with that.


    theuni commented at 2:36 pm on September 1, 2023:
    (Didn’t mean to get in the weeds with review, I realize you’re still looking for concept ACKs. The mempool_set_definitions.h thing is all I really care about at this stage)

    TheCharlatan commented at 2:48 pm on September 1, 2023:

    If so, I (and gcc/clang :) don’t see an issue with that.

    I thought for some reason that this was discouraged, but you’re obviously right.

    Didn’t mean to get in the weeds with review,

    np, any engagement here is much appreciated :)

  18. in src/Makefile.am:897 in 8e125c0c84 outdated
    889@@ -890,7 +890,7 @@ bitcoin_util_LDADD = \
    890 
    891 # bitcoin-chainstate binary #
    892 bitcoin_chainstate_SOURCES = bitcoin-chainstate.cpp
    893-bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) $(BOOST_CPPFLAGS)
    894+bitcoin_chainstate_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES)
    


    theuni commented at 2:28 pm on September 1, 2023:

    FWIW this won’t have any effect on c-i currently because boost is installed into the common depends prefix.

    As a quick hack to demonstrate the functionality of this PR, you could add: https://github.com/theuni/bitcoin/commit/e131d8f1e33baa9a9499d2e1a4a99b171566c5a5 and drop it before merge. That puts boost in a different path, so BOOST_CPPFLAGS becomes a requirement as-intended. (We should also fix that for good in a less hackish way.)

    As far as I can tell this PR should currently fail with the above patch because the removal is not yet done as mentioned here: https://github.com/bitcoin/bitcoin/pull/28335/commits/bdbc5093c226a8f843f938cc51e2077c4b4623a0#r1313092304


    hebasto commented at 2:34 pm on September 1, 2023:

    As a quick hack to demonstrate the functionality of this PR, you could add: theuni@e131d8f and drop it before merge. That puts boost in a different path, so BOOST_CPPFLAGS becomes a requirement as-intended.

    IIRC, this issue is recurring, no?

    We should also fix that for good in a less hackish way.

    Agree with it :)


    theuni commented at 2:38 pm on September 1, 2023:
    Heh, yes, I think this is the 3rd time we’ve needed this to demonstrate a boost dependency removal. Will do.

    fanquake commented at 3:19 pm on September 1, 2023:
    Note that this was covered by CI until we dropped the native macOS ARM builds.
  19. TheCharlatan force-pushed on Sep 1, 2023
  20. TheCharlatan force-pushed on Sep 1, 2023
  21. glozow commented at 4:24 pm on September 1, 2023: member

    As a further approach I think two separate PRs should be opened; one for the heavier DisconnectedBlockTransactions and related changes as well as another one for the easier refactors.

    The other two cases I am not sure about is using infoAll() (as done here) to create a vector of TxMemPoolInfo instead of directly iterating over mapTx and construction of a new GenTxid here. In both cases it would be fairly straight forward to replace them with new CTxMemPool helper methods if they prove too controversial.

    Opened #28385 with the less straightforward changes.

  22. DrahtBot added the label CI failed on Sep 3, 2023
  23. DrahtBot removed the label CI failed on Sep 5, 2023
  24. DrahtBot added the label Needs rebase on Sep 14, 2023
  25. fanquake referenced this in commit ac9fa6ec78 on Sep 23, 2023
  26. theuni commented at 5:33 pm on October 11, 2023: member
    Needs rebase after #28385.
  27. TheCharlatan force-pushed on Oct 12, 2023
  28. DrahtBot removed the label Needs rebase on Oct 12, 2023
  29. DrahtBot added the label CI failed on Oct 12, 2023
  30. TheCharlatan force-pushed on Oct 13, 2023
  31. DrahtBot removed the label CI failed on Oct 13, 2023
  32. DrahtBot added the label Needs rebase on Oct 30, 2023
  33. TheCharlatan force-pushed on Nov 3, 2023
  34. DrahtBot removed the label Needs rebase on Nov 3, 2023
  35. DrahtBot added the label Needs rebase on Nov 9, 2023
  36. TheCharlatan force-pushed on Nov 10, 2023
  37. DrahtBot removed the label Needs rebase on Nov 10, 2023
  38. fanquake referenced this in commit dd5f5713bc on Nov 13, 2023
  39. TheCharlatan force-pushed on Nov 13, 2023
  40. TheCharlatan force-pushed on Nov 15, 2023
  41. TheCharlatan commented at 10:11 pm on November 15, 2023: contributor
    Rebased on #28886, which allowed for dropping one of the big commits in the previous patchset as well as slimming down the other patches.
  42. DrahtBot added the label CI failed on Nov 15, 2023
  43. in src/txmempool.h:212 in 433a7265b7 outdated
    395@@ -396,15 +396,18 @@ class CTxMemPool
    396 
    397     typedef std::set<txiter, CompareIteratorByHash> setEntries;
    398 
    399+    typedef std::set<CTxMemPoolEntryRef, CompareIteratorByHash> setEntryRefs;
    


    maflcko commented at 7:06 am on November 16, 2023:
    use using for new code? Also could remove the duplicate CTxMemPoolEntryRef decl in this pull (see my diff from the last pull)

    TheCharlatan commented at 12:58 pm on November 16, 2023:
    Added a new commit for this.

    maflcko commented at 1:04 pm on November 16, 2023:
    Thanks, I guess it could be moved to #28886 as well, if you don’t mind and have to retouch for other reasons.
  44. TheCharlatan force-pushed on Nov 16, 2023
  45. DrahtBot removed the label CI failed on Nov 16, 2023
  46. Maytow approved
  47. DrahtBot added the label Needs rebase on Nov 24, 2023
  48. TheCharlatan force-pushed on Nov 24, 2023
  49. DrahtBot removed the label Needs rebase on Nov 24, 2023
  50. Maytow approved
  51. Maytow commented at 3:45 am on November 25, 2023: none
    src/txmempool.cpp
  52. bitcoin deleted a comment on Nov 25, 2023
  53. mempool: Replace sets of txiter with CTxMemPoolEntryRefs
    Currently the mempool returns and consumes sets of multiindex iterators
    in its public API. A likely motivation for this over working with
    references to the actual values is that the multi index interface works
    with these iterators and not with pointers or references to the actual
    values.
    
    However, the iterator type provides little benefit in practice as
    applied currently. Its purpose, ownership, and safety semantics often
    remain ambiguous, and it is hardly used for actually iterating through
    the data structures. So replace it where possible with
    CTxMemPoolEntryRefs.
    
    Since CTxMemPoolEntry itself refers to its Parents and Children by
    CTxMemPoolEntryRef and not txiter, this allowed for an overall
    reduction of calls to iterator_to.
    
    This also makes the long term goal of eliminating boost types from the
    headers more feasible.
    342cce4d17
  54. mempool: Remove dead code bb336762cb
  55. mempool: Remove txiter/setEntries from public interface 068c279e22
  56. doc: Document CTxMemPoolEntry copy and move constructors 2685c9a228
  57. DrahtBot added the label Needs rebase on Nov 28, 2023
  58. [refactor] Move boost txmempool definitions to their own header
    Introduces mempool_set_definitions.h which collects the boost types that
    were previously defined in the txmempool header.
    
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    8aa6c68098
  59. [refactor] Move boost miner definitions to miner implementation
    They need not be in the header, so move them to the implementation.
    5e342299c7
  60. [refactor] Make txmempool's UpdateForDescendants static
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    1335fc061d
  61. [refactor] Wrap txmempool's txiter in a struct
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    e973d65f1b
  62. [refactor] Pimpl txmempool's mapTx
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    4936cf4269
  63. [refactor] Move some CTxMemPool methods to implementation.
    This allows them to use the mapTx without including the full mapTx
    definition in the header.
    
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    0ee51b4121
  64. [refactor] Make txmempool's GetSortedDepthAndScore static
    This removes the need for defining its complex boost multi index return
    type in the header.
    
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    855b05ea0b
  65. [refactor] include cmath in init.cpp
    This missing include was found in the following commit.
    f9efc43d76
  66. [refactor] Complete ripping boost includes from txmempool.h
    All the header files should now no longer contain boost multi index
    includes. Only the implementation files directly include the multi index
    types for manipulating mempool data structures.
    
    The context of this change is an effort towards removing the boost multi
    index includes from the header tree. The goal is that the experimental
    `bitcoin-chainstate` binary can be compiled without requiring boost
    includes.
    0b78f0c7db
  67. build: Remove boost from bitcoin-chainstate CPPFLAGS
    This is possible as of the previous commit.
    2cda968d24
  68. refactor: Remove duplicate CTxMemPoolEntryRef decl 55d99c2c47
  69. TheCharlatan force-pushed on Nov 29, 2023
  70. DrahtBot removed the label Needs rebase on Nov 29, 2023
  71. DrahtBot added the label Needs rebase on Dec 1, 2023
  72. DrahtBot commented at 9:17 pm on December 1, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  73. DrahtBot commented at 1:37 am on February 29, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  74. DrahtBot commented at 0:34 am on May 28, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  75. TheCharlatan commented at 8:09 am on June 2, 2024: contributor
    Closing this. There is some work being done towards removing boost from the mempool wholesale. Since that would be preferable, pruning just the headers can be revisited in case that effort proves unsuccessful.
  76. TheCharlatan closed this on Jun 2, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me