RPC: add new listmempooltransactions #29016

pull niftynei wants to merge 2 commits into bitcoin:master from niftynei:nifty/listmempoolentry changing 8 files +160 −1
  1. niftynei commented at 10:32 pm on December 6, 2023: contributor

    Proposed Update

    Add a new RPC endpoint, listmempooltransactions. Takes as args a start_sequence and verbose.

    Returns:

    • if verbose false, list of txids + their entry_sequence where each entry’s entry_sequence >= the provided `start_sequence.
    • if verbose true, raw tx output info including each entry’s entry_sequence.

    Builds on work done in #19572.

    Rationale

    The current mempool RPCs are lacking an ability to scan for updates in a more efficient manner. You can subscribe for updates via the ZMQ pipeline, but even this is inefficient to recover from if your consumer app falls offline.

    ZMQ also is a push protocol, and doesn’t provide a rate-limiting mechanism.

    In the case of core-lightning, we don’t assume access to bitcoin-core/ZMQ, so we’re unable to do efficient mempool querying. There have been some recent CVEs come to light where having optional access to mempool updates may prove useful.

    Paging, filtering by last seen, and the addition of full tx data in the call makes mempool data more readily available to any client app. This is good for self-sovereignty as it reduces the need for additional dependencies and app data sources to efficiently query and parse mempool data.

    Other Things I Considered

    My initial attempt at this modified getrawmempool to

    • take a start_sequence to allow for filtering, and
    • adding the txdata (inputs outpoints + outputs (amount + scriptPubKeys)).

    That was pretty ugly however, given that the current data model for getrawmempool is around a concept of “mempool entry” data. This new RPC in contrast only returns transaction data (and does not report on information about other mempool dependencies etc; for that you should still call getrawmempool or getmempoolentry).

    You could add a start_sequence to the existing getrawmempool to help with entry data paging/querying but that’s secondary to my goals for fetching relevant txdata from the mempool.

    Additions/Changes

    This PR could be further improved by

    • Add a page_size argument which allows a calling application to limit the number of results returned.

    Future Work

    The current RPC only supports finding added mempool transactions. It’d be interesting to experiment with keeping track of evicted/not mined transactions and adding them to the results.

    This would require:

    • An additional memory buffer (perhaps a configurable memory limited ring buffer?) for evicted tx data and the mempool sequence of the eviction.
    • Adding a sequence_action field for results, which indicates whether the sequence is for the tx’s addition or eviction.

    You could also keep track of tx movements into blocks, but this seems less useful/urgent in general.

    Usage

    You can see an example of this implemented and used in CLN for finding paid onchain invoices in this branch: https://github.com/ElementsProject/lightning/compare/master...niftynei:lightning:nifty/mempool-scan

    Note that the CLN implementation doesn’t currently keep the mempool_sequence in disk, so it’ll reload/rescan the mempool at start. This may be desirable?

    Here’s how a caller would use this

    01) start node
    12) listmempooltransactions 0 to get backlog
    23) call listmempooltransactions again with `start_sequence` set to `mempool_sequence` result from last call
    

    Note that this works well with ZMQ as the mempool_sequence number is identical.

  2. rpc: add new `listmempooltransactions`
    Allow bitcoin-core clients to get more interesting information out
    of the mempool more easily, including by filtering for 'latest txs'.
    
    Eventually would be a good interface for grabbing all mempool
    transaction data, including information about evicted transactions
    (which would be useful for avoiding situations where removed
    transactions contained data important for bitcoin applications). This
    would require the addition of a mempool tx log, however the
    `mempool_sequence` id plus this call and the use of a ring buffer(?)
    would allow for clients to get complete scans of anything a bitcoin
    core node had seen in its mempool.
    
    That's future work. For now we start with the interface over existing
    information.
    cb466f4451
  3. rest: add `listmempooltransactions` to the REST API
    Make the `listmempooltransactions` rpc available via the REST api
    07008477b8
  4. DrahtBot commented at 10:32 pm on December 6, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK kristapsk, bordalix

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24748 (test/BIP324: functional tests for v2 P2P encryption by stratospher)
    • #19463 (Prune locks by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. DrahtBot added the label RPC/REST/ZMQ on Dec 6, 2023
  6. DrahtBot added the label CI failed on Dec 6, 2023
  7. in src/rpc/mempool.cpp:462 in 07008477b8
    457+
    458+        txentry.pushKV("entry_sequence", e.GetSequence());
    459+
    460+        if (verbose) {
    461+            // We could also calculate fees etc for this transaction, but yolo.
    462+            TxToUniv(e.GetTx(), /*block_hash=*/uint256::ZERO, /*entry=*/txentry, /*inclue_hex=*/false);
    


    kevkevinpal commented at 2:17 am on December 7, 2023:

    looks like the tidy ci is getting upset here

    0rpc/mempool.cpp:462:82: error: argument name 'inclue_hex' in comment does not match parameter name 'include_hex' [bugprone-argument-comment,-warnings-as-errors]
    1  462 |             TxToUniv(e.GetTx(), /*block_hash=*/uint256::ZERO, /*entry=*/txentry, /*inclue_hex=*/false);
    2      |                                                                                  ^~~~~~~~~~~~~~~
    3      |                                                                                  /*include_hex=*/
    4./core_io.h:57:88: note: 'include_hex' declared here
    5   57 | void TxToUniv(const CTransaction& tx, const uint256& block_hash, UniValue& entry, bool include_hex = true, bool without_witness = false, const CTxUndo* txundo = nullptr, TxVerbosity verbosity = TxVerbosity::SHOW_DETAILS);
    
  8. pablomartin4btc commented at 3:33 am on December 7, 2023: member

    Also, missing test coverage for the new command:

    0Uncovered RPC commands:
    1  - listmempooltransactions
    
  9. glozow commented at 4:00 pm on December 18, 2023: member

    My initial attempt at this modified getrawmempool to

    * take a `start_sequence` to allow for filtering, and
    
    * adding the txdata (inputs outpoints + outputs (amount + scriptPubKeys)).
    

    This seems like the best option imo. If we want to return the transaction hex (iiuc that should suffice if you are looking to get the tx data itself) we could add another level of verbosity to getrawmempool / getmempoolentry.

    That was pretty ugly however, given that the current data model for getrawmempool is around a concept of “mempool entry” data. This returns transaction data (and does not report on information about other mempool dependencies etc; for that you should still call getrawmempool or getmempoolentry).

    Not sure I understand what the difficulty was? getrawmempool should give dependencies if you use verbose (see the “depends” field).

    Note that this works well with ZMQ as the mempool_sequence number is identical.

    Assuming you mean the sequence emitted from Transaction{AddedTo,RemovedFrom}Mempool, note that this is not actually the case (they’re usually off by 1 or more due to when the sequence increments happen).

  10. niftynei commented at 3:23 am on January 7, 2024: contributor

    Assuming you mean the sequence emitted from Transaction{AddedTo,RemovedFrom}Mempool, note that this is not actually the case (they’re usually off by 1 or more due to when the sequence increments happen).

    Can you say more about this divergence or point me to a doc that contains more info?

    I should add that one problem with the existing getrawmempool RPC semantics is that currently requesting the mempool_sequence and asking for verbose results is specifically disallowed.

    I attempted to find a rationale for this but was unsuccessful (some code reorgs had removed the original code commit in the blame).

    Re: returning the raw tx: that suffices but is suboptimal for an application that is scanning for particular output creation or input inclusion in a proposed tx as it creates additional work on the client.

    Speaking of computation, note that this approach omits the computation required for finding ancestor/descendent txs in the mempool; that seemed like unnecessary work for clients that are solely interested in just the tx info.

  11. niftynei commented at 3:25 am on January 7, 2024: contributor

    Also, missing test coverage for the new command:

    Yes, will add test coverage when this approach gets a concept ack.

    Instead, working usage of this proposed code change has been included in the original PR

  12. kristapsk commented at 3:44 pm on January 7, 2024: contributor
    Concept ACK
  13. bordalix commented at 9:42 pm on January 7, 2024: none
    Concept ACK
  14. in src/rpc/mempool.cpp:456 in 07008477b8
    451+    for (const CTxMemPoolEntry& e : pool.entryAll()) {
    452+        UniValue txentry(UniValue::VOBJ);
    453+
    454+        // We skip anything not requested.
    455+        if (e.GetSequence() < sequence_start)
    456+            continue;
    


    glozow commented at 10:01 am on January 11, 2024:
    Do note that a mempool tx from a recently disconnected block would be skipped here
  15. glozow commented at 10:04 am on January 11, 2024: member

    Assuming you mean the sequence emitted from Transaction{AddedTo,RemovedFrom}Mempool, note that this is not actually the case (they’re usually off by 1 or more due to when the sequence increments happen).

    Can you say more about this divergence or point me to a doc that contains more info?

    It’s more that they just happen to use the same counter, but that doesn’t mean they match. The mempool entry’s sequence number is assigned here; https://github.com/bitcoin/bitcoin/blob/fcacbab4878e10946c518970b630e7dccbbd2d45/src/validation.cpp#L854-L856 And the notification fires along with a counter increment here: https://github.com/bitcoin/bitcoin/blob/fcacbab4878e10946c518970b630e7dccbbd2d45/src/validation.cpp#L1272

    ~This means the notification is +1 of the entry sequence, +n if it’s in a package.~ edit: Woops I might be wrong and that’s the same number. A better example is the entry sequences are all the same but are incremented for the notifications in a package. The entry sequence is also hard coded to 0 for a transaction came from a reorg. It doesn’t look like they were meant to match up exactly, though this potential confusion was acknowledged prior to merge.

    I should add that one problem with the existing getrawmempool RPC semantics is that currently requesting the mempool_sequence and asking for verbose results is specifically disallowed.

    I attempted to find a rationale for this but was unsuccessful (some code reorgs had removed the original code commit in the blame).

    I’m not sure why that’s the case either, the only context I can see is the edits in this comment after pushing those lines, which indicate it was causing a test failure. I can’t open the travis link anymore. Pinging @instagibbs in case he remembers what happened 3+ years ago. Maybe there was a subtle bug, or maybe it was the easiest way to make the tests pass and return the results to the client without adding more params.

    Re: returning the raw tx: that suffices but is suboptimal for an application that is scanning for particular output creation or input inclusion in a proposed tx as it creates additional work on the client.

    Note that you can use gettxspendingprevout to search for mempool transactions spending a particular output, since the mempool does index by that. A watchonly wallet will also scan for its transactions in mempool as they appear.

    Speaking of computation, note that this approach omits the computation required for finding ancestor/descendent txs in the mempool; that seemed like unnecessary work for clients that are solely interested in just the tx info.

    Don’t worry, there’s no extra computation, it’s all cached in the mempool entry which is already pulled in.

    Anyway, I think it makes sense to have more params/levels to control filters and results for getrawmempool. It seems reasonable to have a higher verbosity level or more granular controls to give you the results you want.

    Can you describe in more detail what args you want to pass and what results you want returned? I have no objection to making life easier for clients, it’s just that the PR as it looks right now seems like we’d have 2 RPCs that do very similar things, and uses mempool entry sequence in a way that might be buggy.

  16. instagibbs commented at 1:42 pm on January 11, 2024: member

    what happened 3+ years ago

    I have some bad news, I do not.

  17. DrahtBot added the label Needs rebase on Jan 29, 2024
  18. DrahtBot commented at 6:12 pm on January 29, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  19. maflcko commented at 6:05 pm on February 22, 2024: member
    Are you still working on this? Looks like there are outstanding questions.
  20. niftynei commented at 9:59 pm on February 23, 2024: contributor

    Yes, haven’t had time recently due to bitcoin++ work.

    Note that you can use gettxspendingprevout to search for mempool transactions spending a particular output

    My use case is to scan for new transactions depositing to addresses or new scripts pertaining to a wallet which is not managed by core. Please see linked PR for CLN.

    to have more params/levels to control filters and results for getrawmempool.

    As mentioned in the PR, the data model that this returns is distinct from getrawmempool. This is specifically for transaction model data; getrawmempool’s data model is around mempool entries.

    Note that this works well with ZMQ as the mempool_sequence number is identical.

    Found an exception to this: package acceptance sequencing is broken and breaks this assumption.

    After some research, it turns out that we can’t expose the mempool_sequence until we fix an existing bug in the package acceptance’s mempool sequence assignment code.

    Currently, we assign every tx in a package the same sequence number, but notify via ZMQ using an incrmenting one. The problem stems from copying the behavior the single tx acceptance used for sequence assignment (which is quite well done) to the package context, which broke assumptions the single tx was able to rely on. This should be fixed regardless of the status of this PR, and should be considered a blocker imo for this being shipped. If someone else wants to fix it, that’d be great.

  21. luke-jr referenced this in commit 8c8f3163f2 on Mar 13, 2024
  22. luke-jr referenced this in commit a4fe132d56 on Mar 13, 2024
  23. achow101 commented at 2:26 pm on April 9, 2024: member
    Are you still working on this?
  24. niftynei commented at 9:49 pm on April 10, 2024: contributor
    Yes but won’t have any progress to show til May or so. Thanks
  25. fanquake marked this as a draft on Apr 11, 2024
  26. DrahtBot commented at 1:16 am on July 9, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me