rpc: improve `getreceivedby{address,label}` performance #23662

theStack commented at 5:12 PM on December 3, 2021: member

The RPC calls getreceivedbyaddress/getreceivedbylabel both use the internal helper function GetReceived which was introduced in PR #17579 to deduplicate tallying code. For every wallet-related transaction output, the following unnecessary operations are currently performed in the tally loop, leading to a quite bad performance (as reported in #23645):

converting from CScript -> TxDestination (ExtractDestination(...)), converting from TxDestination -> CScript (CWallet::IsMine(const CTxDestination& dest)); this can be avoided by directly using output scripts in the search set instead of addresses (first commit)
checking if the iterated output script belongs to the wallet by calling IsMine; this can be avoided by only adding addresses to the search set which fulfil IsMine in the first place (second commit)

Benchmark results

The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:

5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received)
5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent)
5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent)

Then, the time is measured for calling getreceivedbyaddress and getreceivedbylabel, the latter with two variants. Results on my machine:

branch	`getreceivedbyaddress` (single)	`getreceivedbylabel` (single)	`getreceivedbylabel` (10000)
master	406.13ms	425.33ms	446.58ms
PR (first commit)	367.18ms	365.81ms	426.33ms
PR (second commit)	3.96ms	4.83ms	339.69ms

Fixes #23645.

kristapsk commented at 5:19 PM on December 3, 2021: contributor

if users put a huge number of addresses under the same label. Not sure how that is used in production, I always assumed that there is a one-by-one label->address relation and having more than one address referred to by a label doesn't make too much sense.

JoinMarket users labels to mark addresses in a Bitcoin Core watchonly wallet as belonging to a specific JoinMarket wallet. All addresses of the same JoinMarket wallet will have the same label in Bitcoin Core watchonly wallet. It used accounts before, until they were deprecated.

DrahtBot added the label RPC/REST/ZMQ on Dec 3, 2021

DrahtBot added the label Wallet on Dec 3, 2021

DrahtBot commented at 5:07 AM on December 4, 2021: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25122 (rpc: getreceivedbylabel, return early if no addresses were found in the address book. by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

beegater commented at 11:46 PM on December 4, 2021: none

ACK!

Great, thanks a lot for addressing this so quickly!

alirezaayande approved

fanquake deleted a comment on Dec 6, 2021

in src/wallet/rpcwallet.cpp:512 in d1f056fddc outdated

 509 |  
 510 |      if (by_label) {
 511 |          // Get the set of addresses assigned to label
 512 |          std::string label = LabelFromValue(params[0]);
 513 | -        address_set = wallet.GetLabelAddresses(label);
 514 | +        for (auto& address : wallet.GetLabelAddresses(label)) {

MarcoFalke commented at 9:01 AM on December 6, 2021:

why is this a "mutable" reference?

theStack commented at 10:16 AM on December 6, 2021:

Thanks, good catch, fixed.

MarcoFalke approved

MarcoFalke commented at 9:01 AM on December 6, 2021: member

lgtm

theStack force-pushed on Dec 6, 2021

DrahtBot added the label Needs rebase on Dec 8, 2021

shaavan approved

shaavan commented at 7:11 AM on December 8, 2021: contributor

crACK 7589305280a7b0f43a4dd500b9fc72e070104bd2

Thanks for addressing it. These updates make the code more clear to understand and straightforward to run.

as it could actually hurt performance if users put a huge number of addresses under the same label.

Though I think this refactoring would be very advantageous for an average user, this statement still worries me a little. Maybe if someone might have some insights into how frequent a user is to put multiple addresses under the same label, they could share them.

rpc: avoid scriptPubKey<->CTxDestination conversions in `GetReceived` tally a7b65af2a4

rpc: avoid expensive `IsMine` calls in `GetReceived` tally f336ff7f21

theStack force-pushed on Dec 8, 2021

theStack commented at 5:36 PM on December 8, 2021: member

Rebased on master (necessary as the function modified in this PR has moved from ./src/wallet/rpcwallet.cpp to ./src/wallet/rpc/coins.cpp). @shaavan: Thanks for your review! The performance worries on the second commit concerns transactions that are in wallet.mapWallet but don't satisfy the IsMine predicate (which should be all txs sent from the wallet to an address not owned by the wallet) ; if the search in the output script set (output_scripts.count(...)) takes longer than the IsMine check, then on master the loop iteration would be faster for those transactions. On the other hand a search in std::set takes O(log n), so it shouldn't be that bad.

Would be very interested in further opinions.

DrahtBot removed the label Needs rebase on Dec 8, 2021

amadeuszpawlik approved

amadeuszpawlik commented at 3:20 PM on December 12, 2021: contributor

Concept & approach ACK

stratospher commented at 6:18 PM on December 12, 2021: contributor

Concept ACK.

a7b65af
- Output scripts are used instead of addresses to perform the operations.
- For the 'getreceivedbylabel' case, all the addresses with the given label are iterated through to insert all the output_scripts. For the 'getreceivedbyaddress' case, the required address's output script is inserted into output_scripts.
- When tallying all the transactions, output scripts can be directly used to compute the total amount at an address/label in the wallet.
f336ff7
- For the 'getreceivedbylabel' case, when iterating through all the addresses with the given label to insert the output_scripts, only output_scripts that belong to the wallet are added.
- When tallying all the transactions, output scripts are used to compute the total amount at an address/label(and it's already known that they belong to the wallet).

achow101 commented at 6:40 PM on December 15, 2021: member

I don't think it's even necessary to use IsMine. We already filter for addresses that are ours within the address book itself by setting CAddressBookData.purpose to receive. So you can just check for that string rather than doing IsMine. It will certainly be more efficient.

theStack commented at 11:47 AM on December 16, 2021: member

I don't think it's even necessary to use IsMine. We already filter for addresses that are ours within the address book itself by setting CAddressBookData.purpose to receive. So you can just check for that string rather than doing IsMine. It will certainly be more efficient.

Thanks for the hint! In the current version of the PR though, IsMine is not called in the hot spot (mapWallet/CTxOut iteration loop) anymore. It may still make sense to check for the CAddressBookData.purpose instead of using IsMine in the preparation loop, which creates the "needles" of the search. Will take a deeper look on the weekend.

beegater commented at 5:53 PM on April 1, 2022: none

Can this please be merged? It would help a lot!

kristapsk commented at 6:34 PM on April 1, 2022: contributor

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

beegater commented at 6:53 PM on April 5, 2022: none

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

For the second commit this may be true. But the version that has now been on master for several years is FAR slower under any scenario than the solution in this initial commit. And just pushing the merge forward until some benchmarking is set up or not will just mean the slow solution will stay in all upcoming versions. Since nobody else has risen awareness to this bug since all that time it is on master, I'd say that the 10x degraded performance of this rpc was not relevant to anyone anyway, so merging this PR which provides a speed up for most scenarios should be fine.

theStack referenced this in commit a63a87b2e0 on Apr 14, 2022

theStack commented at 12:10 PM on April 14, 2022: member

Finally did some benchmarking. The functional test wallet_pr23662.py (not part of this PR) creates transactions with 15000 different addresses:

5000 outputs (500 txs with 10 outputs each) with label set, IsMine set (received)
5000 outputs (500 txs with 10 outputs each) with label set, IsMine not set (sent)
5000 outputs (500 txs with 10 outputs each) without label set, IsMine not set (sent)

Then, the time is measured for calling getreceivedbyaddress and getreceivedbylabel, the latter with two variants. Results on my machine:

branch	`getreceivedbyaddress` (single)	`getreceivedbylabel` (single)	`getreceivedbylabel` (10000)
master	406.13ms	425.33ms	446.58ms
PR (first commit)	367.18ms	365.81ms	426.33ms
PR (second commit)	3.96ms	4.83ms	339.69ms

Seems like my worries ("the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label") are not justified and most of the performance gain is indeed achieved by the second commit. Improvement ideas to the benchmark are very welcome, though I'd consider this pretty safe to merge at this point.

achow101 commented at 3:19 PM on April 19, 2022: member

I'm worried about "the second commit is potentially more controversial, as it could actually hurt performance if users put a huge number of addresses under the same label", as this is how JoinMarket uses Core wallets. Guess some benchmarking is needed here, how much is performance gains / losses in different scenarios.

Performance would only be hurt if a label had a huge number of addressees that did not belong to the wallet and for that to matter, there would need to be way more addresses than outputs stored in the wallet. Even for JoinMarket users (probably especially for JoinMarket users too), it will be faster because instead of doing IsMine on every single output of every single transaction in a wallet (which for coinjoins the vast majority will be irrelevant to the wallet), it is doing IsMine on things that are in the address book. For most users, the set of addresses in the address book will be smaller than the number of transaction outputs in a wallet.

achow101 commented at 3:49 PM on April 19, 2022: member

ACK f336ff7f213564909cf5f9742618cc6ec87600fd

w0xlt approved

w0xlt commented at 5:25 PM on April 19, 2022: contributor

ACK https://github.com/bitcoin/bitcoin/pull/23662/commits/f336ff7f213564909cf5f9742618cc6ec87600fd

This change makes the code clearer and better performing.

kristapsk commented at 7:05 PM on April 19, 2022: contributor

Comments by @theStack and @achow101 calmed me down, but will try to benchmark myself just in case too.

MarcoFalke added this to the milestone 24.0 on Apr 27, 2022

laanwj added this to the "Blockers" column in a project

furszy approved

furszy commented at 8:17 PM on May 12, 2022: member

Code ACK f336ff7f

Plus, probably something like https://github.com/furszy/bitcoin-core/commit/e8d860742ca1c834c0e4a31a38bd14ab3d681f5d would be good to include here or on a follow-up PR.

Basically, inside GetReceived, we can return early if the label provided by getreceivedbylabel is not in the address book. Otherwise, we walk through all the wallet txs + outputs for no reason (output_scripts is empty).

theStack commented at 9:02 PM on May 12, 2022: member

@furszy: Thanks for reviewing!

Plus, probably something like furszy@e8d8607 would be good to include here or on a follow-up PR.

Basically, inside GetReceived, we can return early if the label provided by getreceivedbylabel is not in the address book. Otherwise, we walk through all the wallet txs + outputs for no reason (output_scripts is empty).

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours), hence I think this would fit better for a follow-up PR. Would you mind opening one? If you want, you can already do that now, branching off of this PR.

furszy commented at 9:20 PM on May 12, 2022: member

Hey, sure. Will do it.

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours)

Actually, can happen when the user inputs a label that is not in the address book as well (eg: wallet.GetLabelAddresses(label) returning an empty set).

theStack commented at 9:33 PM on May 12, 2022: member

This seems to be an optimization for a very special case (passing a label that is in the address book, but doesn't apply to any address that is ours)

Actually, can happen when the user inputs a label that is not in the address book as well (eg: wallet.GetLabelAddresses(label) returning an empty set).

Good point. So it would be an improvement of the user-interface, which is maybe even more important; right now if the passed label doesn't exist (due to e.g. a typo from the user), the amount of 0 is returned instead of an error.

furszy commented at 9:44 PM on May 12, 2022: member

yeah, for that I added an error there "Address/es not found in wallet" (a general error for both getreceivedbylabel and getreceivedbyaddress if the address/label is not in the wallet).

But we could customize it a bit more too. Have an error string targeting getreceivedbyaddress -> "Address not found in wallet" and another one for getreceivedbylabel -> "label not found in wallet".

In any case, just opened #25122. Will rebase it on master once this one gets merged.

MarcoFalke assigned achow101 on May 13, 2022

achow101 merged this on May 16, 2022

achow101 closed this on May 16, 2022

laanwj removed this from the "Blockers" column in a project

achow101 referenced this in commit 5ebff43025 on May 23, 2022

sidhujag referenced this in commit c217be178c on May 28, 2022

sidhujag referenced this in commit 99ee44cab2 on May 28, 2022

DrahtBot locked this on May 16, 2023