[WIP] wallet: standardize change output detection process

furszy commented at 8:40 pm on September 1, 2022: member

Depends on #27601, please go there first.

This work aims to define, and implement a base standard mechanism to detect individual change outputs.

Context

Currently, the wallet detects whether an output is change or not based on data stored in the address book.

There is no notion of “change outputs”, the wallet detects change scripts.

Connoting that any address book record modification has implications on all the historical outputs related to that particular destination. Meaning that all those outputs can either be change or not. There is no middle-ground granular distinction.

How Change Detection Currently Works?

The wallet walks-through the transaction outputs, extracts the script destination and verify the following two points:

If the destination doesn’t exist in the address book, then the script is a “change address”.
If the destination exists in the address book, but it doesn’t have a label, then the script is a “change address”.

Motivation

There are a good number of problems in the current approach:

We make the wallet dependent on an external structure, with separate storage. Which has to be updated and maintained along with the wallet state.
It cannot be maintained nor recovered across different wallet instances. Cannot re-create the, possibly custom, address book data only by importing the wallet descriptor string.
As the address book is an structure that the user can freely modify, the change detection process might differ through different wallets.
The current rudimentary assumptions of “no address book entry” or “no label set for the address book entry” to denote that certain script destination is change or not can easily be broken: E.g. derive an address from one of the wallet’s external paths manually. Then send coins to it. As the receive destination wasn’t created inside the wallet, the wallet has no associated address book entry. So, the reception is invalidly detected as change (added a test case for it).
The wallet can’t detect change outputs on more complex scripts such as multi-sig change outputs.
The wallet is not able to detect change outputs going to an internal address if the internal address has a label. (E.g. the user can manually set a label for the internal address and, doing that, make that all the change outputs, in the wallet history, that were sent to the destination are no longer detected as change).
There isn’t a way to distinguish the external reception of coins into an internal address. Coins reception on any internal address are always detected as change.

New Change Detection Mechanism Goals

Aiming to:

Define a base mechanism to align different wallet implementations. Preventing each piece of software from diverging on the basic change outputs distinction.
Detect change outputs on-demand without requiring to maintain an external data structure synced with the latest wallet state.
Independently, and accurately, detect change outputs regardless data stored in structures that the user can freely modify.
Granular distinction between change vs non-change outputs that were sent to the same internal address. E.g. the reception of coins, from an external source, on internal addresses will not longer be detected as change anymore.
Expand the change detection to more complex scripts such as a multi-sig protected addresses. (While they are added into the wallet on an internal spkm)

Change Output Detection Rules

A transaction output is change if it fulfills the following points:

At least one of the parent transaction inputs is from the wallet. (If none of them are, then the wallet is receiving coins on an internal address).
The script extracted destination is from the wallet and is located in one of the internal script pub key manager. (e.g. derived from an internal derivation path)

What about legacy wallets?

If the legacy wallet is HD post-split, we have an internal derivation path, so we can follow the same process as descriptors wallet. Unless the destination is on the pre-split key pool, in which case, we fallback to the follow-up case.

If the legacy wallet is pre-split, we continue using the address book as we either have an HD wallet with keys derived only on the external path, or we are using raw keypool.

———————————————————————

Extra Note

This PR, in about 85% at least, is about expanding the current test coverage for the change output detection area.

TO DO (still WIP):

Save “internal” flag on non-active descriptors so the wallet can use them on the change detection process. (which will fix the currently failing test cases).
Re-organize commits so tests always pass.
Verify backwards compatibility.

DrahtBot added the label Wallet on Sep 1, 2022

furszy renamed this:
~~wallet: standardize change output detection process~~
[WIP] wallet: standardize change output detection process
on Sep 1, 2022

DrahtBot commented at 9:23 pm on September 1, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/25979.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	ghost

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29936 (fuzz: wallet: add target for CreateTransaction by brunoerg)
#29136 (wallet: addhdkey RPC to add just keys to wallets via new void(KEY) descriptor by achow101)
#28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ghost commented at 2:31 am on September 2, 2022: none

Interesting pull request although I haven’t tested it yet.

Does this also fix #20935 and #20795 ?

How Change Detection Currently Works? The wallet walks-through the transaction outputs, extracts the script destination and verify the following two points:

If the destination doesn’t exist in the address book, then the script is a “change address”.

If the destination exists in the address book, but it doesn’t have a label, then the script is a “change address”.

I was assuming change address had different derivation path. Is that false?

furszy commented at 4:09 pm on September 2, 2022: member

Does this also fix #20935 and #20795 ?

Yes for the change outputs detection part. It will allow us to properly detect each individual output as change or not, independently on what is stored in the address book entry (making the process deterministic across instances, and not dependent on a data structure that the user can freely modify).

As an edge case example; with this, users could even receive coins on an internal address and the new mechanism will properly detect it as an external reception, not a change output (the transaction inputs aren’t from the wallet, so no coins are returning to it). Still, this behavior is obviously not encouraged, nor should be easy to do in the wallet, as it breaks the whole idea of internal/external derivation paths.

Plus, would recommend you to check #25685 as well. The first commit there fixes a misleading wallet behavior where even if you set add_inputs=false (telling the wallet to disallow automatic coin selection), the wallet will still fetch coins internally and use them in the Coin Selection process (if you don’t pre-set inputs manually).

How Change Detection Currently Works? The wallet walks-through the transaction outputs, extracts the script destination and verify the following two points:

If the destination doesn’t exist in the address book, then the script is a “change address”.

If the destination exists in the address book, but it doesn’t have a label, then the script is a “change address”.

I was assuming change address had different derivation path. Is that false?

Not entirely, the assumption is most of the time correct. It breaks on pre-split legacy wallets, where we don’t have an internal derivation path. In those wallet versions we (1) only use an external path to derive public and change addresses, or if we go further back in time, (2) we use a raw pool of keys with no derivation paths at all. (Thus why the initial change detection process was implemented using the address book).

ghost commented at 5:41 am on September 3, 2022: none

Concept ACK

achow101 commented at 7:53 pm on October 31, 2022: member

While I like the idea this is going for, I don’t think it is sufficient. It relies on m_internal_spk_managers which only contains the currently active SPKMs. If a user were to remove a SPKM from being active, all of the addresses that were produced by that would no longer be detected as change. This is especially problematic with #25907 which rotates all of the currently active descriptors.

In general, I’m not sure that we can determine which output is change without storing additional metadata. What has been suggested before is to explicitly store an “IsChange” value in address book entries, with a “smart” fallback like what this PR does.

DrahtBot added the label Needs rebase on Dec 6, 2022

furszy force-pushed on Jan 10, 2023

DrahtBot removed the label Needs rebase on Jan 10, 2023

furszy commented at 7:07 pm on January 11, 2023: member

finally here. Thanks for the input achow101

While I like the idea this is going for, I don’t think it is sufficient. It relies on m_internal_spk_managers which only contains the currently active SPKMs. If a user were to remove a SPKM from being active, all of the addresses that were produced by that would no longer be detected as change. This is especially problematic with #25907 which rotates all of the currently active descriptors.

I’m probably missing some context but.. wouldn’t that be solvable by adding an internal field to the WalletDescriptor class? (32a990a). So we can keep track of non-active internal descriptors too. It should work fine while we don’t have any descriptor removal functionality (which would also mean to remove txs from the wallet etc) and require to keep historical records.

In general, I’m not sure that we can determine which output is change without storing additional metadata. What has been suggested before is to explicitly store an “IsChange” value in address book entries, with a “smart” fallback like what this PR does.

I think that we should have a more granular distinction and move from “change addresses” to “change outputs”. Storing any extra metadata, if needed, inside the wallet transaction class.

A good use case for this is the reception of coins from an external source on internal addresses, which shouldn’t be detected as change as it could be a simple reception or part of a dust attack. if the tx has no inputs belonging to the wallet, then no output should be labeled as change.

DrahtBot added the label Needs rebase on Jan 18, 2023

furszy force-pushed on Jan 19, 2023

DrahtBot removed the label Needs rebase on Jan 19, 2023

DrahtBot added the label Needs rebase on Feb 16, 2023

furszy force-pushed on Feb 20, 2023

DrahtBot removed the label Needs rebase on Feb 20, 2023

DrahtBot added the label Needs rebase on Apr 11, 2023

furszy force-pushed on Apr 12, 2023

DrahtBot removed the label Needs rebase on Apr 12, 2023

DrahtBot added the label Needs rebase on Apr 15, 2023

furszy force-pushed on Apr 20, 2023

DrahtBot removed the label Needs rebase on Apr 21, 2023

DrahtBot added the label CI failed on Apr 21, 2023

DrahtBot removed the label CI failed on Apr 23, 2023

DrahtBot added the label CI failed on May 3, 2023

furszy renamed this:
~~[WIP] wallet: standardize change output detection process~~
wallet: standardize change output detection process
on May 19, 2023

furszy force-pushed on May 19, 2023

furszy renamed this:
~~wallet: standardize change output detection process~~
[WIP] wallet: standardize change output detection process
on May 19, 2023

furszy commented at 1:42 pm on May 19, 2023: member

Thanks for the ping marko, rebased. Yeah, this is still relevant. Change detection is still an issue.

The base work is done, just need some extra love. Will keep it rebased, so anyone can review/test it meanwhile.

furszy marked this as a draft on May 19, 2023

furszy commented at 1:52 pm on May 19, 2023: member

This now depends on #27601. Moved to draft until that one gets merged.

DrahtBot added the label Needs rebase on Jul 20, 2023

Sadiq272 changes_requested

Sadiq272 commented at 1:39 am on January 16, 2024: none

Oba

bitcoin deleted a comment on Jan 16, 2024

furszy commented at 3:00 am on January 16, 2024: member

This is on a long-working path. Will try to rebase it in the coming days or close it until we proceed with the preceding PRs if it is too behind.

furszy force-pushed on Jan 21, 2024

DrahtBot removed the label Needs rebase on Jan 21, 2024

DrahtBot removed the label CI failed on Jan 21, 2024

DrahtBot added the label CI failed on Feb 23, 2024

DrahtBot commented at 10:34 am on February 23, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/20703001941

DrahtBot added the label Needs rebase on Mar 29, 2024

furszy force-pushed on Mar 29, 2024

DrahtBot removed the label Needs rebase on Mar 29, 2024

DrahtBot added the label Needs rebase on May 23, 2024

furszy force-pushed on Aug 20, 2024

DrahtBot removed the label CI failed on Aug 20, 2024

DrahtBot removed the label Needs rebase on Aug 20, 2024

DrahtBot added the label Needs rebase on Aug 28, 2024

wallet: add test coverage for change output detection

General concept:
 Create transactions using the local wallet to different destinations provided by an external wallet.
 Verifying that the local wallet can detect the change output prior and post the tx is added to the wallet.

The following cases are covered for a descriptor wallet:
 * 1) Create tx that sends to a legacy p2pkh addr and verify change detection.
 * 2) Create tx that sends to a p2wpkh addr and verify change detection.
 * 3) Create tx that sends to a wrapped p2wpkh addr and verify change detection.
 * 4) Create tx that sends to a taproot addr and verify change detection.

And the following ones for a legacy-only wallet:
 * 1) Create tx that sends to a legacy p2pkh addr and verify change detection.
 * 2) Create tx that sends to a p2wpkh addr and verify change detection.
 * 3) Create tx that sends to a wrapped p2wpkh addr and verify change detection.

547db5f820

wallet: change output detection coverage for tx created outside the wallet

The current change detection process assumption can easily be broken by creating an address manually,
from an external derivation path, and send coins to it. As the address was not created by the
wallet, it will not be in the addressbook, there by will be treated as change when it's clearly not.

The wallet will properly detect it once the transaction gets added to the wallet and the address
(and all the previous unused address) are derived and stored in the address book.

e04a606813

wallet: test change detection negative paths

1) The change output goes to an address in one of the wallet external path.
2) The change goes back to the source.
   As the source is an external destination, and we are currently detecting
   change through it output script, the change will be marked as external
   (not change).
3) The user setting an address book label to a destination created from an
   internal key.

39bb367db8

wallet: standardize change output detection process

Currently, the wallet detects whether an output is change or not based
on data stored in the address book.

There is no notion of “change outputs”, the wallet detects change scripts.

Meaning that any address book data modification has implications
on all the historical outputs related to a particular destination as all of
them can either be change or not. There is no middle-ground.

The wallet walks-through the transaction outputs, extracts the script
destination and verify the following two points:

1) If the destination doesn't exist in the address book, then the script
    is a "change address".

2) If the destination exists in the address book, but it doesn't have a
     label, then the script is a "change address".

There are a good number of problems in the current approach:

- We make the wallet dependent on an external structure, with separate storage.
   Which has to be updated and maintained along with the wallet state.

- It cannot be maintained nor recovered across different wallet instances.
   Cannot re-create the address book data only by importing the wallet descriptor string.

- As the address book is an structure that the user can freely modify, the change
   detection process might differ through different wallets.

- The current rudimentary assumptions of "no address book entry" or "no label set for the address book entry"
  to denote that certain script destination is change or not can easily be broken:
  E.g. derive an address from one of the wallet’s external paths manually. Then send coins to it.
  As the receive destination wasn't created inside the wallet, the wallet has no associated address book entry.
  So, the reception is invalidly detected as change (added a test case for it).

- The wallet can't detect change outputs on more complex scripts such as multi-sig change outputs.

- The wallet is not able to detect any change output going to an internal address if the internal address has a label.
  (E.g. the user can manually set a label for the internal address and, doing that, make that all the change outputs,
   in the wallet history, that were sent to the destination are no longer detected as change).

- There isn’t a way to distinguish the external reception of coins into an internal address. Coins reception on any
   internal address are always detected as change.

Aiming to:

* Define a base mechanism to align different wallet implementations. Preventing each piece of software
  from diverging on the basic change outputs distinction.

* Detect change outputs on-demand without requiring to maintain an external data structure synced with the
   latest wallet state.

* Independently, and accurately, detect change outputs regardless data stored in structures that the user
   can freely modify.

* Granular distinction between change vs non-change outputs that were sent to the same internal address.
  E.g. the reception of coins, from an external source, on internal addresses will not longer be detected
  as change anymore.

* Expand the change detection to more complex scripts such as a multi-sig protected addresses. (While they
  are added into the wallet on an internal spkm)

A transaction output is change if it fulfills the following points:

1) At least one of the parent transaction inputs is from the wallet. (If none of them are, then the wallet is receiving coins
    on an internal address).

2) The script extracted destination is from the wallet and is located in one of the internal script pub key manager.
    (e.g. derived from an internal derivation path)

If the legacy wallet is HD post-split, we have an internal derivation path, so we can follow the same process as
descriptors wallet. Unless the destination is on the pre-split key pool, in which case, we fallback to the follow-up
case.

If the legacy wallet is pre-split, we continue using the address book as we either have an HD wallet with keys
derived only on the external path, or we are using raw keypool.

2221ea11da

wallet: remove old ScriptIsChange function 25fd5fea6b

wallet: coverage for coins reception into an internal address

As the wallet is receiving those coins from outside, them should
not be detected as change.

E.g. the user manually obtained and shared one of the internal addresses
and received coins there.

12723da5b6

RPC: add 'include_change' arg to 'listtransactions' and 'gettransaction' df638b80af

wallet: granular distinction between change vs non-change outputs sent to the same internal address

A transaction output is change if it fulfills the following rules:

1) At least one of the transaction inputs is from the wallet.
   (If none of them are, then the wallet is receiving coins
    on an internal address).

2) The script extracted destination is from the wallet and is
   located in one of the internal script pub key manager.
   (e.g. derived from an internal derivation path)

0c26b6dc12

[fixup] tests, internal address with addrbook label detected as change 59de009ba7

wallet: move OutputIsChange(txout) to IsOutputChange(tx, pos)

Keep behavior consistent

dae621e5cc

wallet: add 'internal' field to WalletDescriptor

To detect whether active and non-active descriptors are internal or not.

ea33406be5

wallet: GetAllScriptPubKeyMans, return non-active internal descriptors fb59e1c881

test: coverage for change detection on inactive internal descriptors 679c9188b3

furszy force-pushed on Oct 12, 2024

DrahtBot removed the label Needs rebase on Oct 12, 2024

DrahtBot added the label Needs rebase on Oct 25, 2024

DrahtBot commented at 9:23 am on October 25, 2024: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

murchandamus commented at 8:45 pm on November 4, 2024: contributor

Hey @furszy, is this ready for review?

DrahtBot commented at 1:07 am on February 1, 2025: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

DrahtBot commented at 0:33 am on May 1, 2025: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

maflcko commented at 5:20 pm on May 4, 2025: member

The dependency was closed more than a year ago: #27601 (comment)

furszy commented at 10:53 pm on May 4, 2025: member