test: add assertions to SRD max weight test

yancyribbens commented at 5:56 pm on July 24, 2025: contributor

Replace generic assertion with a result specific assertion showing the correctness of the solution found. If the max weight parameter is exceeded, the least valuable UTXOs are removed from the result. Therefore, only the most valued encountered UTXO's are selected. While the smallest set would include all the most valued UTXO's, in the case of the test there is one high value UTXO that is never found before the target value is reached.

Correct the test comment to be more specific about why the assertion is a good result.

DrahtBot added the label Tests on Jul 24, 2025

DrahtBot commented at 5:56 pm on July 24, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33058.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	murchandamus, furszy

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

yancyribbens force-pushed on Jul 24, 2025

yancyribbens commented at 5:59 pm on July 24, 2025: contributor

cc @murchandamus

yancyribbens force-pushed on Jul 24, 2025

DrahtBot added the label CI failed on Jul 24, 2025

DrahtBot commented at 6:10 pm on July 24, 2025: contributor

🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/46670755797 LLM reason (✨ experimental): The failure is due to a lint check identifying trailing whitespace in the source code.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

yancyribbens commented at 6:24 pm on July 24, 2025: contributor

All unit tests pass for me locally, so I’m not sure what CI is trying to tell me.

maflcko commented at 8:10 pm on July 24, 2025: member

[18:13:33.790] ./wallet/test/coinselector_tests.cpp(1247): [1;31;49merror: in “coinselector_tests/srd_tests”: check EquivalentResult(expected_result, *res) has failed[0;39;49m

https://api.cirrus-ci.com/v1/task/6251945985310720/logs/ci.log

yancyribbens force-pushed on Jul 24, 2025

yancyribbens commented at 8:39 pm on July 24, 2025: contributor

[18:13:33.790] ./wallet/test/coinselector_tests.cpp(1247): �[1;31;49merror: in “coinselector_tests/srd_tests”: check EquivalentResult(expected_result, *res) has failed�[0;39;49m

That is the unit test, right? It passes for me locally. I just re-based and pushed again and now that job is passing. I wonder now if there’s a possible random seed that could cause SRD to fail.

yancyribbens commented at 9:38 pm on July 24, 2025: contributor

Oh I see, SRD generates a different solution each time in the test-suit. That’s annoying.

DrahtBot removed the label CI failed on Jul 25, 2025

in src/wallet/test/coinselector_tests.cpp:1247 in bf2990faa9 outdated

1243+            add_coin(2 * COIN, i, expected_result);
1244+        }
1245+        for (int j = 0; j < 27; ++j) {
1246+            add_coin(0.33 * COIN, j + 10, expected_result);
1247+        }
1248+        BOOST_CHECK(EquivalentResult(expected_result, *res));

murchandamus commented at 8:13 pm on July 25, 2025:

If the intent is to check that the result adheres to the max_selection_weight limitation, you could check for that directly:

0        BOOST_CHECK(res);
1        BOOST_CHECK(res->GetWeight() <= max_selection_weight);

yancyribbens commented at 4:13 pm on July 26, 2025:

This doesn’t show the behavior of the re-ordering done by the min-heap but it’s better than before. And I think it would be a lot of work to make this test deterministic.

yancyribbens commented at 4:31 pm on July 26, 2025:

I removed the check for res since it’s implied by the second assertion.

murchandamus commented at 8:30 pm on July 26, 2025:

That’s a disimprovement. Please leave both, because the test failing on res being falsy indicates other issues than it failing on the weight check.

yancyribbens commented at 9:11 pm on July 26, 2025:

Alright I added it back. However in the past we have preferred not to have this extra check: https://github.com/bitcoin/bitcoin/pull/31656

yancyribbens commented at 9:13 pm on July 26, 2025:

nit: Maybe this BOOST_CHECK is not needed anymore since you’re checking the result itself.

in src/wallet/test/coinselector_tests.cpp:1226 in bf2990faa9 outdated

1222@@ -1223,7 +1223,7 @@ BOOST_AUTO_TEST_CASE(srd_tests)
1223 
1224     {
1225         // ################################################################################################################
1226-        // 3) Test selection when some coins surpass the max allowed weight while others not. --> must find a good solution
1227+        // 3) Test that when the max weight is exceeded, only the most valuable encountered UTXOs are returned

murchandamus commented at 8:16 pm on July 25, 2025:

What SRD does is a bit different:

0        // 3) Test that SRD result does not exceed the max weight

yancyribbens commented at 9:29 pm on July 25, 2025:

Is it not true that when the max weight is encountered, the least valuable utxo is removed from the selection via the min-heap? Therefore the most valuable are returned?

murchandamus commented at 11:34 pm on July 25, 2025:

The phrasing “Test that when the max weight is exceeded, only the most valuable encountered UTXOs are returned” sounds potentially misleading to me, because it could be understood as the algorithm changing to largest-first if the max weight is exceeded or the input set getting pruned beyond dropping below the weight limit, but this only tests that a solution is found that stays within the limit.

yancyribbens commented at 4:08 pm on July 26, 2025:

Well since I see you changed the assertion, then I think your rewording also makes sense.

murchandamus commented at 8:25 pm on July 25, 2025: contributor

Yeah, as you noted, Single Random Draw is non-deterministic, so you would only get a predictable input set if there is only exactly one composition permitted. I don’t think that’s necessary to test what you want, though:

yancyribbens commented at 4:11 pm on July 26, 2025: contributor

Yeah, as you noted, Single Random Draw is non-deterministic, so you would only get a predictable input set if there is only exactly one composition permitted. I don’t think that’s necessary to test what you want, though:

Well we all know the algorithm is non-deterministic, however ideally it would be possible to make deterministic test cases :).

I see you slightly weakened the assertion to handle the non-determinism which is a definite improvement over the current test.

yancyribbens force-pushed on Jul 26, 2025

murchandamus commented at 8:28 pm on July 26, 2025: contributor

Well we all know the algorithm is non-deterministic, however ideally it would be possible to make deterministic test cases :).

Such a test would be brittle and tied to the exact implementation details of the algorithm rather than the characteristics of the algorithm. That doesn’t seem like an obvious improvement.

yancyribbens commented at 8:56 pm on July 26, 2025: contributor

Such a test would be brittle and tied to the exact implementation details of the algorithm rather than the characteristics of the algorithm. That doesn’t seem like an obvious improvement.

This test case has a number of invaluable utxos and some valuable ones. Would it not be better to show that after the min-heap is added, that the implementation has a bias towards selecting valuable utxos? I think these types of implementation details would be valuable to test.

test: improve assertion for SRD max weight test

Previously, the assertion only showed that a result was found, however
made no assertion about the quality of the result.

Remove comment about what UTXOs are selected and what are not
since the test does not reflect that.

Co-authored-by: Mark "Murch" Erhardt <murch@murch.one>

cc33e45789

yancyribbens force-pushed on Jul 26, 2025

murchandamus commented at 1:44 am on August 2, 2025: contributor

ACK cc33e4578946c68d6d333f35c48f8cbf3f75f6cf

This test case has a number of invaluable utxos and some valuable ones. Would it not be better to show that after the min-heap is added, that the implementation has a bias towards selecting valuable utxos? I think these types of implementation details would be valuable to test.

No, a good test checks that the tested function produces the expected outcomes without relying on knowledge of implementation details.

In this case, the relevant feature of SRD is that it is capable of producing input sets that adhere to max_selection_weight restrictions even when a purely random selection on the corresponding UTXO pool would make it extremely unlikely to produce one. This characteristic of SRD is the subject of the test, not how SRD achieves that property.

A test that relies on the implementation details of SRD would likely break when implementation details of SRD change even while the relevant properties remain the same, e.g., if we start shuffling differently or start passing only a subset of the available coins, a test that checks for the property would continue to pass, while a test that seeds randomness and expects a specific set of inputs would break and need to be amended. Tests should treat the tested functionality as black boxes, merely knowing the expected characteristics of the outcomes. In this case, the expected outcome is variable, and therefore the test should accept variable outcomes as long as they meet expectations.

Beside increasing the maintenance burden, tests that are based on exact implementation details of the functionality they test, tend to reimplement the functionality in another manner (or worse, in the same manner), so that test and functionality are more likely to share the same flaws in reasoning or that it may no longer be obvious that the test in itself is correct.

A test being based on knowing that SRD uses the min-heap approach internally would be 1) more complicated, 2) harder to read, 3) more error prone, and 4) a layer violation.

yancyribbens commented at 11:30 am on August 2, 2025: contributor

A test being based on knowing that SRD uses the min-heap approach internally would be 1) more complicated, 2) harder to read, 3) more error prone, and 4) a layer violation.

I’m not advocating to test that SRD uses a min-heap. However, the min-heap was added #26720 because:

As the selection of the 1515 small UTXOs exceeds the max allowed tx size, the expectation here is to receive a selection result that only contain the big UTXO. Which is not happening for the following reason:

A test that tests the behavior ^ above as a black-box is what I was thinking of. Agreed that we are not trying to test specifically that SRD uses a min-heap. However, to test the above behavior, I think it would be difficult using a non-deterministic test.

furszy commented at 11:51 am on August 2, 2025: member

ACK cc33e4578946c68d6d333f35c48f8cbf3f75f6cf

glozow merged this on Aug 4, 2025

glozow closed this on Aug 4, 2025

murchandamus commented at 11:59 pm on August 4, 2025: contributor

This test case has a number of invaluable utxos and some valuable ones. Would it not be better to show that after the min-heap is added, that the implementation has a bias towards selecting valuable utxos? I think these types of implementation details would be valuable to test.

[…]

However, to test the above behavior, I think it would be difficult using a non-deterministic test.

That’s exactly what the test shows: in order to stay under the max_selection_weight, SRD has to select a higher than average value density, i.e., the result is biased towards selecting the more valuable UTXOs.

yancyribbens commented at 4:48 pm on August 6, 2025: contributor

That’s exactly what the test shows: in order to stay under the max_selection_weight, BnB has to select a higher than average value density, i.e., the result is biased towards selecting the more valuable UTXOs.

Nit, you meant to say SRD algorithm, not BnB. Yes, I see that now. That’s a clever way to test the behavior observing that without using the min-heap strategy, a preponderance of low value UTXOs leads to no solution being found before hitting the weight limit. However I think that’s not immediately obvious and some type of comment to that affect would be helpful, preferable spelled out in the commit message.

test: add assertions to SRD max weight test #33058

Code Coverage & Benchmarks

Reviews