Adding tests in ./test/functional/wallet_assumeutxo.py
to cover the following scenarios:
- test import descriptors while background sync is in progress
- test loading a wallet (backup) on a pruned node
Adding tests in ./test/functional/wallet_assumeutxo.py
to cover the following scenarios:
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For detailed information about the code coverage, see the test coverage report.
See the guideline for information on the review process.
Type | Reviewers |
---|---|
Concept ACK | fjahr |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
42 including blocks the other hasn't yet seen."""
43- self.add_nodes(2)
44+ self.add_nodes(3)
45 self.start_nodes(extra_args=self.extra_args)
46
47+ def test_descriptor_import(self, node, wallet_name, key, timestamp, should_succeed=True, expected_error_message=None):
0 def test_descriptor_import(self, node, wallet_name, key, timestamp, expected_error_message=None):
should_succeed == not expected_error_message
16 assert_equal,
17 assert_raises_rpc_error,
18 )
19 from test_framework.wallet import MiniWallet
20+from test_framework.wallet_util import get_generate_key
21+from test_framework.descriptors import descsum_create
32+ self.num_nodes = 3
33 self.rpc_timeout = 120
34 self.extra_args = [
35 [],
36 [],
37+ ["-prune=1"]
fastprune
option as well and check if there has already been some pruning happening when the test runs, i.e. check that there is some pruneheight. There isn’t much of a difference if the option is set but nothing has actually been pruned yet.
Thanks for the feedback. I added the fastprune
option to my local copy and I’m working to understand how pruning operates.
When I tried loading the wallet during background synchronization (here), I noticed that there wasn’t a pruneheight set yet. So, I did a manual prune using n2.pruneblockchain(START_HEIGHT)
. After pruning, I saw a pruneheight of 300, and the test continued as expected.
However, after the background validation and sync to the tip (here), the pruneheight was reset to 0. When I tried to prune the blockchain again, the pruneheight remained at 0, no matter the value I passed to pruneblockchain. To further investigate, I mined an additional 500 blocks on n2 and manually pruned again with n2.pruneblockchain(FINAL_HEIGHT)
. This time, the pruneheight was 299.
Despite this, the wallet restoration failed with this message:
Wallet loading failed. Prune: last wallet synchronization goes beyond pruned data. You need to -reindex (download the whole blockchain again in case of pruned node)
I understand that a pruneheight of 299 means that the node keeps blocks from 299 onwards. Since the last processed block of the wallet is also 299 and the block hash matches, I expected the restoration to work. This issue also happens with lower pruneheight values, like 242.
Is this a bug, or am I missing something?
I can confirm that I am able to retrieve the block value with height pruneheight
. When the pruneheight was 299, I validated that block 299 was still stored in the node. The same validation was done when the pruneheight was 242, confirming block 242 was also stored. So I am not sure why the wallet restoration fails
Thanks for the feedback. I added the fastprune option to my local copy and I’m working to understand how pruning operates.
In general, check feature_index_prune.py
, there extensive tests using pruning and I think you can learn a bit more from that how much you have to mine for the fast pruning to work as you need it.
However, after the background validation and sync to the tip (here), the pruneheight was reset to 0. When I tried to prune the blockchain again, the pruneheight remained at 0, no matter the value I passed to pruneblockchain. To further investigate, I mined an additional 500 blocks on n2 and manually pruned again with n2.pruneblockchain(FINAL_HEIGHT). This time, the pruneheight was 299.
Pruning can feel weird when you operate with such low numbers, the answer is probably to play around with the number of mined blocks and the test mentioned above should help you with that. In general you have to understand that pruning means we delete db we no longer need (and fastprune makes them very small so we don’t need them at a much faster rate). We no longer need them when nothing is in there that we need, i.e. no blocks that we might require are in there. Blocks are appended to these files as they come in so during normal IBD (in theory at least) we can always delete the db files in order they are created (FIFO) because the blocks come in order. But with assumeutxo the blocks don’t come in order anymore. So that alone is weird and you need to keep that in mind. For the same reason getblockfrompeer
can be hard to grasp, #23813 might be an interesting read about that. I think this is the reason you don’t see results from pruneblockchain
because there are no db files to be found that don’t contain blocks we don’t want to keep anymore. Mining more blocks in the test overall should fix this though, as you have seen already.
Is this a bug, or am I missing something?
Hard to say for me with just what you have written. Can you push your current status to a branch somewhere and post the link, then I can take a look and play around with it.
Thank you very much for your explanation and suggestions! I’ll definitely check feature_index_prune.py
and #23813 to better understand how this works.
Here is a branch where I pushed my current status: debugging_assumeutxo_tests
wallet_last_processed_block
but that is coming out of node 0. When you load the wallet into node 2 it doesn’t know anything about it, so this can be ignored. What you might be thinking of instead is the birthdate of the wallet, that’s part of the backup afaict so that should be relevant. But the birthdate is earlier than 299 because the wallet seems to be created before a bunch of blocks are generated. So the error seems to make sense to me and you might want to create another wallet with a later birthdate which then should be successfully load at the place where you currently get the error with the older wallet.
Hey @fjahr! Thank you for taking the time to review this. I haven’t had much time to work on it lately, but I do recall that during my debugging, I found the rescan_height
was 199 at this point , while the block_height
was 299 (the oldest block stored on disk), hence triggering the error mentioned.
The rescan_height
of 199 seems to match the wallet’s birthday, given that the wallet was created at the beginning of the test with the default chain of 199 blocks.
However, my test works if I create (and restore) a much newer backup of the same wallet. These are the scenarios I tried:
n0
, wallet w
(birthday 199), backup height: 199
, restoration on n2
: failsn0
, wallet w
(birthday 199), backup height: 299
, restoration on n2
: failsn0
, wallet w
(birthday 199), backup height: 399
, restoration on n2
: succeedsI know I’m being repetitive in the list, but I wanted to emphasize that I am doing backups on the same node and wallet, but at different heights. In all cases, the pruneheight
on n2 is 299.
I know why the code fails in scenarios (1, 2), and it is because the rescan_height
ends up being 199 while block_height
is 299. Scenario 3 works because the rescan_height
gets a value of 299, matching the block_height
. However, I don’t get it conceptually.
So my question is, are you referring to the wallet’s birthday or the backup’s birthday? The height of the backup seems to matter, and that’s why I was looking at the last_processed_block
as a hint.
Thanks
rescan_height
is coming from the block locator stored on disk. This is a different value than last_processed_block
. I found that there is not guaranteed that the latest block is written when a backup is created. Since the full database is included in the backup, there is indeed something there that would cause the wallet to only need to rescan after 299. So the correct behavior IMO would be that your case 2. shouldn’t fail. I confirmed this with following change: #30678. I am suggesting there that this is changed but I am not sure this behavior is really so bad in practice that it will get merged. But at least I hope that answers your question.
399
(scenario 3) would be enough for the purpose of this test?
71+
72+ assert_equal(node.getblockchaininfo()["blocks"], SNAPSHOT_BASE_HEIGHT)
73+
74 def run_test(self):
75 """
76 Bring up two (disconnected) nodes, mine some new blocks on the first,
51+ "timestamp": timestamp,
52+ "label": "Descriptor import test"}]
53+ wrpc = node.get_wallet_rpc(wallet_name)
54+ result = wrpc.importdescriptors(import_request)
55+
56+ if expected_error_message is None:
test_
from the name as well then.
191@@ -185,6 +192,14 @@ def run_test(self):
192 self.log.info("Ensuring wallet can be restored from backup")
193 n1.restorewallet("w", "backup_w.dat")
194
195+ # Now make sure background validation succeeds on n2
196+ # and try restoring the wallet
197+ self.connect_nodes(0, 2)
198+ self.sync_blocks(nodes=(n0, n2))
199+ self.wait_until(lambda: len(n2.getchainstates()['chainstates']) == 1)
200+ self.log.info("Ensuring wallet can be restored from backup (pruned node)")
191@@ -185,6 +192,14 @@ def run_test(self):
192 self.log.info("Ensuring wallet can be restored from backup")
193 n1.restorewallet("w", "backup_w.dat")
194
195+ # Now make sure background validation succeeds on n2
196+ # and try restoring the wallet
197+ self.connect_nodes(0, 2)
198+ self.sync_blocks(nodes=(n0, n2))
199+ self.wait_until(lambda: len(n2.getchainstates()['chainstates']) == 1)
CI seems to have an issue with the import descriptor test
0 test 2024-08-22T15:25:46.161000Z TestFramework (ERROR): Unexpected exception caught during testing
1 Traceback (most recent call last):
2 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
3 self.run_test()
4 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/wallet_assumeutxo.py", line 167, in run_test
5 result = self.import_descriptor(n1, wallet_name, key, timestamp)
6 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/wallet_assumeutxo.py", line 51, in import_descriptor
8 return wrpc.importdescriptors(import_request)
9 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/coverage.py", line 50, in __call__
11 return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 127, in __call__
14 response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
15 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
16 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 106, in _request
17 return self._get_response()
18 ^^^^^^^^^^^^^^^^^^^^
19 File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/authproxy.py", line 164, in _get_response
20 http_response = self.__conn.getresponse()
21 ^^^^^^^^^^^^^^^^^^^^^^^^^
22 File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
23 response.begin()
24 File "/usr/lib/python3.12/http/client.py", line 331, in begin
25 version, status, reason = self._read_status()
26 ^^^^^^^^^^^^^^^^^^^
27 File "/usr/lib/python3.12/http/client.py", line 300, in _read_status
28 raise RemoteDisconnected("Remote end closed connection without"
29 http.client.RemoteDisconnected: Remote end closed connection without response
Sorry for not providing an update earlier. But yes, I continue working on this. I am trying to debug the CI issues with no success so far.
I mounted a virtual machine with ubuntu 22.04 (8GB ram), and after dealing with some libraries and memory issues I was able to run the CI job as described here .
However the jobs takes around 4-5 hours to run in the VM and I was not able to reproduce the issue yet. I was using the root user to run the command.
I am running it on a Macbook Pro with M2 Pro chip and 16GB of RAM.
Any recommendations on how to test this in my mac? I was using Parallels Desktop to virtualize but my license expired so I will move on to another alternative like VirtualBox.
Ugh, the error is https://cirrus-ci.com/task/5110045812195328?logs=ci#L2535
0 node1 stderr ./src/validation.cpp:5569 GuessVerificationProgress: Assertion `pindex->m_chain_tx_count > 0' failed. `
This means that this is a real issue that should be fixed in the real code. This test is just surfacing it.
Ugh, the error is https://cirrus-ci.com/task/5110045812195328?logs=ci#L2535
0 node1 stderr ./src/validation.cpp:5569 GuessVerificationProgress: Assertion `pindex->m_chain_tx_count > 0' failed. `
This means that this is a real issue that should be fixed in the real code. This test is just surfacing it.
Thanks for pointing this out. I hadn’t noticed the assertion was failing.
It seems the assertion was introduced here, and my branch doesn’t include it.
I think I’ll need to rebase my local branch to see if I can reproduce the issue.
It seems the assertion was introduced here, and my branch doesn’t include it.
No, that’s just a commit where the variable is renamed. The Assume
was introduced here: https://github.com/bitcoin/bitcoin/pull/29370/commits/0fd915ee6bef63bb360ccc5c039a3c11676c38e3. The assumption (pun intended) was that GuessVerificationProgress
wouldn’t be called in the context of assumeutxo but that was obviously not right.
I think we’ll want to do something like this for now: https://github.com/fjahr/bitcoin/commit/97a7261c8afbe4a488e6ec185cd9926e278ac3c9 Every other fix I could think of seems much more invasive and I don’t think we need a perfect solution for this edge case right now.
I am thinking if this should still be fixed in v28, so that users don’t see this scary Internal bug detected
log, what do you think @maflcko ?
I am thinking if this should still be fixed in v28, so that users don’t see this scary
Internal bug detected
log, what do you think @maflcko ?
Yeah, it should probably be fixed or worked around in v28. But AU is marked experimental (if it isn’t it should be done, for at least one release), so a fix for a debug log warning doesn’t have to be rushed and can be done for rc3, or even 28.1, imo.
I have opened a separate PR #30909 with my suggested fix and your test commit cherry-picked on top of it @alfonsoromanz .
Thanks a lot @fjahr !
Should I modify this PR now to be on top of yours, including only the second test? Or would it be better to wait until your PR gets merged and then modify mine?
Should I modify this PR now to be on top of yours, including only the second test? Or would it be better to wait until your PR gets merged and then modify mine?
I would recommend you wait a bit to see what the feedback on that PR is.
🐙 This pull request conflicts with the target branch and needs rebase.
⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?