wallet: migration, avoid loading legacy wallet after failure when BDB isn't compiled

furszy commented at 7:24 pm on December 9, 2024: member

During migration failure, only load wallet back into memory when the wallet was loaded prior to migration. This fixes the case where BDB is not supported, which implies that no legacy wallet can be loaded into memory due to the lack of db writing functionality.

Link to error description #31447 (comment).

This PR also improves migration backup related comments to better document the current workflow.

DrahtBot commented at 7:24 pm on December 9, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31451.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	rkrux, achow101, pablomartin4btc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31250 (wallet: Disable creating and loading legacy wallets by achow101)
#28710 (Remove the legacy wallet and BDB dependency by achow101)
#25722 (refactor: Use util::Result class for wallet loading by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Wallet on Dec 9, 2024

furszy force-pushed on Dec 9, 2024

DrahtBot added the label CI failed on Dec 9, 2024

DrahtBot commented at 7:39 pm on December 9, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34150908075

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

furszy force-pushed on Dec 9, 2024

DrahtBot removed the label CI failed on Dec 9, 2024

furszy force-pushed on Dec 10, 2024

achow101 commented at 6:17 pm on December 10, 2024: member

Maybe also add a BDB only test that a when a loaded wallet is migrated, it is loaded after migration?

furszy force-pushed on Dec 10, 2024

furszy commented at 8:12 pm on December 10, 2024: member

Maybe also add a BDB only test that a when a loaded wallet is migrated, it is loaded after migration?

I assume you meant a BDB only test verifying that the legacy wallet remains loaded after a migration failure –> just pushed it.

wallet: migration, avoid loading wallet after failure when it wasn't loaded before

During migration failure, only load wallet back into memory when the
wallet was loaded prior to migration. This fixes the case where BDB
is not supported, which implies that no legacy wallet can be loaded
into memory due to the lack of db writing functionality.

This commit also improves migration backup related comments to better
document the current workflow.

Co-authored-by: Ava Chow <github@achow101.com>

589ed1a8ea

furszy force-pushed on Dec 12, 2024

in test/functional/wallet_migration.py:897 in 589ed1a8ea

893@@ -894,9 +894,7 @@ def test_failed_migration_cleanup(self):
894         shutil.copytree(self.old_node.wallets_path / "failed", self.master_node.wallets_path / "failed")
895         assert_raises_rpc_error(-4, "Failed to create database", self.master_node.migratewallet, "failed")
896 
897-        assert "failed" in self.master_node.listwallets()
898-        assert "failed_watchonly" not in self.master_node.listwallets()
899-        assert "failed_solvables" not in self.master_node.listwallets()
900+        assert all(wallet not in self.master_node.listwallets() for wallet in ["failed", "failed_watchonly", "failed_solvables"])

rkrux commented at 6:30 am on December 12, 2024:

+1 for getting rid of 2 extra listwallets calls here!

in src/wallet/wallet.cpp:4569 in 589ed1a8ea

4565@@ -4560,17 +4566,24 @@ util::Result<MigrationResult> MigrateLegacyToDescriptor(const std::string& walle
4566         }
4567 
4568         // Restore the backup
4569-        DatabaseStatus status;
4570-        std::vector<bilingual_str> warnings;
4571-        if (!RestoreWallet(context, temp_backup_location, wallet_name, /*load_on_start=*/std::nullopt, status, error, warnings)) {
4572-            error += _("\nUnable to restore backup of wallet.");
4573+        // Convert the backup file to the wallet db file by renaming it and moving it into the wallet's directory.

rkrux commented at 7:54 am on December 12, 2024:

Nit

0        // Convert the backup file in the top-level wallets dir to the wallet db file by renaming it and moving it into the wallet's directory.

in src/wallet/wallet.cpp:4583 in 589ed1a8ea

4583+        // The wallet directory has been restored, but just in case, copy the previously created backup to the wallet dir
4584         fs::copy_file(temp_backup_location, backup_path, fs::copy_options::none);
4585         fs::remove(temp_backup_location);
4586 
4587+        // Verify that there is no dangling wallet: when the wallet wasn't loaded before, expect null.
4588+        // This check is performed after restoration to avoid an early error before saving the backup.

rkrux commented at 7:58 am on December 12, 2024:

before saving the backup.

Isn’t the backup saved already at this* point?

furszy commented at 4:09 pm on December 12, 2024:

before saving the backup.

Isn’t the backup saved already at the point?

Yes, the sentence basically means “this assertion was specifically placed here, and not before the fs::copy_file line that is one line above, because we want the backup to be stored in the correct location regardless of whether the wallet was invalidly loaded or not”

in src/wallet/wallet.cpp:4585 in 589ed1a8ea

4585         fs::remove(temp_backup_location);
4586 
4587+        // Verify that there is no dangling wallet: when the wallet wasn't loaded before, expect null.
4588+        // This check is performed after restoration to avoid an early error before saving the backup.
4589+        bool wallet_reloaded = ptr_wallet != nullptr;
4590+        assert(was_loaded == wallet_reloaded);

rkrux commented at 7:59 am on December 12, 2024:

Nice to have this assert imo.

in test/functional/wallet_migration.py:912 in 589ed1a8ea

907@@ -910,6 +908,22 @@ def test_failed_migration_cleanup(self):
908             _, _, magic = struct.unpack("QII", data)
909             assert_equal(magic, BTREE_MAGIC)
910 
911+        ####################################################
912+        # Perform the same test with a loaded legacy wallet.

rkrux commented at 8:21 am on December 12, 2024:

Should we also not add the checks from lines 899 to 909 in the loaded wallet test below? If it makes sense, then maybe we can consider calling test_failed_migration_cleanup twice from run_test with boolean loaded_wallet as an argument to the test.

furszy commented at 4:42 pm on December 12, 2024:

Should we also not add the checks from lines 899 to 909 in the loaded wallet test below? If it makes sense, then maybe we can consider calling test_failed_migration_cleanup twice from run_test with boolean loaded_wallet as an argument to the test.

Yeah, that’s possible, but I’m not sure it’s worth the refactoring since we’re planning to remove BDB soon (this extra test will be removed too). Also, running it from run_test in two different ways would require creating a different wallet for each run (because create_legacy_wallet can’t be called with the same name twice), which would mean decoupling the wallet name into a variable, which requires a good number of lines changes. Another option is to create a test_failed_migration_cleanup inner function post-setup and call it twice, e.g. 393c506533a1183225d767d18dd54c3d961e1a28.

rkrux commented at 6:08 am on December 13, 2024:

The approach shared in https://github.com/bitcoin/bitcoin/commit/393c506533a1183225d767d18dd54c3d961e1a28#diff-571cc226b2c9fcca7adef3916d9328f24948edf3bf0b5e66b8b45dcf75cbe502R896 looks good to me - increases assertions while getting rid of duplication as well!

Yes BDB will be removed but I’d assume the wallet_migration tests won’t be? That’s why these tests now use previous releases to create legacy wallets from what I understand: #31248

This extra test can also stick around imo, it’s quite a specific case that led to the bug.

furszy commented at 2:43 pm on December 13, 2024:

Yes BDB will be removed but I’d assume the wallet_migration tests won’t be? That’s why these tests now use previous releases to create legacy wallets from what I understand: #31248

This extra test can also stick around imo, it’s quite a specific case that led to the bug.

Yes, wallet_migration.py uses a previous release binary to create the legacy wallet and the latest version binary to perform migration. Since BDB support will be removed, the latest version binary will not allow loading a legacy wallet, a prerequisite for this test, thus making it impossible to run this test case.

rkrux commented at 2:53 pm on December 13, 2024:

That makes sense, due to the check on line 918. Got it, thanks.

rkrux commented at 8:32 am on December 12, 2024: contributor

Nice catching the bug, especially helpful for me because the migration test fails in my system as well! The diff looks quite different and shortened compared to what I was looking at day before yesterday. Mostly LGTM.

Attaching a failed wallet snapshot of the newly added part functional test in my system. I bumped the mock time by 10000 ms to make the file modified time difference apparent.

 0master node, failed wallet, before migration:
 1node0/regtest/wallets/failed/failed_1733991924.legacy.bak Thu, 12 Dec 2024 08:25:24 16384 bytes
 2node0/regtest/wallets/failed/database Thu, 12 Dec 2024 08:25:24 96 bytes
 3node0/regtest/wallets/failed/wallet.dat Thu, 12 Dec 2024 08:25:24 16384 bytes
 4node0/regtest/wallets/failed/db.log Thu, 12 Dec 2024 08:25:24 0 bytes
 5node0/regtest/wallets/failed/.walletlock Thu, 12 Dec 2024 08:25:24 0 bytes
 6master node, failed wallet, after migration:
 7node0/regtest/wallets/failed/database Thu, 12 Dec 2024 08:25:25 96 bytes
 8node0/regtest/wallets/failed/wallet.dat Thu, 12 Dec 2024 08:25:25 16384 bytes
 9node0/regtest/wallets/failed/failed_1734001939.legacy.bak Thu, 12 Dec 2024 08:25:25 16384 bytes
10node0/regtest/wallets/failed/db.log Thu, 12 Dec 2024 08:25:25 0 bytes
11node0/regtest/wallets/failed/.walletlock Thu, 12 Dec 2024 08:25:25 0 bytes

in src/wallet/wallet.cpp:4574 in 589ed1a8ea

4573+        // Convert the backup file to the wallet db file by renaming it and moving it into the wallet's directory.
4574+        // Reload it into memory if the wallet was previously loaded.
4575+        bilingual_str restore_error;
4576+        const auto& ptr_wallet = RestoreWallet(context, temp_backup_location, wallet_name, /*load_on_start=*/std::nullopt, status, restore_error, warnings, /*load_after_restore=*/was_loaded);
4577+        if (!restore_error.empty()) {
4578+            error += restore_error + _("\nUnable to restore backup of wallet.");

furszy commented at 3:02 pm on December 13, 2024:

Note for a follow-up (up-for-grabs): In a previous version of this PR, as well as in #31250 (9beab081e977e), there was a bug here. We were passing the existing error field to RestoreWallet() and then checking error.empty() afterward. This was problematic because this is the error recovery path, meaning error was already set at this point.

This issue no longer exists because we now pass a new error object to RestoreWallet() and concatenate the outcomes. However, it would be good to verify that the message “Unable to restore backup of wallet” is not returned inside the test_failed_migration_cleanup test case. This will require modifying assert_raises_rpc_error to confirm that the specific message is not part of the error response.

pablomartin4btc commented at 4:30 pm on March 28, 2025:

@furszy, let me know if this is what you meant… https://github.com/bitcoin/bitcoin/commit/401121530c0f0331ad73938f30bf18f11d759bfc (you can comment there if you like before I open a PR)

rkrux commented at 10:10 am on March 31, 2025:

I’ll let furszy confirm but I believe conceptually this^ is what he meant. You can open a PR for others to review and provide suggestions. Also, you can keep the PR in draft mode if you prefer so, which lets the reviewers know the PR is not ready for review yet.

But it’s easier and better to give suggestions in a PR than it is on a standalone commit - you will also get CI feedback while you’re waiting for a response.

rkrux approved

rkrux commented at 3:24 am on December 20, 2024: contributor

ACK 589ed1a8eafe1daed379ebb383602c8f220aef19

on top of previous review

achow101 commented at 7:49 pm on December 30, 2024: member

ACK 589ed1a8eafe1daed379ebb383602c8f220aef19

pablomartin4btc approved

pablomartin4btc commented at 2:19 am on January 7, 2025: member

tACK 589ed1a8eafe1daed379ebb383602c8f220aef19

I’ve reproduced the issue described in #31447 (sqllite-only or BDB not compiled). This PR fixed the test issue and it temporarily adds (later will be removed with BDB) a validation when the legacy wallet is loaded (with BDB actually compiled; cmake -B build -DWITH_BDB=ON).

achow101 merged this on Jan 9, 2025

achow101 closed this on Jan 9, 2025

TheCharlatan referenced this in commit 230a439a4a on Jan 17, 2025

furszy deleted the branch on Feb 12, 2025

luke-jr referenced this in commit 6bd2a83270 on Feb 22, 2025

luke-jr referenced this in commit 3eead328db on Feb 24, 2025

stickies-v referenced this in commit d760fd3dda on Mar 17, 2025

stickies-v referenced this in commit cc83553352 on Mar 17, 2025

stickies-v referenced this in commit 2614933f06 on Mar 17, 2025

stickies-v referenced this in commit b70418c5fc on Mar 17, 2025

stickies-v referenced this in commit 69f8a1fe50 on Mar 17, 2025

pablomartin4btc commented at 3:58 pm on April 24, 2025: member

I think there’s an issue with this fix which is that when we call the restorewallet RPC, before this PR the wallet was loaded and the “load_on_startup” argument was set, this is no longer the case unless we pass load_after_restore=true. Having said that, if that gets fixed, since we are disabling loading legacy wallets (#31250) if someone calls restorewallet RPC with load_on_startup=true on a legacy wallet, next time the node or QT starts it would fail and the app will be closed. Another place with similar issue (that’s not happening due to this PR) is when we select “Restore Wallet…” from the File menu in QT for a legacy wallet, we set load_on_startup=true in the wallet interface.. False alarm, I just saw load_after_restore, added in this PR, was set to true by default.

wallet: migration, avoid loading legacy wallet after failure when BDB isn’t compiled #31451

Code Coverage & Benchmarks

Reviews

Conflicts