wallet: guard against dangling to-be-reverted db transactions

furszy commented at 0:41 am on January 16, 2024: member

Discovered while was reviewing #29112, specifically #29112#pullrequestreview-1821862931.

If the db handler that initiated the database transaction is destroyed, the ongoing transaction cannot be left dangling when the db txn fails to abort. It must be forcefully reverted; otherwise, any subsequent db handler executing a write operation will dump the dangling, to-be-reverted transaction data to disk.

This not only breaks the isolation property but also results in the improper storage of incomplete information on disk, impacting the wallet consistency.

This PR fixes the issue by resetting the db connection, automatically rolling back the transaction (per https://www.sqlite.org/c3ref/close.html) when the handler object is being destroyed and the txn abortion failed.

Testing Notes Can verify the failure by reverting the fix e5217fea and running the test. It will fail without e5217fea and pass with it.

test: wallet db, exercise deadlock after write failure fdf9f66909

DrahtBot commented at 0:41 am on January 16, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29112 (sqlite: Disallow writing from multiple SQLiteBatchs by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Wallet on Jan 16, 2024

in src/wallet/sqlite.cpp:637 in 83c807a380 outdated

616@@ -612,7 +617,7 @@ std::unique_ptr<DatabaseCursor> SQLiteBatch::GetNewPrefixCursor(Span<const std::
617 
618 bool SQLiteBatch::TxnBegin()
619 {
620-    if (!m_database.m_db || sqlite3_get_autocommit(m_database.m_db) == 0) return false;
621+    if (!m_database.m_db || m_database.HasActiveTxn()) return false;

achow101 commented at 8:10 pm on January 16, 2024:

In 83c807a380b41787214806c6440236ea1a9dabd7 “sqlite: introduce HasActiveTxn method”

HasActiveTxn() checks for m_db already, so no need to do check that here too.

Same for TxnCommit() and TxnAbort().

furszy commented at 8:22 pm on January 16, 2024:

upps done

furszy force-pushed on Jan 16, 2024

achow101 added this to the milestone 27.0 on Jan 16, 2024

bitcoin deleted a comment on Jan 22, 2024

in src/wallet/sqlite.cpp:498 in ca39d812e4 outdated

492@@ -493,6 +493,12 @@ bool SQLiteBatch::ExecStatement(sqlite3_stmt* stmt, Span<const std::byte> blob)
493     return res == SQLITE_DONE;
494 }
495 
496+int SQLiteBatch::Exec(const std::string& statement)
497+{
498+    if (m_exec_handler && !m_exec_handler->CanExecute(statement)) return -999;

ryanofsky commented at 2:36 pm on January 23, 2024:

In commit “sqlite: add ability to interrupt statements” (ca39d812e4766575e25721296fa2d43dfe7eba7e)

It would be good to define -999 as a constant so callers could check for this error. Or if there probably won’t be a need for callers to do that, it would be clearer to return SQLITE_ERROR.

furszy commented at 1:10 pm on January 26, 2024:

Sure. Created a constant for the -999 error in the new approach.

ryanofsky commented at 3:09 pm on January 23, 2024: contributor

I’m trying to understand this, but confused about why executing “ROLLBACK TRANSACTION” would be expected to fail in sqlite. It seems like the fix in e5217fea1516120643f4a7a55a474648e21e2269 is handling failure to roll back by trying to close and reopen the database. But I don’t understand why rolling back a transaction would fail to begin with, unless there was some serious error like disk corruption. Probably to make the this fix clearer and safer it would be good to check the specific error code from the failed rollback instead of just checking res == SQLITE_OK. But I’m generally just confused.

furszy commented at 3:02 pm on January 25, 2024: member

I’m trying to understand this, but confused about why executing “ROLLBACK TRANSACTION” would be expected to fail in sqlite. It seems like the fix in e5217fe is handling failure to roll back by trying to close and reopen the database. But I don’t understand why rolling back a transaction would fail to begin with, unless there was some serious error like disk corruption. Probably to make the this fix clearer and safer it would be good to check the specific error code from the failed rollback instead of just checking res == SQLITE_OK. But I’m generally just confused.

The current sqlite configuration doesn’t allow concurrent database transactions (see #29112). This means that if there is a failure during the WalletBatch destruction (independently on if it is due to a fatal or transient and recoverable error, like a busy db or a loss of the db connection), the created transaction will remain active indefinitely, contrary to what all workflows expect. This situation provoques that any subsequent write operation on a new database handler (WalletBatch; it doesn’t necessarily need to begin a new transaction, could also be a direct write) will dump information that was intended to be rolled back to disk.

Check #29112#pullrequestreview-1821862931 to see the test that originated this work.

ryanofsky commented at 4:09 pm on January 25, 2024: contributor

if there is a failure during the WalletBatch destruction (independently on if it is due to a fatal or transient and recoverable error, like a busy db or a loss of the db connection)

Yes I’ve started reviewing both PR’s and the code changes seem reasonable, I just don’t understand what motivated this PR, because “failure during the WalletBatch destruction” seems like something I would never expect to happen unless someone modified the code or manually corrupted the data on disk. So I’m wondering did we accidentally cause a breakage like that during development of these PRs? Or did you just notice that the error handling in the destructor wasn’t very good, and decide to fix it? Or can failures to roll back actually happen in more cases than it seems, and this is a bug that could happen during normal operation without data corruption?

achow101 commented at 4:33 pm on January 25, 2024: member

seems like something I would never expect to happen unless someone modified the code or manually corrupted the data on disk. So I’m wondering did we accidentally cause a breakage like that during development of these PRs?

It shouldn’t ever happen.

In general, we don’t have very robust error handling for when the database fails to read or write something. I think this should actually be more like an assert, but one that doesn’t kill the whole software, maybe just unloads the wallet?

in src/wallet/sqlite.h:41 in ca39d812e4 outdated

35@@ -36,11 +36,20 @@ class SQLiteCursor : public DatabaseCursor
36     Status Next(DataStream& key, DataStream& value) override;
37 };
38 
39+class SQliteExecHandler

ryanofsky commented at 8:34 pm on January 25, 2024:

In commit “sqlite: add ability to interrupt statements” (ca39d812e4766575e25721296fa2d43dfe7eba7e)

Would suggest calling this SQliteExecBlocker, or maybe SQliteExecHook or SQliteTestHook if you anticipate it being used more broadly for other things in the future. “ExecHandler” seems like the wrong name because the class itself isn’t executing anything and doesn’t call sqlite3_exec.

Would also suggest adding a comment saying this is used for testing, because otherwise it’s not clear why CanExecute would ever return false.

Also, in case you don’t anticipate the class being extended used for other things in the future, you could replace it with a std::function, since it does only have one virtual method.

furszy commented at 1:17 pm on January 26, 2024:

Yeah, this was meant to be a class encapsulating all sqlite functions so that we can trigger different errors and improve and test our error-handling code. I’m not sure why I implemented CanExecute instead of decoupling the Exec(), but well.. I just changed the implementation to follow the initial idea. Thanks for pointing this out.

in src/wallet/sqlite.cpp:383 in 5670a21f5e outdated

376@@ -377,6 +377,11 @@ void SQLiteDatabase::Close()
377     m_db = nullptr;
378 }
379 
380+bool SQLiteDatabase::HasActiveTxn()
381+{
382+    return m_db && sqlite3_get_autocommit(m_db) == 0;

ryanofsky commented at 8:50 pm on January 25, 2024:

In commit “sqlite: introduce HasActiveTxn method” (5670a21f5ef28748b6b7242c03b82f39dcbc983f)

It would be nice to have a comment explaining the use of sqlite3_get_autocommit, since the name doesn’t really explain what it is doing. Maybe “// sqlite3_get_autocommit returns true by default, and false if a transaction has begun and not been committed or rolled back.”

Also, I noticed https://www.sqlite.org/c3ref/get_autocommit.html says “If another thread changes the autocommit status of the database connection while this routine is running, then the return value is undefined” so it seems like we probably need to be using a mutex for this. (That would be a pre-existing issue not related to this PR.)

furszy commented at 1:05 pm on January 26, 2024:

Sure, done. Comment added.

in src/wallet/sqlite.cpp:620 in 5670a21f5e outdated

616@@ -612,7 +617,7 @@ std::unique_ptr<DatabaseCursor> SQLiteBatch::GetNewPrefixCursor(Span<const std::
617 
618 bool SQLiteBatch::TxnBegin()
619 {
620-    if (!m_database.m_db || sqlite3_get_autocommit(m_database.m_db) == 0) return false;
621+    if (m_database.HasActiveTxn()) return false;

ryanofsky commented at 8:58 pm on January 25, 2024:

In commit “sqlite: introduce HasActiveTxn method” (5670a21f5ef28748b6b7242c03b82f39dcbc983f)

There appears to be a bug here. Previously it would return false if m_db is null. Now it will try to execute begin transaction. I think you need to keep the !m_database.m_db condition

furszy commented at 1:10 pm on January 26, 2024:

upps thanks. Fixed.

in src/wallet/sqlite.cpp:402 in 5670a21f5e outdated

399@@ -395,7 +400,7 @@ SQLiteBatch::SQLiteBatch(SQLiteDatabase& database)
400 void SQLiteBatch::Close()
401 {
402     // If m_db is in a transaction (i.e. not in autocommit mode), then abort the transaction in progress

ryanofsky commented at 9:00 pm on January 25, 2024:

In commit “sqlite: introduce HasActiveTxn method” (5670a21f5ef28748b6b7242c03b82f39dcbc983f)

Would be good to drop the parenthetical about autocommit mode here, since that’s now an implementation detail of HasActiveTxn()

furszy commented at 1:05 pm on January 26, 2024:

done

in src/wallet/sqlite.cpp:408 in 508fda2fdd outdated

404     if (m_database.HasActiveTxn()) {
405         if (TxnAbort()) {
406             LogPrintf("SQLiteBatch: Batch closed unexpectedly without the transaction being explicitly committed or aborted\n");
407         } else {
408-            LogPrintf("SQLiteBatch: Batch closed and failed to abort transaction\n");
409+            // In case we cannot abort the transaction, prevent other handler/s from dumping the ongoing

ryanofsky commented at 9:14 pm on January 25, 2024:

In commit “sqlite: guard against dangling to-be-reverted db transactions” (508fda2fddfef0cefedde9777c914aec72adf7fb)

I found this comment confusing because I couldn’t figure out when failing to abort a transaction would ever be expected to happen, and what the error handling strategy was supposed to be when it did happen. Would maybe suggest: “// If transaction cannot be aborted, it means there is a bug or there has been data corruption. Try to recover in this case by closing and reopening the database. Closing the database should also ensure that any changes made since the transaction was opened will be rolled back.”

furszy commented at 1:08 pm on January 26, 2024:

Sure, done. Have slightly rephrased the suggestion. Thanks

in src/wallet/test/db_tests.cpp:254 in d4ebac8381 outdated

249+    std::unique_ptr<SQLiteDatabase> database = MakeSQLiteDatabase(m_path_root / "sqlite", options, status, error);
250+
251+    std::string key = "key";
252+    std::string value = "value";
253+
254+    std::unique_ptr<SQLiteBatch> handler = std::make_unique<SQLiteBatch>(*database);

ryanofsky commented at 9:16 pm on January 25, 2024:

In commit “test: sqlite, add coverage for dangling to-be-reverted db txns” (d4ebac8381801d024b69d5f1f5a13246c61edbe4)

I think it would be a little more consistent with other code to call this variable “batch” instead of “handler.” Also calling it handler gets a little confusing when code below sets a handler on the handler.

furszy commented at 1:21 pm on January 26, 2024:

So true that it hurts. I tend to call the WalletBatch handler because ‘batch’ refers to a class that groups operations and dumps the result all at once atomically at the end of the process. This is something that the WalletBatch class doesn’t do unless we explicitly begin and finish the transaction.

ryanofsky commented at 6:36 pm on January 29, 2024:

re: #29253 (review)

I tend to call the WalletBatch handler because ‘batch’ refers to a class that groups operations and dumps the result all at once atomically at the end of the process.

:grinning: yeah very used to seeing bland names that don’t mean anything because someone rejected an an actually descriptive name for a pedantic reason. In this case, I think a batch is just a collection of actions done together. Metaphors are ok once in a while.

furszy commented at 1:28 pm on January 30, 2024:

Done as suggested.

ryanofsky commented at 9:30 pm on January 25, 2024: contributor

Code review d4ebac8381801d024b69d5f1f5a13246c61edbe4. This seems like a nice change that could make the wallet more reliable, and adds good cleanup and test coverage.

Did not ack just because I noticed a bug in SQLiteBatch::TxnBegin if m_db is null, but none of my other comments are too important, so feel free to ignore them.

re: #29253 (comment)

In general, we don’t have very robust error handling for when the database fails to read or write something. I think this should actually be more like an assert, but one that doesn’t kill the whole software, maybe just unloads the wallet?

Yeah that sounds pretty close to ideal. Maybe not an urgent problem to solve, but could be nice.

furszy commented at 10:47 pm on January 25, 2024: member

I just don’t understand what motivated this PR, because “failure during the WalletBatch destruction” seems like something I would never expect to happen unless someone modified the code or manually corrupted the data on disk. So I’m wondering did we accidentally cause a breakage like that during development of these PRs? Or did you just notice that the error handling in the destructor wasn’t very good, and decide to fix it? Or can failures to roll back actually happen in more cases than it seems, and this is a bug that could happen during normal operation without data corruption?

Yeah, I just noted that the error handling in the destructor isn’t optimal. That if this ever occurs in #29112, the introduced database mutex would lock indefinitely. Then, I considered the same event in current master: which could lead to the writing of a dangling transaction to disk, which would corrupt the wallet and leave it at a non-recoverable point. Which is quite bad. And that motivated me to split the PR and force the txn rollback within the WalletBatch context as a last resource safety measure.

In general, we don’t have very robust error handling for when the database fails to read or write something. I think this should actually be more like an assert, but one that doesn’t kill the whole software, maybe just unloads the wallet?

It wouldn’t be ideal if the error is still recoverable (e.g. SQLITE_BUSY if we ever introduce concurrent txns). At least in this scenario, where we are destroying the object that created the transaction, I believe triggering the database connection reset is acceptable as a first measure. If it fails, then yeah, I agree that unloading the wallet instead of exiting the node would be desirable.

furszy force-pushed on Jan 26, 2024

furszy commented at 1:32 pm on January 26, 2024: member

Updated per feedback, thanks @ryanofsky!

The main modification resides in the ExecHandler class, which is now designed to encapsulate all sqlite functions. This enables us to trigger different errors across the database sources; letting us improve the error-handling code and expand its test coverage. Furthermore, future encapsulation of sqlite functions within this class will contribute to minimizing the impact of any future API changes.

in src/wallet/sqlite.h:39 in 53c4f1334e outdated

35@@ -36,11 +36,26 @@ class SQLiteCursor : public DatabaseCursor
36     Status Next(DataStream& key, DataStream& value) override;
37 };
38 
39+class ExecHandler

ryanofsky commented at 5:36 pm on January 29, 2024:

In commit “sqlite: add ability to interrupt statements” (53c4f1334ee300e2228974b4abca51c38087c51c)

Would suggest a rename and comment.

This class is SQLite specific, but unlike other other classes in this file it does not have SQLite in the name. Would suggest calling it SQliteExecInterface since it is an abstract class with all virtual methods.

Also, it’s not initially clear why this class and the class below exist. Would suggest adding a doxygen comment like //* Abstract class responsible for executing SQL statements in SQLite databases. It is abstract so it can be overridden by unit tests testing unusual database conditions. */

EDIT: Alternately you could combine ExecHandler and SQliteExecHandler into a single class. This would also simplify the unit test because it could just inherit from SQliteExecHandler instead of inheriting from ExecHandler and containing SQliteExecHandler.

furszy commented at 1:25 pm on January 30, 2024:

Great, thanks 👌🏼. Unified ExecHandler and SQliteExecHandler and added your doxygen comment.

in src/wallet/sqlite.cpp:418 in 0bf3a7356d outdated

414         } else {
415-            LogPrintf("SQLiteBatch: Batch closed and failed to abort transaction\n");
416+            // If the transaction cannot be aborted, it may be due to the database connection being "busy", indicating a potential bug or data corruption.
417+            // In such cases, attempt recovery by closing and reopening the database. This action will automatically roll back the transaction,
418+            // ensuring that any changes made since the transaction was opened will be reverted. This prevents dangling transaction data
419+            // outside the context of the 'SQLiteBatch' that initiated such a transaction.

ryanofsky commented at 6:28 pm on January 29, 2024:

In commit “sqlite: guard against dangling to-be-reverted db transactions” (0bf3a7356db224d28a86194b004ca30549197f27)

I think this comment might be more harmful than helpful in its current form, because it isn’t communicating the most important thing, which is that this code path is never expected to be hit under normal conditions, and if it is hit, it means something is seriously wrong and there has probably been data loss, if not data corruption. By saying this failure can happen when the connection is “busy”, it makes it sound like database could just be waiting for a lock. But locking alone could never trigger this, so suggesting that it could raises doubts about the rest of the code, and makes the handling here seem to be unnecessary.

Also saying “this action will automatically rollback the transaction ensuring…” seems like an overstatement. Even if it’s probably true, we are in the middle of recovering from some other significant bug or I/O error, so this comment seems to be going out on a limb to make a guarantee that isn’t clearly significant. It is also unclear to me what “transaction data outside the context of the ‘SQLiteBatch’” would refer to if SQLiteBatch is the class responsible for creating transactions.

Considering all this, I’d probably suggest still going with my original comment from #29253 (review) which was “// If transaction cannot be aborted, it means there is a bug or there has been data corruption. Try to recover in this case by closing and reopening the database. Closing the database should also ensure that any changes made since the transaction was opened will be rolled back.”

But maybe this comment is missing some other information that you wanted to get across?

furszy commented at 1:21 pm on January 30, 2024:

It is also unclear to me what “transaction data outside the context of the ‘SQLiteBatch’” would refer to if SQLiteBatch is the class responsible for creating transactions.

The purpose of this sentence was to describe what could happen if we don’t rollback the transaction and explain why we rollback the transaction inside the SQLiteBatch destructor.

The complete sentence, “This prevents dangling transaction data outside the context of the ‘SQLiteBatch’ that initiated such a transaction,” refers to the SQLiteBatch instance, not the class. Each SQLiteBatch can create a single transaction at a time; no concurrent SQLiteBatch instances are allowed (#29112). Thus, the code rolls back the db txn within the SQLiteBatch object destructor. This links the db txn to the SQLiteBatch instance lifecycle, isolating data changes and preventing other SQLiteBatch instances from dumping data that does not correspond to them.

But maybe this comment is missing some other information that you wanted to get across?

I wanted to emphasize the importance of rolling back the transaction by explaining the consequences of leaving the transaction dangling. Also, wanted to mention that the error could be transient (so we don’t forget about this possibility) and describe why we want to keep the transaction data changes isolated within the SQLiteBatch instance that initiated the transaction.

If, after reading this response, you still have doubts and find the comment confusing, I can replace it with your comment. No problem; the idea was to provide a clearer description of the situation.

ryanofsky commented at 7:44 pm on January 30, 2024:

re: #29253 (review)

Thanks for clarifying.

I wanted to emphasize the importance of rolling back the transaction by explaining the consequences of leaving the transaction dangling.

That seems ok. Would maybe suggest adding to the previously suggested comment:

// If transaction cannot be aborted, it means there is a bug or there has been data corruption. Try to recover in this case by closing and reopening the database. Closing the database should also ensure that any changes made since the transaction was opened will be rolled back and future transactions can succeed without committing old data.

Also, wanted to mention that the error could be transient (so we don’t forget about this possibility)

This is exactly what I think the comment should not be saying because the error cannot be transient. And it’s confusing to suggest that it could be transient, because it either makes the rest of the code seem broken, or it makes this code seem unnecessary.

If you want talk about a theoretical future state, in some future version of sqlite or future version of the wallet codebase where a rollback statement could return SQLITE_BUSY, I guess you could do that but it seems speculative and confusing if you don’t distinguish that scenario with what is actually happening in the current code.

and describe why we want to keep the transaction data changes isolated within the SQLiteBatch instance that initiated the transaction.

I think that’s mostly clear from the context but if you want to say more about it, the best place to put it would be on line 410 before the TxnAbort() call is made, to explain why the TxnAbort() call is being made and why it’s important. Trying to get into why the rollback is important after the rolllback fails seems like going down a tangent and not putting information where it is most helpful.

furszy commented at 8:52 pm on January 30, 2024:

All good, applied your suggestion. Thanks!

This is exactly what I think the comment should not be saying because the error cannot be transient. And it’s confusing to suggest that it could be transient, because it either makes the rest of the code seem broken, or it makes this code seem unnecessary.

I was also thinking about an I/O hardware malfunctioning error, which “could” be transient; sqlite has return codes for it. But it’s probably better to abort the program if something like this ever happens.

ryanofsky approved

ryanofsky commented at 7:32 pm on January 29, 2024: contributor

Code review ACK 493cfc728aa7835527b1eab179b8cb36edd60946. I left more more comments about comments (feel free to ignore), but otherwise this looks good.

re: #29253 (comment)

In general, we don’t have very robust error handling for when the database fails to read or write something. I think this should actually be more like an assert, but one that doesn’t kill the whole software, maybe just unloads the wallet?

It wouldn’t be ideal if the error is still recoverable (e.g. SQLITE_BUSY if we ever introduce concurrent txns).

In this case do we ever think we should see SQLITE_BUSY while trying to abandon a transaction? But yes in general we should treat expected errors and unexpected errors differently, and not handle expected errors by unloading the wallet.

furszy force-pushed on Jan 30, 2024

in src/wallet/sqlite.h:40 in 177ff954c1 outdated

35@@ -36,11 +36,21 @@ class SQLiteCursor : public DatabaseCursor
36     Status Next(DataStream& key, DataStream& value) override;
37 };
38 
39+/** Abstract class responsible for executing SQL statements in SQLite databases.
40+ *  It is abstract so it can be overridden by unit tests testing unusual database conditions. */

ryanofsky commented at 7:29 pm on January 30, 2024:

In commit “sqlite: add ability to interrupt statements” (177ff954c1221a00d4bf5ea63e6bfc415ba706c8)

It’s technically not an abstract class anymore since it doesn’t contain any = 0 pure virtual methods. Would maybe suggest “Class responsible for executing SQL statements in SQLite databases. Methods are virtual so they can be overridden by unit tests testing unusual database conditions.”

furszy commented at 8:33 pm on January 30, 2024:

yeah, fixed. I pasted your doc suggestion first, then made the classes consolidation..

in src/wallet/sqlite.cpp:615 in 177ff954c1 outdated

611@@ -607,7 +612,7 @@ std::unique_ptr<DatabaseCursor> SQLiteBatch::GetNewPrefixCursor(Span<const std::
612 bool SQLiteBatch::TxnBegin()
613 {
614     if (!m_database.m_db || sqlite3_get_autocommit(m_database.m_db) == 0) return false;
615-    int res = sqlite3_exec(m_database.m_db, "BEGIN TRANSACTION", nullptr, nullptr, nullptr);
616+    int res = m_exec_handler && m_exec_handler->Exec(m_database, "BEGIN TRANSACTION");

ryanofsky commented at 7:34 pm on January 30, 2024:

In commit “sqlite: add ability to interrupt statements” (177ff954c1221a00d4bf5ea63e6bfc415ba706c8)

This doesn’t seem right because the && operator will convert the integer return value into a bool. Also if m_exec_handler is null the res value will be 0 which is SQLITE_OK so the call will appear to succed.

It probably would make more sense to do int res = Assert(m_exec_handler)->Exec(...)

furszy commented at 8:32 pm on January 30, 2024:

yes, fixed. Thanks. God knows why I did that.. it seems that I wasn’t focused.

ryanofsky approved

ryanofsky commented at 7:47 pm on January 30, 2024: contributor

Code review ACK 756bed0529b46944162c02c783ecc804f802b637. New update consolidating the exec classes looks good.

sqlite: add ability to interrupt statements

By encapsulating sqlite3_exec into its own standalone method
and introducing the 'SQliteExecHandler' class, we enable the
ability to test db statements execution failures within the
unit test framework.

This is used in the following-up commit to exercise a deadlock
and improve our wallet db error handling code.

Moreover, the future encapsulation of other sqlite functions
within this class will contribute to minimize the impact of
any future API changes.

dca874e838

sqlite: introduce HasActiveTxn method

Util function to clean up code and let us
verify, in the following-up commit, that dangling,
to-be-reverted db transactions cannot occur anymore.

472d2ca981

sqlite: guard against dangling to-be-reverted db transactions

If the handler that initiated the database transaction is destroyed,
the ongoing transaction cannot be left dangling when the db txn fails
to abort. It must be forcefully reversed; otherwise, any subsequent
db handler executing a write operation will dump the dangling,
to-be-reverted transaction data to disk.

This not only breaks the database isolation property but also results
in the improper storage of incomplete information on disk, impacting
the wallet consistency.

fc0e747192

test: sqlite, add coverage for dangling to-be-reverted db txns b298242c8d

furszy force-pushed on Jan 30, 2024

furszy commented at 8:58 pm on January 30, 2024: member

Updated per feedback, thanks ryanofsky! Docstrings were improved, and a bug related to a null SQLiteExecHandler has been fixed. Small diff.

DrahtBot added the label CI failed on Jan 30, 2024

in src/wallet/sqlite.h:74 in dca874e838 outdated

70@@ -61,6 +71,8 @@ class SQLiteBatch : public DatabaseBatch
71     explicit SQLiteBatch(SQLiteDatabase& database);
72     ~SQLiteBatch() override { Close(); }
73 
74+    void SetExecHandler(std::unique_ptr<SQliteExecHandler>&& handler) { m_exec_handler = std::move(handler); }

ryanofsky commented at 4:11 pm on January 31, 2024:

In commit “sqlite: add ability to interrupt statements” (dca874e838c2599bd24433675b291168f8e7b055)

It’s a little better to pass plain unique_ptr instead of unique_ptr&&, because former guarantees unique_ptr argument will be moved-from and null afterward, while latter means the move may or may not happen depending on internals of the function. In general you want explicit && references for template programming and perfect forwarding, but should use values and & references otherwise.. Can see https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-conventional for guidelines.

furszy commented at 9:44 pm on January 31, 2024:

ok noted, thanks!

ryanofsky approved

ryanofsky commented at 5:02 pm on January 31, 2024: contributor

Code review ACK b298242c8d495c36072415e1b95eaa7bf485a38a. Just fix for exec result codes and comment update since last review.

Overall this change should help wallet recover and not cause more problems if a rollback fails, and it adds some nice test coverage.

achow101 commented at 6:55 pm on January 31, 2024: member

It wouldn’t be ideal if the error is still recoverable (e.g. SQLITE_BUSY if we ever introduce concurrent txns). At least in this scenario, where we are destroying the object that created the transaction, I believe triggering the database connection reset is acceptable as a first measure.

If we do have recoverable errors, then we should be handling them, not hitting them with the hammer of closing and reopening the database.

This PR feels like it’s the wrong approach to dealing with this particular database failure. It shouldn’t be possible for a transaction commit or abort to currently fail in way that isn’t catastrophic. In that case, we should be shutting the wallet rather than reopening the database connection and pretending that nothing happened. This could potentially lead to a state where the wallet’s in-memory state does not match the database, which is really bad and could lead to loss of funds.

ryanofsky commented at 7:24 pm on January 31, 2024: contributor

This PR feels like it’s the wrong approach to dealing with this particular database failure. It shouldn’t be possible for a transaction commit or abort to currently fail in way that isn’t catastrophic. In that case, we should be shutting the wallet rather than reopening the database connection and pretending that nothing happened. This could potentially lead to a state where the wallet’s in-memory state does not match the database, which is really bad and could lead to loss of funds.

It seems like if there is failure to rollback a transaction one way, with a rollback statement, this PR is just retrying again a different way, by closing and reopening the database. In theory the code could always roll back transactions by closing and reopening the database, so the new behavior in this PR doesn’t seem worse than current behavior.

One way the PR seems better than the current code is that the current code just logs “SQLiteBatch: Batch closed and failed to abort transaction” if the rollback fails, which is not even an obvious error message, and it continues like nothing happened. The new code at least raises an exception if the rollback fails.

All of this seems very theoretical to me since I have no idea why a rollback statement would ever fail. But I think this PR seems reasonable, adds some test coverage, and probably is a small improvement even if better approaches are also possible.

achow101 commented at 8:02 pm on January 31, 2024: member

ACK b298242c8d495c36072415e1b95eaa7bf485a38a

achow101 merged this on Jan 31, 2024

achow101 closed this on Jan 31, 2024

furszy deleted the branch on Feb 1, 2024

luke-jr referenced this in commit 967571ddc8 on Mar 14, 2024

luke-jr referenced this in commit a1f955eac6 on Mar 14, 2024

luke-jr referenced this in commit 9a238c50a3 on Mar 14, 2024

luke-jr referenced this in commit cb49b56f95 on Mar 14, 2024

luke-jr referenced this in commit 09d1192801 on Mar 14, 2024

luke-jr referenced this in commit bb3bb898c3 on Mar 14, 2024

bitcoin locked this on Jan 31, 2025

wallet: guard against dangling to-be-reverted db transactions #29253

Code Coverage

Reviews

Conflicts