wallet: fix deadlock in bdb read write operation #27556

pull furszy wants to merge 3 commits into bitcoin:master from furszy:2023_wallet_db_deadlock changing 6 files +56 −18
  1. furszy commented at 6:09 pm on May 2, 2023: member

    Basically, with bdb, we can’t make a write operation while we are traversing the db with the same db handler. These two operations are performed in different txn contexts and cause a deadlock.

    Added coverage by using EraseRecords() which is the simplest function that executes this process.

    To replicate it, need bdb support and drop the first commit. The test will run forever.

  2. DrahtBot commented at 6:09 pm on May 2, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hebasto, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27286 (wallet: Keep track of the wallet’s own transaction outputs in memory by achow101)
    • #24914 (wallet: Load database records in a particular order by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Wallet on May 2, 2023
  4. theStack commented at 3:18 pm on May 3, 2023: contributor

    Didn’t look closer at the changes yet, but can confirm that running the newly introduced unit test via

    0$ ./src/test/test_bitcoin --log_level=all --run_test=walletdb_tests/walletdb_read_write_deadlock
    

    succeeds on the PR branch head and hangs if the bugfix commit bda562ddba13ee8f94147734ab9901ab07357f2f is reverted.

  5. achow101 commented at 3:34 pm on May 3, 2023: member
    ACK d3419f4108669d37a4ccecab77750e70c2f6237c
  6. achow101 commented at 3:35 pm on May 3, 2023: member
    Backport shouldn’t be required as I don’t think there’s any user action that can trigger the deadlock.
  7. in src/wallet/bdb.cpp:716 in d3419f4108 outdated
    712@@ -713,7 +713,7 @@ BerkeleyCursor::~BerkeleyCursor()
    713 std::unique_ptr<DatabaseCursor> BerkeleyBatch::GetNewCursor()
    714 {
    715     if (!pdb) return nullptr;
    716-    return std::make_unique<BerkeleyCursor>(m_database);
    717+    return std::make_unique<BerkeleyCursor>(m_database, this);
    


    achow101 commented at 3:43 pm on May 3, 2023:
    We should just make the batch argument required to avoid this kind of issue in the future.

    furszy commented at 3:54 pm on May 3, 2023:
    sure
  8. furszy force-pushed on May 3, 2023
  9. furszy commented at 4:09 pm on May 3, 2023: member

    Updated per feedback. Thanks achow101.

    Only diff is BerkeleyCursor requiring a WalletBatch ref instead of a pointer.

  10. achow101 commented at 5:55 pm on May 3, 2023: member
    ACK de202f860b3f79774d5235dec59a140fa5c82fa7
  11. hebasto commented at 4:15 pm on May 11, 2023: member
    Concept ACK.
  12. in src/wallet/bdb.cpp:677 in de202f860b outdated
    673 {
    674     if (!database.m_db.get()) {
    675         throw std::runtime_error(STR_INTERNAL_BUG("BerkeleyDatabase does not exist"));
    676     }
    677     // Transaction argument to cursor is only needed when using the cursor to
    678     // write to the database. Read-only cursors do not need a txn pointer.
    


    hebasto commented at 4:30 pm on May 11, 2023:
    Does this comment remain true?

    furszy commented at 9:02 pm on May 11, 2023:
    It is true while you use the db handler only for reading operations at that point. In that case, you don’t need the transaction argument and batch.txn() returns a nullptr. If you mix an already open read-only cursor with some write operations, then the code is no longer performing a read-only operation, so it will need to provide the transaction arg (probably by re-arranging the code in the same way as I did with EraseRecords).
  13. in src/wallet/walletdb.cpp:1158 in de202f860b outdated
    1155         DataStream value{};
    1156         DatabaseCursor::Status status = cursor->Next(key, value);
    1157         if (status == DatabaseCursor::Status::DONE) {
    1158             break;
    1159         } else if (status == DatabaseCursor::Status::FAIL) {
    1160+            cursor.reset(nullptr);
    


    hebasto commented at 4:53 pm on May 11, 2023:
    A note for myself: Seems unfortunate to reset a std::unique_ptr instance explicitly instead of using its RAII capabilities.

    furszy commented at 9:05 pm on May 11, 2023:

    Yeah, I explained the reason inside the commit description ca1871d48558b356257d2a95b046f63739dfe3a0:

    extra note from the Db.cursor doc: “If transaction protection is enabled, cursors must be opened and closed within the context of a transaction”

    thus why added a CloseCursor call before calling to TxnAbort/TxnCommit.

    The CloseCursor call is the ptr reset. Just extra safety measures.

  14. hebasto approved
  15. hebasto commented at 5:27 pm on May 11, 2023: member
    Approach ACK de202f860b3f79774d5235dec59a140fa5c82fa7.
  16. furszy commented at 12:50 pm on May 12, 2023: member

    Does this PR makes

    https://github.com/bitcoin/bitcoin/blob/137a98c5a22e058ed7a7997a0a4dbd75301de51e/test/sanitizer_suppressions/tsan#L28

    outdated? @hebasto hmm, interesting. It seems to be outdated in master. Just ran the test suite there, without the suppression, and it passed.

  17. hebasto approved
  18. hebasto commented at 1:56 pm on May 12, 2023: member

    ACK de202f860b3f79774d5235dec59a140fa5c82fa7, tested on Ubutnu 23.04.

    To replicate it, need bdb support and drop the first commit. The test will run forever.

    The test is deadlocked with the second commit dropped as well.

  19. fanquake requested review from achow101 on May 12, 2023
  20. DrahtBot added the label Needs rebase on May 15, 2023
  21. wallet: bugfix, GetNewCursor() misses to provide batch ptr to BerkeleyCursor
    If the batch ptr is not passed, the cursor will not use the db active
    txn context which could lead to a deadlock if the code tries to modify
    the db while it is traversing it.
    
    E.g. the 'EraseRecords()' function.
    043fcb0b05
  22. walletdb: scope bdb::EraseRecords under a single db txn
    so we erase all the records atomically or abort the entire
    procedure.
    
    and, at the same time, we can share the same db txn context
    for the db cursor and the erase functionality.
    
    extra note from the Db.cursor doc:
    "If transaction protection is enabled, cursors must be
    opened and closed within the context of a transaction"
    
    thus why added a `CloseCursor` call before calling to
    `TxnAbort/TxnCommit`.
    12daf6fcdc
  23. test: add coverage for wallet read write db deadlock 69d43905b7
  24. furszy force-pushed on May 15, 2023
  25. DrahtBot removed the label Needs rebase on May 15, 2023
  26. furszy commented at 4:37 pm on May 15, 2023: member
    rebased, conflicts solved.
  27. DrahtBot added the label CI failed on May 15, 2023
  28. hebasto approved
  29. hebasto commented at 2:07 pm on May 16, 2023: member
    re-ACK 69d43905b7f1d4849dfaea1b5744ee473ccc8744
  30. DrahtBot removed the label CI failed on May 16, 2023
  31. achow101 commented at 3:01 pm on May 18, 2023: member
    ACK 69d43905b7f1d4849dfaea1b5744ee473ccc8744
  32. DrahtBot removed review request from achow101 on May 18, 2023
  33. achow101 merged this on May 18, 2023
  34. achow101 closed this on May 18, 2023

  35. sidhujag referenced this in commit 33242acf61 on May 19, 2023
  36. furszy deleted the branch on Aug 9, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me