wallet: Cleanup and separate BerkeleyDatabase and BerkeleyBatch #19335

pull achow101 wants to merge 5 commits into bitcoin:master from achow101:bdb-cleanup-refactors changing 3 files +74 −90
  1. achow101 commented at 2:29 am on June 20, 2020: member

    BerkeleyBatch and BerkeleyDatabase are kind of messy. The goal of this is to clean up them up so that they are logically separated.

    BerkeleyBatch currently handles the creation of the BerkeleyDatabase’s Db handle. This is instead moved into BerkeleyDatabase and is called by BerkeleyBatch.

    Instead of having BerkeleyEnvironment track each database’s usage, have BerkeleyDatabase track this usage itself with the m_refcount variable that is present in WalletDatabase.

    Lastly, instead of having each BerkeleyEnvironment store the fileids of the databases open in it, have a global g_fileids to track those fileids. We were already checking fileid uniqueness globally (by checking the fileids in every environment when opening a database) so it’s cleaner to do this with a global variable.

    All of these changes allow us to make BerkeleyBatch and BerkeleyDatabase no longer be friend classes.

    The diff of this PR is currently the same as in ##18971

    Requires #19334

  2. fanquake added the label Wallet on Jun 20, 2020
  3. DrahtBot commented at 3:33 am on June 20, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19102 (wallet: Introduce and use DummyDatabase instead of dummy BerkeleyDatabase by achow101)
    • #18904 (Don’t call lsn_reset in periodic flush by bvbfan)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. achow101 force-pushed on Jun 22, 2020
  5. DrahtBot added the label Needs rebase on Jul 1, 2020
  6. achow101 force-pushed on Jul 1, 2020
  7. achow101 force-pushed on Jul 1, 2020
  8. DrahtBot removed the label Needs rebase on Jul 1, 2020
  9. DrahtBot added the label Needs rebase on Jul 5, 2020
  10. achow101 force-pushed on Jul 6, 2020
  11. DrahtBot removed the label Needs rebase on Jul 6, 2020
  12. DrahtBot added the label Needs rebase on Jul 8, 2020
  13. achow101 force-pushed on Jul 9, 2020
  14. DrahtBot removed the label Needs rebase on Jul 9, 2020
  15. DrahtBot added the label Needs rebase on Jul 11, 2020
  16. achow101 force-pushed on Jul 11, 2020
  17. DrahtBot removed the label Needs rebase on Jul 11, 2020
  18. achow101 force-pushed on Jul 14, 2020
  19. ryanofsky approved
  20. ryanofsky commented at 8:21 pm on July 15, 2020: member
    Code review ACK d416ae560e46a4846a3fd5990b7d390d2ef30ec8. Only changes since last review were rebasing after base PR #19334 merge, and adding cs_db lock in BerkeleyDatabase destructor, which should avoid races accessing env->m_databases and env->m_fileids
  21. in src/wallet/bdb.cpp:238 in 61da2fa6a1 outdated
    243@@ -245,9 +244,6 @@ BerkeleyEnvironment::BerkeleyEnvironment()
    244 
    245 bool BerkeleyEnvironment::Verify(const std::string& strFile)
    246 {
    247-    LOCK(cs_db);
    248-    assert(mapFileUseCount.count(strFile) == 0);
    


    ryanofsky commented at 11:08 pm on July 15, 2020:

    In commit “walletdb: track database file use as m_refcount within BerkeleyDatabase” (61da2fa6a159b0d318978cb8713f83705bb97035)

    Notes for reviewers: Assert here is moving to BerkeleyDatabase::Verify below. The reason the assert is currently true is because of the IsWalletLoaded check and cs_wallets lock in wallets.cpp. The lock is just being dropped and not moving, but should not be needed because m_refcount is std::atomic

    Suggestion: BerkeleyEnvironment::Verify is only called one place in BerkeleyDatabase::Verify. Would suggest just removing and inlining the whole method instead of only moving the assert


    achow101 commented at 6:30 pm on July 16, 2020:
    I’ve added a commit to merge BerkeleyEnvironment::Verify and BerkeleyDatabase::Verify
  22. in src/wallet/bdb.cpp:570 in 61da2fa6a1 outdated
    587@@ -591,10 +588,10 @@ void BerkeleyEnvironment::Flush(bool fShutdown)
    588         return;
    589     {
    590         LOCK(cs_db);
    591-        std::map<std::string, int>::iterator mi = mapFileUseCount.begin();
    592-        while (mi != mapFileUseCount.end()) {
    593-            std::string strFile = (*mi).first;
    594-            int nRefCount = (*mi).second;
    595+        bool no_dbs_accessed = true;
    596+        for (auto& db_it : m_databases) {
    


    ryanofsky commented at 11:30 pm on July 15, 2020:

    In commit “walletdb: track database file use as m_refcount within BerkeleyDatabase” (61da2fa6a159b0d318978cb8713f83705bb97035)

    It looks like there’s a minor change behavior here could lead to extra logging and checkpoint and lsn reset calls. Previously this only iterated over unflushed databases, now it iterates over all databases even if they were already flushed.

    Would be good to either mention behavior change in commit message or possibly add line like if (db_it.second.get().m_db)) continue; to restore previous behavior.

    Change happens because previously 0 use count meant database was not in use but not yet flushed, and no use count meant database was actually flushed. (See previous note #18971 (review))


    achow101 commented at 6:32 pm on July 16, 2020:
    I’ve changed m_refcount to track when the database has been flushed by setting it to -1. This is being done everywhere mapFileUseCount.erase() was happening. This allows us to restore the original behavior by checking for m_refcount < 0.

    ryanofsky commented at 8:02 pm on July 16, 2020:

    re: #19335 (review)

    In commit “walletdb: track database file use as m_refcount within BerkeleyDatabase” (516084a2738364f3922c08d8703fd737541bb2da)

    I’ve changed m_refcount to track when the database has been flushed by setting it to -1. This is being done everywhere mapFileUseCount.erase() was happening. This allows us to restore the original behavior by checking for m_refcount < 0.

    Note: I think there is still small change in behavior because m_refcount is initialized to 0 not -1, so never used databases are now treated like used and not yet flushed databases. I don’t think the difference actually matters, though.

  23. in src/wallet/bdb.cpp:638 in 61da2fa6a1 outdated
    637     }
    638 
    639     // Don't flush if there haven't been any batch writes for this database.
    640-    auto it = env->mapFileUseCount.find(strFile);
    641-    if (it == env->mapFileUseCount.end()) return false;
    642+    if (m_refcount > 0) return false;
    


    ryanofsky commented at 11:44 pm on July 15, 2020:

    In commit “walletdb: track database file use as m_refcount within BerkeleyDatabase” (61da2fa6a159b0d318978cb8713f83705bb97035)

    This looks like a bug that causes flushes to no longer happen when needed. Previous code checked if there were any batch operations since the last flush. New code is checking if a batch operation is currently in progress. This new check also appears to be always false due to the line immediately above.

    I think you might be able to replace this line with if (!m_db) return false to restore previous behavior


    achow101 commented at 6:33 pm on July 16, 2020:
    Changed to use m_refcount < 0 as mentioned in a previous comment.
  24. in src/wallet/bdb.cpp:322 in 33554edc24 outdated
    333@@ -334,13 +334,27 @@ BerkeleyDatabase::~BerkeleyDatabase()
    334 
    335 BerkeleyBatch::BerkeleyBatch(BerkeleyDatabase& database, const char* pszMode, bool fFlushOnCloseIn) : pdb(nullptr), activeTxn(nullptr), m_cursor(nullptr), m_database(database)
    336 {
    337+    database.AddRef();
    


    ryanofsky commented at 0:21 am on July 16, 2020:

    In commit “walletdb: Move Db->open to BerkeleyDatabase::Open” (33554edc24a51685cfc3ca0053372c253d1fb4b9)

    This is annoying, but I think there is another bug here. It’s necessary to move the AddRef call before the Open call to avoid a race condition: #18971 (review). But now there is another problem because if Open throws, then the reference count will never be decremented in the batch destructor because Batch::pdb will be null.

    Easiest way to fix this is probably to move the the AddRef() call here back after the Open() call but lock cs_db while calling both.

    An alternate fix would move m_database.RemoveRef() from the batch close method to the batch destructor and make it unconditional the same way the database.AddRef() call is unconditional in the constructor right now.

    A third alternative would make reference counting internal to the database object and just give it symmetric Open and CloseIfUnused methods like I suggested twice previously #19334 (review) and #18971 (review), but this is a bigger change.


    achow101 commented at 6:33 pm on July 16, 2020:
    I’ve moved m_database.RemoveRef() to the destructor.
  25. in src/wallet/bdb.cpp:386 in 8a98a83232 outdated
    427-            // different backup copies of a wallet at the same time. Maybe even
    428-            // more ideally, an exclusive lock for accessing the database could
    429-            // be implemented, so no equality checks are needed at all. (Newer
    430-            // versions of BDB have an set_lk_exclusive method for this
    431-            // purpose, but the older version we use does not.)
    432-            for (const auto& env : g_dbenvs) {
    


    ryanofsky commented at 0:59 am on July 16, 2020:

    In commit “walletdb: Use a global g_fileids instead of m_fileids for each env” (8a98a83232fab6ff653c9475bb6e01249eadfa5c)

    I think now that we have .walletlock files, the “in the future a more relaxed check” TODO above has been implemented, and neither having the for g_dbenvs loop nor having a global fileid list make sense anymore. Like the comment and #18971 (review) say, the global fileid checking was a kludge to work around lack of locking. Since #11904, locking is implemented and the kludge should no longer necessary. I think this commit could be simplified to just drop the for loop and long comment here and call CheckUniqueFileid(*env, strFilename, *pdb_temp, env->m_fileids[strFile]);` here to delete the kludge instead of reimplementing and moving it.


    achow101 commented at 6:34 pm on July 16, 2020:
    Replaced that commit to do as you suggest.
  26. ryanofsky commented at 1:12 am on July 16, 2020: member

    Conditional code review ACK e73d0a1f9e28f303024d6ddec2b4e7e980783c9c if some problems below are fixed. Just reviewed last 4 commits here

    • 61da2fa6a159b0d318978cb8713f83705bb97035 walletdb: track database file use as m_refcount within BerkeleyDatabase (5/8)
    • 33554edc24a51685cfc3ca0053372c253d1fb4b9 walletdb: Move Db->open to BerkeleyDatabase::Open (6/8)
    • 8a98a83232fab6ff653c9475bb6e01249eadfa5c walletdb: Use a global g_fileids instead of m_fileids for each env (7/8)
    • e73d0a1f9e28f303024d6ddec2b4e7e980783c9c walletdb: Remove BerkeleyBatch friend class from BerkeleyDatabase (8/8)
  27. achow101 force-pushed on Jul 16, 2020
  28. ryanofsky approved
  29. ryanofsky commented at 8:03 pm on July 16, 2020: member

    Code review ACK ccfd611ed1bb45874a2558390de5597dd32d2772 last 5 commits

    • 747ca18d5b5f6dd3780de0f543b8a06d52c74169 Combine BerkeleyEnvironment::Verify into BerkeleyDatabase::Verify (5/9)
    • 516084a2738364f3922c08d8703fd737541bb2da walletdb: track database file use as m_refcount within BerkeleyDatabase (6/9)
    • 93b5f0442bada706aba561dfb5bf3da78101522b walletdb: Move Db->open to BerkeleyDatabase::Open (7/9)
    • 0a01ba79a4f7076593b6f65cb3e227455a487219 No need to check for duplicate fileids in all dbenvs (8/9)
    • ccfd611ed1bb45874a2558390de5597dd32d2772 walletdb: Remove BerkeleyBatch friend class from BerkeleyDatabase (9/9)

    Only changes since previous review are followups from #19335#pullrequestreview-449402796. This is a nice cleanup and everything here is an improvement. I do think Batch/Database refcounting interaction is too complicated though and could be made simpler later (https://github.com/bitcoin/bitcoin/pull/19335#discussion_r455438675).

  30. meshcollider added this to the "PRs" column in a project

  31. DrahtBot added the label Needs rebase on Jul 22, 2020
  32. achow101 force-pushed on Jul 22, 2020
  33. DrahtBot removed the label Needs rebase on Jul 22, 2020
  34. Combine BerkeleyEnvironment::Verify into BerkeleyDatabase::Verify 65fb8807ac
  35. walletdb: track database file use as m_refcount within BerkeleyDatabase
    Instead of having BerkeleyEnvironment track the file use count, make
    BerkeleyDatabase do it itself.
    4fe4b3bf1b
  36. walletdb: Move Db->open to BerkeleyDatabase::Open
    Instead of opening the Db handle in BerkeleyBatch, make BerkeleyDatabase
    do that.
    d86efab370
  37. No need to check for duplicate fileids in all dbenvs
    Since we have .walletlock in each directory, we don't need the duplicate
    fileid checks across all dbenvs as it shouldn't be possible anyways.
    00f0041351
  38. walletdb: Remove BerkeleyBatch friend class from BerkeleyDatabase 74507ce71e
  39. achow101 force-pushed on Jul 23, 2020
  40. laanwj added this to the "Blockers" column in a project

  41. ryanofsky approved
  42. ryanofsky commented at 7:36 pm on July 24, 2020: member
    Code review ACK 74507ce71eb61105fb3ae8460999099234ca7b8b. No changes since last review other than rebase
  43. in src/wallet/bdb.cpp:288 in 4fe4b3bf1b outdated
    283@@ -285,8 +284,7 @@ bool BerkeleyDatabase::Verify(bilingual_str& errorStr)
    284 
    285     if (fs::exists(file_path))
    286     {
    287-        LOCK(cs_db);
    


    promag commented at 0:11 am on July 25, 2020:
    Why was this removed?

    achow101 commented at 3:31 pm on July 25, 2020:
    It was there to guard mapFileUseCount which we’ve now removed.
  44. laanwj commented at 4:02 pm on July 29, 2020: member
    Code review ACK 74507ce71eb61105fb3ae8460999099234ca7b8b
  45. laanwj merged this on Jul 29, 2020
  46. laanwj closed this on Jul 29, 2020

  47. laanwj removed this from the "Blockers" column in a project

  48. sidhujag referenced this in commit fbe47717c0 on Jul 31, 2020
  49. meshcollider moved this from the "PRs" to the "Done" column in a project

  50. Fabcien referenced this in commit 07db5362b0 on Sep 6, 2021
  51. Fabcien referenced this in commit 45210e0f0d on Sep 6, 2021
  52. deadalnix referenced this in commit 2a0157d45a on Sep 6, 2021
  53. deadalnix referenced this in commit 0b1e3bf1a7 on Sep 6, 2021
  54. deadalnix referenced this in commit 5cbb600772 on Sep 6, 2021
  55. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me