Handle corrupt wallets gracefully. #1895

pull gavinandresen wants to merge 3 commits into bitcoin:master from gavinandresen:wallet_exceptions changing 8 files +531 −219
  1. gavinandresen commented at 7:25 pm on October 1, 2012: contributor

    Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a crash. This commit does three things:

    1. Runs a BDB verify early in the startup process, and if there is a low-level problem with the database:
    • Moves the bad wallet.dat to wallet.timestamp.bak
    • Runs a ‘salvage’ operation to get key/value pairs, and writes them to a new wallet.dat
    • Continues with startup.
    1. Much more tolerant of serialization errors. All errors in deserialization are tolerated EXCEPT for errors related to reading keypairs or master key records– those are reported and then shut down, so the user can get help (or recover from a backup).

    2. Adds a new -salvagewallet option, which:

    • Moves the wallet.dat to wallet.timestamp.bak
    • extracts ONLY keypairs and master keys into a new wallet.dat
    • soft-sets -rescan, to recreate transaction history

    This was tested by randomly corrupting testnet wallets using a little python script I wrote (https://gist.github.com/3812689)

  2. in src/walletdb.cpp: in 959b9ad926 outdated
    608+                if (IsKeyType(strType))
    609+                    result = DB_CORRUPT;
    610+                else if (strType == "tx")
    611+                    // Rescan if there is a bad transaction record:
    612+                    SoftSetBoolArg("-rescan", true);
    613+                // Leave other errors alone, if we try to fix them we might make things worse.
    


    laanwj commented at 6:34 am on October 2, 2012:

    The result stays DB_LOAD_OK here, so in case of corruptions in non-key/tx records, it silently continues also with the upgrading below at line 458. Is this desired behavior?

    Or should we set some flag, log a message and show a popup to the user (uiInterface.ThreadSafeMessageBox) at the end of the function that recovery has taken place and that some wallet data (such as address book entries, details can be found in debug.log…) might be corrupt?


    gavinandresen commented at 1:45 pm on October 2, 2012:

    An earlier version of this code extended DBErrors to have different levels of error, but the code started getting complicated and confusing (e.g. you could have a wallet that had a key error, a non-key error, AND needed upgrading… maybe DBErrors should be a bitmask, etc).

    But telling the user that there is something wrong is definitely a good idea, I’ll make that so.


    Diapolo commented at 5:01 pm on October 2, 2012:
    I like the idea to have bitmasks to handle error codes btw.
  3. in src/init.cpp: in 648e148a7e outdated
    488@@ -480,7 +489,34 @@ bool AppInit2()
    489 
    490     int64 nStart;
    491 
    492-    // ********************************************************* Step 5: network initialization
    493+    // ********************************************************* Step 5: verify database integrity
    494+
    


    Diapolo commented at 5:07 pm on October 3, 2012:
    If this process takes a few seconds (or more), it’s IMO worth adding a uiInterface.InitMessage(_("Verifying database integrity...")); message here.
  4. Diapolo commented at 5:15 pm on October 3, 2012: none

    I know you will for sure dislike the following comment, but I’ll try for the last time (you won’t get any further comments on strings in your pulls, if you want) as the brave knight for unified string usage ^^. Can you change your Warning messages to the following format:

    “Warning: First sentence! Second sentence.”

    • start with Warning:
    • First sentence (if a sentence) finished with a !
    • Further sentences finished with a .
  5. gavinandresen commented at 2:36 pm on October 4, 2012: contributor

    @Diapolo : good idea on the Verifying message. And ok, I’ll change the first period to an exclamation mark.

    I’m finding serious bugs doing more testing; writing here so I don’t lose track of them:

    1. Getting a crash on my main wallet, bdb complaining about out of memory (out of mutexes).

    2. Getting this weirdness switching from newer bitcoind to older: 10/04/12 14:16:00 nFileVersion = 70003 10/04/12 14:16:00 Performing wallet upgrade to 60000

    3. Crash-at-shutdown due to the printf-in-global-destructor bug

  6. gavinandresen commented at 7:45 pm on October 4, 2012: contributor

    Updated to not “pre-verify” blkindex.dat which fixes the ‘out of mutexes’ problem (looks like bdb does not clean up after a ->verify() ?), pick up some changes from @jgarzik version of DBEnv::RemoveDB (kept RemoveDB as the name, though, since it removes a database not a dbenv), and tweaked Warning! messages.

    I’ll investigate the downgrade weirdness separately, I’m afraid that might be another bug introduced in 0.7.0.

  7. in src/db.cpp: in 86d793ba8d outdated
    180+    }
    181+    catch (DbException& e) {
    182+        printf("ERROR: db salvage failed\n");
    183+        return false;
    184+    }
    185+
    


    sipa commented at 9:08 pm on October 4, 2012:

    My god… is that was is necessary to recover from BDB? Manually parse the hex dump created by a library?

    I want to get rid of BDB yesterday.


    laanwj commented at 5:51 am on October 5, 2012:

    Recovery is never a nice process, if things are broken enough you always get to a level where you have to look at hexdumps of the raw file to salvage anything. At least you still get delimited keys/values here.

    Is it any prettier for leveldb, for example?


    sipa commented at 8:45 am on October 5, 2012:
    There are no leveldb tools; I think ‘recovery’ is the same as ‘opening’, and ‘crashing’ is the same as ‘closing’ in LevelDB. There are a few flags to set the degree of checksum verification or paranoidness when opening, though.
  8. sipa commented at 1:16 pm on October 7, 2012: member
    @gavinandresen Do you consider this pull ready now?
  9. gavinandresen commented at 4:27 pm on October 7, 2012: contributor

    Yes, this is pull-ready now.

    I’d like some help with more thorough testing.

  10. gmaxwell commented at 5:23 pm on October 7, 2012: contributor

    So, this can just cause your balance to go to zero with no notice if you’re not watching the logs/console output carefully. Perhaps get getinfo error field should get something?

    Here is what I tested:

    Using gavin’s testnet-in-a-box wallet. zzuf -I ‘wallet.dat’ -s 0:1000 ./bitcoind -daemon=0

    Seed 0 fails with Db::open: Invalid argument. In log I see Salvage(aggressive) found 2372 records. Restarting without fuzzing gives me a successful start but zero balance.

    Recover original wallet, then run starting with seed 1: zzuf -I ‘wallet.dat’ -s 1:1000 ./bitcoind -daemon=0

    Fails at seed 1 with “DbEnv::open: DB_RUNRECOVERY: Fatal error, run database recovery” No salvage run.

    Starting without fuzzing gives the correct balance.

    Starting again at seed 2: zzuf -I ‘wallet.dat’ -s 2:1000 ./bitcoind -daemon=0 Throws “Bitcoin: Warning: wallet.dat corrupt, data salvaged! Original wallet.dat saved as wallet.{timestamp}.bak in /home/gmaxwell/.bitcoin/testnet3; if your balance or transactions are incorrect you should restore from a backup.” at the console. (first time I’ve seen that)

    Log shows: Renamed wallet.dat to wallet.1349630310.bak Salvage(aggressive) found 2372 records

    And a bunch of nice addwallets.

    But calling getinfo triggers segfault.

    – still, this pull is a massive improvement over default. Now that we’ve got a case where there could be backup wallet files laying around perhaps we should go all the way and keep a couple wallet rotation even when there isn’t corruption?

    Perhaps the fuzzing is a little too nasty to be a realistic test. Though if we ever change to our own append only format, I absolutely expect it to survive this kind of test.

  11. luke-jr commented at 5:26 pm on October 7, 2012: member
    How does this handle encrypted wallets?
  12. Handle incompatible BDB environments
    Before, opening a -datadir that was created with a new
    version of Berkeley DB would result in an un-caught DB_RUNRECOVERY
    exception.
    
    After these changes, the error is caught and the user is told
    that there is a problem and is told how to try to recover from
    it.
    8d5f461cb6
  13. gavinandresen commented at 9:28 pm on October 8, 2012: contributor
    @luke-jr it handles encrypted wallets as well as might be expected. It works on the bdb level, salvaging as many key/value pairs as it can from the backed-up wallet.dat. If it encounters a database-level error reading keys (private keys, encrypted or not, or master keys) it tells the user to try to recover from a backup.
  14. Handle corrupt wallets gracefully.
    Corrupt wallets used to cause a DB_RUNRECOVERY uncaught exception and a
    crash. This commit does three things:
    
    1) Runs a BDB verify early in the startup process, and if there is a
    low-level problem with the database:
      + Moves the bad wallet.dat to wallet.timestamp.bak
      + Runs a 'salvage' operation to get key/value pairs, and
        writes them to a new wallet.dat
      + Continues with startup.
    
    2) Much more tolerant of serialization errors. All errors in deserialization
    are reported by tolerated EXCEPT for errors related to reading keypairs
    or master key records-- those are reported and then shut down, so the user
    can get help (or recover from a backup).
    
    3) Adds a new -salvagewallet option, which:
     + Moves the wallet.dat to wallet.timestamp.bak
     + extracts ONLY keypairs and master keys into a new wallet.dat
     + soft-sets -rescan, to recreate transaction history
    
    This was tested by randomly corrupting testnet wallets using a little
    python script I wrote (https://gist.github.com/3812689)
    eed1785f70
  15. Don't try to verify a non-existent wallet.dat d0b3e77a08
  16. gavinandresen commented at 11:59 pm on October 8, 2012: contributor
    Rebased on top of #1917; changed error handling from bdb methods from exceptions to returned error codes.
  17. gavinandresen merged this on Oct 9, 2012
  18. gavinandresen closed this on Oct 9, 2012

  19. KolbyML referenced this in commit a418e9f1a3 on Dec 5, 2020
  20. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-06 04:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me