issue
MarcoFalke
openend this issue on
February 3, 2016
MarcoFalke
commented at 8:30 pm on February 3, 2016:
member
This is somewhat rare but still worth to take a look sometime. I can reproduce locally and travis complains in wallet.py:
0check -salvagewallet
12start_node: bitcoind started, calling bitcoin-cli -rpcwait getblockcount
34No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
56The build has been terminated
Edit:
This will now show up as
0check -salvagewallet
1Unexpected exception caught during testing: Exception('bitcoind exited with status 1 during initialization',)
laanwj added the label
Wallet
on Feb 4, 2016
laanwj
commented at 1:12 pm on February 15, 2016:
member
laanwj referenced this in commit
85695802e2
on Feb 15, 2016
laanwj referenced this in commit
ca8fb59ae1
on Feb 15, 2016
laanwj
commented at 2:02 pm on March 15, 2016:
member
I’ve noticed that this mostly happens on the 32 bit travis build (.3)
laanwj
commented at 10:14 am on March 25, 2016:
member
I don’t think we’re even sure whether -salvagewallet terminates. In this case every time the last line seem to call some bitcoin-cli command with -rpcwait. In case the client would crash, or exit, it would just wait forever.
I so wish we had full access to the VM image that runs the tests in these cases.
MarcoFalke
commented at 11:24 am on March 25, 2016:
member
In case the client would crash, or exit, it would just wait forever.
The client does not crash nor exit, iIrc. salvagewallet returns some sort of error and the client tries to shutdown. Somehow this fails and the “write-peers.dat-thread” just continues to run. Haven’t had the time to look into that yet…
laanwj
commented at 11:26 am on March 25, 2016:
member
So you managed to reproduce this locally?
dohh you say so in the OP
MarcoFalke
commented at 11:29 am on March 25, 2016:
member
You can just run
0$ for i in {0..10000}; do qa/pull-tester/rpc-tests.py wallet --nocleanup;if[$? -ne 0];then echo; echo "fail"; sleep 1d;fi ;done
and come back 20 minutes later to dig through the dumps.
laanwj
commented at 12:00 pm on March 25, 2016:
member
I did a similar thing, I put the part of wallet.py that does the salvage in an infinite loop.
Managed to make it hang after ~100 iterations.
Looking at the debug.log it didn’t hang during the salvage, but manages to go into normal operation. And the node didn’t start shutting down at all. I could even succesfully send it a stop command and it stopped running. This won’t set the python script in motion again though. Curious.
At least it looks like a problem in the script framework, not in the salvage process itself, that does put my concerns to rest a bit.
Oh, I didn’t notice it spawns three nodes. The third one indeed has:
02016-03-25 10:41:54 Renamed wallet.dat to wallet.1458902514.bak
12016-03-25 10:41:54 CDBEnv::Salvage: Database salvage found errors, all data may not be recoverable.
22016-03-25 10:41:54 Salvage(aggressive) found 295 records
laanwj
commented at 12:32 pm on March 25, 2016:
member
Ok, I was right in my gist above. I can reproduce this 100% of the time now with a specific wallet:
After the salvage goes wrong (for whatever reason) the node will just exit.
The test framework somehow doesn’t detect this sudden exit. I think it will try to send a stop command (with -rpcwait) which never arrives, because the node is not running anymore in the first place. It will try forever to send it.
That, of course, leaves the question why the wallet ends up in a non-salvageable state. But at least the test should not hang in this case, but fail with a clearer error.
laanwj referenced this in commit
81b0587ba6
on Mar 25, 2016
laanwj referenced this in commit
f22f8ed8c0
on Mar 25, 2016
laanwj referenced this in commit
8869c3bbbb
on Mar 25, 2016
laanwj
commented at 2:46 pm on March 25, 2016:
member
That, of course, leaves the question why the wallet ends up in a non-salvageable state.
The underlying problem is weird. It seems that randomly, after exit, the wallet database fails verification with DB_SALVAGE|DB_AGRESSIVE but otherwise works fine. E.g.
0int result = db.verify(strFile.c_str(), NULL, &strDump, flags);
returns DB_VERIFY_BAD.
However, the wallet works as normally. Non -salvagewallet doesn’t see any problems at all.
Unlike the error message would have you believe, the salvage successfully recovers the entire wallet.
laanwj referenced this in commit
018b60c5ea
on Mar 26, 2016
MarcoFalke renamed this:
salvagewallet sometimes never terminates
salvagewallet fails verification
on Apr 16, 2016
laanwj
commented at 9:33 am on May 12, 2016:
member
This test will be disabled in #8038 until this is be fixed.
laanwj closed this
on May 12, 2016
laanwj reopened this
on May 12, 2016
makevoid referenced this in commit
ae90831ab7
on Jun 13, 2016
ftrader referenced this in commit
5ba022fab1
on Dec 9, 2016
schinzelh referenced this in commit
61d56ee22e
on Mar 14, 2017
UdjinM6 referenced this in commit
d6f3ae95db
on May 30, 2017
jnewbery
commented at 11:11 pm on June 5, 2017:
member
I’ve been digging into this failure over the last couple of days. This is what I’ve found so far:
This is fairly easy to reproduce. Just uncomment the -salvagewallet test and set BASE_SCRIPTS in test_runner.py to be a list [["wallet.py"] * 50]. In a run of 50, I usually get one failure.
the problem is caused by running Berkeley DB’s Db::verify() with the DB_AGGRESSIVE flag. The documentation for that is (emphasis mine):
Output all the key/data pairs in the file that can be found. By default, Db::verify() does not assume corruption. For example, if a key/data pair on a page is marked as deleted, it is not then written to the output file. When DB_AGGRESSIVE is specified, corruption is assumed, and any key/data pair that can be found is written. In this case, key/data pairs that are corrupted or have been deleted may appear in the output (even if the file being salvaged is in no way corrupt), and the output will almost certainly require editing before being loaded into a database.
you can repro the corruption manually by running the db_dump utility function, which is bundled with BDB. -r runs salvage and -R runs salvage with aggressive.
This is the wallet.dat file that I was using in my tests (renamed wallet.dat.txt as github has rules about file types): wallet.dat.txt. If I run db_dump on this .dat file with -R, the output ends:
if I run db_dump with -r, then I don’t see the DB_VERIFY_BAD message.
piping db_dump through to db_load rebuilds a database. If I use -R on db_dump, then this fails, because the database dump is corrupt. If I use -r, on db_dump, then it succeeds, and most (all?) of the key-value pairs appear to be intact.
Digging a bit more into the corruption, if I look at the output of db_dump -R, it’s a superset of the output of db_dump -r. There are some extra rows that look like normal keys/values, but there are also a bunch of these:
554e4b4e4f574e5f4b4559 is ascii for UNKNOWN_KEY. You can see where that string is output in btree/bt_verify.c in the BDB source.
Critically, in the first group, we have (with my commentary):
0554e4b4e4f574e5f4b4559 // UNKNOWN_KEY
1 020000000183365de8669a47426115d8224516599ce66f9e8dd75738e30b1c1812cc65210d010000006b483045022100fbec000102000000010000000000000000000000000000000000000000000000000000000000000000ffffffff0502d8000101ffffffff0200f902950000000023210230734f482295f0361adc9fdd518a17ce3914e30c5944e266c9c2d62a81bfabc0ac0000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c690689799962b48bebd836974e8cf900000000717b8547d59a8248a8f337f18b7ff9740c60ad7661a034534a3dc8b5ca85cb75000000000000030b66726f6d6163636f756e7400016e0232320974696d65736d6172740a313439363637373630380000000000e87c35590000732300010274783e056c621538aad4f361910ae102f93a9ca6b669 // some value
2554e4b4e4f574e5f4b4559 // UNKNOWN_KEY
3 02000000010817df69d135686315416c5c2ab83d740882b5c733a2e24762dc4a1b52640e2000000000494830450221008732d20ce6ec67b39bbd6065149d66cd15fa10b1666512d2f72c315dadb4a5a802202de2365122b8f014994bdd8a98955f7ade117a1aadebd2b7769d1325f53c15e801ffffffff0200000000000000001976a914954ff91555b588a81796e3d1ae6313339ce28e4788acc0e4022a010000001976a914c58fb7d7c9407cf2eab0a7a52677869334c2c46488ac00000000ccf2bff45a3b0101020000000106e45a28096dd10e52d073e6a93ed5ae884aa1dc69fb4106df2656396da5d75f000000006b483045022100ede821e3abae05f5b94454bfe6e3cdf78525206355929b3bdc1fd11b5dc8d59102 // some value
4554e4b4e4f574e5f4b4559 // UNKNOWN_KEY
5 036b65792103c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef // bitcoin pub key key
6 d63081d302010104202941b6af510f195ac980a8d5218d06b870d18ba057a2e8b3f63a9eb9be6d9cf7a08185308182020101302c06072a8648ce3d0101022100fffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f300604010004010704210279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798022100fffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141020101a12403220003c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef60d62c885546d4410667ec2ce0cf0562b90fa067e6a8cd3dc20f127d659300d1 // bitcoin pubkey value
7 036b65792103c9c89b90f2ae69722c1ec7cea861a0c925bce1dfd996467aa7eb3fe887450655
8 d63081d302010104204b1b5c9cc37717ac4ed9f4376fa93bea36e8f065d4ea0f9991eec9905e916ecba08185308182020101302c06072a8648ce3d0101022100fffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f300604010004010704210279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798022100fffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141020101a12403220003c9c89b90f2ae69722c1ec7cea861a0c925bce1dfd996467aa7eb3fe887450655b92751b48a09b2bff3ebf0ca0d40af28c2d3aa229711abd6967a5fa6c43d61c7
the output format for BDB should be key in one row followed by value in the next row. You can see here that the final 554e4b4e4f574e5f4b4559 key is followed by 036b65792103c58cdbfc2b4cedd96e56d2714aa980a1478feb0bc5f9f7b6036e075eb974faef, which is actually another key row (for a bitcoin public key). That means that the output of db_dump is out of alignment - there’s actually an odd number of rows, so after this line, parsing keys/values gets mixed up.
I added some logging into ReadKeyValue() in bitcoind and see exactly the same thing when running with -salvagewallet. The key/values get misaligned when re-reading after calling Db::verify():
02017-06-05 19:57:03 Renamed wallet.dat to wallet.1496692623.bak
12017-06-05 19:57:03 CDBEnv::Salvage: Database salvage found errors, all data may not be recoverable.
22017-06-05 19:57:03 CDBEnv::Salvage: WARNING: Number of keys in data does not match number of values.
32017-06-05 19:57:03 CDBEnv::Salvage: WARNING: Unexpected end of file while reading salvage output.
42017-06-05 19:57:03 Salvage(aggressive) found 102 records
52017-06-05 19:57:03 ReadKeyValue: Reading record type tx
62017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: aa2f899c1b2fcf578522467a4d36d46b43d2ba318d9d3c17ddf0679630e67d02
72017-06-05 19:57:03 ReadKeyValue: Reading record type tx
82017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 5fd7a56d395626df0641fb69dca14a88aed53ea9e673d0520ed16d09285ae406
92017-06-05 19:57:03 ReadKeyValue: Reading record type tx
102017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 2f66a28ccfbbf6fccad3d168cbef6ffd36ed14d2f7303ca797708aebf3deb510
112017-06-05 19:57:03 ReadKeyValue: Reading record type tx
122017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: aada45522a31c27cab548975be7f5282471f484258451916bcad018b694dc311
132017-06-05 19:57:03 ReadKeyValue: Reading record type tx
142017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 580ce8c795c5e3c5eaa946da7c054e6cd6336739c730c4fb87c9fa735632de15
152017-06-05 19:57:03 ReadKeyValue: Reading record type tx
162017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: ddae1c851b2275a9bb12d53bfe8a1f0ff9a7cdb149d8734042fe439e72cdd420
172017-06-05 19:57:03 ReadKeyValue: Reading record type tx
182017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: f3450597d463df2edd39cd7f0f0aa6e07801e143c2ecb3a96f33ba62bc1aa429
192017-06-05 19:57:03 ReadKeyValue: Reading record type tx
202017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: 31f5262195fd40bce453e44d69b6a69c3af902e10a9161f3d4aa3815626c053e
212017-06-05 19:57:03 ReadKeyValue: Reading record type tx
222017-06-05 19:57:03 ReadKeyValue: Reading tx record. hash: b6aa0e873ad8854e15855f9caf2a932edef07bfb5a261172351959defd83144d
232017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D )A<B6><AF>Q^O^YZɀ<A8><D5>!<8D>^F<B8>pы<A0>W<A2><E8><B3><F6>:<9E><B9><BE>m<9C>����0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^CŌ<DB><FC>+L<ED><D9>nV<D2>qJ<A9><80><A1>G<8F><EB>^K<C5><F9><F7><B6>^Cn^G^<B9>t<FA><EF>
242017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D )A<B6><AF>Q^O^YZɀ<A8><D5>!<8D>^F<B8>pы<A0>W<A2><E8><B3><F6>:<9E><B9><BE>m<9C>����0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^CŌ<DB><FC>+L<ED>
25<D9>nV<D2>qJ<A9><80><A1>G<8F><EB>^K<C5><F9><F7><B6>^Cn^G^<B9>t<FA><EF> record (no-op)
262017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D KESC\<9C><C3>w^W<AC>N<D9><F4>7o<A9>;<EA>6<E8><F0>e<D4><EA>^O<99><91><EE>ɐ^<91>nˠ<81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
27<FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<C9>ț<90><F2><AE>ir,^^<C7>Ψa<A0><C9>%<BC><E1><DF>ٖFz<A7><EB>?<E8><87>E^FU
282017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D KESC\<9C><C3>w^W<AC>N<D9><F4>7o<A9>;<EA>6<E8><F0>e<D4><EA>^O<99><91><EE>ɐ^<91>nˠ<81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
29<FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<C9>ț<90><F2><AE>ir,^^<C7>Ψa<A0><C9>%<BC><E1><DF>ٖFz<A7><EB>?<E8><87>E^FU record (no-op)
302017-06-05 19:57:03 ReadKeyValue: Reading record type 0<81><D3>^B^A^A^D u<9C><E8><AB><F4>4D7<F2><D4>i<A9>uN<F7>3<C1>^HW<8C>0<B9><F2>ݑ<B4>^YN<FE><E0>@W<A0><81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<DE>u^XN"m<81><B8><9E>p<97><C4>5a<CA><C6>^V<A9>*<B6>p<99><AA><A5><B5><DC><D3>>N`(^Z
312017-06-05 19:57:03 ReadKeyValue: Reading 0<81><D3>^B^A^A^D u<9C><E8><AB><F4>4D7<F2><D4>i<A9>uN<F7>3<C1>^HW<8C>0<B9><F2>ݑ<B4>^YN<FE><E0>@W<A0><81><85>0<81><82>^B^A^A0,^F^G*<86>H<CE>=^A^A^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><FF><FF><FC>/0^F^D^A^@^D^A^G^D!^By<BE>f~<F9>ܻ<AC>U<A0>b<95>·^K^G^B<9B><FC><DB>-<CE>(<D9>Y<F2><81>[^V<F8>^W<98>^B!^@<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FE><BA><AE><DC><E6><AF>H<A0>;<BF><D2>^<8C><D0>6AA^B^A^A<A1>$^C"^@^C<DE>u^XN"m<81><B8><9E>p<97><C4>5a<CA><C6>^V<A9>*<B6>p<99><AA><A5><B5><DC><D3>>N`(^Z record (no-op)
322017-06-05 19:57:03 ReadKeyValue: Reading record type test
332017-06-05 19:57:03 ReadKeyValue: Reading test record (no-op)
342017-06-05 19:57:03 ReadKeyValue: Reading record type
352017-06-05 19:57:03 ReadKeyValue: Reading record (no-op)
362017-06-05 19:57:03 ReadKeyValue: Reading record type рыба
372017-06-05 19:57:03 ReadKeyValue: Reading рыба record (no-op)
382017-06-05 19:57:03 ReadKeyValue: Reading record type
392017-06-05 19:57:03 ReadKeyValue: Reading record (no-op)
402017-06-05 19:57:03 ReadKeyValue: Reading record type
412017-06-05 19:57:03 ReadKeyValue: Reading record (no-op)
422017-06-05 19:57:03 ReadKeyValue: Reading record type 𝅘𝅥𝅯
432017-06-05 19:57:03 ReadKeyValue: Reading 𝅘𝅥𝅯 record (no-op)
442017-06-05 19:57:03 ReadKeyValue: Reading record type ^@^@
452017-06-05 19:57:03 ReadKeyValue: Reading ^@^@ record (no-op)
(you can see at the end that it’s reading keys as values and vice versa).
After running with -salvagewallet, the wallet.dat file is empty. All key-value pairs have gone. The wallet backup file wallet.<timestamp>.bak contains the db from before the salvage operation so still has all the key-value pairs.
Next steps:
-salvagewallet should be changed to not set the aggressive flag. Note the documentation for that flag: the output will almost certainly require editing before being loaded into a database. Currently the salvagewallet operation just directly reads that output, so it’s not entirely surprising that it occasionally fails.
we may want an option to run salvage wallet with the aggressive flag set, but that should only be once we have the bitcoin_wallet_tool from #8745.
once #8745 is merged, we should add targeted testing of salvagewallet (rather then the incidental testing we have in wallet.py). Gavin had some testing here: #1895 / https://gist.github.com/gavinandresen/3812689 which we might use as a start.
TBD whether we want to merge my extra logging on BDB reads. We could also add extra logging on writes.
we should try to figure out what BD writes are causing the DB to get into a state where Db::verify() with the DB_AGGRESSIVE flag fails. Additional logging will help there. I can also add an option to the integration tests which verifies the wallet.dat files of every node at the end of every test.
jnewbery
commented at 3:10 pm on June 6, 2017:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-18 15:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me