util: Close txindex LDB before exiting bitcoind #17852

paulyc commented at 1:52 AM on January 3, 2020: none

Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind, which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create. This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time bitcoind was started.

util: Close txindex LDB before exiting bitcoind

Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind,
which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create.
This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time
bitcoind was started.

ed9180fc28

fanquake added the label UTXO Db and Indexes on Jan 3, 2020

fanquake requested review from jimpo on Jan 3, 2020

in src/index/txindex.cpp:233 in ed9180fc28

 227 | @@ -228,7 +228,10 @@ TxIndex::TxIndex(size_t n_cache_size, bool f_memory, bool f_wipe)
 228 |      : m_db(MakeUnique<TxIndex::DB>(n_cache_size, f_memory, f_wipe))
 229 |  {}
 230 |  
 231 | -TxIndex::~TxIndex() {}
 232 | +TxIndex::~TxIndex()
 233 | +{
 234 | +    m_db.reset();

MarcoFalke commented at 11:18 AM on January 3, 2020:

Why is this needed? The pointer should be reset when the TxIndex is destructed

paulyc commented at 2:34 PM on January 3, 2020:

Good point, it should, (shouldn't it???) (clearly both C++ and myself are too old for me to be writing it), so it would seem that, at least in some corner case, ~TxIndex isn't getting called at all. So I will close this for now and reopen it if I figure out what's going on further investigation. Thanks

paulyc commented at 3:20 PM on January 3, 2020:

I wonder, but wouldn't really know, if the default destructor call isn't somehow ordered, or being reordered, such that it doesn't finish before the process exits, and the explicit call here maybe allows it to finish, but I'd still have to test it more and maybe figure out a way to write an integration test that verifies all the leveldbs are properly closed (or not). Will investigate further

paulyc commented at 3:22 PM on January 3, 2020:

Particularly since leveldb is rather poorly designed imo to put some very important closing the database code in the destructor, since, after all, the compiler (and programmer, and OS), don't necessarily know or care that any of those objects still in memory that are about to be wiped out by the OS when the process exits, actually do important things that need to happen before the process does exit! And thus, quite often, do not bother actually destructing any of those objects, since nobody cares, (but this is exactly why they should care), about memory that leaks when the process is about to exit.

paulyc commented at 3:37 PM on January 3, 2020:

It certainly seems like the patch works, but, I can neither explain why nor prove that it does definitively, so, not sure if that quite reaches the merge bar, in light of it being such a minor change to such a minor subsystem that isn't functionally broken, rather just a little bit inconvenient having to wait a while, while it reindexes the same blocks ;) but it's been bugging me (no pun intended) for years..

MarcoFalke commented at 4:18 PM on January 3, 2020:

What is the issue you are seeing? Any stacktraces or anything?

Reminds me of #13894, #14071, #15410

paulyc commented at 2:31 PM on January 4, 2020:

Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment.

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently.

So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one.

I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info

paulyc closed this on Jan 3, 2020

paulyc commented at 2:35 PM on January 4, 2020: none

Whoa i dunno what just happened. Was just adding a comment but i inadvertently hit "close conversation" not as if i know what that button does but everything disappeared. Good thing i always write everything in a text editor because I'm so used to Android killing my apps and taking my drafts with them i dont like any ux that just deletes what you were working on because you hit one button. People accidentally hit buttons all the time. Anyway back on subject:

Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment.

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently.

So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one.

I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info

MarcoFalke commented at 9:30 PM on January 6, 2020: member

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process

Txindex is a background thread, so it might fall back arbitrarily. Though it shouldn't lose progress from where it was in normal operation.

jimpo commented at 2:57 AM on January 7, 2020: contributor

@paulyc Oh, thank you, very nice catch. This is happening because the txindex (and the other indices) are stopped in Shutdown (init.cpp) before ForceFlushStateToDisk is called. ForceFlushStateToDisk triggers the ChainStateFlushed callback on the indices which commits the indexed state to disk. So probably the best solution is to move the index stops after that. Alternately, BaseIndex::Stop can just call Commit directly. Or do both, because I don't see a downside to either approach.

paulyc commented at 2:23 PM on January 7, 2020: none

Ah, @jimpo thanks for tracking that down as well. Must not have had enough coffee or something when I submitted this, but glad it helped find the issue, which clearly had to be a little more complex. ;) In the immortal words of Samwise Gamgee all's well as ends well, or at least it will be when a working PR is merged. Not sure if I'll have time to do it soon, but it's been there long enough, it can wait a little longer and I can test it properly, (why they do the tests...), unless someone else has time to fit it in somewhere but I wouldn't know

jimpo cross-referenced this on Jan 8, 2020 from issue init: Stop indexes on shutdown after ChainStateFlushed callback. by jimpo

fanquake referenced this in commit a51aa2880d on Jan 22, 2020

sidhujag referenced this in commit 8c909d56ea on Jan 24, 2020

sidhujag referenced this in commit 6271403c3d on Nov 10, 2020

bitcoin locked this on Feb 15, 2022