Yeah, I wonder if we can reliably test anything other than the happy path anyway. Generally, once UB happened, there is no reliable way to recover from it. While in practise here, most likely you’ll end up with negative values, I don’t think this is guaranteed by the C++ standard and probably not something to be able to rely on.
Sure, there is no guarantee that the check that the corruption test is testing will actually be hit for every corrupt index. I still think it would be good to have test coverage here but I am also happy to remove it (and implement the rest with a previous release) if other reviewers agree we only need the happy path.
Taking a step back, the size of the index is probably small and it is not too hard to re-create it. So an alternative to attempting to migrate it, could be to just have the index sit under a new name/folder for new releases from now. This should allow downgrades to previous releases to continue using the prior index on a best-effort-basis without user involvement. On upgrades, the new location is used without user involvement. A warning could be printed on every start to offer the user to delete the old index, if they don’t need it for use with previous releases?
Hmm, the size of the mainnet coinstats index on my machine is 160MB, but I’m not sure I like any amount of data that will be left over in the users datadir potentially forever. Worst case the user has also used the node for Testnet3 and Signet and it may be over 500MB total. I am not sure how many people notice such warnings and act on them, so I am kind of on the fence between these two solutions. Is there any precedent for something like this that could give some guidance @maflcko ? Nevertheless, it would indeed be a way simpler fix, I have sketched this out here: https://github.com/fjahr/bitcoin/tree/2024-07-csi-overflow-2-keep The fix commit is almost the same except for the folder name change. And then there is a simple compatibility test. The versioning code and migration are scrapped.
I am currently undecided between (potentially) leaving data on users disk that may not be used and making downgrading possible. I will have to think about this a little bit more I would be happy to hear more feedback what other reviewers think. Maybe it’s even worth bringing it up on IRC to get some more input.