index: Fix coinstats overflow #30469

pull fjahr wants to merge 2 commits into bitcoin:master from fjahr:2024-07-csi-overflow-2 changing 7 files +202 −125
  1. fjahr commented at 10:46 am on July 17, 2024: contributor

    Closes #26362

    This continues the work that was started with #26426. It fixes the overflow issue by switching most of the values tracked in the index from historical totals to values per block. The only effect of this is that it requires iterating over all the entries to get to these historical values, however nothing changes for the capabilities of the RPCs we have today because the switched values were always report per block and the totals were used in concepts like #19792.

    The current approach opts for a simple solution: The new version of the index goes into a separate location in the datadir (index/coinstatsindex/ rather than index/coinstats/ before, the new naming is more consistent with the naming of the other indexes). There is no explicit concept of versioning of the index which earlier versions of this PR had. Having the two different versions of the index in separate places allows for downgrading of the node without having to rebuild the index. However, there will be a warning printed in the logs if the new code (v30) detects the old index still being present. Beyond v31 the old index will be automatically deleted.

  2. DrahtBot commented at 10:46 am on July 17, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30469.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan
    Concept ACK mzumsande

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on Jul 17, 2024
  4. DrahtBot added the label CI failed on Jul 17, 2024
  5. fjahr force-pushed on Jul 23, 2024
  6. fjahr force-pushed on Jul 23, 2024
  7. mzumsande commented at 6:07 pm on July 23, 2024: contributor

    Concept ACK

    Haven’t tested anything yet, but did you check what happens in case of a downgrade? (an index generated with this PR is loaded with master)

  8. DrahtBot removed the label CI failed on Jul 23, 2024
  9. mzumsande commented at 5:09 pm on July 24, 2024: contributor
    Since rebuilding the coinstatsindex takes a long time and it’s not corrupted (yet) on mainnet I thought that auto-migrating to the new format instead of making the user reindex might be more user-friendly: I tried that out in https://github.com/mzumsande/bitcoin/commits/202407_coinstats_v1_automigrate/ (just an untested POC - also doesn’t process any blocks saved by hash instead of height yet, but that should be possible to add.) What do you think of that option?
  10. DrahtBot added the label CI failed on Sep 8, 2024
  11. DrahtBot removed the label CI failed on Sep 14, 2024
  12. DrahtBot added the label CI failed on Oct 13, 2024
  13. achow101 requested review from TheCharlatan on Oct 15, 2024
  14. DrahtBot removed the label CI failed on Oct 19, 2024
  15. DrahtBot added the label CI failed on Mar 17, 2025
  16. fjahr force-pushed on Mar 30, 2025
  17. fjahr commented at 8:59 pm on March 30, 2025: contributor

    Picking this up again after this has gone a bit quiet for a while. After having a brief conversation about it with @mzumsande at CoreDev I took another look at his migration idea. Initially I was a bit undecided honestly but I now think that it’s reasonable to try this and I would like to open this up for additional approach feedback from other reviewers. Aside from a rebase this has now cherry-picked @mzumsande ’s draft commit with minor changes by me and I adapted the tests to check the new behavior is a separate commit. Test-each-commit CI will probably fail right now but I plan to make a clean history once I have some additional feedback that including the migration is the way to go.

    I have also tested the migration behavior on Signet again and it looks good to me.

     02025-03-30T18:08:00Z Opened LevelDB successfully
     12025-03-30T18:08:00Z Using obfuscation key for /Users/FJ/Library/Application Support/Bitcoin/signet/indexes/coinstats/db: 0000000000000000
     22025-03-30T18:08:00Z Migrating coinstatsindex to new format (v1). This might take a few minutes.
     32025-03-30T18:08:00Z Migrating block at height 240000
     42025-03-30T18:08:00Z Migrating block at height 230000
     52025-03-30T18:08:00Z Migrating block at height 220000
     62025-03-30T18:08:00Z Migrating block at height 210000
     72025-03-30T18:08:00Z Migrating block at height 200000
     82025-03-30T18:08:00Z Migrating block at height 190000
     92025-03-30T18:08:00Z Migrating block at height 180000
    102025-03-30T18:08:00Z Migrating block at height 170000
    112025-03-30T18:08:00Z Migrating block at height 160000
    122025-03-30T18:08:00Z Migrating block at height 150000
    132025-03-30T18:08:00Z Migrating block at height 140000
    142025-03-30T18:08:00Z Migrating block at height 130000
    152025-03-30T18:08:00Z Migrating block at height 120000
    162025-03-30T18:08:00Z [error] Coinstatsindex is corrupted at height 112516
    172025-03-30T18:08:00Z [error] coinstatsindex migration: Error while migrating to v1. In order to rebuild the index, remove the indexes/coinstats directory in your datadir
    
  18. DrahtBot removed the label CI failed on Mar 30, 2025
  19. fjahr force-pushed on Apr 24, 2025
  20. fjahr force-pushed on Apr 25, 2025
  21. fjahr commented at 9:20 pm on April 25, 2025: contributor
    Needed to reorg the commits a bit to get the test-every-commit CI to pass
  22. fjahr commented at 9:26 pm on April 25, 2025: contributor
    I guess this could use a bug label too if someone with permission can get around to that :)
  23. achow101 added this to the milestone 30.0 on Apr 25, 2025
  24. DrahtBot added the label CI failed on May 10, 2025
  25. maflcko commented at 8:25 pm on May 20, 2025: member
    Looks like CI failed
  26. fjahr force-pushed on May 21, 2025
  27. DrahtBot removed the label CI failed on May 22, 2025
  28. fjahr commented at 12:47 pm on May 22, 2025: contributor

    Looks like CI failed

    Thanks, I had missed that. Seems to have been a silent merge conflict. Kind of weird that only one environment failed but I have come up with an easier way to produce the corrupted db failure case and added some documentation on that as well.

  29. maflcko commented at 1:07 pm on May 22, 2025: member

    Kind of weird that only one environment failed

    There are not enough CI resources to re-run every task on every pull request every few days. The CI resources would have to be ~doubled for that. For now, only the lint and previous-releases task are re-run.

  30. fjahr commented at 1:17 pm on May 22, 2025: contributor

    Kind of weird that only one environment failed

    There are not enough CI resources to re-run every task on every pull request every few days. The CI resources would have to be ~doubled for that. For now, only the lint and previous-releases task are re-run.

    Good to know, thanks!

  31. in src/index/coinstatsindex.cpp:556 in 8031519207 outdated
    551+        return false;
    552+    }
    553+
    554+    // Loop backwards until we hit the genesis block which doesn't need to be updated
    555+    while (pindex->pprev) {
    556+        if (pindex->nHeight % 10000 == 0) LogInfo("Migrating block at height %i\n", pindex->nHeight);
    


    TheCharlatan commented at 8:28 pm on May 23, 2025:
    Nit: I think this message should be a little clearer. How about: “Migrating coinstatsindex, current block height %s”.
  32. TheCharlatan commented at 8:40 pm on May 23, 2025: contributor

    Approach ACK

    This all looks good to me, but I am not sure if embedding the database in our functional test suite is a good idea. I don’t think it is too much data, but I am not sure how stable it is in terms of file formats. Is this really compatible with all posix systems? Then again it could just be removed once/if it starts becoming unreliable.

  33. in test/functional/data/coinstatsindex_v0_corrupt/000006.log:1 in 73571b9c19


    maflcko commented at 7:41 am on May 26, 2025:

    It is “just” 20kB, but I think we’d want to avoid adding binary blobs to the repo at this point. At a minimum, it would be good to have a reason to do this.

    The alternative of just using a previous release to create it on-demand has many benefits:

    • It works on Windows as well
    • It doesn’t bloat the git history by 20+ kB forever
    • It is easier to review and easier to adapt

    fjahr commented at 10:29 am on May 26, 2025:

    This all looks good to me, but I am not sure if embedding the database in our functional test suite is a good idea. I don’t think it is too much data, but I am not sure how stable it is in terms of file formats. Is this really compatible with all posix systems? Then again it could just be removed once/if it starts becoming unreliable.

    It is “just” 20kB, but I think we’d want to avoid adding binary blobs to the repo at this point. At a minimum, it would be good to have a reason to do this.

    Happy to try other approaches that are more stable and easier to grasp, I did think about alternative approaches too. My personal “best” alternative I could come up with was to add a leveldb python library to our test framework and edit the db on the fly with it, e.g. plyvel. On the one hand, I am not sure if it’s worth adding such a dependency for this test case, on the other hand this could be used to simulate leveldb corruption for all kinds of tests all over the codebase so that may make it worth it. What are your thoughts on that?

    The alternative of just using a previous release to create it on-demand

    Maybe I am not getting it but how would you create the corrupt index with the previous release in the test? The only way I can think of is to mine a deterministic chain long enough to overflow the values just like it happened in signet. But that would require a lot of blocks to be mined and this would make the test extremely slow, probably prohibiting it from being run on the CI. And for being run locally the previous releases would be needed so I fear this test would hardly be run ever. So if this is the way to do it the downsides outweigh the benefits. But you probably have something else in mind that I am not able to think of right now…

    Or maybe you would discard the corruption test and only do the happy path test?


    maflcko commented at 10:56 am on May 26, 2025:

    Or maybe you would discard the corruption test and only do the happy path test?

    Yeah, I wonder if we can reliably test anything other than the happy path anyway. Generally, once UB happened, there is no reliable way to recover from it. While in practise here, most likely you’ll end up with negative values, I don’t think this is guaranteed by the C++ standard and probably not something to be able to rely on.

    Taking a step back, the size of the index is probably small and it is not too hard to re-create it. So an alternative to attempting to migrate it, could be to just have the index sit under a new name/folder for new releases from now. This should allow downgrades to previous releases to continue using the prior index on a best-effort-basis without user involvement. On upgrades, the new location is used without user involvement. A warning could be printed on every start to offer the user to delete the old index, if they don’t need it for use with previous releases?


    fjahr commented at 8:54 pm on June 2, 2025:

    Yeah, I wonder if we can reliably test anything other than the happy path anyway. Generally, once UB happened, there is no reliable way to recover from it. While in practise here, most likely you’ll end up with negative values, I don’t think this is guaranteed by the C++ standard and probably not something to be able to rely on.

    Sure, there is no guarantee that the check that the corruption test is testing will actually be hit for every corrupt index. I still think it would be good to have test coverage here but I am also happy to remove it (and implement the rest with a previous release) if other reviewers agree we only need the happy path.

    Taking a step back, the size of the index is probably small and it is not too hard to re-create it. So an alternative to attempting to migrate it, could be to just have the index sit under a new name/folder for new releases from now. This should allow downgrades to previous releases to continue using the prior index on a best-effort-basis without user involvement. On upgrades, the new location is used without user involvement. A warning could be printed on every start to offer the user to delete the old index, if they don’t need it for use with previous releases?

    Hmm, the size of the mainnet coinstats index on my machine is 160MB, but I’m not sure I like any amount of data that will be left over in the users datadir potentially forever. Worst case the user has also used the node for Testnet3 and Signet and it may be over 500MB total. I am not sure how many people notice such warnings and act on them, so I am kind of on the fence between these two solutions. Is there any precedent for something like this that could give some guidance @maflcko ? Nevertheless, it would indeed be a way simpler fix, I have sketched this out here: https://github.com/fjahr/bitcoin/tree/2024-07-csi-overflow-2-keep The fix commit is almost the same except for the folder name change. And then there is a simple compatibility test. The versioning code and migration are scrapped.

    I am currently undecided between (potentially) leaving data on users disk that may not be used and making downgrading possible. I will have to think about this a little bit more I would be happy to hear more feedback what other reviewers think. Maybe it’s even worth bringing it up on IRC to get some more input.


    maflcko commented at 5:31 am on June 3, 2025:
    Deleting the folder right away should be fine, too. (This is what this pull is doing?) An alternative would be to delay the deletion until the next major release and keep the old data for one release.

    fjahr commented at 10:46 am on June 4, 2025:
    No, it wasn’t deleting it because it sounded like it should be kept around but yeah, deleting makes sense startin from a future version. I will just go with >=31 for now, if this gets backported the version check can be adapted accordingly.
  34. fjahr force-pushed on Jun 4, 2025
  35. fjahr commented at 10:49 am on June 4, 2025: contributor
    I have pushed a new version with a simplified approach as suggested by @maflcko. This puts the fixed version into a new path and removes the old version after a grace period. This makes downgrading possible and uses less code. I think this approach isn’t too far off from the migration since syncing the index and running the migration are both processes that take some time.
  36. fjahr force-pushed on Jun 4, 2025
  37. DrahtBot added the label CI failed on Jun 4, 2025
  38. DrahtBot commented at 10:50 am on June 4, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/43456257400 LLM reason (✨ experimental): Lint check failure due to unused import errors detected by ruff.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  39. fjahr force-pushed on Jun 4, 2025
  40. DrahtBot removed the label CI failed on Jun 4, 2025
  41. DrahtBot added the label Needs rebase on Jun 12, 2025
  42. fjahr force-pushed on Jun 13, 2025
  43. fjahr renamed this:
    index: Fix coinstats overflow and introduce index versioning
    index: Fix coinstats overflow
    on Jun 13, 2025
  44. DrahtBot removed the label Needs rebase on Jun 13, 2025
  45. TheCharlatan commented at 12:13 pm on June 15, 2025: contributor
    The new approach seems fine to me, but the pull request description needs updating. In general I’m not too big a fan of leaving unused data lying around, but waiting for a deprecation cycle seems like a good compromise.
  46. fjahr commented at 7:35 pm on June 15, 2025: contributor
    Updated the description
  47. in src/index/coinstatsindex.cpp:122 in 1a1b7cf3c5 outdated
    118+        if (CLIENT_VERSION >= 310000) {
    119+            fs::remove_all(old_path);
    120+            LogInfo("Old version of coinstatsindex was found at %s and deleted.", fs::PathToString(old_path));
    121+        } else {
    122+            LogWarning("Old version of coinstatsindex found at %s. This folder can be safely deleted unless you " \
    123+                "plan to downgrade your node to version 29 or lower.", fs::PathToString(old_path));
    


    maflcko commented at 6:56 am on June 16, 2025:
    not sure about changing the software behavior based on the version number. This will cause test failures when the version is bumped after the branch off (or it will require to change the tests in the same commit that bumps the version). This seems odd. Also, at least one of the branches will be dead code. So I think it would be better to just inline the non-dead code and have the dead code in a code comment (or other comment) with a note to adjust it after the branch-off.

    fjahr commented at 11:01 pm on June 17, 2025:
    Removed the version specific removal and replaced it with a TODO to delete the index in the future.
  48. index: Fix coinstatsindex overflow issue
    The index originally stored cummulative values but this allowed for
    potential overflow issues which were observed on Signet. Fix this by
    storing per-block instead of cummulative values and use a different
    folder for this new version of the index.
    c32b174713
  49. test: Add coinstatsindex compatibility test 33668270ee
  50. in src/index/coinstatsindex.cpp:111 in 1a1b7cf3c5 outdated
    106@@ -106,16 +107,41 @@ std::unique_ptr<CoinStatsIndex> g_coin_stats_index;
    107 CoinStatsIndex::CoinStatsIndex(std::unique_ptr<interfaces::Chain> chain, size_t n_cache_size, bool f_memory, bool f_wipe)
    108     : BaseIndex(std::move(chain), "coinstatsindex")
    109 {
    110-    fs::path path{gArgs.GetDataDirNet() / "indexes" / "coinstats"};
    111+    // An earlier version of the index used "indexes/coinstats" but it contained
    112+    // a bug and is superceded by a fixed version at "indexes/coinstatsindex".
    


    maflcko commented at 7:50 am on June 16, 2025:
    superceded -> superseded [correct spelling of “superseded”]
    

    fjahr commented at 11:02 pm on June 17, 2025:
    fixed
  51. fjahr force-pushed on Jun 17, 2025
  52. TheCharlatan approved
  53. TheCharlatan commented at 9:01 am on June 18, 2025: contributor
    ACK 33668270eeecab4ec2a47d92a71f97abf9d0aceb
  54. DrahtBot requested review from mzumsande on Jun 18, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-28 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me