rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612

pull fjahr wants to merge 4 commits into bitcoin:master from fjahr:2024-03-pr26045-reopen changing 9 files +260 −75
  1. fjahr commented at 4:31 pm on March 10, 2024: contributor

    The second attempt at implementing the dumptxoutset space optimization as suggested in #25675. Closes #25675.

    This builds on the work done in #26045, addresses open feedback, adds some further improvements (most importantly usage of compact size), documentation, and an additional test.

    The original snapshot at height 830,000 came in at 10.82 GB. With this change, the same snapshot is 8.94 GB, a reduction of 17.4%.

    This also enhances the metadata of the output file and adds the following data to allow for better error handling and make future upgrades easier:

    • A newly introduced utxo set magic
    • A version number
    • The network magic
    • The block height
  2. DrahtBot commented at 4:31 pm on March 10, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, theStack, achow101
    Stale ACK mzumsande, Sjors

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29775 (Testnet4 including PoW difficulty adjustment fix by fjahr)
    • #29307 (util: check for errors after close and read in AutoFile by vasild)
    • #28670 (assumeutxo, rpc: Improve EOF error when reading snapshot metadata in loadtxoutset by pablomartin4btc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label RPC/REST/ZMQ on Mar 10, 2024
  4. DrahtBot added the label CI failed on Mar 10, 2024
  5. DrahtBot commented at 5:25 pm on March 10, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22485931752

  6. fjahr force-pushed on Mar 10, 2024
  7. fjahr force-pushed on Mar 10, 2024
  8. DrahtBot removed the label CI failed on Mar 10, 2024
  9. Sjors commented at 11:21 am on March 11, 2024: member

    我最近正在休假中

    Well, I’m not on holiday :-)

    tACK b0cfbce17a84158973ebba16f9801bd57cb70fe6

    I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

    To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

    I generated a snapshot and loaded it using the patch in #29551 (only waited for it to reach the tip, not a full background sync).

    For those testing who don’t want to wait out the long flush, see #28358.

  10. fjahr force-pushed on Mar 11, 2024
  11. fjahr force-pushed on Mar 11, 2024
  12. fjahr commented at 9:00 pm on March 11, 2024: contributor

    Thanks for the review and testing!

    I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

    To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

    As it is worded, wouldn’t this be more appropriate to add in CreateUTXOSnapshot? That’s where we are iterating with the pcursor, leveraging the sorting, and writing the coins to file, so I would say that’s also where the space savings are realized.

    I have now amended the comment a bit and added a version of it to both CreateUTXOSnapshot and PopulateAndValidateSnapshot. Please let me know if you think this is alright: https://github.com/bitcoin/bitcoin/commit/923ec9043af668f3a342422d32c1c28d29018ed0

  13. fjahr force-pushed on Mar 11, 2024
  14. Sjors commented at 8:24 am on March 12, 2024: member

    re-ACK 923ec9043af668f3a342422d32c1c28d29018ed0

    That looks good.

    The reason I suggested (also) writing something in the header is because it’s more likely to be noticed by some future dev looking through existing leveldb uses, seeing if they can replace it.

  15. in src/rpc/blockchain.cpp:2705 in 923ec9043a outdated
    2676     Coin coin;
    2677     unsigned int iter{0};
    2678+    std::vector<std::pair<uint32_t, Coin>> coins;
    2679+
    2680+    // To reduce space the serialization format of the snapshot avoids
    2681+    // dublication of tx hashes. The code takes advantage of the guarantee by
    


    TheCharlatan commented at 7:08 am on March 13, 2024:
    Nit: s/dublication/duplication/ (same in validation.h)

    fjahr commented at 12:58 pm on March 13, 2024:
    That is my favorite typo, can’t get enough of it ;D

    fjahr commented at 10:58 am on March 14, 2024:
    fixed
  16. in src/rpc/blockchain.cpp:2709 in 923ec9043a outdated
    2680+    // To reduce space the serialization format of the snapshot avoids
    2681+    // dublication of tx hashes. The code takes advantage of the guarantee by
    2682+    // leveldb that keys are lexicographically sorted.
    2683+    // In the coins vector we collect all coins that belong to a certain tx hash
    2684+    // (key.hash) and when we have them all (key.hash != last_hash) we write
    2685+    // them to file using the below lamda function.
    


    TheCharlatan commented at 7:08 am on March 13, 2024:
    Nit: s/lamda/lambda/

    fjahr commented at 10:58 am on March 14, 2024:
    fixed
  17. in src/rpc/blockchain.cpp:2705 in 923ec9043a outdated
    2675+    uint256 last_hash;
    2676     Coin coin;
    2677     unsigned int iter{0};
    2678+    std::vector<std::pair<uint32_t, Coin>> coins;
    2679+
    2680+    // To reduce space the serialization format of the snapshot avoids
    


    TheCharlatan commented at 7:31 am on March 13, 2024:
    Nit: Might it be useful to add a small diagram of the format to the dumptxoutset help output?

    fjahr commented at 1:01 pm on March 13, 2024:
    Sounds good, will think about what that could look like if I have to retouch or in the follow-up.

    fjahr commented at 10:58 am on March 14, 2024:
    I spent some time thinking about a diagram that would work but I wasn’t very happy what I could come up with. I instead copied the expression used in the issue. It shows the format in a concise way that I find understandable: list(Txid, list((vout,Coin)). Let me know what you think.

    TheCharlatan commented at 3:03 pm on March 20, 2024:
    Maybe also add the size of the coins list list(Txid, number_of_coins, list((vout,Coin)))?

    fjahr commented at 3:33 pm on March 20, 2024:
    I will add this if I have to retouch, but if not I can do this as a follow-up in one of my other assumeutxo PRs.
  18. in test/functional/feature_assumeutxo.py:107 in 923ec9043a outdated
    106-            [(1).to_bytes(4, "little"), 32, "9f4d897031ab8547665b4153317ae2fdbf0130c7840b66427ebc48b881cb80ad"],  # wrong outpoint index
    107-            [b"\x81", 36, "3da966ba9826fb6d2604260e01607b55ba44e1a5de298606b08704bc62570ea8"],  # wrong coin code VARINT((coinbase ? 1 : 0) | (height << 1))
    108-            [b"\x80", 36, "091e893b3ccb4334378709578025356c8bcb0a623f37c7c4e493133c988648e5"],  # another wrong coin code
    109+            # The coin size byte is skipped here and its manipulation is tested
    110+            # below because this will always cause parsing to fail, resulting in
    111+            # and error that is not a mismatch of the expected hash.
    


    TheCharlatan commented at 7:46 am on March 13, 2024:
    Nit: I’m a bit confused about this sentence. Is this supposed to be resulting in an error?

    fjahr commented at 12:57 pm on March 13, 2024:
    yepp, in an error would be right

    fjahr commented at 12:58 pm on March 13, 2024:
    Do you think it is clear otherwise?

    TheCharlatan commented at 1:11 pm on March 13, 2024:
    Could be a bit clearer I guess, but I don’t have a suggestion :P.

    fjahr commented at 10:59 am on March 14, 2024:
    Fixed and tried a different wording to make it more clear
  19. TheCharlatan approved
  20. TheCharlatan commented at 7:49 am on March 13, 2024: contributor

    Nice, ACK 923ec9043af668f3a342422d32c1c28d29018ed0

    Just some comment nits that could easily be done in a follow-up too.

  21. DrahtBot added the label Needs rebase on Mar 13, 2024
  22. fjahr force-pushed on Mar 14, 2024
  23. fjahr force-pushed on Mar 14, 2024
  24. fjahr force-pushed on Mar 14, 2024
  25. DrahtBot removed the label Needs rebase on Mar 14, 2024
  26. TheCharlatan approved
  27. TheCharlatan commented at 3:03 pm on March 20, 2024: contributor
    Re-ACK 71d41d536bc2f47a014c898891d77d58502c9554
  28. DrahtBot requested review from Sjors on Mar 20, 2024
  29. bitcoin deleted a comment on Mar 21, 2024
  30. in src/rpc/blockchain.cpp:2699 in e788980599 outdated
    2693@@ -2693,21 +2694,42 @@ UniValue CreateUTXOSnapshot(
    2694 
    2695     afile << metadata;
    2696 
    2697+    std::map<uint256, std::vector<std::pair<uint32_t, Coin>>> mapCoins;
    2698     COutPoint key;
    2699+    uint256 last_hash;
    


    theStack commented at 4:37 pm on March 25, 2024:

    nit, if you retouch: slightly clearer imho, that’s the type and naming also used in the snapshot loading routine (ChainstateManager::PopulateAndValidateSnapshot)

    0    Txid last_txid;
    

    fjahr commented at 10:36 pm on April 21, 2024:
    done
  31. theStack commented at 4:40 pm on March 25, 2024: contributor
    Concept ACK
  32. in src/rpc/blockchain.cpp:2733 in e788980599 outdated
    2721+            if (key.hash != last_hash) {
    2722+                write_coins_to_file(afile, last_hash, coins);
    2723+                last_hash = key.hash;
    2724+                coins.clear();
    2725+            }
    2726+            coins.emplace_back(key.n, coin);
    


    mzumsande commented at 9:35 pm on March 27, 2024:
    slightly unrelated nit: might be useful to assert that we actually wrote the right number of coins (we just take maybe_stats->coins_count and wouldn’t abort if the actual number of coins written was different becaus the db changed for some reason). I’m sure #15606 (review) is correct, but checking it explicitly seems better.

    fjahr commented at 10:37 pm on April 21, 2024:
    I added a counter and check it at the end
  33. mzumsande commented at 9:43 pm on March 27, 2024: contributor

    Light ACK 71d41d536bc2f47a014c898891d77d58502c9554

    I tested this on signet and it worked well, but I’m not very familiar with our serialization code.

  34. DrahtBot requested review from mzumsande on Mar 27, 2024
  35. DrahtBot requested review from theStack on Mar 27, 2024
  36. in src/validation.cpp:5580 in e788980599 outdated
    5597+            Txid txid;
    5598+            coins_file >> txid;
    5599+            size_t size{0};
    5600+            size = ReadCompactSize(coins_file);
    5601+
    5602+            if(size > coins_left) {
    


    theStack commented at 9:24 pm on March 31, 2024:

    spacing nit:

    0            if (size > coins_left) {
    

    fjahr commented at 10:37 pm on April 21, 2024:
    fixed
  37. in src/validation.cpp:5578 in e788980599 outdated
    5595-            return false;
    5596-        }
    5597+            Txid txid;
    5598+            coins_file >> txid;
    5599+            size_t size{0};
    5600+            size = ReadCompactSize(coins_file);
    


    theStack commented at 9:28 pm on March 31, 2024:

    nit:

    0            size_t size{ReadCompactSize(coins_file)};
    

    also, size is a quite generic name, would prefer something more explicit like coins_per_txid


    fjahr commented at 10:37 pm on April 21, 2024:
    renamed
  38. theStack approved
  39. theStack commented at 9:35 pm on March 31, 2024: contributor

    Code-review ACK 71d41d536bc2f47a014c898891d77d58502c9554

    (Fwiw, I rebased #27432 on this PR and adapted the utxo-to-sqlite3 tool to the new serialization format.)

  40. Sjors commented at 10:35 am on April 2, 2024: member

    re-utACK 71d41d536bc2f47a014c898891d77d58502c9554

    Only test (rebase) and documentation changes since my last review.

  41. fjahr commented at 1:26 pm on April 2, 2024: contributor
    Thanks all for the review, unless I have to retouch (I hope not since this has collected 4 ACKs) I will address the open comments in a follow-up.
  42. in src/rpc/blockchain.cpp:2697 in e788980599 outdated
    2693@@ -2693,21 +2694,42 @@ UniValue CreateUTXOSnapshot(
    2694 
    2695     afile << metadata;
    2696 
    2697+    std::map<uint256, std::vector<std::pair<uint32_t, Coin>>> mapCoins;
    


    achow101 commented at 7:24 pm on April 2, 2024:

    In e7889805999ed312a5af12a2c08c548c19403bd9 “rpc: Optimize serialization disk space of dumptxoutset”

    mapCoins is never used.


    fjahr commented at 10:37 pm on April 21, 2024:
    leftover from a previous approach, removed
  43. achow101 commented at 8:08 pm on April 2, 2024: member

    dumptxoutset is a RPC that has been available for several releases, including those prior to the completion of assumeutxo validation code. Changing the format now is a breaking change - utxo sets dumped prior to the PR are incompatible with any version with this PR merged, and utxo sets dumped after this PR are incompatible with all prior versions.

    Conceivably, in the future, someone could run an older node and create a utxo set snapshot to setup a new node with more recent software. This would then fail as the formats are incompatible. Currently the failure does not provide much detail on what actually went wrong. The RPC error is just “Unable to load UTXO snapshot” while the log does say “bad snapshot format or truncated snapshot after deserializing 0 coins”. However that is not an indication of a version incompatibility, and I think it would be better if the new format could include a version number so that this could be properly reported.

    Having a version number would also make it easy to distinguish which snapshot format to deserialize as so we can avoid these kinds of errors. We could detect which format was used for serialization and proceed to deserialize as necessary.

    The reverse is also a possibility - a new node creates a snapshot with the new format and tries to load it into older software, but this seems unlikely to actually happen. But, it would also be trivial to add an option that allows for serializing in the old format for those that need it.

    I think this also needs a release note about the format change as utxo set snapshots are useful for more than just snapshot loading.

  44. Sjors commented at 7:46 am on April 3, 2024: member

    I think it would be better if the new format could include a version number so that this could be properly reported.

    This seems useful in general.

    We could detect which format was used for serialization and proceed to deserialize as necessary.

    Yes, but I don’t think we need to support the format from before this PR - once we have the mainnet params, the instruction could simply be to use v28.0 or newer to generate the snapshot.

  45. mzumsande commented at 9:14 am on April 3, 2024: contributor

    utxo set snapshots are useful for more than just snapshot loading.

    Do you know of any examples of the utxo dumps already being used for something else?

  46. Sjors commented at 2:38 pm on April 3, 2024: member
    @mzumsande I don’t know, but it seems such applications should use a more portable format like SQL, see #27432
  47. theStack commented at 6:48 pm on April 3, 2024: contributor

    I think it would be better if the new format could include a version number so that this could be properly reported.

    If we decide to do that, I’d suggest to use the chance and go one step further, by also adding a magic number to the start of the file. This way, it can be quickly identified as being / not being an UTXO dump (for better error reporting, but longer-term also potentially for external identifying tools like file). Then we could provide more helpful error message categories like

    • “this is not an UTXO dump”
    • “this is an UTXO dump, but in a version that is not supported”
    • “this is a supported UTXO dump file, but something else is wrong”

    as opposed to the rather confusing “hash is not in the chain” message that currently appears for pretty much all files passed to loadtxoutset that are larger than a certain amount of bytes. Random example with a text file (both on master and this PR):

    0$ ./src/bitcoin-cli loadtxoutset /home/thestack/.vimrc
    1error code: -32603
    2error message:
    3Unable to load UTXO snapshot, assumeutxo block hash in snapshot metadata not recognized (626174207465730a7265626d756e207465730a656c62616e65207861746e7973)
    
  48. achow101 commented at 8:16 pm on April 3, 2024: member

    Do you know of any examples of the utxo dumps already being used for something else?

    For myself, I have a couple of random scripts that read utxo dumps in order to compute some statistics and other info on the utxo set. I’m not aware of any actual projects using utxo snapshots, but I also haven’t actively looked.

  49. Sjors commented at 8:16 am on April 5, 2024: member

    I like magic bytes. For PSBT it’s just “psbt” in ascii followed by 0xff: https://en.bitcoin.it/wiki/BIP_0174

    So you could use “utxo” in ascii: 0x75 0x74 0x78 0x6F 0xFF

    But then while you’re at it, you could also append:

    • ChainTypeToString (“main”, “testnet”, “signet”, “regtest”)
    • height (so we can have an error like “you tried to load a snapshot at height X, but only heights Y, Z are supported by this version of PACKAGE_NAME. Newer releases may support more recent snapshots.”)
    • for signet: the signet challenge bytes
  50. fjahr force-pushed on Apr 21, 2024
  51. fjahr commented at 10:40 pm on April 21, 2024: contributor

    So far I have addressed the minor feedback in the latest push and I am now working on the extended metadata for the snapshot. I am currently planning to add the following:

    1. magic bytes - the ones suggested by @Sjors
    2. snapshot version
    3. Network magic (pchMessageStart) - This lets us distinguish between the different networks and since it’s generated from the challenge for signets, we can deduce that it’s probably a custom signet if we don’t recognize them and give an appropriate feedback to the user
    4. block height

    Let me know if you disagree with this choice or have something else to add, otherwise I will have the code ready soon.

  52. DrahtBot added the label CI failed on Apr 21, 2024
  53. DrahtBot commented at 11:43 pm on April 21, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/24078947679

  54. Sjors commented at 9:13 am on April 22, 2024: member
    @fjahr sounds good. Let me know when that’s ready and I’ll bake a halving block snapshot…
  55. fjahr force-pushed on Apr 26, 2024
  56. fjahr force-pushed on Apr 26, 2024
  57. fjahr force-pushed on Apr 26, 2024
  58. fjahr force-pushed on Apr 26, 2024
  59. fjahr commented at 11:16 pm on April 26, 2024: contributor
    Alright, ready for review again. I hope I have addressed all the ideas that were mentioned as intended. Sorry but the enhancement of the metadata is now one big commit for now, I can try to split it up if it makes review easier.
  60. fjahr renamed this:
    rpc: Optimize serialization disk space of dumptxoutset - Reloaded
    rpc: Optimize serialization and enhance metadata of dumptxoutset output
    on Apr 26, 2024
  61. fjahr force-pushed on Apr 27, 2024
  62. DrahtBot removed the label CI failed on Apr 27, 2024
  63. in src/kernel/chainparams.h:188 in 6ed6bffcbe outdated
    184@@ -183,4 +185,6 @@ class CChainParams
    185     ChainTxData chainTxData;
    186 };
    187 
    188+std::optional<std::string> GetNetworkForMagic(MessageStartChars& pchMessageStart);
    


    Sjors commented at 9:05 am on April 29, 2024:
    6ed6bffcbedb6a28c3578721cc0a08086a15e417: this doesn’t seem kernel-worthy. chaintype.h is a better home for it, next to ChainTypeToString.

    TheCharlatan commented at 1:51 pm on April 30, 2024:
    I think this is fine where it is, I like keeping these single type utilities small.
  64. Sjors commented at 10:09 am on April 29, 2024: member

    I can try to split it up if it makes review easier.

    Doesn’t matter that much, but it’s probably to put the kernel/chainparams.h change in its own commit so e.g. @TheCharlatan can do a drive-by review.

    Here’s some new snapshot torrents (not sure if I’m seeding them correctly…).

    Torrent for testnet: magnet:?xt=urn:btih:787f5917029876c83301c49dc97bb69224597285&dn=utxo-testnet-2500000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

    Torrent for signet: magnet:?xt=urn:btih:715a8797d6154a2474f225f6019364b6230eed17&dn=utxo-signet-160000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

    Torrent for mainnet at the halving block (I’ll update #28553 to use this): magnet:?xt=urn:btih:55cb2ea9b6c48aa11d676de871f9639e3fa3dd7e&dn=utxo-840000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

  65. in src/kernel/chainparams.cpp:566 in 6ed6bffcbe outdated
    561+    const auto regtest_msg = CChainParams::RegTest(regtest_opts)->MessageStart();
    562+    auto signet_opts = CChainParams::SigNetOptions{};
    563+    const auto signet_msg = CChainParams::SigNet(signet_opts)->MessageStart();
    564+
    565+    if (std::equal(message.begin(), message.end(), mainnet_msg.data())) {
    566+        return "Mainnet";
    


    TheCharlatan commented at 1:49 pm on April 30, 2024:
    I’m surprised we never had to match the magic bytes to a chaintype. Instead of defining new strings here though, could you use ChainTypeToString(ChainType::MAIN)?

    fjahr commented at 4:59 am on May 5, 2024:
    Me too :) Yes, that makes sense. I opted for the function to return the ChainType and then I use ChainTypeToString on the result.
  66. in src/node/utxo_snapshot.h:22 in 6ed6bffcbe outdated
    17@@ -16,17 +18,24 @@
    18 #include <optional>
    19 #include <string_view>
    20 
    21+// UTXO set snapshot magic bytes
    22+static constexpr uint8_t SNAPSHOT_MAGIC_BYTES[5] = {'u', 't', 'x', 'o', 0xff};
    


    theStack commented at 3:16 am on May 2, 2024:

    nit: could use std::array, and use its methods (.begin(), .end(), .size()) in Unserialize below to avoid magic numbers

    0static constexpr std::array<uint8_t, 5> SNAPSHOT_MAGIC_BYTES = {'u', 't', 'x', 'o', 0xff};
    

    fjahr commented at 5:00 am on May 5, 2024:
    done
  67. in src/node/utxo_snapshot.h:37 in 6ed6bffcbe outdated
    32+    const std::set<uint16_t> m_supported_versions{1};
    33 public:
    34     //! The hash of the block that reflects the tip of the chain for the
    35     //! UTXO set contained in this snapshot.
    36     uint256 m_base_blockhash;
    37+    int m_base_blockheight;
    


    theStack commented at 3:50 am on May 2, 2024:

    nit: for variables that are (de)serialized, it’s imho preferred if the data type size is explicit

    0    uint32_t m_base_blockheight;
    

    fjahr commented at 5:00 am on May 5, 2024:
    done
  68. theStack commented at 3:55 am on May 2, 2024: contributor
    Concept ACK on the enhanced metadata, I think this includes now everything we need. Will update #27432 in a bit to the new format and test.
  69. DrahtBot added the label Needs rebase on May 2, 2024
  70. fjahr force-pushed on May 5, 2024
  71. fjahr commented at 5:02 am on May 5, 2024: contributor
    Rebased, resolved conflicts (the functional tests have changed a bit now), and addressed the open feedback. Thanks everyone for reviewing!
  72. DrahtBot removed the label Needs rebase on May 5, 2024
  73. in src/kernel/chainparams.cpp:561 in b2cf3f77b1 outdated
    556+
    557+std::optional<ChainType> GetNetworkForMagic(MessageStartChars& message) {
    558+    const auto mainnet_msg = CChainParams::Main()->MessageStart();
    559+    const auto testnet_msg = CChainParams::TestNet()->MessageStart();
    560+    auto regtest_opts = CChainParams::RegTestOptions{};
    561+    const auto regtest_msg = CChainParams::RegTest(regtest_opts)->MessageStart();
    


    TheCharlatan commented at 1:31 pm on May 11, 2024:
    In commit b2cf3f77b14bccf11ad5bc211db7f746ae5073fb: Nit: Could just do const auto regtest_msg = CChainParams::RegTest({})->MessageStart(); here and for the signet_msg.

    fjahr commented at 12:16 pm on May 21, 2024:
    done
  74. in src/kernel/chainparams.cpp:17 in b2cf3f77b1 outdated
    13@@ -14,6 +14,7 @@
    14 #include <logging.h>
    15 #include <primitives/block.h>
    16 #include <primitives/transaction.h>
    17+#include <protocol.h>
    


    TheCharlatan commented at 1:36 pm on May 11, 2024:
    In commit b2cf3f77b14bccf11ad5bc211db7f746ae5073fb: Is this include from a prior iteration? It seems unused (also in the header file).

    fjahr commented at 12:16 pm on May 21, 2024:
    Yeah, seems so, done
  75. in src/rpc/blockchain.cpp:2715 in b2cf3f77b1 outdated
    2711@@ -2712,7 +2712,7 @@ UniValue CreateUTXOSnapshot(
    2712     auto write_coins_to_file = [&](AutoFile& afile, const Txid& last_hash, const std::vector<std::pair<uint32_t, Coin>>& coins, size_t& written_coins_count) {
    2713         afile << last_hash;
    2714         WriteCompactSize(afile, coins.size());
    2715-        for (auto [n, coin] : coins) {
    2716+        for (const auto& [n, coin] : coins) {
    


    TheCharlatan commented at 2:00 pm on May 11, 2024:
    In commit b2cf3f77b14bccf11ad5bc211db7f746ae5073fb: Nit: I think this should be moved to the first commit.

    fjahr commented at 12:16 pm on May 21, 2024:
    done
  76. in src/kernel/chainparams.cpp:547 in b2cf3f77b1 outdated
    542@@ -542,3 +543,33 @@ std::unique_ptr<const CChainParams> CChainParams::TestNet()
    543 {
    544     return std::make_unique<const CTestNetParams>();
    545 }
    546+
    547+std::vector<int> CChainParams::GetAvailableSnapshotHeights() const {
    


    TheCharlatan commented at 2:13 pm on May 11, 2024:
    In commit b2cf3f77b14bccf11ad5bc211db7f746ae5073fb Nit (clang-format-diff): Open braces on new line (here and the other new functions and methods below).

    fjahr commented at 12:16 pm on May 21, 2024:
    done
  77. TheCharlatan commented at 2:18 pm on May 11, 2024: contributor
    Nice, just some small comments, will review again quickly once they are addressed.
  78. in src/node/utxo_snapshot.h:70 in b2cf3f77b1 outdated
    66+    inline void Unserialize(Stream& s) {
    67+        // Read the snapshot magic bytes
    68+        std::array<uint8_t, SNAPSHOT_MAGIC_BYTES.size()> snapshot_magic;
    69+        s >> snapshot_magic;
    70+        if (snapshot_magic != SNAPSHOT_MAGIC_BYTES) {
    71+            throw std::ios_base::failure("Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format.", std::error_code{});
    


    theStack commented at 8:30 pm on May 13, 2024:

    nit: seems like the second parameter (std::error_code{}) isn’t needed (here and in all other instances below)

    0            throw std::ios_base::failure("Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format.");
    

    fjahr commented at 12:16 pm on May 21, 2024:
    done
  79. in src/rpc/blockchain.cpp:2821 in b2cf3f77b1 outdated
    2817+        auto available_heights = chainman.GetParams().GetAvailableSnapshotHeights();
    2818+        std::ostringstream oss;
    2819+        for (auto it = available_heights.begin(); it != available_heights.end(); ++it) {
    2820+            oss << (it != available_heights.begin() ? ", " : "") << *it;
    2821+        }
    2822+        std::string heights_formatted = oss.str();
    


    theStack commented at 9:00 pm on May 13, 2024:

    nit: could use our fancy Join helper here (available in util/string.h)

    0        std::string heights_formatted = Join(available_heights, ", ", [&](const auto& i) { return ToString(i); });
    

    fjahr commented at 12:16 pm on May 21, 2024:
    Cool, done!
  80. rpc: Optimize serialization disk space of dumptxoutset
    Co-authored-by: Aurèle Oulès <aurele@oules.com>
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    de95953d87
  81. assumeutxo: Add test for changed coin size value c14ed7f384
  82. assumeutxo: Add documentation on dumptxoutset serialization format 4d8e5edbaa
  83. rpc: Enhance metadata of the dumptxoutset output
    The following data is added:
    - A newly introduced utxo set magic
    - A version number
    - The network magic
    - The block height
    542e13b293
  84. fjahr force-pushed on May 21, 2024
  85. fjahr commented at 12:17 pm on May 21, 2024: contributor

    Addressed feedback from @TheCharlatan and @theStack , thank you!

    Also changed the error code for the parsing of the metadata which was feedback in another PR: #28670 (review)

  86. TheCharlatan approved
  87. TheCharlatan commented at 1:15 pm on May 21, 2024: contributor
    Re-ACK 542e13b2937356810bda2c41be83c3b1675e2f2f
  88. DrahtBot requested review from theStack on May 21, 2024
  89. DrahtBot requested review from Sjors on May 21, 2024
  90. in src/rpc/blockchain.cpp:2699 in de95953d87 outdated
    2695@@ -2695,24 +2696,48 @@ UniValue CreateUTXOSnapshot(
    2696     afile << metadata;
    2697 
    2698     COutPoint key;
    2699+    Txid last_hash;
    


    theStack commented at 9:59 pm on May 21, 2024:
    nit: would still prefer last_txid over last_hash as it’s more expressive (OTOH, in COutPoint, it’s also called hash)

    fjahr commented at 7:01 am on May 22, 2024:
    Sorry, I must have overlooked that if you already mentioned it before when I changed the type. I will address it if I have to retouch.
  91. theStack approved
  92. theStack commented at 9:59 pm on May 21, 2024: contributor
    ACK 542e13b2937356810bda2c41be83c3b1675e2f2f
  93. achow101 commented at 4:21 pm on May 23, 2024: member
    ACK 542e13b2937356810bda2c41be83c3b1675e2f2f
  94. achow101 added the label Needs release note on May 23, 2024
  95. achow101 merged this on May 23, 2024
  96. achow101 closed this on May 23, 2024

  97. in src/node/utxo_snapshot.h:58 in 542e13b293
    54-    SERIALIZE_METHODS(SnapshotMetadata, obj) { READWRITE(obj.m_base_blockhash, obj.m_coins_count); }
    55+    template <typename Stream>
    56+    inline void Serialize(Stream& s) const {
    57+        s << SNAPSHOT_MAGIC_BYTES;
    58+        s << m_version;
    59+        s << Params().MessageStart();
    


    maflcko commented at 4:49 pm on May 23, 2024:
    nit: It would be nice, if new code didn’t use globals implicitly. What about storing a reference to the prams (or the magic) in a member field of this struct and using it here? An alternative would be to use params-serialization and embed the magic bytes during serialization.

    fjahr commented at 10:45 am on May 24, 2024:
    I addressed this in #30167 using the member field. Using params-serialization wouldn’t have made much of a difference given how we use this code right now, I think.
  98. in src/validation.cpp:5750 in 542e13b293
    5746@@ -5735,7 +5747,8 @@ bool ChainstateManager::PopulateAndValidateSnapshot(
    5747 
    5748     bool out_of_coins{false};
    5749     try {
    5750-        coins_file >> outpoint;
    5751+        Txid txid;
    


    maflcko commented at 5:05 pm on May 23, 2024:
    nit: I think this should check that the stream is completely empty, no? I suspect that a trailing byte will be left undetected, so it may be better to deserialize a std::byte here?

    fjahr commented at 10:45 am on May 24, 2024:
    done in #30167
  99. in src/validation.cpp:5677 in 542e13b293
    5694+            coins_file >> txid;
    5695+            size_t coins_per_txid{0};
    5696+            coins_per_txid = ReadCompactSize(coins_file);
    5697+
    5698+            if (coins_per_txid > coins_left) {
    5699+                LogPrintf("[snapshot] mismatch in coins count in snapshot metadata and actual snapshot data\n");
    


    maflcko commented at 5:06 pm on May 23, 2024:
    Looks like this is new code, but uncovered. It would be nice to have a test for this.

    fjahr commented at 10:45 am on May 24, 2024:
    Added a test for this in #30167
  100. fjahr referenced this in commit de32700dc9 on May 24, 2024
  101. fjahr referenced this in commit 359967e310 on May 24, 2024
  102. fjahr commented at 10:45 am on May 24, 2024: contributor
    Added release notes and dddressed post-merge comments in #30167
  103. fanquake referenced this in commit 80bdd4b6be on Jun 3, 2024
  104. fanquake removed the label Needs release note on Jun 10, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me