Switch BlockMap to use an unordered_set under the hood #19677

pull JeremyRubin wants to merge 1 commits into bitcoin:master from JeremyRubin:refactor-blockmap-blockset changing 9 files +93 −83
  1. JeremyRubin commented at 5:22 am on August 7, 2020: contributor

    Currently we use an unordered_map<uint256, CBlockIndex*> for our BlockMap. This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex. It would be conceptually simpler if we used a unordered_map<CBlockIndex> and modified CBlockIndex to directly own the uint256 for the phashBlock.

    That’s what this patch attempts doing. The only additional complexity is where we do want to query the map by key, the key equivalence stuff is only in C++20, so we add a helper that lets us construct a mock element easily for using with find.

    The result is nicer since we get rid of an additional indirection for accessing the phashblock and we get rid of a lot of std::pair de-structuring. We also save a bit of memory per index (I haven’t computed it precisely, but my guess is we save something like 24 bytes total per index by eliminating 1 pointer, which saves 8 bytes, and then could save us something like 16 bytes of padding). We also no longer can be in an inconsistent state where phashblock does not point to the entries own hash (although modifying phashblock once it is inserted into the map would be invalid – it must be removed, modified, and reinserted).

    Future work can likely further improve on this by making CBlockIndex owned* by the unordered_set directly, which will eliminate even more indirection/pointer chasing and make a lot of the code using BlockMap simpler (I don’t think we ever rely on CBlockIndex being non-owned anywhere, nor should we have reason to in the future).

    0     References and pointers to data stored in the container are only invalidated by erasing that element, even when the corresponding iterator is invalidated. 
    
  2. fanquake added the label Refactoring on Aug 7, 2020
  3. fanquake added the label Validation on Aug 7, 2020
  4. in src/validation.cpp:1879 in c61c74edd6 outdated
    1875@@ -1875,8 +1876,8 @@ static unsigned int GetBlockScriptFlags(const CBlockIndex* pindex, const Consens
    1876     // mainnet and testnet), so for simplicity, always leave P2SH
    1877     // on except for the one violating block.
    1878     if (consensusparams.BIP16Exception.IsNull() || // no bip16 exception on this chain
    1879-        pindex->phashBlock == nullptr || // this is a new candidate block, eg from TestBlockValidity()
    1880-        *pindex->phashBlock != consensusparams.BIP16Exception) // this block isn't the historical exception
    1881+        pindex->m_hash_block.IsNull() || // this is a new candidate block, eg from TestBlockValidity()
    


    JeremyRubin commented at 5:24 am on August 7, 2020:
    is this kosher 100% of the time as a replacement for nullptr?

    MarcoFalke commented at 3:28 pm on March 25, 2022:
    This should fix itself in the next rebase?
  5. JeremyRubin force-pushed on Aug 7, 2020
  6. DrahtBot commented at 7:15 am on August 7, 2020: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19865 (scripted-diff: Restore AssertLockHeld after #19668, remove LockAssertion by ryanofsky)
    • #19556 (Remove mempool global by MarcoFalke)
    • #19438 (Introduce deploymentstatus by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. JeremyRubin force-pushed on Aug 7, 2020
  8. JeremyRubin force-pushed on Aug 7, 2020
  9. JeremyRubin force-pushed on Aug 7, 2020
  10. JeremyRubin force-pushed on Aug 7, 2020
  11. in src/chain.h:233 in 99a12742de outdated
    231@@ -232,7 +232,7 @@ class CBlockIndex
    232 
    233     uint256 GetBlockHash() const
    


    promag commented at 8:22 am on August 10, 2020:
    nit, could change to const uint256&? Or drop it?

    JeremyRubin commented at 5:20 pm on August 10, 2020:
    done!
  12. promag commented at 8:26 am on August 10, 2020: member

    Concept ACK!

    This is a bit of a peculiar choice because we then store a pointer to the uint256 from the pair inside of the CBlockIndex

    Indeed.

  13. JeremyRubin force-pushed on Aug 10, 2020
  14. JeremyRubin renamed this:
    WIP: Switch BlockMap to use an unordered_set under the hood
    Switch BlockMap to use an unordered_set under the hood
    on Aug 10, 2020
  15. JeremyRubin commented at 7:29 pm on August 10, 2020: contributor

    Thanks for the review! Converted GetBlockHash to be a const ref, good suggestion!

    I’ve removed the WIP status here as I eliminated the earlier bug.

  16. in src/bench/rpc_blockchain.cpp:25 in 0aa7e944b8 outdated
    21@@ -22,7 +22,7 @@ static void BlockToJsonVerbose(benchmark::Bench& bench)
    22 
    23     CBlockIndex blockindex;
    24     const uint256 blockHash = block.GetHash();
    25-    blockindex.phashBlock = &blockHash;
    26+    blockindex.m_hash_block = blockHash;
    


    promag commented at 10:55 am on August 11, 2020:
    0blockindex.m_hash_block = block.GetHash();
    

    JeremyRubin commented at 5:05 pm on August 11, 2020:
    done
  17. in src/chain.h:140 in 0aa7e944b8 outdated
    137@@ -138,7 +138,7 @@ class CBlockIndex
    138 {
    139 public:
    140     //! pointer to the hash of the block, if any. Memory is owned by this CBlockIndex
    


    promag commented at 10:56 am on August 11, 2020:
    Update comment.

    JeremyRubin commented at 5:05 pm on August 11, 2020:
    done
  18. in src/chain.h:233 in 0aa7e944b8 outdated
    229@@ -230,9 +230,9 @@ class CBlockIndex
    230         return block;
    231     }
    232 
    233-    uint256 GetBlockHash() const
    234+    const uint256& GetBlockHash() const
    


    promag commented at 10:57 am on August 11, 2020:
    Honestly I think this could be dropped. It was a convenience getter to dereference the pointer. And it’s confusing whether this should be used instead of m_hash_block.

    JeremyRubin commented at 4:59 pm on August 11, 2020:
    Better for follow up work; this is called in a huge number of places throughout the codebase.
  19. in src/test/fuzz/chain.cpp:23 in 0aa7e944b8 outdated
    19@@ -20,7 +20,7 @@ void test_one_input(const std::vector<uint8_t>& buffer)
    20     }
    21 
    22     const uint256 zero{};
    23-    disk_block_index->phashBlock = &zero;
    24+    disk_block_index->m_hash_block = zero;
    


    promag commented at 11:00 am on August 11, 2020:
    Could remove? Already null right?

    JeremyRubin commented at 5:05 pm on August 11, 2020:
    done
  20. in src/test/fuzz/chain.cpp:62 in 0aa7e944b8 outdated
    58@@ -59,7 +59,7 @@ void test_one_input(const std::vector<uint8_t>& buffer)
    59     }
    60 
    61     CBlockIndex block_index{block_header};
    62-    block_index.phashBlock = &zero;
    63+    block_index.m_hash_block = zero;
    


    promag commented at 11:04 am on August 11, 2020:
    Could be removed.

    JeremyRubin commented at 5:05 pm on August 11, 2020:
    done
  21. JeremyRubin force-pushed on Aug 11, 2020
  22. Refactor BlockMap to use an unordered_set instead of an unordered_map fdc2f15c6b
  23. JeremyRubin force-pushed on Aug 11, 2020
  24. DrahtBot added the label Needs rebase on Sep 7, 2020
  25. DrahtBot commented at 9:15 am on September 7, 2020: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  26. DrahtBot commented at 11:21 am on December 15, 2021: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  27. JeremyRubin commented at 8:19 pm on December 15, 2021: contributor
    if there is a concept ACK from someone who would merge it on this approach I will rebase it, but since it touches a lot of places I don’t have time to rebase it continually.
  28. MarcoFalke commented at 8:38 pm on December 15, 2021: member

    Yeah, it is non-trivial to review so hard to tell.

    I wonder if this tiny layers can be peeled off of this or if this needs to be one chunk. Maybe the phashBlock change can be split out?

  29. JeremyRubin commented at 8:45 pm on December 15, 2021: contributor

    hmmm…

    it’s possible? but why?

    I see two split-up paths:

    Make phashblock m_hash_block and keep the map structure first

    This uses 24 bytes more memory at first, but doesn’t change the lookup data structure.

    Next, we could change the data structure to an indirect_map type and change the key to a pointer.

    Then, we could change to a set.

    Make the data structure a set first and keep phashblock

    This changes the data structure, and changes all the lookup sites.

    No extra storage, no savings.

    Uses of phashblock remain consistent, although it’s just a self-pointer.

    phashblock could become an inline method too, so you don’t change the memory usage, but do need to add ().

    Eventually switch to using m_hash_block.

  30. DrahtBot commented at 1:07 pm on March 21, 2022: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  31. MarcoFalke commented at 2:04 pm on March 21, 2022: member
    Is this still relevant after #24050?
  32. JeremyRubin commented at 2:47 pm on March 21, 2022: contributor
    yes, it’s completely orthogonal to https://github.com/bitcoin/bitcoin/pull/24050
  33. in src/validation.h:109 in fdc2f15c6b
    105+    mutable CBlockIndex mock;
    106+    BlockHasher() : mock() {};
    107+    size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
    108+    // Helper for querying by hash
    109+    // e.g., map.find(map.hash_function()(h))
    110+    CBlockIndex* operator()(const uint256& hash) {
    


    ryanofsky commented at 7:26 pm on March 21, 2022:

    In commit “Refactor BlockMap to use an unordered_set instead of an unordered_map” (fdc2f15c6b821f8ac0936cd35272a37305801723)

    I was confused by the idx.hash_function()(hash) stuff initially, and think it could be clearer if you gave this method a different name like FakeBlockIndex instead of operator(). idx.hash_function().FakeBlockIndex(hash) would give a clue that the method is returning a fake block index pointer, not taking the hash of a hash, and also avoid overloading the same operator in two ways that do different things conceptually.

    Also I guess you are using a shared mock CBlockIndex instance rather than creating temporary CBlockIndex objects for performance reasons, but it it would be interesting to know if taking a more straightforward with temporary mock objects would actually hurt performance:

    0struct FakeBlockIndex : public CBlockIndex
    1{
    2    FakeBlockIndex(const uint256& hash) { m_hash_block = hash; }
    3    FakeBlockIndex* pointer() { return this; }
    4};
    

    Since it would allow simplifying:

    0-        BlockMap::const_iterator  it = m_blockman.m_block_index.find(m_blockman.m_block_index.hash_function()(hashAssumeValid));
    1+        BlockMap::const_iterator  it = m_blockman.m_block_index.find(FakeBlockIndex(hashAssumeValid).pointer());
    

    and:

     0 struct BlockHasher
     1 {
     2     // this used to call `GetCheapHash()` in uint256, which was later moved; the
     3     // cheap hash function simply calls ReadLE64() however, so the end result is
     4     // identical
     5-    mutable CBlockIndex mock;
     6-    BlockHasher() : mock() {};
     7     size_t operator()(CBlockIndex* const& ptr) const { return ReadLE64(ptr->GetBlockHash().begin()); }
     8-    // Helper for querying by hash
     9-    // e.g., map.find(map.hash_function()(h))
    10-    CBlockIndex* operator()(const uint256& hash) {
    11-        mock.m_hash_block = hash;
    12-        return &mock;
    13-    }
    14 };
    

    JeremyRubin commented at 8:31 pm on March 21, 2022:

    I think it would since CBlockIndex is a very big struct?

    but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere


    ryanofsky commented at 9:41 pm on March 21, 2022:

    I think it would since CBlockIndex is a very big struct?

    but maybe we can make CBlockIndex have an abstract base class with just m_block_hash and then inherit from that for both? but then you have overhead everywhere

    I would probably not do the abstract base thing, since the reason for my suggestion was to simplify code, and trying to overload find that way would seem to add more complexity. Current approach here does seems fine. If dropping the hasher method would be bad for performance, I’d just rename it from operator() to FakeBlockIndex or something for clarity.


    JeremyRubin commented at 9:59 pm on March 21, 2022:

    i probably won’t have time to work on this until mid april FYI, but i can pick it up sometime then.

    if you’d like to pick it off and rebase it and push it through i would review it + ack.


    ryanofsky commented at 10:06 pm on March 21, 2022:
    Good to know. I’m working on some other CBlockIndex changes so might want to do this. No hurry though!
  34. ryanofsky commented at 7:35 pm on March 21, 2022: contributor
    Concept ACK. I started making the same change after reviewing #24050 and #24199. As noted in the PR description, C++20 will allow a simpler implementation of this because it adds unordered_set find and count method overloads that can accept block hashes, instead of CBlockIndex pointers (heterogenous lookups, https://www.cppstories.com/2021/heterogeneous-access-cpp20/). But since the find method is mostly wrapped anyway and not called directly, this is not a big deal.
  35. DrahtBot commented at 7:03 am on July 25, 2022: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  36. glozow commented at 10:01 am on September 26, 2022: member
    Closing as this has needed rebase for more than 2 years. Feel free to reopen if you get a chance to work on this again in the future, thanks!
  37. glozow closed this on Sep 26, 2022

  38. Rspigler commented at 10:34 pm on September 26, 2022: contributor
    Mark up for grabs?
  39. glozow added the label Up for grabs on Sep 27, 2022
  40. bitcoin locked this on Sep 27, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 13:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me