indexes: Add compact block filter headers cache #18960

pull jnewbery wants to merge 1 commits into bitcoin:master from jnewbery:2020-05-cfcheckpts-cache changing 3 files +40 −6
  1. jnewbery commented at 3:04 pm on May 12, 2020: member

    Cache block filter headers at heights of multiples of 1000 in memory.

    Block filter headers at height 1000x are checkpointed, and will be the most frequently requested. Cache them in memory to avoid costly disk reads.

  2. fanquake added the label UTXO Db and Indexes on May 12, 2020
  3. jnewbery commented at 3:06 pm on May 12, 2020: member
    This takes a different approach from the caching in #16442. I prefer it because it isolates all of the caching logic inside the indexer rather than in net processing. I’ll put the other approach up as a different branch for comparison.
  4. in src/index/blockfilterindex.cpp:39 in 6fe202209e outdated
    30@@ -31,6 +31,8 @@ constexpr char DB_FILTER_POS = 'P';
    31 constexpr unsigned int MAX_FLTR_FILE_SIZE = 0x1000000; // 16 MiB
    32 /** The pre-allocation chunk size for fltr?????.dat files */
    33 constexpr unsigned int FLTR_FILE_CHUNK_SIZE = 0x100000; // 1 MiB
    34+/** Maximum size of the cfheaders cache */
    35+constexpr size_t CF_HEADERS_CACHE_MAX_SZ{2000};
    


    theStack commented at 3:18 pm on May 12, 2020:
    Is there any particular reason for limiting the size of the cache? With this numbers it would be full at block 2000000, i.e. in (2000000-630000)/(6 * 24 * 365) ~ 26 years.

    jnewbery commented at 8:08 pm on May 12, 2020:
    I wanted to have some limit so that a bug doesn’t turn into a memory exhaustion possibility (since we only ever add to this map, and never remove). I don’t think the exact number matters. Presumably in 26 years, anyone who wants to serve compact block filters from a v0.21 Bitcoin Core server will be able to patch their own code.

    theStack commented at 11:26 am on May 13, 2020:
    I agree that when in doubt, it’s better to be overcautious than risking introducing a memory exhaustion bug – I just wanted to get sure there are no other reasons for the limit that I’m not aware of.
  5. theStack commented at 3:21 pm on May 12, 2020: member
    Concept ACK – I didn’t look very deeply into the proposed solution from the original PR (#16442, commit https://github.com/bitcoin/bitcoin/pull/16442/commits/be5f7c5f8e98416f95d3bd91ab5639b67b28eef2), but this solution here seems to be way simpler.
  6. DrahtBot commented at 7:52 am on May 13, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18354 (Protect wallet by using shared pointers by bvbfan)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. theStack commented at 12:10 pm on May 13, 2020: member

    I’m almost ready for ACKing this PR, just have some minor suggestion/discussion points:

    • Would it make sense to also only check the cache if we have checkpoints, i.e. a method structure like
    0bool is_checkpoint = (block_index->nHeight % CFCHECKPT_INTERVAL == 0); 
    1if (is_checkpoint)
    2    // Check the headers cache
    3
    4// Regular lookup
    5
    6if (is_checkpoint)
    7    // Add to the headers cache
    

    ? Of course, the drawback of this approach is that the condition for checking needs to be removed if the cache is ever used for other filter headers than those from the checkpoints. Also not sure if its really worth it from a performance point of view. But in 999 out of 1000 times the lookup in the map could be skipped.

    • In the regular case the cache should never get full, considering that this would by plan only happen in the far future (block 2000000, >25 years from now, with the current maximum size constant of 2000). If it gets full however, should we take some action like putting a debug output or even an assert, to get noticed that we either have some memory exhaustion bug or that the maximum size needs to be increased?
    • Any idea how to effectively test performance improvements like this? I don’t know anything better than just putting some LogPrintfs in the cache hit / cache add cases, run the (maybe slightly modified) functional test and analyze the log output to check if it behaves as expected.
  8. in src/index/blockfilterindex.h:36 in 6fe202209e outdated
    32@@ -30,6 +33,9 @@ class BlockFilterIndex final : public BaseIndex
    33     bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const;
    34     size_t WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter);
    35 
    36+    RecursiveMutex m_cs_headers_cache;
    


    MarcoFalke commented at 12:43 pm on May 13, 2020:
    0    Mutex m_cs_headers_cache;
    

    nit (feel free to ignore). I haven’t tested this, but from reading the code, a non-recursive mutex should be sufficient.


    jnewbery commented at 7:30 pm on May 13, 2020:
    Agree. I’ll change to Mutex.

    jnewbery commented at 10:44 pm on May 13, 2020:
    Done
  9. MarcoFalke commented at 12:46 pm on May 13, 2020: member
    Agree with @theStack that performance improvements should be benchmarked. This could turn into another improvement that is actually a slow-down, see #18352.
  10. in src/index/blockfilterindex.cpp:414 in 6fe202209e outdated
    409 
    410+    if (block_index->nHeight % CFCHECKPT_INTERVAL == 0) {
    411+        // Add to the headers cache
    412+        LOCK(m_cs_headers_cache);
    413+        if (m_headers_cache.size() < CF_HEADERS_CACHE_MAX_SZ) {
    414+            m_headers_cache.emplace(block_index->GetBlockHash(), entry.header);
    


    fjahr commented at 2:59 pm on May 13, 2020:
    This is a very minor issue but if we use the hash instead of the height as the key we could end up with headers of orphaned blocks in the cache since they will not be overridden or removed. This could also be the case if CFCHECKPT_INTERVAL would ever change. I know the chances for that happening are slim and if it happens we can deal with it then, just wanted to put it there.

    clarkmoody commented at 6:14 pm on May 13, 2020:

    The test suite explicitly checks that filters for stale blocks may still be fetched.

    https://github.com/jnewbery/bitcoin/blob/2020-05-cfcheckpts-cache/test/functional/p2p_blockfilters.py#L89-L101


    jnewbery commented at 7:29 pm on May 13, 2020:

    we could end up with headers of orphaned blocks in the cache

    Correct. If there’s a re-org across checkpoint block, then the disconnect block’s filter header stays in the cache. That’s an explicit design decision, since:

    if CFCHECKPT_INTERVAL would ever change

    In that case, we’d need to change the software, which would involve a stop/start. This is an in-memory cache so gets dropped on shutdown.

  11. fjahr commented at 3:31 pm on May 13, 2020: member
    Agree with previous comments: I prefer this to the previous approach. I think it would be good to have some benchmarks. Otherwise, code looks very good to me.
  12. in src/index/blockfilterindex.cpp:413 in 6fe202209e outdated
    408     }
    409 
    410+    if (block_index->nHeight % CFCHECKPT_INTERVAL == 0) {
    411+        // Add to the headers cache
    412+        LOCK(m_cs_headers_cache);
    413+        if (m_headers_cache.size() < CF_HEADERS_CACHE_MAX_SZ) {
    


    jonasschnelli commented at 6:24 pm on May 13, 2020:
    doesn’t this statically cap (and freeze forever) the cache after inserting 2000 entries? Wouldn’t it make sense to have some sort of invalidation logic? (I can’t come up with a good one though)

    MarcoFalke commented at 6:32 pm on May 13, 2020:

    If someone can mine enough blocks to fill up this cache, we have other problems. I think an adversary needs to mine blocks in the order of the 2000th triangle number to fill this up.

    And when this cache is full due to natural reasons (in 20 years), disks will be fast enough that we won’t need a cache.


    MarcoFalke commented at 6:33 pm on May 13, 2020:
    Maybe disks are fast enough or smart enough with caching today already, so it might not be needed after all. A benchmark would be nice…

    jonasschnelli commented at 6:38 pm on May 13, 2020:
    Oh. You’r right. I missed the % CFCHECKPT_INTERVAL == 0.

    jnewbery commented at 7:31 pm on May 13, 2020:

    Remind me to change this to 4000 in 2026 :)

    I agree that a benchmark would be nice.


    MarcoFalke commented at 7:33 pm on May 13, 2020:
    0// TODO change to 4000 in 2026
    

    ;)


    jnewbery commented at 7:38 pm on May 13, 2020:
    oops sorry, I meant 2046
  13. jonasschnelli commented at 6:25 pm on May 13, 2020: contributor
    Concept ACK
  14. jnewbery force-pushed on May 13, 2020
  15. jnewbery force-pushed on May 13, 2020
  16. jnewbery commented at 10:48 pm on May 13, 2020: member

    @theStack

    Would it make sense to also only check the cache if we have checkpoints, i.e. a method structure like

    Done @MarcoFalke

    a non-recursive mutex should be sufficient.

    Changed RecursiveMutex -> Mutex

    I’m still working on adding a microbench, although it’s not clear to me that it’ll show any improvement since the disk reads may be cached by the OS for the microbench.

  17. in src/index/blockfilterindex.h:37 in 7ad83ed252 outdated
    32@@ -30,6 +33,9 @@ class BlockFilterIndex final : public BaseIndex
    33     bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const;
    34     size_t WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter);
    35 
    36+    Mutex m_cs_headers_cache;
    37+    std::map<uint256, uint256> m_headers_cache GUARDED_BY(m_cs_headers_cache);
    


    jkczyz commented at 6:00 am on May 14, 2020:
    Any reason std::unordered_map can’t be used? I see that the tree-based std::map and std::set are used primarily throughout the codebase, but figured it may just be historical. Is there another reason these are preferred?

    jnewbery commented at 2:07 am on May 15, 2020:

    I expect mostly laziness. With an unordered_map, you need to provide a hash function. In this case, it’s easy. The data is all hash digests from block chain data, so I can just take the first 8 bytes from the uint256 filter header.

    I’ll make that change tomorrow.


    MarcoFalke commented at 10:23 am on May 15, 2020:
    Just as a reminder: There is no free lunch and unordered_map comes with a size penalty

    jnewbery commented at 4:17 pm on May 15, 2020:
    I’ve reimplemented this as an unordered_map
  18. jkczyz commented at 6:05 am on May 14, 2020: contributor

    Concept ACK.

    Kudos on the simplicity of this approach!

  19. theStack referenced this in commit e7bb43ec6d on May 14, 2020
  20. theStack referenced this in commit 1d504570d1 on May 14, 2020
  21. theStack commented at 4:56 pm on May 14, 2020: member

    Using jnewbery’s nice node-shell module (a modification of test-shell that allows to connect with the functional test framework to a running full-node) I crafted up a quick and dirty test-script (see branch https://github.com/theStack/bitcoin/tree/pr18960_test) that sends getcfcheckpt requests with the stop-hash of the halving block 630000 and measures the processing time.

    Freshly started bitcoind from master branch (w/o checkpoint headers cache implementation), after mempool imports are done:

    0getcfcheckpt request processing times
    1=====================================
    2-> first:     53.75ms
    3-> following: 50.70ms (average)
    4=====================================
    

    Freshly started bitcoind from PR branch (with checkpoint headers cache implementation), after mempool imports are done:

    0getcfcheckpt request processing times
    1=====================================
    2-> first:     50.92ms
    3-> following: 50.80ms (average)
    4=====================================
    

    There doesn’t seem to be a noticable difference. I don’t know if my methodology is flawed though. Maybe all the processing around is taking so much more time that the actual part of fetching the header (from the cache or from disk) is neglectible. If anyone has an idea how to improve the benchmark script, happy to hear suggestions!

    // EDIT: As jnewbery found out in the course of today’s PR review club meeting, there is a good explanation on why the values are always close to 50ms:

    019:36 < jnewbery> There are 100ms sleeps in the messagehandler thread between looping through all peers:
    1                  https://github.com/jnewbery/bitcoin/blob/7ad83ed252914d6d5b8ed81f103aecf052c68fb7/src/net.cpp#L2061
    219:37 < jnewbery> on a non-busy node, 50ms is the average time you have to wait for the message handler thread to wake up
    
  22. michaelfolkson commented at 6:36 pm on May 14, 2020: contributor
    Approach ACK
  23. laanwj added this to the "Blockers" column in a project

  24. MarcoFalke commented at 10:26 pm on May 14, 2020: member
    @theStack I believe in between the requests you can drop the cache to simulate eviction (which may occasionally happen during normal operation). See https://unix.stackexchange.com/questions/17936/setting-proc-sys-vm-drop-caches-to-clear-cache
  25. jnewbery commented at 1:36 am on May 15, 2020: member
    @narula suggested in today’s PR Review Club meeting that it’d be less cognitive overhead to simply take the m_cs_headers_cache for the entirety of the LookupFilterHeader() function, rather than taking, releasing and taking it again. That seem like a reasonable request to me. We can’t ever block important work with this lock, so holding it a bit longer than strictly necessary is not an issue.
  26. jnewbery commented at 1:56 am on May 15, 2020: member

    @theStack - thank you for doing some benching of this. Your approach inspired me to do something similar myself, using debug logs to find the exact time to construct a cfcheckpt response. I’ve added a debug_log_delta context wrapper to the node_shell, which allows me to provide two debug log messages, then do actions on the node and extract the time delta between those two logs. I applied that patch to:

    • without a cache
    • with this cache
    • with @narula’s suggestion of taking the lock only once (I was interested to see if there were any performance differences).

    And then ran the following node-shell script:

     0REPO_PATH="<path_to_repo>"
     1DATADIR_PATH="<path_to_datadir>"
     2import sys
     3sys.path.insert(0, f"{REPO_PATH}/test/functional")
     4from test_framework.node_shell import NodeShell
     5test = NodeShell()
     6test.setup(datadir=DATADIR_PATH)
     7# <test_framework.node_shell.NodeShell.__TestShell object at 0x7f7704820490>
     8node = test.nodes[0]
     9bb = node.getbestblockhash()
    10from test_framework.messages import FILTER_TYPE_BASIC, msg_getcfcheckpt
    11request = msg_getcfcheckpt(filter_type=FILTER_TYPE_BASIC, stop_hash=int(bb, 16))
    12for i in range(15):
    13    with node.debug_log_delta("getcfcheckpt request received", "cfcheckpt response constructed"):
    14        node.p2ps[0].send_and_ping(request)
    15test.shutdown()
    

    no cache

     0getcfcheckpt request received to cfcheckpt response constructed: 7.824ms
     1getcfcheckpt request received to cfcheckpt response constructed: 7.054ms
     2getcfcheckpt request received to cfcheckpt response constructed: 7.021ms
     3getcfcheckpt request received to cfcheckpt response constructed: 7.303ms
     4getcfcheckpt request received to cfcheckpt response constructed: 7.355ms
     5getcfcheckpt request received to cfcheckpt response constructed: 7.26ms
     6getcfcheckpt request received to cfcheckpt response constructed: 7.169ms
     7getcfcheckpt request received to cfcheckpt response constructed: 7.166ms
     8getcfcheckpt request received to cfcheckpt response constructed: 7.194ms
     9getcfcheckpt request received to cfcheckpt response constructed: 8.574ms
    10getcfcheckpt request received to cfcheckpt response constructed: 7.119ms
    11getcfcheckpt request received to cfcheckpt response constructed: 7.156ms
    12getcfcheckpt request received to cfcheckpt response constructed: 4.153ms
    13getcfcheckpt request received to cfcheckpt response constructed: 8.162ms
    14getcfcheckpt request received to cfcheckpt response constructed: 6.485ms
    

    There does seem to be a very slight performance improvement after the first request. Maybe around 5-10% from the mean, but variance is quite high, so it might be noise.

    cache (taking locks twice)

     0getcfcheckpt request received to cfcheckpt response constructed: 12.2ms
     1getcfcheckpt request received to cfcheckpt response constructed: 2.855ms
     2getcfcheckpt request received to cfcheckpt response constructed: 2.992ms
     3getcfcheckpt request received to cfcheckpt response constructed: 2.869ms
     4getcfcheckpt request received to cfcheckpt response constructed: 3.141ms
     5getcfcheckpt request received to cfcheckpt response constructed: 3.692ms
     6getcfcheckpt request received to cfcheckpt response constructed: 3.452ms
     7getcfcheckpt request received to cfcheckpt response constructed: 2.916ms
     8getcfcheckpt request received to cfcheckpt response constructed: 3.413ms
     9getcfcheckpt request received to cfcheckpt response constructed: 3.648ms
    10getcfcheckpt request received to cfcheckpt response constructed: 3.816ms
    11getcfcheckpt request received to cfcheckpt response constructed: 3.84ms
    12getcfcheckpt request received to cfcheckpt response constructed: 3.812ms
    13getcfcheckpt request received to cfcheckpt response constructed: 3.869ms
    14getcfcheckpt request received to cfcheckpt response constructed: 3.445ms
    

    The initial request is slower (presumably because the cache needs to be warmed), but subsequent requests are on average >50% faster than with no cache.

    cache (taking lock once)

     0getcfcheckpt request received to cfcheckpt response constructed: 11.785ms
     1getcfcheckpt request received to cfcheckpt response constructed: 3.35ms
     2getcfcheckpt request received to cfcheckpt response constructed: 3.19ms
     3getcfcheckpt request received to cfcheckpt response constructed: 3.475ms
     4getcfcheckpt request received to cfcheckpt response constructed: 3.175ms
     5getcfcheckpt request received to cfcheckpt response constructed: 3.345ms
     6getcfcheckpt request received to cfcheckpt response constructed: 3.274ms
     7getcfcheckpt request received to cfcheckpt response constructed: 3.161ms
     8getcfcheckpt request received to cfcheckpt response constructed: 3.259ms
     9getcfcheckpt request received to cfcheckpt response constructed: 3.217ms
    10getcfcheckpt request received to cfcheckpt response constructed: 3.238ms
    11getcfcheckpt request received to cfcheckpt response constructed: 3.253ms
    12getcfcheckpt request received to cfcheckpt response constructed: 3.33ms
    13getcfcheckpt request received to cfcheckpt response constructed: 3.32ms
    14getcfcheckpt request received to cfcheckpt response constructed: 3.36ms
    

    The first request is slightly faster than when taking locks twice (although that might be noise). Subsequent requests are the same speed as the ’taking locks twice’ branch, which is what we’d expect because locks are only taken once in that branch if the header is found in the cache. @MarcoFalke - I think these full node manual benches are more meaningful than microbenches, so I’m not planning to add any automated benches to /src/bench.

  27. jnewbery commented at 2:08 am on May 15, 2020: member

    @narula suggested in today’s PR Review Club meeting that it’d be less cognitive overhead to simply take the m_cs_headers_cache for the entirety of the LookupFilterHeader() function, rather than taking, releasing and taking it again.

    I’ll make this change tomorrow.

  28. theStack commented at 10:37 am on May 15, 2020: member

    In yesterday’s review club meeting on this PR, it was suggested to put LogPrintf() directly into the network processing code parts to measure the execution time of getcfcheckpt requests. (My previous approach of measuring the request response time from the test framework was not precise enough, as the time also included python overhead and the time for the message handler to wake up.) I did some testing with this refined approach and can roughly confirm the results from jnewbery (see commit https://github.com/theStack/bitcoin/commit/48cff1bb4d27ef0040b18a860a3c897752e3c296):

    without cache:

     02020-05-14T20:15:11Z getcfcheckpt processing took 10247us
     12020-05-14T20:15:11Z getcfcheckpt processing took 6105us
     22020-05-14T20:15:11Z getcfcheckpt processing took 6437us
     32020-05-14T20:15:11Z getcfcheckpt processing took 5330us
     42020-05-14T20:15:11Z getcfcheckpt processing took 6964us
     52020-05-14T20:15:11Z getcfcheckpt processing took 5523us
     62020-05-14T20:15:12Z getcfcheckpt processing took 5413us
     72020-05-14T20:15:12Z getcfcheckpt processing took 5843us
     82020-05-14T20:15:12Z getcfcheckpt processing took 7060us
     92020-05-14T20:15:12Z getcfcheckpt processing took 6165us
    10...
    

    with cache:

     02020-05-14T20:21:26Z getcfcheckpt processing took 9801us
     12020-05-14T20:21:26Z getcfcheckpt processing took 2619us
     22020-05-14T20:21:26Z getcfcheckpt processing took 3338us
     32020-05-14T20:21:26Z getcfcheckpt processing took 2446us
     42020-05-14T20:21:26Z getcfcheckpt processing took 2560us
     52020-05-14T20:21:27Z getcfcheckpt processing took 2331us
     62020-05-14T20:21:27Z getcfcheckpt processing took 2703us
     72020-05-14T20:21:27Z getcfcheckpt processing took 2614us
     82020-05-14T20:21:27Z getcfcheckpt processing took 2598us
     92020-05-14T20:21:27Z getcfcheckpt processing took 2444us
    10...
    

    One can see that there’s a speedup of at least 2x with the cache.

    Note that my approach to get these numbers was slightly different, as I directly measured the time in bitcoind via GetTimeMicros() (commit https://github.com/theStack/bitcoin/commit/49f5f3d5444ed6c872dd41be910e2149308b35e7):

     0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
     1index 1df1fab..6e620d9 100644
     2--- a/src/net_processing.cpp
     3+++ b/src/net_processing.cpp
     4@@ -3380,7 +3380,10 @@ bool ProcessMessage(CNode* pfrom, const std::string& msg_type, CDataStream& vRec
     5     }
     6
     7     if (msg_type == NetMsgType::GETCFCHECKPT) {
     8+        int64_t start_time = GetTimeMicros();
     9         ProcessGetCFCheckPt(pfrom, vRecv, chainparams, connman);
    10+        int64_t processing_duration = GetTimeMicros() - start_time;
    11+        LogPrintf("getcfcheckpt processing took %lldus\n", processing_duration);
    12         return true;
    13     }
    

    but the idea is the same.

    Now I’m very curious if the change from std::map to std::unordered_map shows a noticable difference in performance. @MarcoFalke: Nice idea to drop the cache between the requests. Unfortunately, on my current development box I don’t have the permissions (root rights) to do this :/ if anyone having the possibility, the results would be very interesting.

  29. jnewbery force-pushed on May 15, 2020
  30. jnewbery commented at 4:17 pm on May 15, 2020: member

    it’d be less cognitive overhead to simply take the m_cs_headers_cache for the entirety of the LookupFilterHeader() function @narula

    Done.

  31. in src/index/blockfilterindex.cpp:34 in 4018953ee9 outdated
    30@@ -31,6 +31,8 @@ constexpr char DB_FILTER_POS = 'P';
    31 constexpr unsigned int MAX_FLTR_FILE_SIZE = 0x1000000; // 16 MiB
    32 /** The pre-allocation chunk size for fltr?????.dat files */
    33 constexpr unsigned int FLTR_FILE_CHUNK_SIZE = 0x100000; // 1 MiB
    34+/** Maximum size of the cfheaders cache */
    


    narula commented at 4:24 pm on May 15, 2020:
    Nit: give justification in comments for magic numbers so future readers have some idea how this was chosen and if/when it will need to be adjusted.

    jnewbery commented at 4:48 pm on May 15, 2020:
    done
  32. in src/index/blockfilterindex.cpp:421 in 4018953ee9 outdated
    406 
    407+    // Add to the headers cache
    408+    if (m_headers_cache.size() < CF_HEADERS_CACHE_MAX_SZ) {
    409+        m_headers_cache.emplace(block_index->GetBlockHash(), entry.header);
    410+    }
    411+
    


    jkczyz commented at 4:34 pm on May 15, 2020:
    You’ll still want to condition this on block_index->nHeight % CFCHECKPT_INTERVAL == 0, otherwise the most recent 2000 requested headers will be cached rather than the desired ones.

    jnewbery commented at 4:35 pm on May 15, 2020:
    yikes. Thanks

    jnewbery commented at 4:48 pm on May 15, 2020:
    done
  33. narula commented at 4:40 pm on May 15, 2020: contributor
    utACK
  34. jnewbery commented at 4:41 pm on May 15, 2020: member

    I’ve force-pushed a branch that changes this to use an unordered_map and only lock once.

    Comparing the unordered map implementation to map:

    unordered_map

     0getcfcheckpt request received to cfcheckpt response constructed: 12.808ms
     1getcfcheckpt request received to cfcheckpt response constructed: 2.783ms
     2getcfcheckpt request received to cfcheckpt response constructed: 2.712ms
     3getcfcheckpt request received to cfcheckpt response constructed: 2.656ms
     4getcfcheckpt request received to cfcheckpt response constructed: 2.672ms
     5getcfcheckpt request received to cfcheckpt response constructed: 2.82ms
     6getcfcheckpt request received to cfcheckpt response constructed: 2.994ms
     7getcfcheckpt request received to cfcheckpt response constructed: 3.068ms
     8getcfcheckpt request received to cfcheckpt response constructed: 3.335ms
     9getcfcheckpt request received to cfcheckpt response constructed: 3.444ms
    10getcfcheckpt request received to cfcheckpt response constructed: 3.159ms
    11getcfcheckpt request received to cfcheckpt response constructed: 3.194ms
    12getcfcheckpt request received to cfcheckpt response constructed: 2.601ms
    13getcfcheckpt request received to cfcheckpt response constructed: 2.692ms
    14getcfcheckpt request received to cfcheckpt response constructed: 2.982ms
    15getcfcheckpt request received to cfcheckpt response constructed: 3.209ms
    16getcfcheckpt request received to cfcheckpt response constructed: 3.116ms
    17getcfcheckpt request received to cfcheckpt response constructed: 3.136ms
    18getcfcheckpt request received to cfcheckpt response constructed: 2.666ms
    19getcfcheckpt request received to cfcheckpt response constructed: 3.091ms
    20getcfcheckpt request received to cfcheckpt response constructed: 3.444ms
    21getcfcheckpt request received to cfcheckpt response constructed: 3.385ms
    22getcfcheckpt request received to cfcheckpt response constructed: 3.147ms
    23getcfcheckpt request received to cfcheckpt response constructed: 3.145ms
    24getcfcheckpt request received to cfcheckpt response constructed: 3.053ms
    

    map

     0getcfcheckpt request received to cfcheckpt response constructed: 12.959ms
     1getcfcheckpt request received to cfcheckpt response constructed: 3.001ms
     2getcfcheckpt request received to cfcheckpt response constructed: 3.656ms
     3getcfcheckpt request received to cfcheckpt response constructed: 2.07ms
     4getcfcheckpt request received to cfcheckpt response constructed: 2.537ms
     5getcfcheckpt request received to cfcheckpt response constructed: 2.838ms
     6getcfcheckpt request received to cfcheckpt response constructed: 2.994ms
     7getcfcheckpt request received to cfcheckpt response constructed: 3.143ms
     8getcfcheckpt request received to cfcheckpt response constructed: 3.383ms
     9getcfcheckpt request received to cfcheckpt response constructed: 3.666ms
    10getcfcheckpt request received to cfcheckpt response constructed: 3.889ms
    11getcfcheckpt request received to cfcheckpt response constructed: 3.553ms
    12getcfcheckpt request received to cfcheckpt response constructed: 3.394ms
    13getcfcheckpt request received to cfcheckpt response constructed: 3.3ms
    14getcfcheckpt request received to cfcheckpt response constructed: 3.5ms
    15getcfcheckpt request received to cfcheckpt response constructed: 3.372ms
    16getcfcheckpt request received to cfcheckpt response constructed: 3.329ms
    17getcfcheckpt request received to cfcheckpt response constructed: 3.308ms
    18getcfcheckpt request received to cfcheckpt response constructed: 3.564ms
    19getcfcheckpt request received to cfcheckpt response constructed: 3.746ms
    20getcfcheckpt request received to cfcheckpt response constructed: 3.714ms
    21getcfcheckpt request received to cfcheckpt response constructed: 3.423ms
    22getcfcheckpt request received to cfcheckpt response constructed: 3.517ms
    23getcfcheckpt request received to cfcheckpt response constructed: 3.636ms
    24getcfcheckpt request received to cfcheckpt response constructed: 3.595ms
    

    So from this single test, it appears that an unordered_map is indeed faster. That’s what we’d expect, since lookup is O(1) and number of lookups required for a single getcfcheckpt request is O(n), so total is O(n). For a map, a single lookup is O(logn), so total is O(nlogn).

    The original implementation is faster still, since it’s storing the checkpts in a single vector, but the aim of this PR is to avoid disk reads for getcfcheckpt messages, which the simple unordered_map implementation does. Further performance improvements could be done as a follow-up if required.

  35. jnewbery force-pushed on May 15, 2020
  36. jnewbery commented at 4:48 pm on May 15, 2020: member
    Feedback from @narula and @jkczyz addressed. Thanks!
  37. jkczyz commented at 5:55 pm on May 15, 2020: contributor

    ACK b6eba9e492e1dfb38baec894bfda1cfb4d709e5d

    Ran all tests.

  38. fjahr commented at 7:30 pm on May 15, 2020: member
    utACK b6eba9e492e1dfb38baec894bfda1cfb4d709e5d
  39. in src/index/blockfilterindex.cpp:399 in b6eba9e492 outdated
    392@@ -387,13 +393,26 @@ bool BlockFilterIndex::LookupFilter(const CBlockIndex* block_index, BlockFilter&
    393     return ReadFilterFromDisk(entry.pos, filter_out);
    394 }
    395 
    396-bool BlockFilterIndex::LookupFilterHeader(const CBlockIndex* block_index, uint256& header_out) const
    397+bool BlockFilterIndex::LookupFilterHeader(const CBlockIndex* block_index, uint256& header_out)
    398 {
    399+    LOCK(m_cs_headers_cache);
    400+    auto header = m_headers_cache.find(block_index->GetBlockHash());
    


    theStack commented at 10:49 am on May 17, 2020:
    Before the last change, the cache lookup only happened for checkpoint blocks (i.e. block number modulo 1000 is zero). Is this removal of the condition intended?

    jnewbery commented at 4:54 pm on May 18, 2020:
    No. Not intentional. I just messed up splitting out the LOCK(m_cs_headers_cache). Thanks for catching this.
  40. jnewbery force-pushed on May 18, 2020
  41. [indexes] Add compact block filter headers cache
    Cache block filter headers at heights of multiples of 1000 in memory.
    
    Block filter headers at height 1000x are checkpointed, and will be the
    most frequently requested. Cache them in memory to avoid costly disk
    reads.
    0187d4c118
  42. jnewbery force-pushed on May 18, 2020
  43. jnewbery commented at 4:55 pm on May 18, 2020: member
    Thanks for the rereview @jkczyz and @theStack , and for catching the problems with checking checkpoint heights. Should be fixed now.
  44. jkczyz commented at 5:22 pm on May 18, 2020: contributor
    ACK 0187d4c118ab4c0f5c2d4fb180c2a8dea8ac53cf
  45. jnewbery renamed this:
    [indexes] Add compact block filter headers cache
    indexes: Add compact block filter headers cache
    on May 18, 2020
  46. theStack commented at 9:33 am on May 19, 2020: member

    ACK 0187d4c118ab4c0f5c2d4fb180c2a8dea8ac53cf :tada:

    By the way, I’m neutral to the question on whether to use std::map or std::unordered_map for the cache – std::unordered_map does indeed seem to lead to some performance improvement (my tests show a rough 10% improvement, see numbers below; jnewberys tests show an even higher improvement of close to 20%), but on the other hand takes both more space (as pointed out by MarcoFalke) and also more code due to the need to provide a hash function. Maybe someone has a more clear opinion on the choice, personally I’m okay with ACKing either version.

    Test with std::map:

     02020-05-19T09:09:01Z getcfcheckpt processing took 13817us
     12020-05-19T09:09:01Z getcfcheckpt processing took 2855us
     22020-05-19T09:09:01Z getcfcheckpt processing took 3573us
     32020-05-19T09:09:01Z getcfcheckpt processing took 3444us
     42020-05-19T09:09:01Z getcfcheckpt processing took 3236us
     52020-05-19T09:09:01Z getcfcheckpt processing took 3385us
     62020-05-19T09:09:01Z getcfcheckpt processing took 3543us
     72020-05-19T09:09:01Z getcfcheckpt processing took 3378us
     82020-05-19T09:09:01Z getcfcheckpt processing took 3450us
     92020-05-19T09:09:01Z getcfcheckpt processing took 3041us
    102020-05-19T09:09:01Z getcfcheckpt processing took 3561us
    11...
    

    :arrow_right: Average subsequent lookup: 3,346ms

    Test with std::unordered_map:

     02020-05-19T08:59:53Z getcfcheckpt processing took 12205us
     12020-05-19T08:59:53Z getcfcheckpt processing took 3650us
     22020-05-19T08:59:53Z getcfcheckpt processing took 4660us
     32020-05-19T08:59:53Z getcfcheckpt processing took 3393us
     42020-05-19T08:59:53Z getcfcheckpt processing took 3669us
     52020-05-19T08:59:53Z getcfcheckpt processing took 2919us
     62020-05-19T08:59:53Z getcfcheckpt processing took 2565us
     72020-05-19T08:59:53Z getcfcheckpt processing took 2804us
     82020-05-19T08:59:53Z getcfcheckpt processing took 2348us
     92020-05-19T08:59:53Z getcfcheckpt processing took 2511us
    102020-05-19T08:59:53Z getcfcheckpt processing took 2264us
    11...
    

    :arrow_right: Average subsequent lookup: 3,078ms

  47. fjahr commented at 2:08 pm on May 19, 2020: member

    re-utACK 0187d4c118ab4c0f5c2d4fb180c2a8dea8ac53cf

    Compared to my last review is_checkpoint variable is introduced for better readability and if statement that was removed in error is added back: https://github.com/bitcoin/bitcoin/compare/b6eba9e492e1dfb38baec894bfda1cfb4d709e5d..0187d4c118ab4c0f5c2d4fb180c2a8dea8ac53cf

  48. in src/index/blockfilterindex.h:42 in 0187d4c118
    37@@ -30,6 +38,9 @@ class BlockFilterIndex final : public BaseIndex
    38     bool ReadFilterFromDisk(const FlatFilePos& pos, BlockFilter& filter) const;
    39     size_t WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter);
    40 
    41+    Mutex m_cs_headers_cache;
    42+    std::unordered_map<uint256, uint256, FilterHeaderHasher> m_headers_cache GUARDED_BY(m_cs_headers_cache);
    


    ariard commented at 1:25 am on May 21, 2020:
    If you have to change PR again, you may comment that key value is block hash and mapped value filter-header, type being the same it may be confusing at first look.

    jnewbery commented at 3:06 pm on May 21, 2020:
    Good point. I’ll add a comment if I need to retouch the branch.
  49. ariard commented at 1:29 am on May 21, 2020: member

    Code Review ACK 0187d4c.

    I prefer this approach of caching inside index instead of net_processing compare to #16442, it reduces cognitive load. I also assent with ignoring reorg as argued, probability of the event is rare enough and overhead cheap enough to justify having some junk in cache.

  50. laanwj commented at 5:34 pm on May 21, 2020: member
    code review ACK 0187d4c118ab4c0f5c2d4fb180c2a8dea8ac53cf
  51. laanwj merged this on May 21, 2020
  52. laanwj closed this on May 21, 2020

  53. laanwj removed this from the "Blockers" column in a project

  54. jnewbery deleted the branch on May 21, 2020
  55. sidhujag referenced this in commit ba41194e48 on May 21, 2020
  56. jasonbcox referenced this in commit a4c5ff8ec7 on Nov 19, 2020
  57. kittywhiskers referenced this in commit 3dfe7bd4d5 on Aug 22, 2021
  58. kittywhiskers referenced this in commit bd73a6b004 on Aug 22, 2021
  59. kittywhiskers referenced this in commit 26f05134c8 on Sep 16, 2021
  60. kittywhiskers referenced this in commit b5c3500676 on Sep 18, 2021
  61. kittywhiskers referenced this in commit 5996dbe686 on Sep 19, 2021
  62. UdjinM6 referenced this in commit 4ffd42de63 on Sep 21, 2021
  63. thelazier referenced this in commit 853c6f587b on Sep 25, 2021
  64. kittywhiskers referenced this in commit 6cedb20e18 on Oct 12, 2021
  65. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me