p2p: Implement anti-DoS headers sync

sdaftuar commented at 8:35 pm on July 26, 2022: member

New nodes starting up for the first time lack protection against DoS from low-difficulty headers. While checkpoints serve as our protection against headers that fork from the main chain below the known checkpointed values, this protection only applies to nodes that have been able to download the honest chain to the checkpointed heights.

We can protect all nodes from DoS from low-difficulty headers by adopting a different strategy: before we commit to storing a header in permanent storage, first verify that the header is part of a chain that has sufficiently high work (either nMinimumChainWork, or something comparable to our tip). This means that we will download headers from a given peer twice: once to verify the work on the chain, and a second time when permanently storing the headers.

The p2p protocol doesn’t provide an easy way for us to ensure that we receive the same headers during the second download of peer’s headers chain. To ensure that a peer doesn’t (say) give us the main chain in phase 1 to trick us into permanently storing an alternate, low-work chain in phase 2, we store commitments to the headers during our first download, which we validate in the second download.

Some parameters must be chosen for commitment size/frequency in phase 1, and validation of commitments in phase 2. In this PR, those parameters are chosen to both (a) minimize the per-peer memory usage that an attacker could utilize, and (b) bound the expected amount of permanent memory that an attacker could get us to use to be well-below the memory growth that we’d get from the honest chain (where we expect 1 new block header every 10 minutes).

After this PR, we should be able to remove checkpoints from our code, which is a nice philosophical change for us to make as well, as there has been confusion over the years about the role checkpoints play in Bitcoin’s consensus algorithm.

Thanks to Pieter Wuille for collaborating on this design.

DrahtBot added the label P2P on Jul 26, 2022

sdaftuar force-pushed on Jul 26, 2022

jamesob commented at 11:32 pm on July 26, 2022: member

Concept ACK, will review soon

lish2099 approved

DrahtBot commented at 6:03 am on July 27, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25923 (p2p, bugfix: always provide CNodeStateStats, ensure v23 getpeerinfo API for v24 by jonatack)
#25725 (consensus: Remove mainnet checkpoints by sdaftuar)
#25673 (refactor: make member functions const when applicable by aureleoules)
#25555 (refactor: Move m_num_unconnecting_headers_msgs out of cs_main guard by MarcoFalke)
#24571 (p2p: Prevent block index fingerprinting by sending additional getheaders messages by dergoegge)
#23443 (p2p: Erlay support signaling by naumenkogs)
#23352 (test: Extend stale_tip_peer_management test by MarcoFalke)
#18933 (rpc: Add submit option to generateblock by MarcoFalke)
#18470 (test: Extend stale tip test by MarcoFalke)
#17860 (fuzz: BIP 42, BIP 30, CVE-2018-17144 by MarcoFalke)
#16981 (Improve runtime performance of –reindex by LarryRuane)
#10102 (Multiprocess bitcoin by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

glozow requested review from dergoegge on Jul 27, 2022

glozow requested review from mzumsande on Jul 27, 2022

glozow requested review from ajtowns on Jul 27, 2022

sdaftuar force-pushed on Jul 27, 2022

sdaftuar commented at 3:59 pm on July 27, 2022: member

It’s worth pointing out that initial headers sync will be slowed down by this PR, not just because we will download headers twice before storing, but also because the additional latency to make progress on syncing with our initial headers-sync-peer will mean that we’re more likely to receive a block announcement while waiting for headers sync to finish, which in turn will trigger a headers sync with all our peers that announce the block. (And our starting point for those syncs will be further behind, because we don’t make progress on adding headers to our block index until phase 2.)

I opened #25720 as a mitigation for this effect; many strategies are possible and I think they are orthogonal to the change proposed here, so I’d prefer to leave the discussion of this particular issue to that PR.

luke-jr commented at 7:53 pm on July 27, 2022: member

This seems to be based on the assumption that the DoS is attacking disk space, but bandwidth tends to be more limited than space, and it makes that worse even in the best scenario…?

sipa commented at 8:24 pm on July 27, 2022: member

@luke-jr The DoS concern is primarily about memory usage: filling mapBlockIndex and other data structures with low-difficulty headers before a checkpoint is reached.

Bandwidth is a concern for sure, but:

There are many more effective ways to perform bandwidth DoS (like spamming INV messages, ADDR messages, PING messages, continuously requesting non-existing transactions or blocks, …).
The proper solution to bandwidth DoS is just throttling peers that waste too much of it.
In non-attack scenarios, the added bandwidth by this PR (especially when combined with #25720) is in the order of 10s of MB over a node’s lifetime, which is negligible compared to the full block download.

luke-jr commented at 8:33 pm on July 27, 2022: member

Is there a reason to not just prune the block index under some conditions?

sipa commented at 9:14 pm on July 27, 2022: member

Here is the script to compute the optimal parameters used, with lots of explanation: https://gist.github.com/sipa/016ae445c132cdf65a2791534dfb7ae1 (it takes around 20s for me in pypy3, though over 2 minutes in python3).

It may make sense to include the script in the repository (as I can imagine it being re-run to tune things occasionally, but there should not be a need to do it more than every few years). It could also live on the devwiki or just linked-to in the code. Opinions?

sipa commented at 10:29 pm on July 27, 2022: member

@luke-jr I think that’s a potentially useful but orthogonal improvement, though there is probably much less need for it after this PR.

in src/headerssync.h:17 in 7bed25cbd3 outdated

13+#include <consensus/params.h>
14+#include <util/hasher.h>
15+#include <util/bitdeque.h>
16+
17+#include <vector>
18+#include <deque>

dergoegge commented at 12:55 pm on July 28, 2022:

nit: please sort these alphabetically

sdaftuar commented at 10:05 pm on July 28, 2022:

Done

in src/headerssync.cpp:8 in 7bed25cbd3 outdated

0@@ -0,0 +1,318 @@
1+#include <headerssync.h>
2+#include <logging.h>
3+#include <util/check.h>
4+#include <pow.h>
5+#include <timedata.h>

dergoegge commented at 12:56 pm on July 28, 2022:

nit: please sort these alphabetically

sdaftuar commented at 10:13 pm on July 28, 2022:

Done.

in src/headerssync.cpp:225 in 7bed25cbd3 outdated

215+    if ((current_height - m_chain_start->nHeight) % HEADER_COMMITMENT_FREQUENCY == 0) {
216+        // Try to add a commitment.
217+        m_header_commitments.push_back(m_hasher(current.GetHash()) & 1);
218+        if (m_header_commitments.size() > m_max_commitments) {
219+            // The peer's chain is too long; give up.
220+            // TODO: disconnect this peer.

dergoegge commented at 12:58 pm on July 28, 2022:

Why not do that in this PR? Maybe add a new state FAILED (or an extra bool on HeadersSyncState) that signals to net_processing that the peer should be disconnected? That seems simple enough and in obvious cases like this one (i.e. too many commitments or e.g. invalid PoW) the peer definitely behaved funky and should be punished.

sdaftuar commented at 6:14 pm on July 31, 2022:

Now that I think about it, it’s possible that if the peer’s chain has grown (and the sync takes very long) that you could get to this condition; the right thing to do in that scenario (assuming our peer has an actually more work chain) is to try to sync with this peer again later. (Obviously this is a pathological case, but I think we should still be able to sync such a chain anyway, eventually.)

I’ll update the comment.

in src/pow.cpp:95 in cbf3069c7d outdated

90+        bnNew.SetCompact(old_nbits);
91+        bnNew *= largest_timespan;
92+        bnNew /= params.nPowTargetTimespan;
93+
94+        if (bnNew > bnPowLimit)
95+            bnNew = bnPowLimit;

sipa commented at 1:42 pm on July 28, 2022:

Nit in “Add function to validate difficulty changes”:

I realize this is mostly-copied code, but can you follow the style guide here (braces when the then statement isn’t on the same line)?

(here and below)

sdaftuar commented at 10:08 pm on July 28, 2022:

Fixed the braces. Should I also fix all the variable names? (Saw @instagibbs’ comment along those lines just now, so if you agree then I’ll modernize the whole thing.)

sipa commented at 9:01 pm on August 1, 2022:

I prefer doing so yes.

sdaftuar commented at 9:26 pm on August 2, 2022:

Done.

in src/headerssync.cpp:7 in de72d0867d outdated

0@@ -0,0 +1,318 @@
1+#include <headerssync.h>
2+#include <logging.h>
3+#include <util/check.h>
4+#include <pow.h>
5+#include <timedata.h>
6+
7+// Store a commitment to a header every HEADER_COMMITMENT_FREQUENCY blocks

sipa commented at 1:45 pm on July 28, 2022:

Nit in “Utilize anti-DoS headers download strategy”:

Perhaps use doxygen comments for comments on a function/variable definition or declaration? (either /** ... */ or //! ...). That way they get picked up by doxygen for generating documentation.

sdaftuar commented at 10:08 pm on July 28, 2022:

I think I fixed all these now.

in src/headerssync.cpp:51 in de72d0867d outdated

19+HeadersSyncState::HeadersSyncState(NodeId id, const Consensus::Params& consensus_params) :
20+    m_id(id), m_consensus_params(consensus_params)
21+{ }
22+
23+// Free any memory in use, and mark this object as no longer usable.
24+void HeadersSyncState::Finalize()

sipa commented at 1:56 pm on July 28, 2022:

Maybe also wipe m_chain_start_locator?

sdaftuar commented at 10:08 pm on July 28, 2022:

Done.

in src/headerssync.cpp:23 in de72d0867d outdated

18+
19+HeadersSyncState::HeadersSyncState(NodeId id, const Consensus::Params& consensus_params) :
20+    m_id(id), m_consensus_params(consensus_params)
21+{ }
22+
23+// Free any memory in use, and mark this object as no longer usable.

sipa commented at 2:00 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Perhaps document that this is required to guarantee that the object isn’t reused with the same SaltedTxidHasher for another sync attempt.

sdaftuar commented at 10:08 pm on July 28, 2022:

Done.

in src/headerssync.cpp:61 in de72d0867d outdated

56+    // chain using 6 blocks/second (fastest blockrate given the MTP rule) times
57+    // the number of seconds from the last allowed block until today. This
58+    // serves as a memory bound on how many commitments we might store from
59+    // this peer, and we can safely give up syncing if the peer exceeds this
60+    // bound, because it's not possible for a consensus-valid chain to be
61+    // longer than this.

sipa commented at 2:02 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Arguably, it’s not exactly true that no consensus-valid chain can be longer, but it’s not possible for the peer to have such a chain at the time the sync starts (it’d need a tip timestamp that is in the future at the time the sync starts).

sdaftuar commented at 10:09 pm on July 28, 2022:

Reworked the comment to try to make that clearer, let me know if it looks better now…

sipa commented at 9:07 pm on August 1, 2022:

Looks good.

in src/headerssync.cpp:168 in de72d0867d outdated

163+    if (headers.size() == 0) return true;
164+
165+    Assume(m_download_state != State::FINAL);
166+    if (m_download_state == State::FINAL) return false;
167+
168+    if (m_last_header_received.IsNull()) {

sipa commented at 2:13 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Any reason to do this here, and not in StartInitialDownload? It seems a bit cleaner to set these values as soon as they’re known, rather than on the fly.

sdaftuar commented at 10:09 pm on July 28, 2022:

Done.

in src/headerssync.cpp:50 in de72d0867d outdated

42+        const std::vector<CBlockHeader>& initial_headers, const arith_uint256&
43+        minimum_required_work, CBlockLocator&& chain_start_locator)
44+{
45+    // A new instance of this object should be instantiated for every headers
46+    // sync, so that we don't reuse our salted hasher between syncs.
47+    assert(m_download_state == State::INITIAL_DOWNLOAD);

sipa commented at 2:13 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Should there be a restriction to not call StartInitialDownload twice on the same object?

sdaftuar commented at 10:09 pm on July 28, 2022:

Seems reasonable, I added a variable to do that. Let me know if that looks good now.

sipa commented at 5:58 pm on July 29, 2022:

Maybe it’s a bit more natural and concise to have a separate state for this? e,g. have states UNSTARTED, INITAL_DOWNLOAD, REDOWNLOAD, FINAL?

sdaftuar commented at 9:26 pm on August 2, 2022:

Done.

in test/functional/p2p_headers_sync_with_minchainwork.py:61 in 7bed25cbd3 outdated

34+            assert {
35+                'height': 0,
36+                'hash': '0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206',
37+                'branchlen': 0,
38+                'status': 'active',
39+            } in chaintips

dergoegge commented at 2:17 pm on July 28, 2022:

This only checks node 1.

0        for node in self.nodes[1:]:
1            chaintips = node.getchaintips()
2            assert(len(chaintips) == 1)
3            assert {
4                'height': 0,
5                'hash': '0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206',
6                'branchlen': 0,
7                'status': 'active',
8            } in chaintips

sdaftuar commented at 10:05 pm on July 28, 2022:

Thanks! Fixed.

in src/headerssync.cpp:158 in de72d0867d outdated

174+        // Doesn't connect to what we expect
175+        Finalize();
176+        return false;
177+    }
178+
179+    // If it does connect, (minimally) validate and store a commitment to each one.

sipa commented at 2:18 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Perhaps document that we rely on the caller having verified that the headers are continuous (each has the previous one’s hash as their prevhash)?

sdaftuar commented at 10:09 pm on July 28, 2022:

Done.

in test/functional/p2p_headers_sync_with_minchainwork.py:30 in 7bed25cbd3 outdated

25+    def test_chains_sync_when_long_enough(self):
26+        self.log.info("Generate blocks on the node with no required chainwork")
27+        self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED-2, sync_fun=self.no_op)
28+
29+        self.log.info("Verify nodes 1 and 2 have no new headers in their headers tree")
30+        time.sleep(2)

dergoegge commented at 2:21 pm on July 28, 2022:

I think you could get rid of the time.sleep with assert_debug_log.

0    with self.nodes[1].assert_debug_log(expected_msgs=["[net] Ignoring low-work chain"]), self.nodes[2].assert_debug_log(expected_msgs=["[net] Ignoring low-work chain"]):
1         self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED-2, sync_fun=self.no_op)

sdaftuar commented at 8:46 pm on July 28, 2022:

Hm, will assert_debug_log work properly here if there might be multiple “[net] Ignoring low-work chain” lines in our debug.log? (Perhaps that would work if I generated one block at a time in a loop, rather than invoke generate once with multiple blocks?)

sdaftuar commented at 9:26 pm on August 2, 2022:

Marking this as closed, I believe this is addressed now with the improved logging and the use of assert_debug_log.

in src/headerssync.cpp:216 in de72d0867d outdated

211+    }
212+
213+    if (!CheckProofOfWork(current.GetHash(), current.nBits, m_consensus_params)) return false;
214+
215+    if ((current_height - m_chain_start->nHeight) % HEADER_COMMITMENT_FREQUENCY == 0) {
216+        // Try to add a commitment.

sipa commented at 2:21 pm on July 28, 2022:

Nit in “Utilize anti-DoS headers download strategy”

Why “try”? It doesn’t look like the adding of a commitment can fail.

sdaftuar commented at 10:09 pm on July 28, 2022:

Fixed.

in test/functional/p2p_headers_sync_with_minchainwork.py:29 in 7bed25cbd3 outdated

24+
25+    def test_chains_sync_when_long_enough(self):
26+        self.log.info("Generate blocks on the node with no required chainwork")
27+        self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED-2, sync_fun=self.no_op)
28+
29+        self.log.info("Verify nodes 1 and 2 have no new headers in their headers tree")

dergoegge commented at 2:22 pm on July 28, 2022:

node 0 and node 2 are not connected in this test but it looks like that is an assumption here?

glozow commented at 9:41 am on July 29, 2022:

+1, it looks like node2 currently doesn’t receive any headers because it’s not connected to node0? IIRC the default is that 0-1 and 1-2 are connected. My suggestion would be to override setup_network with:

0self.setup_nodes()
1self.connect_nodes(0, 1)
2self.connect_nodes(0, 2)
3self.sync_all(self.nodes[0:2])

sdaftuar commented at 6:15 pm on July 31, 2022:

Done.

in test/functional/p2p_headers_sync_with_minchainwork.py:19 in 7bed25cbd3 outdated

14+
15+class RejectLowDifficultyHeadersTest(BitcoinTestFramework):
16+    def set_test_params(self):
17+        self.setup_clean_chain = True
18+        self.num_nodes = 3
19+        # Node0 has no required chainwork; node1 requires 16 blocks; node2 requires 2048 blocks

dergoegge commented at 2:22 pm on July 28, 2022:

Why does NODE1_BLOCKS_REQUIRED match with the comment but NODE2_BLOCKS_REQUIRED doesn’t?

sdaftuar commented at 10:07 pm on July 28, 2022:

Oops, looks like I was off by one a couple of times when I worked on this, and didn’t square this away. Fixed now.

in test/functional/p2p_headers_sync_with_minchainwork.py:43 in 7bed25cbd3 outdated

38+                'status': 'active',
39+            } in chaintips
40+
41+        self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED - self.nodes[0].getblockcount(), sync_fun=self.no_op)
42+        self.log.info("Verify that node1 syncs node0's chain")
43+        time.sleep(2)

dergoegge commented at 2:25 pm on July 28, 2022:

I think this time.sleep is not needed given the sync_blocks call below?

sdaftuar commented at 10:07 pm on July 28, 2022:

Fixed

in test/functional/p2p_headers_sync_with_minchainwork.py:46 in 7bed25cbd3 outdated

42+        self.log.info("Verify that node1 syncs node0's chain")
43+        time.sleep(2)
44+        self.sync_blocks(self.nodes[0:2])
45+
46+        self.log.info("Verify that node2 still has no new headers")
47+        time.sleep(2)

dergoegge commented at 2:26 pm on July 28, 2022:

Could also use assert_debug_log like suggested above instead of the sleep.

sdaftuar commented at 9:26 pm on August 2, 2022:

Also fixed now.

in src/headerssync.cpp:208 in de72d0867d outdated

221+            LogPrint(BCLog::NET, "headers chain is too long; giving up sync peer=%d\n", m_id);
222+            return false;
223+        }
224+    }
225+
226+    m_current_chain_work += GetBlockProof(CBlockIndex(current));

sipa commented at 2:29 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Just a thought for later, but it’s rather ugly to have to construct a CBlockIndex object just to be able to call GetBlockProof. I think GetBlockProof should work (or have a variant that works) with a CBlockHeader too, or even just the nBits value.

in src/headerssync.cpp:241 in de72d0867d outdated

236+    if (m_download_state == State::FINAL) return false;
237+
238+    // Ensure that we're working on a header that connects to the chain we're
239+    // downloading.
240+    if (m_redownloaded_headers.empty()) {
241+        if (header.hashPrevBlock != m_chain_start->GetBlockHash()) {

sipa commented at 2:33 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Is it possible to set this m_redownload_buffer_last_hash = m_chain_start->GetBlockHash(); m_redownload_buffer_last_height = m_chain_start->nHeight; in ValidateAndStoreHeadersCommitments already when the transition to REDOWNLOAD state is made?

sdaftuar commented at 10:10 pm on July 28, 2022:

Yeah that’s much better, thanks. Done.

in src/headerssync.cpp:272 in de72d0867d outdated

267+    // Store this header for later processing.
268+    m_redownloaded_headers.push_back(header);
269+    m_redownload_buffer_last_height = next_height;
270+    m_redownload_buffer_last_hash = header.GetHash();
271+
272+    // If we're processing our target block header, which we verified has

sipa commented at 2:38 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

Could there be a concern that the last or few last headers or so get reorganized during the second phase, resulting in a mismatch at the end? If so, perhaps it’s possible to instead keep track of chainwork again in the second phase rather than remembering the exact hash at which the threshold was reached in the first phase?

sdaftuar commented at 9:27 pm on August 2, 2022:

I added a commit that tries to improve this. Left it unsquashed for now so that it’s easier to review; I don’t believe there are any material DoS risks introduced by this change but just want to make sure we get enough eyes on this logic.

dergoegge commented at 2:41 pm on July 28, 2022: member

Concept & Approach ACK

Left some comments (mostly about the added functional test)

in src/headerssync.cpp:317 in de72d0867d outdated

312+        locator.push_back(m_redownload_buffer_last_hash);
313+    }
314+
315+    locator.insert(locator.end(), m_chain_start_locator.vHave.begin(),
316+            m_chain_start_locator.vHave.end());
317+    return CBlockLocator(locator);

sipa commented at 2:46 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

return CBlockLocator(std::move(locator)) saves a copy.

sdaftuar commented at 10:10 pm on July 28, 2022:

Done.

in src/headerssync.h:140 in de72d0867d outdated

135+     *                     can process and validate now (because these returned
136+     *                     headers are on a chain with sufficient work)
137+     * processing_success: set to false if an error is detected and the sync is
138+     *                     aborted; true otherwise.
139+     */
140+    std::optional<CBlockLocator> ProcessNextHeaders(const std::vector<CBlockHeader>& headers,

sipa commented at 2:50 pm on July 28, 2022:

In commit “Utilize anti-DoS headers download strategy”

This function really returns 3 things: an optional block locator, a vector of headers to process, and a bool processing_success. Having those spread over return values and mutable arguments is a bit ugly.

How about returning a typedef’d std::optional<std::pair<std::optional<CBlockLocator>, std::vector<CBlockHeader>>>, or making a simple custom struct to return the results in?

sdaftuar commented at 10:10 pm on July 28, 2022:

This is way better, thanks.

sipa commented at 2:52 pm on July 28, 2022: member

Just comments on the headerssync.* module so far.

instagibbs commented at 6:22 pm on July 28, 2022: member

concept ACK, cool to see such an old idea actually get implemented

in src/test/pow_tests.cpp:56 in cbf3069c7d outdated

47@@ -44,7 +48,12 @@ BOOST_AUTO_TEST_CASE(get_next_work_lower_limit_actual)
48     pindexLast.nHeight = 68543;
49     pindexLast.nTime = 1279297671;  // Block #68543
50     pindexLast.nBits = 0x1c05a3f4;
51-    BOOST_CHECK_EQUAL(CalculateNextWorkRequired(&pindexLast, nLastRetargetTime, chainParams->GetConsensus()), 0x1c0168fdU);
52+    unsigned int expected_nbits = 0x1c0168fdU;

instagibbs commented at 6:38 pm on July 28, 2022:

can this relation between pindexLast.nBits and expected_nbits be explicitly computed? Would make the test condition below abundantly clear.

sdaftuar commented at 12:35 pm on July 29, 2022:

I’m not sure there’s a great way to do this without essentially duplicating the code from pow.cpp; is that what you have in mind? (That might be a reasonable suggestion, I’m just not sure what makes the most sense.)

instagibbs commented at 12:59 pm on July 29, 2022:

Not convinced either way, maybe just comment on the relationship here?

sdaftuar commented at 9:28 pm on August 2, 2022:

Added a comment, let me know if this is what you had in mind (or please suggest some other language?)

in src/test/pow_tests.cpp:42 in cbf3069c7d outdated

33@@ -32,7 +34,9 @@ BOOST_AUTO_TEST_CASE(get_next_work_pow_limit)
34     pindexLast.nHeight = 2015;
35     pindexLast.nTime = 1233061996;  // Block #2015
36     pindexLast.nBits = 0x1d00ffff;
37-    BOOST_CHECK_EQUAL(CalculateNextWorkRequired(&pindexLast, nLastRetargetTime, chainParams->GetConsensus()), 0x1d00ffffU);
38+    unsigned int expected_nbits = 0x1d00ffffU;

instagibbs commented at 6:41 pm on July 28, 2022:

can this just be the (casted) value of pindexLast.nBits?

sdaftuar commented at 6:54 pm on August 2, 2022:

Well, it could be, but the test would still be correct if pindexLast.nBits were modified to be within a factor of 4 from the max difficulty target.

So it seems to me that having expected_nbts = 0x1d00ffffU is the more important line as it captures exactly what the test case is trying to exercise here (and pindexLast.nBits could be set from that, or not).

in src/test/pow_tests.cpp:73 in cbf3069c7d outdated

64@@ -56,7 +65,12 @@ BOOST_AUTO_TEST_CASE(get_next_work_upper_limit_actual)
65     pindexLast.nHeight = 46367;
66     pindexLast.nTime = 1269211443;  // Block #46367
67     pindexLast.nBits = 0x1c387f6f;
68-    BOOST_CHECK_EQUAL(CalculateNextWorkRequired(&pindexLast, nLastRetargetTime, chainParams->GetConsensus()), 0x1d00e1fdU);
69+    unsigned int expected_nbits = 0x1d00e1fdU;

instagibbs commented at 6:42 pm on July 28, 2022:

can this relation between pindexLast.nBits and expected_nbits be explicitly computed? Would make the test condition below abundantly clear.

in src/pow.cpp:84 in cbf3069c7d outdated

79+
80+    if (height % params.DifficultyAdjustmentInterval() == 0) {
81+        int64_t smallest_timespan = params.nPowTargetTimespan/4;
82+        int64_t largest_timespan = params.nPowTargetTimespan*4;
83+
84+        const arith_uint256 bnPowLimit = UintToArith256(params.powLimit);

instagibbs commented at 6:44 pm on July 28, 2022:

while we’re here, what does bn actually refer to? :grimacing:

wouldn’t mind tossing the old naming schemes for this

sipa commented at 9:09 pm on August 1, 2022:

@instagibbs In the old Satoshi-era naming convention, it referred to “bignum” (there was a CBigNum wrapper around the OpenSSL BIGNUM type, which was used for various big integer operations, both inside script and for PoW purposes).

sdaftuar commented at 9:28 pm on August 2, 2022:

Done.

in src/pow.cpp:78 in cbf3069c7d outdated

70@@ -71,6 +71,54 @@ unsigned int CalculateNextWorkRequired(const CBlockIndex* pindexLast, int64_t nF
71     return bnNew.GetCompact();
72 }
73 
74+// Check that on difficulty adjustments, the new difficulty does not increase
75+// or decrease beyond the permitted limits.
76+bool PermittedDifficultyTransition(const Consensus::Params& params, int64_t height, uint32_t old_nbits, uint32_t new_nbits)
77+{
78+    if (params.fPowAllowMinDifficultyBlocks) return true;

instagibbs commented at 6:46 pm on July 28, 2022:

splitting hairs maybe but should this also check new_bits is at/above powLimit?

sipa commented at 7:37 pm on July 28, 2022:

For the purposes of what is needed for the security analysis of the overall header commitment scheme, the only requirements are that this verifies (a) that the difficulty doesn’t change on non-2016-multiple blocks and (b) doesn’t go up or down too quickly. Its goal is forcing the attacker to spread out their attempted PoW over many blocks, rather than just one or a few (because creating N blocks with each difficulty D is much harder than creating one block with difficulty N*D, if the hashpower available to the attacker is less than the expected value for an N*D difficulty block).

No objection to checking whatever can be checked with the provided arguments, but I think the current code just chooses to check exactly what is needed, and document it, rather than verify everything possible.

sdaftuar commented at 9:31 pm on August 2, 2022:

I think I prefer to just not check anything for chains where proof-of-work isn’t an anti-DoS mechanism for us; seems like it avoids a false sense of security (and it doesn’t accomplish anything anyway, as an attacker would just avoid failing such a check without expending any real resources).

sdaftuar force-pushed on Jul 28, 2022

in test/functional/p2p_headers_sync_with_minchainwork.py:23 in ec2723b39f outdated

18+        self.num_nodes = 3
19+        # Node0 has no required chainwork; node1 requires 15 blocks on top of the genesis block; node2 requires 2047
20+        self.extra_args = [["-minimumchainwork=0x0"], ["-minimumchainwork=0x1f"], ["-minimumchainwork=0x1000"]]
21+
22+    def skip_test_if_missing_module(self):
23+        self.skip_if_no_wallet()

glozow commented at 9:26 am on July 29, 2022:

Not needed since you’re not using wallet in this test

sdaftuar commented at 1:56 pm on July 29, 2022:

Fixed.

in test/functional/p2p_headers_sync_with_minchainwork.py:53 in ec2723b39f outdated

26+        self.log.info("Generate blocks on the node with no required chainwork")
27+        self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED-2, sync_fun=self.no_op)
28+
29+        self.log.info("Verify nodes 1 and 2 have no new headers in their headers tree")
30+        time.sleep(2)
31+        for node in self.nodes[1:]:

glozow commented at 9:51 am on July 29, 2022:

I agree with removing the sleep() since it’s not a very reliable way to wait for things to happen in the functional tests (which are often run in parallel). Not sure about relying on assert_debug_log.

I’d recommend wait_until the headers are received, for example:

0        self.log.info("Generate blocks on the node with no required chainwork")
1        node1_recv_headers = self.nodes[1].getpeerinfo()[0]["bytesrecv_per_msg"]["headers"]
2        node2_recv_headers = self.nodes[2].getpeerinfo()[0]["bytesrecv_per_msg"]["headers"]
3        self.generate(self.nodes[0], NODE1_BLOCKS_REQUIRED-2, sync_fun=self.no_op)
4
5        self.log.info("Verify nodes 1 and 2 have no new headers in their headers tree")
6        self.wait_until(lambda: self.nodes[1].getpeerinfo()[0]["bytesrecv_per_msg"]["headers"] > node1_recv_headers)
7        self.wait_until(lambda: self.nodes[2].getpeerinfo()[0]["bytesrecv_per_msg"]["headers"] > node2_recv_headers)
8        for node in self.nodes[1:]:

(this is using getpeerinfo()[0] assuming node1 and node2 have node0 as their first peer)

sdaftuar commented at 1:58 pm on July 29, 2022:

Ah, that’s clever, but I think the problem is that the number of headers messages that come in after a call to generate is variable, based on the particular blocks that get INV’ed and the responses they generate.

I took an approach of changing the logging to include the height of the chain, and then used the assert_debug_log approach to check for it. This gets rid of all the sleeps and I think ought to work?

in src/test/fuzz/bitdeque.cpp:19 in 5072054428 outdated

14+namespace {
15+
16+constexpr int LEN_BITS = 16;
17+constexpr int RANDDATA_BITS = 20;
18+
19+using bitdeque_type = bitdeque<128>;

glozow commented at 1:01 pm on July 29, 2022:

question in 5072054428fa9229a0e34b9eb4a0eed97639012d: what’s the rationale for fuzzing with 128 instead of default blob size?

sipa commented at 9:11 pm on August 1, 2022:

@glozow The testing power is significantly less if you need 16384 booleans before another allocation is made (because interactions that involve many separate allocations are much more likely to trigger bugs).

glozow commented at 8:50 am on August 2, 2022:

Ah, that makes sense to me - thanks!

in src/test/fuzz/bitdeque.cpp:537 in 5072054428 outdated

518+                    auto rand_end = rand_begin + count;
519+                    auto it = deq.insert(cdeq.begin() + before, rand_begin, rand_end);
520+                    auto bitit = bitdeq.insert(cbitdeq.begin() + before, rand_begin, rand_end);
521+                    assert(it == deq.begin() + before);
522+                    assert(bitit == bitdeq.begin() + before);
523+                }

glozow commented at 1:05 pm on July 29, 2022:

in 5072054428fa9229a0e34b9eb4a0eed97639012d:

Perhaps these methods could be fuzzed too?

clear()
resize()
max_size()
emplace() (though maybe unnecessary since there’s already insert)

sipa commented at 9:28 pm on August 1, 2022:

@glozow @sdaftuar Added a commit that adds these to the fuzzer in https://github.com/sipa/bitcoin/commits/pr25717

sdaftuar commented at 9:30 pm on August 2, 2022:

Thanks, cherry-picked here and squashed.

sdaftuar force-pushed on Jul 29, 2022

in src/headerssync.cpp:106 in fe620e1a1f outdated

101+            // sending a new getheaders immediately could trigger an infinite
102+            // loop. Just give up for now; if our peer ever gives us an block
103+            // INV later we will fetch headers then, and likely retrigger this
104+            // logic.
105+            Finalize();
106+            return ret;

sipa commented at 6:08 pm on July 29, 2022:

There are lots of return ret; statements in this function now. Perhaps it’s cleaner to restructure it so that there is only a single return statement at the end, and all the branches just modify the values in ret?

sdaftuar commented at 3:41 pm on August 11, 2022:

Incorporated your fix! :)

in src/headerssync.cpp:10 in fe620e1a1f outdated

 5+#include <util/check.h>
 6+
 7+//! Store a commitment to a header every HEADER_COMMITMENT_FREQUENCY blocks
 8+constexpr size_t HEADER_COMMITMENT_FREQUENCY = 571;
 9+//! Pull out headers for acceptance during redownload once our buffer exceeds this amount
10+constexpr size_t REDOWNLOAD_HEADERS_THRESHOLD = 25*HEADER_COMMITMENT_FREQUENCY;

glozow commented at 3:27 pm on August 1, 2022:

in fe620e1a1fb2ef11d06cf941249ad4e5e4250bbf:

Perhaps 25 should be named, e.g. REDOWNLOAD_BUFFER_THRESHOLD? IIUC it’s possible to change this in the future if the optimization script outputs a different rsize?

glozow commented at 3:59 pm on August 1, 2022:

nit in fe620e1a1fb2ef11d06cf941249ad4e5e4250bbf:

0//! Store a commitment to a header every HEADER_COMMITMENT_FREQUENCY blocks
1constexpr size_t HEADER_COMMITMENT_FREQUENCY{571};
2//! Pull out headers for acceptance during redownload once our buffer exceeds this number of headers
3constexpr size_t REDOWNLOAD_HEADERS_THRESHOLD{25*HEADER_COMMITMENT_FREQUENCY};

sipa commented at 9:04 pm on August 1, 2022:

@glozow REDOWNLOAD_HEADERS_THRESHOLD will (in optimal configurations) always be a multiple of HEADER_COMMITMENT_FREQUENCY. This could just have been written as just = 14275;, but that would make this multiple-of relation less clear.

If it’s confusing, I would suggest instead doing:

0constexpr size_t HEADER_COMMITMENT_FREQUENCY = 571;
1constexpr size_t REDOWNLOAD_HEADERS_FACTOR = 25;

and then replace the current instances of REDOWNLOAD_HEADERS_THRESHOLD with HEADER_COMMITMENT_FREQUENCY * REDOWNLOAD_HEADERS_FACTOR.

sdaftuar commented at 9:30 pm on August 2, 2022:

Thanks, done.

sdaftuar commented at 9:40 pm on August 2, 2022:

Done.

in test/functional/p2p_headers_sync_with_minchainwork.py:24 in e3260960ad outdated

19+
20+    def setup_network(self):
21+        self.setup_nodes()
22+        self.connect_nodes(0, 1)
23+        self.connect_nodes(0, 2)
24+        self.sync_all(self.nodes[0:2])

glozow commented at 3:34 pm on August 1, 2022:

in 172453aa980e7e0b9cfa616971090461a2796c09:

Oops! I’m sorry, I gave an incorrect suggestion before :facepalm: this would exclude node2

0        self.sync_all()

sdaftuar commented at 9:30 pm on August 2, 2022:

Oops I should have caught that too. Fixed!

in src/validation.cpp:3435 in fe620e1a1f outdated

3430+{
3431+    bool proof_of_work_valid = true;
3432+    for (const CBlockHeader& header : headers) {
3433+        proof_of_work_valid = proof_of_work_valid && CheckProofOfWork(header.GetHash(), header.nBits, consensusParams);
3434+    }
3435+    return proof_of_work_valid;

glozow commented at 3:43 pm on August 1, 2022:

nit in fe620e1a1fb2ef11d06cf941249ad4e5e4250bbf, this might be clearer:

0    return std::all_of(headers.cbegin(), headers.cend(),
1                       [&](const auto& header) { return CheckProofOfWork(header.GetHash(), header.nBits, consensusParams);});

sdaftuar commented at 9:38 pm on August 2, 2022:

Done, thanks.

in src/pow.h:33 in 130d838df0 outdated

28+ * This function only checks that the new value is within a factor of 4 of the
29+ * old value for blocks at the difficulty adjustment interval, and otherwise
30+ * requires the values to be the same.
31+ *
32+ * Always returns true on networks where min difficulty blocks are allowed,
33+ * such as regtest/testnet.

glozow commented at 3:53 pm on August 1, 2022:

in 130d838df0b02f1115cb54e62002a95823d5efcb:

Maybe also mention this always returns true if not on a difficulty adjustment block?

sipa commented at 9:30 pm on August 1, 2022:

@glozow No, on non-difficulty-adjustment blocks it returns whether the difficulty changed.

glozow commented at 4:19 pm on August 1, 2022: member

Approach ACK, still reviewing the code. Ran this on mainnet a few times as a sanity check, synced fine. Ran the fuzzer for a while.

It may make sense to include the script in the repository (as I can imagine it being re-run to tune things occasionally, but there should not be a need to do it more than every few years). It could also live on the devwiki or just linked-to in the code. Opinions?

Maybe contrib/devtools? Devwiki seems fine too, no strong opinion.

I have some questions trying to understand headerssync_params.py. Why is BLOCK_INTERVAL = timedelta(seconds=598, microseconds=800000) rather than exactly 10min (link)? Is it for expected hashrate increase? And based on “especially as this attack is only possible before the victim has learned about the honest chain” (link) does this mean we should always attempt to sync headers from all/multiple outbound peers at once?

sipa commented at 9:56 pm on August 1, 2022: member

@glozow

It may make sense to include the script in the repository (as I can imagine it being re-run to tune things occasionally, but there should not be a need to do it more than every few years). It could also live on the devwiki or just linked-to in the code. Opinions?

Maybe contrib/devtools? Devwiki seems fine too, no strong opinion.

My current thinking is to add it to contrib/devtools in a follow-up PR, together with instructions in the release-process to run/update it. No need to add the review burden of that here, as the current values are likely more than sufficient for at least a few years.

I have some questions trying to understand headerssync_params.py. Why is BLOCK_INTERVAL = timedelta(seconds=598, microseconds=800000) rather than exactly 10min (link)? Is it for expected hashrate increase?

Yes, the number is just the average block rate in 2021, approximately. It barely matters, so I’ve changed it to just 600 seconds. A few percent off on this value isn’t going to change the result.

And based on “especially as this attack is only possible before the victim has learned about the honest chain” (link) does this mean we should always attempt to sync headers from all/multiple outbound peers at once?

I don’t think so; that also worsens the attack potential in addition to reducing it, because it increases the chance that at least one of the nodes synced from will be an attacker, and gives them a window while the first (eventually) successful hasn’t reached the second stage yet.

And syncing from all peers at once is a pretty extreme position to take from a bandwidth optimization perspective (in non-attack scenarios). For most aspects of the P2P protocol, we attempt to never request the same thing twice simultaneously (this is true for transactions and blocks, except the high-bandwidth compact block mode which makes a bounded number of exceptions). Headers are small, so strictly requiring only a single header sync in flight is pretty extreme and opens up the ability for peers to stall the sync for a long time, but fetching from all at once means wasting possibly several GB of volume. #25720 picks something in between: start one sync for each new block announcement.

glozow commented at 4:03 pm on August 2, 2022: member

@sipa Makes sense, thanks for explaining!

sdaftuar force-pushed on Aug 2, 2022

in src/pow.cpp:86 in 68c05de96a outdated

81+        int64_t smallest_timespan = params.nPowTargetTimespan/4;
82+        int64_t largest_timespan = params.nPowTargetTimespan*4;
83+
84+        const arith_uint256 pow_limit = UintToArith256(params.powLimit);
85+        arith_uint256 observed_new_target;
86+        observed_new_target.SetCompact(new_nbits);

mzumsande commented at 10:17 pm on August 2, 2022:

It might save a few operations to first check if the new target is larger or smaller than the old one, and conditional on that run only one of the two checks.

sdaftuar commented at 4:01 pm on August 11, 2022:

I made an attempt at this suggestion to see if the code would look any simpler, but it just seems to make it slightly more complex to think about (and not any shorter, at least the way I wrote it, which is mimicking the code in CalculateNextWorkRequired()).

At any rate, I also think the performance effect should be very minor, since we only do this once every 2000 headers (and nothing here should be all that slow to begin with, I think?).

I’ll leave this as-is, and if there’s some reason to make further improvements here, perhaps someone can pick it up in a followup PR.

in src/net_processing.cpp:5074 in c64d02a21d outdated

5071@@ -5076,6 +5072,24 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
5072 
5073     MaybeSendAddr(*pto, *peer, current_time);
5074 
5075+    // Delay sending SENDHEADERS (BIP 130) until we're done with an
5076+    // initial-headers-sync with this peer. Receiving headers announcements for

mzumsande commented at 10:45 pm on August 2, 2022:

Would we actually receive headers messages during initial-headers-sync without this commit? As I wrote here, I got the impression that other peers revert to Inv when they found a new block during headers sync and won’t send us unrequested headers messages anyway, so I wonder if delaying SENDHEADERS actually has an effect.

sdaftuar commented at 12:53 pm on August 3, 2022:

I believe it is possible, for 2 main reasons: (1) BIP 130 has no explicit rules around when headers may be used to announce a block, so in theory any software implementing BIP 130 might be free to send a header to us regardless of whether we’ve completed header sync; (2) Bitcoin Core implementations would send us announcements via header at the end of our phase 1 download (if we reach their tip at the end of that phase, which is certainly possible). So I think delaying sendheaders helps eliminate the common cases of how we might get a spurious headers message during our sync.

in src/net_processing.cpp:2826 in 6335a03b9e outdated

2682@@ -2483,25 +2683,31 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
2683     }
2684 
2685     // At this point, the headers connect to something in our block index.
2686-    if (!CheckHeadersAreContinuous(headers)) {
2687-        Misbehaving(peer, 20, "non-continuous headers sequence");
2688+    // Do anti-DoS checks to determine if we should process or store for later
2689+    // processing.
2690+    if (!already_validated_work && TryLowWorkHeadersSync(peer, pfrom,

mzumsande commented at 4:07 am on August 3, 2022:

I think it would be good to have some unconditional logging during the first phase, because the node is currently quiet for possibly minutes after startup, which might lead users to think the node is stuck. Currently, the “Synchronizing blockheaders, height: (…) (~(…)%)” messages only start in the redownload phase. We should probably not log too much to prevent possible disk-filling attacks caused by a low-work header chain, but at least announcing the start of each phase would be nice.

sdaftuar commented at 4:19 pm on August 3, 2022:

So if I add unconditional logging on starting a sync, then in the absence of a rate limiter I think it’d be easy for someone to fill up our disk by starting and restarting syncs with us. Perhaps I can log at most once per peer connection when we start a low-work headers sync, what do you think?

instagibbs commented at 4:27 pm on August 3, 2022:

maybe a per connection per X seconds, which would give consistent feedback that’s ratelimited?

sipa commented at 4:30 pm on August 3, 2022:

Ideally, I’d say, we produce header sync progress updates based on the longest currently-ongoing header sync across all peers. That sounds like a pain to implement, though.

sipa commented at 4:40 pm on August 3, 2022:

An additional complication in logging is that the first phase takes place entirely in net_processing, in per-peer datastructures. There is no interaction with validation, which currently emits the header sync progress updates, until the second phase.

mzumsande commented at 4:43 pm on August 3, 2022:

Maybe once per connection, and excluding inbound peers would be safe? Alternatively, we could log something once on startup if our best chain is below nMinimumChainWork. I’m thinking of the typical case of a new node without any headers starting up.

in src/headerssync.h:95 in 6335a03b9e outdated

90+ * that we see. With a reasonable choice of N, this uses relatively little
91+ * memory even for a very long chain.
92+ *
93+ * - In phase 2 (redownload), keep a lookahead buffer of size H, and only
94+ * accept a batch of N (N < H) headers to memory once we've verified that H/N =
95+ * S commitments have all passed verification. With this parametrization, we

mzumsande commented at 4:49 am on August 3, 2022:

I can’t really relate this comment to the implementation. From what I understand we accept all received headers in phase 2 to memory (m_redownloaded_headers), but only accept them to the block index once there are REDOWNLOAD_HEADERS_CHECK_BITS correct commitments on top of them - with the batch size N of headers that are being accepted to the block index being just an (unimportant) consequence of the batch size in which we happen to receive headers from peers (2000 once the lookahead buffer is filled)?

sdaftuar commented at 4:17 pm on August 3, 2022:

I updated the comment to hopefully make more sense and better match how we’re thinking about this; let me know if it reads better now.

mzumsande commented at 9:16 pm on August 3, 2022:

Thanks, it reads well now!

mzumsande commented at 5:03 am on August 3, 2022: contributor

Concept ACK

Not finished with the review yet, but I left some comments I had so far.

(With the latest push, feature_block.py and rpc_blockchain.py currently fail)

sdaftuar force-pushed on Aug 3, 2022

in src/pow.cpp:76 in 68c05de96a outdated

70@@ -71,6 +71,57 @@ unsigned int CalculateNextWorkRequired(const CBlockIndex* pindexLast, int64_t nF
71     return bnNew.GetCompact();
72 }
73 
74+// Check that on difficulty adjustments, the new difficulty does not increase
75+// or decrease beyond the permitted limits.
76+bool PermittedDifficultyTransition(const Consensus::Params& params, int64_t height, uint32_t old_nbits, uint32_t new_nbits)

mzumsande commented at 6:50 pm on August 4, 2022:

Would it be possible to also add PermittedDifficultyTransition() coverage to the fuzz test src/test/fuzz/pow.cpp? I’m thinking of something like combining it with the new target from GetNextWorkRequired(), and then asserting that the resulting transition passes PermittedDifficultyTransition().

sdaftuar commented at 8:42 pm on August 11, 2022:

Included your test here, thanks! :)

in src/headerssync.cpp:41 in 5691c174c5 outdated

64+    // today. This serves as a memory bound on how many commitments we might
65+    // store from this peer, and we can safely give up syncing if the peer
66+    // exceeds this bound, because it's not possible for a consensus-valid
67+    // chain to be longer than this (at the current time -- in the future we
68+    // could try again, if necessary, to sync a longer chain).
69+    m_max_commitments = 6*(GetAdjustedTime() - chain_start->GetMedianTimePast() + MAX_FUTURE_BLOCK_TIME) / HEADER_COMMITMENT_FREQUENCY;

Sjors commented at 3:01 pm on August 5, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: note to other reviewers: when using this code to sync from the genesis block in the year 2100, m_max_commitments would be about 32 million. That’s less than 4 MB per peer, assuming bitdeque has low overhead.

Of course by that time the baked in value of nMinimumChainWork might be too low.

in src/headerssync.cpp:8 in 5691c174c5 outdated

0@@ -0,0 +1,318 @@
1+#include <headerssync.h>
2+#include <logging.h>
3+#include <pow.h>
4+#include <timedata.h>
5+#include <util/check.h>
6+
7+//! Store a commitment to a header every HEADER_COMMITMENT_FREQUENCY blocks
8+constexpr size_t HEADER_COMMITMENT_FREQUENCY{571};

Sjors commented at 3:52 pm on August 5, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d nit: it’s a period or interval, not a frequency (which would be 1/571)

sdaftuar commented at 8:41 pm on August 11, 2022:

Fixed.

in src/headerssync.cpp:110 in 5691c174c5 outdated

105+            return ret;
106+        } else {
107+            // If we're in INITIAL_DOWNLOAD and we get a non-full headers
108+            // message, then the peer's chain has ended and definitely doesn't
109+            // have enough work, so we can stop our sync.
110+            LogPrint(BCLog::NET, "Ignoring low-work chain (height=%u) from peer=%d\n", m_current_height, m_id);

Sjors commented at 4:08 pm on August 5, 2022:

I’m still a bit unclear on the sequence of calls here. Why are there two checks for non-full messages with low-work?

sdaftuar commented at 10:58 pm on August 5, 2022:

Just to make sure I understand your question – I think you’re wondering why this log message appears here as well as in net_processing.cpp at line 2473?

The issue is that in net_processing we won’t bother even starting a headers sync using the logic in headerssync.cpp if we receive a headers message that (a) isn’t full and (b) which connects to something we know in our block index and (c) has too little work, because we know in this case that the headers don’t lead to a high work chain (if it did, the headers message would have been full, indicating the peer has more to give us).

However, in the case that we start a headers sync using this module, then net_processing will be receiving headers that don’t connect to the block index, and those messages can only be properly processed by this logic which keeps track of where we expect the headers to connect. So in this logic we also are checking whether the sync has to end because our peer has no more headers to give us (ie the headers message we processed from them was not full, and the chain has too little work).

Sjors commented at 9:28 am on August 6, 2022:

That was indeed my question.

The other confusing bit is the comment “and definitely doesn’t have enough work”. When looking at this function in isolation, it’s not obvious why this would be the case, since we don’t even check the work. It’s also not obvious if I look at where this is called from.

Sjors commented at 6:26 pm on August 6, 2022:

Oh, ValidateAndStoreHeadersCommitments() has the side-effect of changing m_download_state to REDOWNLOAD once minimum difficulty is reached.

in src/headerssync.cpp:171 in 5691c174c5 outdated

187+        m_blockhash_with_sufficient_work = headers.back().GetHash();
188+        m_redownloaded_headers.clear();
189+        m_redownload_buffer_last_height = m_chain_start->nHeight;
190+        m_redownload_buffer_first_prev_hash = m_chain_start->GetBlockHash();
191+        m_redownload_buffer_last_hash = m_chain_start->GetBlockHash();
192+        m_download_state = State::REDOWNLOAD;

Sjors commented at 4:12 pm on August 5, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: it would be good to have a log message when the state switches to REDOWNLOAD.

sdaftuar commented at 7:46 pm on August 11, 2022:

Done.

in src/net_processing.cpp:2458 in 5691c174c5 outdated

2453+                        headers, minimum_chain_work,
2454+                        m_chainman.ActiveChain().GetLocator(chain_start_header)))
2455+            {
2456+                Assume(!locator->vHave.empty());
2457+                if (!locator->vHave.empty() && MaybeSendGetHeaders(pfrom, *locator, peer)) {
2458+                    LogPrint(BCLog::NET, "more getheaders (from %s) to end to peer=%d\n",

Sjors commented at 4:13 pm on August 5, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d : can you add the height to this message? That and a message when the state changes, makes it easier to follow what’s happening in the -debug=net log.

Sjors commented at 6:19 pm on August 5, 2022:

But see also this discussion on logging: #25717 (review)

Sjors commented at 5:42 pm on August 11, 2022:

The new log stuff is good enough.

Sjors commented at 4:16 pm on August 5, 2022: member

Very cool stuff. Some initial comments. Review test hint: -debug=net -stopatheight=100

Update 2022-08-06: why do we need bitdeque instead of just std::vector<bool>? Is it only because we call m_header_commitments.pop_front()? Couldn’t we also just keep the commitments, track our index with an integer and then delete the whole thing when we’re done?

in test/functional/p2p_dos_header_tree.py:66 in 5691c174c5 outdated

62@@ -62,7 +63,7 @@ def run_test(self):
63 
64         self.log.info("Feed all fork headers (succeeds without checkpoint)")
65         # On node 0 it succeeds because checkpoints are disabled
66-        self.restart_node(0, extra_args=['-nocheckpoints'])
67+        self.restart_node(0, extra_args=['-nocheckpoints', "-minimumchainwork=0x0"])

Sjors commented at 12:32 pm on August 6, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: maybe add additional tests where you set minimumchainwork to the chainwork at the original checkpoint (0x0000000000000000000000000000000000000000000000000000022302230223).

0
1        self.log.info("Feed all fork headers (fails due to minimum chain work)")
2        # Disable checkpoint, but require its chainwork as the minimum.
3        self.restart_node(0, extra_args=['-nocheckpoints', "-minimumchainwork=0x0000000000000000000000000000000000000000000000000000022302230223"])
4        peer_minimumchainwork = self.nodes[0].add_p2p_connection(P2PInterface())
5        peer_minimumchainwork.send_and_ping(msg_headers(self.headers_fork))
6        with self.nodes[0].assert_debug_log(['Ignoring low-work chain']):
7            peer_minimumchainwork.send_message(msg_headers(self.headers_fork))
8            # We don't disconnect

Note that if you also set the minimum chainwork for node 0 for the entire test, it will stall at “Feed all fork headers (fails due to checkpoint)”, because we no longer disconnect.

Not sure if the above covers more ground than the new test file introduced in 7be81ce6257b1940d633b1311432c09fbbe09a0b, but it might still be useful to illustrate the behavior change. Then again, in the real world our minimum chainwork is much higher than that of the most recent checkpoints, so this example may be more confusing than illustrating?

Sjors commented at 11:56 am on August 8, 2022:

Do we need to rethink our headers sync timeout and/or disconnect logic? What happens when the first peer we try to do a headers sync with sends a low work chain? Since we don’t disconnect, is there another mechanism that causes us to try with another peer, or do we wait for a timeout or new block announcement? (trying to figure this out from just reading net_processing.cpp is daunting…)

sipa commented at 12:21 pm on August 8, 2022:

@sjors Have you seen #25720?

Sjors commented at 12:34 pm on August 8, 2022:

I have, but during initial sync it might be a while until the first INV with a block arrives.

Sjors commented at 12:39 pm on August 8, 2022:

This comment goes into more detail: https://github.com/bitcoin/bitcoin/pull/25720/files#r936707629

But still, if the first peer gives us low PoW headers - and we don’t disconnect from it as we do know when it doesn’t contain a checkpoint - that would cause us to do nothing for quite a while. At least IIUC.

Sjors commented at 12:43 pm on August 8, 2022:

Oh wait, we don’t have to disconnect, as long as nSyncStarted is set to 0 we’ll try another peer (but we don’t do that in this PR).

sipa commented at 1:41 pm on August 8, 2022:

@Sjors Being in IBD doesn’t stop other peers from announcing new blocks to us, IIRC.

Sjors commented at 1:46 pm on August 8, 2022:

By “new blocks” do you mean blocks they know we don’t have yet, or blocks they just received? Because the latter is not very frequent.

sipa commented at 2:14 pm on August 8, 2022:

Actually new blocks, once every 10 minutes. That’s the current protection in practice against peers stalling headers sync forever by being very slow - once a new block is found on the network, we effectively start headers sync with all peers that announce it to us (unintentionally). #25720 changes it to just one new peer every block.

Sjors commented at 4:07 pm on August 8, 2022:

So currently peers can stall us by sending valid headers slowly (ensuring the first checkpoint is not reached within ~10 minutes). After this PR they can stall us regardless of the how fast the headers are sent. But just sending headers slowly is simple enough to implement, so I suppose this PR does not make that problem worse.

mzumsande commented at 7:26 pm on August 8, 2022:

So currently peers can stall us by sending valid headers slowly (ensuring the first checkpoint is not reached within ~10 minutes). After this PR they can stall us regardless of the how fast the headers are sent. But just sending headers slowly is simple enough to implement, so I suppose this PR does not make that problem worse.

This is not my understanding. Peers that we initially choose for headers sync currently have a time window of ~20 minutes (15min HEADERS_DOWNLOAD_TIMEOUT_BASE + a component based on the expected number of headers). If we don’t have an up-to-date best header by then (no matter if we even received that from them or another peer), then they get disconnected. This logic is unaffected by this PR, although peers now have to serve the headers chain twice within these 20 minutes, so they need to be twice as fast. Maybe it could make sense to add some criteria based on actual progress to this - if a peer has failed to serve us a single header for 10 minutes, does it really make sense to give them another 10 minutes?

The fact that we may also pick additional peers to sync headers from whenever a new block is found (10 minutes on average, but obviously very fluctuating) is independent from that, and also independent from the behavior of a potentially stalling peer. It could sometimes happen a few seconds after the headerssync start if a lucky block is mined then.

Sjors commented at 7:57 pm on August 8, 2022:

That’s still different from the case where a peer feeds us a set of headers that doesn’t include a checkpoint. In that case we immediately disconnect and move on the next peer. That behavior goes away with this PR (though it’s probably not a big deal).

sdaftuar commented at 2:11 pm on August 11, 2022:

Do we need to rethink our headers sync timeout and/or disconnect logic? What happens when the first peer we try to do a headers sync with sends a low work chain? Since we don’t disconnect, is there another mechanism that causes us to try with another peer, or do we wait for a timeout or new block announcement? (trying to figure this out from just reading net_processing.cpp is daunting…)

So currently peers can stall us by sending valid headers slowly (ensuring the first checkpoint is not reached within ~10 minutes). After this PR they can stall us regardless of the how fast the headers are sent. But just sending headers slowly is simple enough to implement, so I suppose this PR does not make that problem worse.

Note that ever since #5927, checkpoints only affect block/block header validation AFTER we have downloaded a headers chain that includes the checkpointed hashes. So when a new node is starting up for the first time, if its initial headers sync peer serves a bogus chain, it would still be validated and accepted (even if it includes blocks at heights that have different block hashes checkpointed at those heights), because the main chain with the checkpointed block hashes wouldn’t have been downloaded yet.

But that distinction is not so important, because for the purposes of (1) determining when the initial headers sync peer is bad/broken, or (2) determining whether a new node starting up can be flooded with low difficulty headers, an adversary has other options to cause us the same problems. In the case of issue (1), as @mzumsande explained, for bandwidth reasons we give our initial headers sync peer roughly 20 minutes before we’ll disconnect them if we haven’t yet seen a chain that looks close to caught up. So a peer looking to stall us can just give us no headers at all (rather than give us a bogus headers chain) and can still stall us for the same amount of time, under that check. In the case of issue (2), even if block hashes were being checked at the checkpointed heights prior to downloading a chain with those hashes, an adversary looking to waste our memory could just give us a lots of chains that start at genesis and build up to the first checkpoint, and then restart from genesis again with a new chain.

I believe this PR completely solves issue (2).

Regarding issue (1): for a very long time, our protection against a stalling peer has been that we will initiate headers sync (via a getheaders message) with any peer sending us a block INV. This is overkill and made somewhat more bandwidth-wasteful after this PR, so #25720 tries to strike a better balance between robustness in the face of stalling peers, and bandwidth usage. However I believe exactly how we strike that balance is orthogonal to the behavior introduced here, so it makes more sense to discuss strategies for when we start to sync headers with a peer in that PR, rather than here.

in src/net_processing.cpp:2467 in 5691c174c5 outdated

2620@@ -2457,18 +2621,54 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
2621                                             const std::vector<CBlockHeader>& headers,
2622                                             bool via_compact_block)
2623 {
2624-    const CNetMsgMaker msgMaker(pfrom.GetCommonVersion());

Sjors commented at 1:09 pm on August 6, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: I was a bit surprised this never triggered an unused variable warning, but that’s apparently possible with non-trivial types: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55203

in src/net_processing.cpp:2386 in 5691c174c5 outdated

2318+    arith_uint256 near_chaintip_work = 0;
2319+    LOCK(cs_main);
2320+    if (m_chainman.ActiveChain().Tip() != nullptr) {
2321+        const CBlockIndex *tip = m_chainman.ActiveChain().Tip();
2322+        // Use a 144 block buffer, so that we'll accept headers that fork from
2323+        // near our tip.

Sjors commented at 6:04 pm on August 6, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: I assume we’re interested in those, because although their total work is lower, there’s a chance a miner builds on top of it. In that case, if we receive only that new block (header), we already have the parent? Otherwise we’d need a peer to send multiple headers to learn about this new block.

sdaftuar commented at 2:17 pm on August 11, 2022:

Yeah, so when I started writing this code I think I had no buffer at all, which turns out to cause problems when there’s a small reorg (requiring extra round-trips for blocks to successfully relay). I think this also caused zillions of our functional tests to fail.

Upon further thought, I reasoned that for anti-DoS purposes, it’s enough to have any kind of work threshold that grows with the tip, rather than is pegged at the tip’s work exactly. So I picked something big enough that we should never have a problem with headers not relaying if they’re at all close to the tip, while small enough that I figured an attacker would not be able to keep up for very long unless they had a huge amount of hashpower. The exact value here is pretty arbitrary though.

in src/headerssync.cpp:80 in 5691c174c5 outdated

89+    if (m_download_state == State::FINAL) return ret;
90+
91+    if (m_download_state == State::INITIAL_DOWNLOAD) {
92+        // During INITIAL_DOWNLOAD, we minimally validate block headers and
93+        // occasionally add commitments to them, until we reach our work
94+        // threshold.

Sjors commented at 6:28 pm on August 6, 2022:

5691c174c5cb2f9947e6cc2db8d32b4b4715e47d: maybe add a comment “m_download_state is updated to REDOWNLOAD if headers reach the minimum difficulty.”

sdaftuar commented at 8:41 pm on August 11, 2022:

Done.

pinheadmz commented at 1:05 pm on August 7, 2022: member

This subject reminds me of the “Chain Width Expansion” attack described in https://bcoin.io/papers/bitcoin-chain-expansion.pdf

bitcoin-dev thread: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-October/017354.html

In this paper the author suggested a solution (section 4.3) using proof-of-work to limit the rate of downloading headers (as well as pruning alternate header chain branches aka “chain width”).

I wonder how this strategy compares to the double-download strategy in this PR. They might be close in total elapsed time (slow downloading vs. downloading twice) but would obviously save on bandwidth.

Conceptually I love the idea of replacing hard-coded checkpoints, I’m just curious how this alternate solution measures up.

Sjors commented at 8:36 am on August 8, 2022: member

@pinheadmz of the three attack scenarios in the paper, 1.3 is prevented because we ignore unsolicited blocks with less work than the tip (see p2p_unrequested_blocks.py). The paper also points this out in (2). So the two attacks to worry about are 1.1 and 1.2. If you can solve 1.2 without the use of checkpoints, then 1.1 is automatically solved.

So the comparison boils down to: how does this PR stack up against the proposal in the paper in fixing 1.2.

Specifically the solutions we’re looking at are 4.2 and 4.3 in the paper, since 4.1 (headers-first sync) is already implemented.

Note that bandwidth for headers is only a fraction of bandwidth for blocks. All things equal less bandwidth is better, but it’s not a huge factor on its own when comparing the approaches.

The same goes for elapsed time. Downloading headers takes only a fraction of the time of downloading blocks.

Also, because we can receive headers unsolicited, the amount of time and bandwidth we spend downloading useless headers is somewhat out of our control (rate limiting based on PoW could help though, see my comment on 4.3).

I think there’s two metrics we should care about, given an ongoing spam attack and a single honest node whispering the truth:

how long does it take until we know enough to go ahead and download the corresponding blocks
how much of disk space gets wasted with spam headers (and can we prune that before it becomes a serious problem)

One meta issue is that afaik there’s no Bitcoin Core pull request of 4.1 and 4.2 from the paper, so that works in favor of this PR which does have working code (from a rfc7282 point of view). But the paper states there is an implementation for Bcoin, so this is not a strong argument against. What do we know of how these implementations work in practice?

With all that out the way, how do the proposals 4.2 and 4.3 from the paper actually compare to this PR?

4.2 suggests picking a maximum number of header chains to store, before pruning the shortest. It suggests storing 10 chains for a total of 480MB. a) that’s far more memory use than this PR requires (~700 KiB per peer, regardless of the amount of spam fired at us) b) In terms of bandwidth, under nice circumstances the approach uses less bandwidth, because headers are only downloaded once. But under attack circumstance most of the bandwidth is spent on downloading spam headers, in which case the redownload of correct headers is negligible. That goes for 4.2 as well as this PR. c) I’m not sure how the proposal holds up in a timewarp attack; I suppose it could perform the sane initial sanity checks on all header chains, e.g. checking the difficulty adjustment is allowed. Dealing with a timewarp attack is important because it would dramatically increase the space needed to store a header chain.

4.3 Rate limiting header requests: I’m not sure how to evaluate that (others on the mailinglist also expressed difficulty with that). I do notice the paper points to advantages for CPU use, but I’m more worried about storage. I’m also not sure if this approach would put an “honest” massive reorg (by alien attacker) at an unfair disadvantage, e.g. if said reorg started with lots of low difficulty blocks. But I guess eventually we’d find it.

Sjors commented at 10:19 am on August 8, 2022: member

But the paper states there is an implementation for Bcoin […] What do we know of how these implementations work in practice?

Found the Bcoin client implementation, with the rate limiting part in https://github.com/bcoin-org/bcoin/commit/cf9d364d6a9424b9f4146bd620d43034af43b056. Unfortunately there hasn’t been any review on it, and it’s unmerged, so we don’t have data on how this performs in practice.

in src/headerssync.cpp:285 in caa2419e65 outdated

289+    m_redownload_buffer_last_hash = header.GetHash();
290+
291+    return true;
292+}
293+
294+std::vector<CBlockHeader> HeadersSyncState::RemoveHeadersReadyForAcceptance()

Sjors commented at 10:40 am on August 8, 2022:

Maybe rename to PopHeadersReadyForAcceptance()

sdaftuar commented at 8:41 pm on August 11, 2022:

Done.

in src/headerssync.h:168 in caa2419e65 outdated

147+     * ProcessingResult.success: set to false if an error is detected and the sync is
148+     *                       aborted; true otherwise.
149+     * ProcessingResult.locator: if present, the next locator to send in a
150+     *                       getheaders message
151+     */
152+    ProcessingResult ProcessNextHeaders(const std::vector<CBlockHeader>&

Sjors commented at 10:43 am on August 8, 2022:

Passing headers in and out of this function is a bit hard to follow. I think we need a more descriptive term than “process” and “to process”. You could rename the input headers to received_headers and the return value headers_to_process with pow_checked_headers or filtered_headers.

But perhaps it’s even more clear if you decouple the feeding of headers into HeadersSyncState from the step of extracting filtered headers. Though I suppose these do have to happen in lock step.

sdaftuar commented at 8:01 pm on August 11, 2022:

I’m doing the rename you suggest, but yes pulling headers out of HeadersSyncState needs to happen in lockstep with processing, so I think rather than add another API call to extract them, it makes the most sense to just return them.

in src/net_processing.cpp:5077 in 2444ce2f8c outdated

5071@@ -5076,6 +5072,24 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
5072 
5073     MaybeSendAddr(*pto, *peer, current_time);
5074 
5075+    // Delay sending SENDHEADERS (BIP 130) until we're done with an
5076+    // initial-headers-sync with this peer. Receiving headers announcements for
5077+    // new blocks while trying to sync their headers chain is problematic,
5078+    // because of the state tracking done.
5079+    if (!peer->m_sent_sendheaders && pto->GetCommonVersion() >= SENDHEADERS_VERSION) {

Sjors commented at 11:38 am on August 8, 2022:

2444ce2f8cb1330ba1dc88b7e5407a3efcfd9aeb The above MaybeSendPing and MaybeSendAddr suggests we should put this in a helper function MaybeSendSendHeaders()

sdaftuar commented at 8:41 pm on August 11, 2022:

Done. :)

sipa commented at 9:54 pm on August 8, 2022: member

I’ve pushed a branch to https://github.com/sipa/bitcoin/commits/pr25717b with some suggested changes:

Switch to randomized commitment offset (which the simulation script shows reduces the attacker’s power a bit, permitting lower per-peer memory usage)
Send locators that contain mixes of the best-known global state (m_chainman.m_best_header) and the current per-peer headerssync state. This seems a very robust way to avoid a whole lot of duplicate work/bandwidth. It means that if we’ve received (in 2nd phase) headers from peer A up to height H, while we’re simultaneously syncing headers (phase 1 or 2) from peer B at height below H, the locator we send to B will make them instantly skip ahead to wherever we already are with A (starting a new phase 1 there). If we’ve finished syncing with A, it will cause an abort with B if they only have the same headers.
Add a bunch of logging
Interface changes:
- The StartInitialSynchronization function is gone; part of its arguments are moved to the constructor, the initial headers argument to it is moved to the caller having to invoke ProcessNextHeaders. This removes some duplication in functionality, and simplifies the implementation too, as now PeerManagerImpl::TryLowWorkHeadersSync can invoke PeerManagerImpl::IsContinuationOfLowWorkHeadersSync to actually deal with the headers, removing some duplication that was introduced by the tip-mixing above.
- The MakeNextHeadersRequest function is now part of the public interface, and is expected to be invoked by the caller when ProcessNextHeaders reports request_more (rather than being invoke in-line by ProcessNextHeaders directly). This avoids the need to lock cs_main when nothing is to be requested (to get the global best header tip).

A rebased, squashed, version of all the commits: https://github.com/sipa/bitcoin/commits/202208_headerssync

Sjors commented at 10:13 am on August 10, 2022: member

@sipa here’s a log when I tried your rebased branch on mainnet. Just as it reached 90% synced headers with one of the peers, it started with another peer, quickly reached the sufficient work and started redownloading. But then it aborted with “non-continuous headers at height=680041 (redownload phase)”. It does eventually complete the download, with this new peer.

There was no new block produced at the time where it started to sync with the new peer. Unfortunately I didn’t log enough other things to see what could have triggered it.

sipa commented at 3:42 pm on August 10, 2022: member

@Sjors Cool, that abort/restart of peer=3 at the same height was surprising to me, but I’ve reproduced it.

sipa commented at 7:19 pm on August 10, 2022: member

@Sjors Fixed it in both branches. The problem was that when asking for the initial headers from a peer (outside of the HeadersSyncState logic), we ask for one block earlier than our actual tip, so that we always get a meaningful result back, even if the peer has the same tip already. However, in my logic to mix in the tip into HeadersSyncState getheader requests, I didn’t make it make this one-step-back. The result was that if you transition from one peer to another when headers syncing, you’d initialize at one block backward, but then the first headers arriving after that would start one further, causing an abort and restart.

Sjors commented at 8:18 am on August 11, 2022: member

Aside from the bug, what triggers the transition to another peer? It definitely wasn’t a timeout.

sipa commented at 12:06 pm on August 11, 2022: member

@Sjors They may have just disconnected for whatever reason.

sdaftuar commented at 2:38 pm on August 11, 2022: member

4.2 suggests picking a maximum number of header chains to store, before pruning the shortest. It suggests storing 10 chains for a total of 480MB. @pinheadmz @sjors Absent other details, I think this approach would fail at guaranteeing that we eventually reach consensus with our peers. Imagine that some adversary has given us 10 bogus chains, and we connect to an honest peer for the first time. What do we do during headers sync with that peer, in the window of time that the headers we’re downloading are far behind (in work) from the work on the bogus chains we have stored already? If we prune as we go, then we will make no progress towards downloading the honest chain; but if we don’t, then we’re open to an attack from an adversary that would serve us a huge, low-work chain (eg due to timewarp).

sdaftuar force-pushed on Aug 11, 2022

laanwj added this to the milestone 24.0 on Aug 11, 2022

sdaftuar force-pushed on Aug 11, 2022

sdaftuar commented at 7:39 pm on August 11, 2022: member

Update 2022-08-06: why do we need bitdeque instead of just std::vector? Is it only because we call m_header_commitments.pop_front()? Couldn’t we also just keep the commitments, track our index with an integer and then delete the whole thing when we’re done?

Two reasons, I think – one is that we use more memory that way, the other is that the bookkeeping seemed more complicated to think about than if we pop entries as we go.

sipa commented at 8:09 pm on August 11, 2022: member

Two reasons, I think – one is that we use more memory that way, the other is that the bookkeeping seemed more complicated to think about than if we pop entries as we go.

Another is that with an anticipated (but not currently implemented) optimization, the m_chain_start point may move forward while commitments are being added, necessitating popping pieces off from the front of m_commitment_bits).

sdaftuar force-pushed on Aug 11, 2022

sdaftuar commented at 8:44 pm on August 11, 2022: member

Updated this PR to include the many improvements @sipa drafted in his branch (thank you!). Also included a fuzz test by @mzumsande, and addressed many of the review comments from @sjors.

instagibbs commented at 8:58 pm on August 11, 2022: member

Would be nice to give some log feedback that the anti-DoS header sync has at least started. Currently for testnet it’s ~3 minutes of nothing of use being printed.

related question: Is the sync state exposed anywhere via RPC? getchaintips showed nothing happening either.

sipa commented at 9:04 pm on August 11, 2022: member

@instagibbs Yeah, there was some chatter about that, but it’s genuinely hard to do: there isn’t any global state that is even aware of the progress made until the second phase is hit (as the first phase takes place entirely in per-peer structures).

My best idea (but I haven’t gotten past an idea):

Make net_processing keep track of the per-peer 1st phase progress, and e.g. aggregate it into a maximum.
Make net_processing report that to validation (because validation is reponsible currently for emitting callbacks about sync progress).
Change the callback interface for headers to not use CBlockIndex* anymore, but height/time/progress base.
Add ways to notify 1st/2nd phase along with this.
Make all listeners of these notifications aware of it.

Just logging is a lot easier of course, but it’s unclear how much should be unconditionally logged. I’m not sure a per-headerblob logging is even appropriate.

sipa commented at 9:05 pm on August 11, 2022: member

@instagibbs Yeah, there was some chatter about that, but it’s genuinely hard to do: there isn’t any global state that is even aware of the progress made until the second phase is it (as the first phase takes place entirely in per-peer structures).

My best idea (but I haven’t gotten past an idea):

Make net_processing keep track of the per-peer 1st phase progress, and e.g. aggregate it into a maximum.
Make net_processing report that to validation (because validation is reponsible currently for emitting callbacks about sync progress).
Change the callback interface for headers to not use CBlockIndex* anymore, but height/time/progress base.
Add ways to notify 1st/2nd phase along with this.
Make all listeners of these notifications aware of it.

Just logging is a lot easier of course, but it’s unclear how much should be unconditionally logged. I’m not sure a per-headerblob logging is even appropriate.

ajtowns commented at 3:29 am on August 12, 2022: contributor

related question: Is the sync state exposed anywhere via RPC? getchaintips showed nothing happening either.

Seems like it might be good to at least report the per-peer value of m_current_height in getpeerinfo? Perhaps reporting the nTime or tracking the median time of the best initial sync block would be nice, then you could report the best one via the gui to give people some idea of progress?

Also, I’m not sure it makes sense to have the initial header sync phase limited to a single peer?

After adding the above to getpeerinfo, and watching a header sync of testnet from scratch, it first did an initial sync up to about block 1M with peer 0, then peer 0 disconnected, and it immediately started an initial header sync with all peers, with peer=8 eventually winning the race and being the one to send all the actual headers.

It looks to me like a new block being found means I’ll receive an INV from all nodes, which will then trigger me sending a GETHEADERS to all nodes, which then gets me doing an initial sync with all nodes anyway.

I’m seeing some nodes get through initial sync at about twice the speed of others, so seems like it might be a worthwhile speedup even if your first node doesn’t decide to disconnect.

in src/headerssync.cpp:191 in 6862e74c70 outdated

186+    // work chain if they compress the work into as few blocks as possible,
187+    // so don't let anyone give a chain that would violate the difficulty
188+    // adjustment maximum.
189+    if (!PermittedDifficultyTransition(m_consensus_params, next_height,
190+                previous.nBits, current.nBits)) {
191+        LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: invalid difficulty transition at height=%i (commitment phase)\n", m_id, next_height);

ajtowns commented at 3:54 am on August 12, 2022:

Any reason for these to be their own category rather than NET?

sdaftuar commented at 6:22 pm on August 15, 2022:

I think just for code-review/testing’s sake – NET gets a lot of traffic.

sdaftuar force-pushed on Aug 12, 2022

sdaftuar commented at 2:16 pm on August 12, 2022: member

related question: Is the sync state exposed anywhere via RPC? getchaintips showed nothing happening either.

Seems like it might be good to at least report the per-peer value of m_current_height in getpeerinfo? Perhaps reporting the nTime or tracking the median time of the best initial sync block would be nice, then you could report the best one via the gui to give people some idea of progress?

Just added a commit that exposes m_current_height in getpeerinfo(), and also updated the branch to include tests for the handling of low-work compact blocks and unrequested blocks.

Also, I’m not sure it makes sense to have the initial header sync phase limited to a single peer?

Did you see Pieter’s comments on the bandwidth issue here? Also, I think it makes most sense to discuss this topic in #25720, as the bandwidth vs. robustness issues around initial headers sync are largely independent of this change (though this PR does increase the bandwidth we’d expect to use).

At this point I think I’ve responded to or addressed all the comments so far (thank you reviewers!), with the exception of the logging issue that has been raised. I will continue to think about that, but given how invasive it may be to address this properly, I wonder what people think of deferring that improvement to a followup PR?

sipa commented at 11:24 pm on August 13, 2022: member

I’ve pushed to https://github.com/sipa/bitcoin/commits/202208_headerssync_log a series of commits to:

There is a lock ordering bug in the current branch, which can lead to a deadlock when opening the peer view window in the GUI. There was a crossing of m_header_sync_mutex and cs_main; GetNodeStateStats is being called while holding cs_main sometimes, and it needs m_header_sync_mutex, which means that elsewhere we cannot grab cs_main while already holding m_header_sync_mutex. There is a commit that fixes this by pushing down the m_header_sync_mutex locking; an alternative was grabbing cs_main for all of the headers sync processing, but the more fine-grained lock approach is cleaner overall, I think.
Aggregate statistics about (the first phase of) low-work headers sync across all peers in net_processing
Report those statistics to validation
Let validation rate-limit those reports, and then use them for logging and emitting signals with progress updates
Let the GUI use those for reporting as well.

In the logs/UI and code related to it I’ve picked the term “headers pre-synchronization” to refer to the first phase of low-work headers sync (as that’s a bit of a mouthful). Bikeshedding welcome - but whatever we pick it may be useful to be consistent in naming in other places too.

sipa commented at 5:02 pm on August 14, 2022: member

So a few open questions regarding the logging/progress:

Is it necessary to address logging/progress before 24.0? It’s a somewhat bad UX if we don’t, as there is no feedback indicating anything is happening at all, for possibly several minutes. Addressing it does seem to need some amount of complexity, it seems.
If Qt GUI changes for reporting 1st phase headers sync are considered needed, there is the concern that it adds a few more translation strings. Not having any reporting at all is however worse than untranslated ones, I would guess?
Should the GUI changes be done through a separate PR to the GUI repository?
My branch above doesn’t add a GUI element in the peer view details for first-phase headers height yet; I’d consider that much lower priority than just having progress bar updates for it, and probably better done by someone familiar with Qt.

hebasto commented at 6:08 pm on August 14, 2022: member

My branch above doesn’t add a GUI element in the peer view details for first-phase headers height yet; I’d consider that much lower priority than just having progress bar updates for it, and probably better done by someone familiar with Qt.

Agree.

Should the GUI changes be done through a separate PR to the GUI repository?

I assume we have no such a strict policy. If reviewers are ok to review and test Qt code here, why not? But separating low-priority (in comparison to a new header sync logic) GUI changes into a dedicated PR in the GUI repo sounds better.

If Qt GUI changes for reporting 1st phase headers sync are considered needed, there is the concern that it adds a few more translation strings. Not having any reporting at all is however worse than untranslated ones, I would guess?

It depends on how many “a few”. I guess an additional week will be enough for translators to update their translations with a couple of new strings.

sipa commented at 6:52 pm on August 14, 2022: member

It depends on how many “a few”. I guess an additional week will be enough for translators to update their translations with a couple of new strings.

Two, both variations of “Pre-syncing headers” (assuming that’s the terminology we’ll settle on).

in src/net_processing.cpp:2515 in 6862e74c70 outdated

2409+            }
2410+        }
2411+
2412+        if (peer.m_headers_sync->GetState() == HeadersSyncState::State::FINAL) {
2413+            peer.m_headers_sync.reset(nullptr);
2414+        }

ajtowns commented at 10:47 am on August 15, 2022:

I think it might make sense to add an if (!result.success) return true; when you end up in FINAL state – otherwise (at least for me) it looks like ordinary header processing then immediately triggers a restart.

sdaftuar commented at 6:46 pm on August 15, 2022:

As mentioned here, the intent is indeed for ordinary headers processing – that is, a headers message which connects to a different point in the locator we send than continuing from the last header they sent us – to trigger a restart of this logic (hopefully at a later point in the sync, but potentially anywhere, e.g. if the peer underwent a reorg during sync).

sdaftuar commented at 1:49 pm on August 17, 2022:

I assume this is also moot, now that mixing in locators from m_best_header has been removed.

in src/net_processing.cpp:2663 in 6862e74c70 outdated

2659+    // (which is done separately inside m_headers_sync)
2660+    bool already_validated_work = false;
2661+
2662+    // If we're in the middle of headers sync, let it do its magic.
2663+    if (IsContinuationOfLowWorkHeadersSync(peer, pfrom, headers, previously_downloaded_headers)) {
2664+        return;

ajtowns commented at 10:50 am on August 15, 2022:

If the suggestion above is adopted, then header sync failure will result in exiting here without a request for more headers being sent out to this peer. In that case, I think you’d also want to add code to toggle fSyncStarted and decrement nSyncStarted to allow another peer to pick up where this one left of. Needs some care to avoid cs_main lock ordering problems of course.

in src/headerssync.cpp:105 in 6862e74c70 outdated

 97+        // receive, and add headers to our redownload buffer. When the buffer
 98+        // gets big enough (meaning that we've checked enough commitments),
 99+        // we'll return a batch of headers to the caller for processing.
100+        ret.success = true;
101+        for (const auto& hdr : received_headers) {
102+            if (!ValidateAndStoreRedownloadedHeader(hdr)) {

ajtowns commented at 11:08 am on August 15, 2022:

We expect to skip headers here due to including locators based on our tip, but ValidatedAndStoreRedownloadedHeader can’t cope with skipped headers. This doesn’t seem sensible to me.

sdaftuar commented at 6:50 pm on August 15, 2022:

Again see #25717 (comment); the intent is that by using a locator which mixes in entries from m_best_header, the peer can skip ahead in the sync (in a scenario where our best header is updating from other peers) to reduce overall bandwidth. In such an event, this code would return an error and we’d drop into starting a new sync with the peer.

sdaftuar commented at 1:48 pm on August 17, 2022:

This should be moot now that mixing in locator entries from m_best_header has been removed.

ajtowns commented at 11:14 am on August 15, 2022: contributor

With this commit (ac4cd62b3abfb4b28135e4f126fc672ca606b643) and #25720, I’m seeing failures after one peer completes its initial download, and the tip progresses past either where other nodes are up to in their initial download, or (less commonly) in their redownload.

In the former case, I think the only real bug is that they then immediately restart their initial download which makes for noisy logs and wastes bandwidth (since they’re getting headers more slowly than the peer that’s redownloading).

In the latter case, that wastes the entire initial download, which seems more wasteful, and seems like it might introduce some new (mild?) DoS behaviours.

I’ve had a go at patching around that behaviour at https://github.com/ajtowns/bitcoin/commits/202208-headers-dos

Not necessarily blocking, but it feels buggy/ugly to me as it stands.

sipa commented at 2:59 pm on August 15, 2022: member

@ajtowns I believe you’re seeing the effects of the mixing-in-m_best_header-based-skipping, but may be misunderstanding what is happening.

First of all, ValidatedAndStoreRedownloadedHeader indeed doesn’t deal with the peer deciding to continue from one of the mixed-in locator entries, but it doesn’t need to; we will, in the same processing of the received headers message, abort the headers sync, and start a new one, with the first header in that received message as starting point. Assuming it is an honest peer, doing so is always advantageous as it means it’s skipping some headers sync work with that peer (the locator entries are sorted in such a way that this is the case). Note that while it’s possible during INITIAL_DOWNLOAD phase that the actual height of synchronization goes backwards, this may still be advantageous as it means less work to redo during REDOWNLOAD (because while the tip moves backwards, the starting point moves forward).

The intention is that once we start synchronizing headers with a peer, we don’t ever stop doing so; it up to them to stop giving us headers (by reaching the sync end). We mix in headers from the m_best_header tip to give the peer a chance to skip ahead if there is some part both sides already have, but it is up to them. If there is one peer faster than all others, and in the process of moving ours m_best_header tip forward with every received headers blob, then this means all other peers will continuously be skipping ahead to keep up, but always stay behind this faster peer, until the faster peer reaches the end of the synchronization, at which point all other peers will skip ahead to the end as well, and stop. So in short: while headers sync is ongoing, we will effectively be continuously asking for headers from all headers-syncing peers all the time, until it’s done. This is already the case in the current codebase actually, though this PR makes it worse by (a) taking longer overall before synchronization completes even with a single peer and (b) only making progress in synchronization visible to other, slower peers, during the second phase. Some of this downside is mitigated by this mixing in, causing headers sync with other (honest) peers to stop as soon as sync with one peer completes, rather than needing to wait for them to get to the end on their own.

It’s not impossible to detect certain conditions, like a peer giving us things we’re clearly not interested in anymore, to just give up on the synchronization with them, but there are some scary edge cases to think about then. For example, imagine a peer who has a deep reorg of the honest chain (however unlikely that may be, we should be able to deal with it), which forks off more than 2000 headers past the highest previous locator entry in the m_best_header based locators. A newly started header sync at that point will just result in 2000 headers we already have, and aborting the sync entirely and starting over later from m_best_header again will not make any progress towards ever actually getting the peer’s chain. For that reason, it seems safer to just keep fetching, and letting the “skipping” be done by the remote party - they know best what they’re trying to give us.

There are two other improvements possible to consider that can reduce bandwidth waste:

Being more careful about which peers we start synchronizing with, in a controlled way that still results in us eventually trying all peers, so we’ll certainly eventually get whatever headers any and all of them have to give us. #25720 is a step in that direction.
Moving the start point forward: if we receive headers from a peer that we already have, in some cases it is possible to just move the HeadersSyncState m_chain_start point forward (and correspondingly pop off commitments from the buffers in it), so that we restart redownload further ahead when we get there. I have a half-finished commit that does this, but it needs thinking about correctness (again related to this deep reorg scenario above), and adds a bunch of complexity we may not want in this PR.

sdaftuar force-pushed on Aug 15, 2022

sdaftuar commented at 4:17 pm on August 15, 2022: member

Thanks @sipa, I have pushed your latest branch with the logging code here (and I squashed the lock-inversion fix into the commit where I had introduced the bug).

in src/headerssync.h:238 in 57d8791463 outdated

225+
226+    /** Store the latest header received while in INITIAL_DOWNLOAD */
227+    CBlockHeader m_last_header_received;
228+
229+    /** Height of m_last_header_received */
230+    int64_t m_current_height{0}; // height of last header received

mzumsande commented at 5:12 pm on August 15, 2022:

nit: duplicate comment (see line above)

sdaftuar commented at 3:51 pm on August 16, 2022:

Fixed.

in src/net_processing.cpp:2758 in 57d8791463 outdated

2641+
2642+    // Before we do any processing, make sure these pass basic sanity checks.
2643+    // We'll rely on headers having valid proof-of-work further down, as an
2644+    // anti-DoS criteria.
2645+    if (!CheckHeadersPoW(headers, m_chainparams.GetConsensus(), peer)) {
2646         return;

mzumsande commented at 5:34 pm on August 15, 2022:

Should we call Misbehaving() if the PoW is invalid?

sdaftuar commented at 3:51 pm on August 16, 2022:

I think so – added.

sdaftuar force-pushed on Aug 15, 2022

ajtowns commented at 8:52 pm on August 15, 2022: contributor

First of all, ValidatedAndStoreRedownloadedHeader indeed doesn’t deal with the peer deciding to continue from one of the mixed-in locator entries, but it doesn’t need to; we will, in the same processing of the received headers message, abort the headers sync, and start a new one, with the first header in that received message as starting point. Assuming it is an honest peer, doing so is always advantageous […]

I don’t think that’s correct. Restarting headers sync pushes you back into initial download, so the behaviour on testnet for a peer that’s finished initial sync but skips some blocks during redownload looks like:

download 2M headers for initial sync
download 500k headers for redownload phase
find that the tip is up to height 550k
download headers from 550k to 552k that don’t connect at 500k, and restart
find that in the meantime the tip is now up to height 560k, and your faster peer has died
download 1.44M headers (for blocks at height 560k to 2M) for initial sync
redownload the 1.44M headers to actually update the tip

That is, you’ve saved downloading 50k headers, at a cost of downloading 1.44M headers in the redundant initial sync. That’s not advantageous…

Being able to drop another peer from redownload phase back to initial sync seems like it has scary dos opportunities to me, and just searching through the history of the tip that we sent to them seems a very easy way of mitigating the problem in the honest-peer case.

It’s definitely less important if you’re only in the initial sync phase, while some other peer is further along in the redownload phase; you are only getting pushed forward there.

But at that point you’ve already been “lapped”, and the other peer is demonstrably currently supplying valid headers faster than you. In the normal case there, you’re not going to win; and even if you’re assuming the other node is an attacker, their attack is just “delivering valid headers quickly”… Continuing to try initial headers normally in those circumstances just seems like a waste of bandwidth – so waiting until the good nodes disconnect or a new block comes in seems like it would be better logic for the non-normal cases.

I guess at the very least, if that path is normal and expected, then constantly logging it as “aborted non-connecting // headers sync started” doesn’t seem helpful.

It’s not impossible to detect certain conditions, like a peer giving us things we’re clearly not interested in anymore, to just give up on the synchronization with them, but there are some scary edge cases to think about then. For example, imagine a peer who has a deep reorg of the honest chain (however unlikely that may be, we should be able to deal with it), which forks off more than 2000 headers past the highest previous locator entry in the m_best_header based locators. A newly started header sync at that point will just result in 2000 headers we already have,

But we don’t consider repeated headers a problem, just ones that don’t connect? And if we’ve already accepted them, they’ll connect fine?

sipa commented at 8:58 pm on August 15, 2022: member

Restarting headers sync pushes you back into initial download, so the behaviour on testnet for a peer that’s finished initial sync but skips some blocks during redownload looks like:

That’s not what the code is supposed to do. The double-height computation should in the scenario you describe treat continuing the current sync as better than restarting, order the locator entries to account for that, and assuming the peer picks the first locator entry of the chain they want to send, just continue syncing.

Specifically, once 2M headers have downloaded in INITIAL_DOWNLOAD, and 500k headers have downloaded in REDOWNLOAD, the double-height score for continuing should be 2500000. The double-height score for restarting from 550k should be 1100000. Continuing has a higher score, so should be placed first in the locator.

Is that not what you’re seeing? The code may be buggy of course, or is it possible you were using an older version (this branch briefly had mixing-in logic without the double-height weighing last week, which would behave as you describe here).

in src/headerssync.cpp:180 in 1178015251 outdated

175+}
176+
177+bool HeadersSyncState::ValidateAndProcessSingleHeader(const CBlockHeader& previous, const CBlockHeader& current)
178+{
179+    Assume(m_download_state != State::FINAL);
180+    if (m_download_state == State::FINAL) return false;

achow101 commented at 9:10 pm on August 15, 2022:

In 1178015251732b2db87c0b84dd7ebe3f435a6771 “Utilize anti-DoS headers download strategy”

ValidateAndStoreHeadersCommitments, ValidateAndProcessSingleHeader, ValidateAndStoreRedownloadedHeader, and PopHeadersReadyForAcceptance all have this same check. However they really only have a single valid state they should be operating in, so I think it would be better to check whether the current state is what is expected (INITIAL_DOWNLOAD for the first two, and REDOWNLOAD for the second) rather than that it is not FINAL.

sdaftuar commented at 3:41 pm on August 16, 2022:

Done.

in src/headerssync.cpp:195 in 1178015251 outdated

190+                previous.nBits, current.nBits)) {
191+        LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: invalid difficulty transition at height=%i (commitment phase)\n", m_id, next_height);
192+        return false;
193+    }
194+
195+    if (!CheckProofOfWork(current.GetHash(), current.nBits, m_consensus_params)) {

mzumsande commented at 9:29 pm on August 15, 2022:

We already check the proof of work as part of the basic sanity checks in net_processing (ProcessHeadersMessage), so I don’t think the check could ever fail at this later point.

sdaftuar commented at 3:52 pm on August 16, 2022:

I was nervous about omitting such an important check in the module that is designed to handle this, but you’re right of course. I added a comment indicating the caller is required to do this, and removed this call.

ajtowns commented at 9:31 pm on August 15, 2022: contributor

Is that not what you’re seeing?

Here’s some logs (for commit fc2e19e61261cedade18d1a5b9de2075523fa6aa merged onto 22d96d76ab02fc73e7fe0d810bacee4c982df085). I reset chainwork so that 1,156,000 testnet headers is enough. Peers 0 and 1 managed to complete initial sync. Peer 0’s log entries look like:

02022-08-15T21:14:38Z [headerssync] Initial headers sync started with peer=0: height=0, max_commitments=3738869, min_work=00000000000000000000000000000000000000000000002830dab7f76dbb7d63
12022-08-15T21:17:30Z [headerssync] Initial headers sync transition with peer=0: reached sufficient work at height=1156000, redownloading from height=0
22022-08-15T21:19:57Z [headerssync] Initial headers sync complete with peer=0: releasing all at height=1156000 (redownload phase)

Peer 1’s log entries look like this:

02022-08-15T21:15:12Z [headerssync] Initial headers sync started with peer=1: height=0, max_commitments=3738870, min_work=00000000000000000000000000000000000000000000002830dab7f76dbb7d63
12022-08-15T21:18:26Z [headerssync] Initial headers sync transition with peer=1: reached sufficient work at height=1156000, redownloading from height=0
22022-08-15T21:18:57Z [headerssync] Initial headers sync aborted with peer=1: non-continuous headers at height=176001 (redownload phase)
32022-08-15T21:18:57Z [headerssync] Initial headers sync started with peer=1: height=666040, max_commitments=2124319, min_work=00000000000000000000000000000000000000000000002830dab7f76dbb7d63
42022-08-15T21:18:57Z [headerssync] Initial headers sync aborted with peer=1: non-continuous headers at height=668040 (commitment phase)

along with another ~70 started/aborted pairs, as peer 1’s initial sync keeps getting reset to match peer 0’s successful redownload.

Peers 3 and 5 contribute another 379 aborted/started pairs; they never made it out of commitment phase though, so aren’t particularly interesting.

~~I don’t think the doubling counts matter here at all~~ [EDIT: I think the double height logic is delaying the restart as designed; but there’s still no reason to restart; I’ve got the data the peer skipped, and can easily validate it against the peer’s commitment, so there’s no need to restart] – peer 1 is just failing due to the if (header.hashPrevBlock != m_redownload_buffer_last_hash) check in ValidateAndStoreRedownloadedHeader.

in src/headerssync.cpp:158 in 1178015251 outdated

153+        // logic.
154+        LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: non-continuous headers at height=%i (commitment phase)\n", m_id, m_current_height);
155+        return false;
156+    }
157+
158+    // If it does connect, (minimally) validate and store a commitment to each one.

mzumsande commented at 9:32 pm on August 15, 2022:

nit: it only stores a commitment to some headers (depending on HEADER_COMMITMENT_PERIOD and m_commit_offset), not to each one.

sdaftuar commented at 3:56 pm on August 16, 2022:

Fixed.

sipa commented at 9:41 pm on August 15, 2022: member

02022-08-15T21:18:57Z [headerssync] Initial headers sync aborted with peer=1: non-continuous headers at height=176001 (redownload phase)
12022-08-15T21:18:57Z [headerssync] Initial headers sync started with peer=1: height=666040, max_commitments=2124319, min_work=00000000000000000000000000000000000000000000002830dab7f76dbb7d63

So, this is good. Peer 1 used to be in a state where it would still need to redownload another 1156000-176001=979999 blocks. By restarting at height 666040, it only has to download (1156000-666040) headers (twice, once in commitment, once in redownload phase), or 979920 headers, anymore. It’s only a tiny improvement, but it’s a strict reduction in the amount of downloading left.

peer 1 is just failing due to the if (header.hashPrevBlock != m_redownload_buffer_last_hash) check in ValidateAndStoreRedownloadedHeader.

I think you’re confusing cause and effect. That check is failing because the peer sent us non-continuous headers. And the peer is doing that because we want it to: restarting is less work than continuing.

sdaftuar force-pushed on Aug 16, 2022

sdaftuar commented at 4:10 pm on August 16, 2022: member

After seeing the discussion between @ajtowns and @sipa regarding bandwidth optimization involving what locators we send and how we process “skipping ahead” in our headers download (in the situation where one of our peers is serving us headers at a faster rate than another), I think it makes sense to drop any such optimization from this PR. Here’s my thinking:

There are a number of ways (such as was done in #25720) to reduce our bandwidth usage during initial headers sync. While this PR increases bandwidth used by a node starting up for the first time, the overall bandwidth usage is still far below what a new node will end up using during block download, particularly now that #25720 has been merged (mitigating bad behavior that existed prior to this PR). Further improvements to bandwidth usage are certainly possible, but the set of strategies we might consider for doing so goes beyond the code touched here.

That is, while we might consider using tricks with this logic to reduce bandwidth usage – whether by mixing in hashes from m_best_header into the locators we send, and/or using headers that are either received from a peer or accepted into our block index to move a given peer’s sync forward – there are also strategies that may be even more effective that don’t involve code changed by this PR. For example, a refinement to #25720 that further reduces the likelihood of performing initial sync with more than one peer simultaneously (such as by introducing a minimum amount of time that we allow our initial headers-sync peer to complete sync, before we add any other peers, even via the block INV mechanism) would materially affect the impact of optimizations designed to improve the behavior when we have multiple sync peers.

Alternatively, deploying a p2p protocol change to compress headers message (as has been discussed in the past, such as by omitting prevHash to save 32 bytes, and compressing other fields such as nVersion and nTime) could lead to bandwidth savings of close to 50%, which would eliminate most of the overhead introduced by this PR, at least for upgraded software that supports such a proposed new protocol.

So given that we have alternatives like these to consider when thinking about bandwidth usage, and particularly given how complex this PR is already, I think deferring bandwidth minimization strategies to future work makes the most sense. For now I’ve removed the optimization that had been in this PR to mix-in locators from m_best_header, which I hope makes the logic easier to reason about.

I’ve also addressed the latest review feedback from @achow101 and @mzumsande, and incorporated the fixups from @sipa to use the PRESYNC terminology in place of INITIAL_DOWNLOAD.

in src/net_processing.cpp:2447 in c600a94390 outdated

2442@@ -2421,6 +2443,47 @@ bool PeerManagerImpl::IsContinuationOfLowWorkHeadersSync(Peer& peer, CNode& pfro
2443 
2444         if (peer.m_headers_sync->GetState() == HeadersSyncState::State::FINAL) {
2445             peer.m_headers_sync.reset(nullptr);
2446+
2447+            // Update statistics: delete this peer's entry in allstats, and if this was the

mzumsande commented at 6:27 pm on August 16, 2022:

Comment seems out of date: allstats should be m_headers_presync_stats, and recomputing is not necessary here, because it’s done in the else branch.

sdaftuar commented at 10:48 pm on August 16, 2022:

Updated comment.

in src/net_processing.cpp:854 in 85e4b9e8ed outdated

850@@ -851,7 +851,7 @@ class PeerManagerImpl final : public PeerManager
851         EXCLUSIVE_LOCKS_REQUIRED(!m_most_recent_block_mutex, peer.m_getdata_requests_mutex) LOCKS_EXCLUDED(::cs_main);
852 
853     /** Process a new block. Perform any post-processing housekeeping */
854-    void ProcessBlock(CNode& node, const std::shared_ptr<const CBlock>& block, bool force_processing);
855+    void ProcessBlock(CNode& node, const std::shared_ptr<const CBlock>& block, bool force_processing, const bool min_pow_checked);

sipa commented at 6:52 pm on August 16, 2022:

In commit “Require callers of AcceptBlockHeader() to perform anti-dos checks”

Nit: having const for a by-value function parameter in a declaration is meaningless (it’s not part of the function signature, so even if you have const on the implementation code, it’s not required to be present on the declaration).

sdaftuar commented at 10:48 pm on August 16, 2022:

Removed.

in src/headerssync.cpp:151 in e778d4f995 outdated

146+        // Somehow our peer gave us a header that doesn't connect.
147+        // This might be benign -- perhaps we issued an extra getheaders
148+        // message, such as after a block INV was received.
149+        // Or it could be that our peer is broken or malicious. If broken,
150+        // sending a new getheaders immediately could trigger an infinite
151+        // loop. Just give up for now; if our peer ever gives us an block

sipa commented at 6:54 pm on August 16, 2022:

In commit “Utilize anti-DoS headers download strategy”

Nit: I think this comment is outdated; we don’t actually give up in this scenario, but return false to the net_processing calling code, which then continues different interpretations with it (possible starting a new sync with it, even).

sdaftuar commented at 10:48 pm on August 16, 2022:

Fixed

in src/net_processing.cpp:2742 in e778d4f995 outdated

2625+        // If we were in the middle of headers sync, receiving an empty headers
2626+        // message suggests that the peer suddenly has nothing to give us
2627+        // (perhaps it reorged to our chain). Clear download state for this peer.
2628+        LOCK(peer.m_headers_sync_mutex);
2629+        if (peer.m_headers_sync) {
2630+            peer.m_headers_sync.reset(nullptr);

mzumsande commented at 7:05 pm on August 16, 2022:

I think we should remove a possible m_headers_presync_stats entry here too.

sdaftuar commented at 10:48 pm on August 16, 2022:

Fixed.

in src/validation.h:868 in 85e4b9e8ed outdated

864     bool AcceptBlockHeader(
865         const CBlockHeader& block,
866         BlockValidationState& state,
867-        CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
868+        CBlockIndex** ppindex,
869+        const bool min_pow_checked) EXCLUSIVE_LOCKS_REQUIRED(cs_main);

sipa commented at 7:25 pm on August 16, 2022:

In commit “Require callers of AcceptBlockHeader() to perform anti-dos checks”

Here too, the const is meaningless.

sdaftuar commented at 10:48 pm on August 16, 2022:

Removed.

in src/test/headers_sync_chainwork_tests.cpp:93 in f05a5641c2 outdated

88+    const CBlockIndex* chain_start = WITH_LOCK(::cs_main, return m_node.chainman->m_blockman.LookupBlockIndex(Params().GenesisBlock().GetHash()));
89+    std::vector<CBlockHeader> headers_batch;
90+
91+    // Feed the first chain to HeadersSyncState, by delivering 1 header
92+    // initially and then the rest.
93+    headers_batch.insert(headers_batch.end(), std::next(first_chain.begin(), 1), first_chain.end());

sipa commented at 7:41 pm on August 16, 2022:

In commit “Add unit test for HeadersSyncState”:

Nit: this can be std::next(first_chain.begin()) or even just first_chain.begin() + 1.

sdaftuar commented at 10:48 pm on August 16, 2022:

Fixed.

sipa commented at 7:45 pm on August 16, 2022: member

ACK, for the code not written by me here.

I’ve reviewed the code, did several experiments on testnet and mainnet, and observed behavior with -debug=headerssync.

A few nits below, which are non-blocking.

sdaftuar force-pushed on Aug 16, 2022

ajtowns commented at 4:49 am on August 17, 2022: contributor

After seeing the discussion between @ajtowns and @sipa regarding bandwidth optimization involving what locators we send and how we process “skipping ahead” in our headers download (in the situation where one of our peers is serving us headers at a faster rate than another), I think it makes sense to drop any such optimization from this PR. Here’s my thinking:

I think that results in the following observable differences:

more bandwidth usage
if the fastest peer you’re syncing with disconnects while in redownload, it will take some time for the next fastest peer to catch up with your best header and you won’t be making any progress until then
previously, once your best header reached min chainwork, you’d start syncing blocks with all nodes; now you’ll have to wait until you actually catch up with the tip, or until each peer meets min chainwork independently

Those don’t seem too bad to me. :+1:

Alternatively, deploying a p2p protocol change to compress headers message (as has been discussed in the past, such as by omitting prevHash to save 32 bytes, and compressing other fields such as nVersion and nTime)

FWIW, since BIP320 has been adopted by miners (effectively providing 16 extra nonce bits), I think it’s hard to compress nVersion below 2 bytes, and nTime seems to need at least 12 bits; not sure the difference between 3.5 bytes and 8 bytes in the normal case is worth the complexity of dealing with the special cases? Batching headers with a common nBits together would save ~4B per header though, since it only changes every 2016 blocks. (If you did a consensus change to have difficulty change every block, you could still have nBits as a constant value over x,000 blocks describing the minimum allowed difficulty in that period)

could lead to bandwidth savings of close to 50%, which would eliminate most of the overhead introduced by this PR, at least for upgraded software that supports such a proposed new protocol.

Upping the number of headers per message from 2000 to 4000 would halve the number of round-trips necessary (or more? if you increased the message size, 22k headers would fit in a 1MB message at 44B/header, and you’d only need ~35 round trips to sync mainnet instead of ~350…), which I think would cut header sync times down pretty dramatically.

Sjors commented at 2:04 pm on August 17, 2022: member

The pre-sync phase looks like this:

I don’t think it’s critical to translate. As long as something is making progress, I suspect most new users will be happy :-)

in src/headerssync.cpp:314 in 07b068e156 outdated

306+    if (m_download_state == State::REDOWNLOAD) {
307+        // During redownload, we will download from the last received header that we stored.
308+        locator.push_back(m_redownload_buffer_last_hash);
309+    }
310+
311+    locator.insert(locator.end(), chain_start_locator.begin(), chain_start_locator.end());

Sjors commented at 3:38 pm on August 17, 2022:

07b068e1568efdb6473f1d1eb4b870eb339c748b: do we need the chain_start_locator at all? We know the peer has the hash in locator, because they sent it to us.

sdaftuar commented at 3:41 pm on August 17, 2022:

We include it to better handle the case where our peer may reorg during the sync. (This way, they won’t go all the way back to genesis just because we omitted locator entries.)

Sjors commented at 4:10 pm on August 17, 2022: member

The LocatorEntries helper function in 5ea6f9a34bf3e8751fbca8f6107e99a9f553e31f was introduced in this push: #25717#event-7174395344. It looks correct to me, but what was the motivation for adding it? I guess we were using a CChain object in an earlier version somewhere?

sipa commented at 5:27 pm on August 17, 2022: member

@Sjors An earlier version of the code had net_processing construct a CBlockLocator normally, hand that to HeadersSyncState at creation time which would remember it, and then the locator returned was constructed by concatenating the presync/redownload tip hash with the entries in the locator that was passed down from net_processing at construction.

I later introduced the commit you’re referring to (LocatorEntries etc) for the purpose of being able to do tip header mixing, as it needed a way to compute the locator entries, plus it needed more information about those entries (their height).

Tip header mixing was reverted, and some of the complexity in that commit with it. Arguably we don’t strictly need the LocatorEntries() anymore, it could just use GetLocator and grab the innards of the returned locator rather than using LocatorEntries. It doesn’t hurt though, it’s a bit cleaner I think, and a simpler change later if we want to introduce the header tip mixing again.

Sjors commented at 7:43 pm on August 17, 2022: member

it’s a bit cleaner I think

Agree in the context it’s used here.

It doesn’t hurt though

Locators are used in various places. I’m not sure how to confirm that the performance decrease o(1) to o(log(n)) is negligible in all the places it’s used.

sipa commented at 9:54 pm on August 17, 2022: member

I’m not sure how to confirm that the performance decrease o(1) to o(log(n)) is negligible in all the places it’s used.

As far as I know, they’re only constructed (a) in order to be sent to the network for requesting headers, or (b) when storing in the wallet or indexes when flushing what the current sync position is. Chasing a few dozen pointers is a sub-microsecond operation. In both scenarios, that’s dwarfed by just network processing times or the disk I/O that also needs to happen.

in src/headerssync.h:96 in 07b068e156 outdated

91+ * little memory even for a very long chain.
92+ *
93+ * - In phase 2 (redownload), keep a lookahead buffer and only accept headers
94+ * from that buffer into the block index (permanent memory usage) once they
95+ * have some target number of verified commitments on top of them. With this
96+ * parametrization, we can achieve a given security target for potential

glozow commented at 9:04 am on August 22, 2022:

nit in 956109788e

0* parameterization, we can achieve a given security target for potential

sdaftuar commented at 6:43 pm on August 23, 2022:

According to https://www.merriam-webster.com/dictionary/parametrization and https://en.wikipedia.org/wiki/Parametrization, both spellings are acceptable!

in src/headerssync.h:203 in 07b068e156 outdated

189+    bool ValidateAndStoreRedownloadedHeader(const CBlockHeader& header);
190+
191+    /** Return a set of headers that satisfy our proof-of-work threshold */
192+    std::vector<CBlockHeader> PopHeadersReadyForAcceptance();
193+
194+private:

glozow commented at 10:03 am on August 22, 2022:

nit in 956109788e: Not needed

sdaftuar commented at 8:09 pm on August 23, 2022:

Yeah, I just like the separation between member functions and data variables, but if this is bothersome I can remove it.

in src/test/headers_sync_chainwork_tests.cpp:143 in f4d912a8e4 outdated

135+    result = hss->ProcessNextHeaders(headers_batch, false);
136+    BOOST_CHECK(hss->GetState() == HeadersSyncState::State::FINAL);
137+    BOOST_CHECK(result.pow_validated_headers.empty());
138+    // Nevertheless, no validation errors should have been detected with the
139+    // chain:
140+    BOOST_CHECK(result.success);

glozow commented at 2:42 pm on August 22, 2022:

in 2263c2e646, maybe add check for request_more?

0    BOOST_CHECK(result.success);
1    BOOST_CHECK(!result.request_more);

sdaftuar commented at 7:16 pm on August 23, 2022:

Done, thanks.

in src/test/headers_sync_chainwork_tests.cpp:117 in f4d912a8e4 outdated

111+    hss.reset(new HeadersSyncState(0, Params().GetConsensus(), chain_start, chain_work));
112+    (void)hss->ProcessNextHeaders(first_chain, true);
113+    BOOST_CHECK(hss->GetState() == HeadersSyncState::State::REDOWNLOAD);
114+
115+    result = hss->ProcessNextHeaders(first_chain, true);
116+    BOOST_CHECK(result.success);

glozow commented at 2:44 pm on August 22, 2022:

in 2263c2e646, same here, succeeded but request_more should be false since sync completed?

0    BOOST_CHECK(result.success);
1    BOOST_CHECK(!result.request_more);

sdaftuar commented at 7:19 pm on August 23, 2022:

Done.

in src/test/headers_sync_chainwork_tests.cpp:102 in f4d912a8e4 outdated

 97+    // Pretend the first header is still "full", so we don't abort.
 98+    auto result = hss->ProcessNextHeaders(headers_batch, true);
 99+
100+    // This chain should look valid, and we should have met the proof-of-work
101+    // requirement.
102+    BOOST_CHECK(result.success);

glozow commented at 2:45 pm on August 22, 2022:

in 2263c2e646, similarly, check that it said to request more?

0    BOOST_CHECK(result.success);
1    BOOST_CHECK(result.request_more);

sdaftuar commented at 7:19 pm on August 23, 2022:

Done.

in src/headerssync.cpp:41 in 07b068e156 outdated

36+    // today. This serves as a memory bound on how many commitments we might
37+    // store from this peer, and we can safely give up syncing if the peer
38+    // exceeds this bound, because it's not possible for a consensus-valid
39+    // chain to be longer than this (at the current time -- in the future we
40+    // could try again, if necessary, to sync a longer chain).
41+    m_max_commitments = 6*(GetAdjustedTime() - chain_start->GetMedianTimePast() + MAX_FUTURE_BLOCK_TIME) / HEADER_COMMITMENT_PERIOD;

glozow commented at 3:01 pm on August 22, 2022:

In 956109788e, is there a reason to put these assignments in the function body instead of member initializer list? Could make m_minimum_required_work and m_max_commitments const members?

sdaftuar commented at 7:28 pm on August 23, 2022:

Done, except I left m_max_commitments, just because the calculation and comments are so long that it seems unwieldy to put in the initializer list.

in src/chain.h:487 in 5ea6f9a34b outdated

476@@ -477,4 +477,10 @@ class CChain
477     CBlockIndex* FindEarliestAtLeast(int64_t nTime, int height) const;
478 };
479 
480+/** Get a locator for a block index entry. */
481+CBlockLocator GetLocator(const CBlockIndex* index);

mzumsande commented at 3:14 pm on August 22, 2022:

Having both GetLocator() and a CChain::GetLocator() with identical names, but slightly different behavior (if a nullptr is passed, the latter returns a locator to the tip, the former returns an empty locator) seems like a possible source for errors. Maybe rename CChain::GetLocator() to CChain::GetLocatorToIndexOrTip() (could be done as a follow-up)?

sdaftuar commented at 8:06 pm on August 23, 2022:

Included a fixup by @sipa to eliminate the CChain::GetLocator(const CBlockIndex *), in favor of CChain::GetLocator() which just operates on the tip.

ariard commented at 2:11 am on August 31, 2022:

I think it would still be valuable to improve long-term codebase readabilty to prefix the class method GetLocator() to a naming like GetLocatorAtTip() to dissociate well from the new method in the same file (note, we have yet another GetLocator() in src/index/base.cpp). There could be a concern of a validation/chain method misusage inducing some safety bug, especially with the ongoing libbitcoinkernel and assumeutxo works.

Beyond, it could be valuable to document CBlockLocator in src/primitives/block.h, while this data struct is used by GETHEADERS and GETBLOCKS, iirc its usage, the way we construct with the steps until genesis or sanity bounds we enforce on our side like MAX_LOCATOR_SIZE are loosely documented across the codebase. That would make it hard to reuse it for future p2p extensions or for experimentation with other chain-syncing strategies.

All suggestions.

in src/net_processing.cpp:4259 in cdc1dd66a3 outdated

4119@@ -4111,7 +4120,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
4120             // we have a chain with at least nMinimumChainWork), and we ignore
4121             // compact blocks with less work than our tip, it is safe to treat
4122             // reconstructed compact blocks as having been requested.
4123-            ProcessBlock(pfrom, pblock, /*force_processing=*/true);
4124+            ProcessBlock(pfrom, pblock, /*force_processing=*/true, true);

mzumsande commented at 3:54 pm on August 22, 2022:

nit (here and elsewhere): would be nice to have more named arguments, at least in all the places where they are already used for other args.

sdaftuar commented at 6:40 pm on August 23, 2022:

Done.

in src/validation.h:1010 in cdc1dd66a3 outdated

990@@ -987,7 +991,7 @@ class ChainstateManager
991      * @param[out]  new_block A boolean which is set to indicate if the block was first received via this call
992      * @returns     If the block was processed, independently of block validity
993      */
994-    bool ProcessNewBlock(const std::shared_ptr<const CBlock>& block, bool force_processing, bool* new_block) LOCKS_EXCLUDED(cs_main);
995+    bool ProcessNewBlock(const std::shared_ptr<const CBlock>& block, bool force_processing, bool* new_block, const bool min_pow_checked) LOCKS_EXCLUDED(cs_main);

mzumsande commented at 4:11 pm on August 22, 2022:

doc above needs an update, plus the ordering is no longer (in, in-out, out) per developer notes.

sdaftuar commented at 6:40 pm on August 23, 2022:

Fixed.

in src/validation.h:1022 in cdc1dd66a3 outdated

1002@@ -999,7 +1003,7 @@ class ChainstateManager
1003      * @param[out] state This may be set to an Error state if any error occurred processing them
1004      * @param[out] ppindex If set, the pointer will be set to point to the last new block index object for the given headers
1005      */
1006-    bool ProcessNewBlockHeaders(const std::vector<CBlockHeader>& block, BlockValidationState& state, const CBlockIndex** ppindex = nullptr) LOCKS_EXCLUDED(cs_main);
1007+    bool ProcessNewBlockHeaders(const std::vector<CBlockHeader>& block, BlockValidationState& state, const bool min_pow_checked, const CBlockIndex** ppindex = nullptr) LOCKS_EXCLUDED(cs_main);

mzumsande commented at 4:12 pm on August 22, 2022:

doc above needs an update, plus the ordering is no longer (in, in-out, out) per developer notes.

sdaftuar commented at 6:41 pm on August 23, 2022:

Fixed.

sdaftuar commented at 0:30 am on August 23, 2022: member

Pushed two commits to address the following:

If a peer has a chain whose work is less than minchainwork, but is a subset of the main chain (m_best_header or chainActive), then we wouldn’t ever process the peer’s headers and update pindexBestKnownBlock, preventing us from being able to use the peer as a download peer for the portion of the chain they have. Fix this by checking to see if the last header in a headers message is (a) known to us and (b) an ancestor of our best header or tip, and if so, bypass the DoS checks (which doesn’t cost us anything because processing these headers wouldn’t use more memory anyway).
This change means that for a node which has synced the honest chain, it’s no longer possible for an attacker to try to achieve a collision during REDOWNLOAD by feeding us headers we already have during PRESYNC. This also means that we would not initiate a low-work headers sync until receiving a headers message that contains something new, which I think is conceptually simpler than what was happening previously.
I noticed a bug in ProcessHeadersMessage() where, in a couple of places, we would be operating on the headers received off the network instead of the headers returned by the headers sync module. The second commit tries to clean this logic up in a less error-prone way by eliminating the headers_to_process pointer, in favor of just changing the headers vector directly when previously_downloaded_headers are returned.

Finally, I’m still contemplating whether any logic change is warranted if we receive a low-work header via compact block (right now we would ignore them, I think this may be fine but since this is a behavior change it’s worth considering whether this could be problematic at all).

(Will squash these commits into place once reviewers have had a chance to take a look)

fanquake commented at 7:54 am on August 23, 2022: member

@sdaftuar with your latest push:

0headerssync.cpp:41:46: error: invalid operands to binary expression ('NodeClock::time_point' (aka 'time_point<NodeClock>') and 'int64_t' (aka 'long long'))
1    m_max_commitments = 6*(GetAdjustedTime() - chain_start->GetMedianTimePast() + MAX_FUTURE_BLOCK_TIME) / HEADER_COMMITMENT_PERIOD;

in src/qt/clientmodel.cpp:218 in f4d912a8e4 outdated

214@@ -215,13 +215,17 @@ QString ClientModel::blocksDir() const
215     return GUIUtil::PathToQString(gArgs.GetBlocksDirPath());
216 }
217 
218-void ClientModel::TipChanged(SynchronizationState sync_state, interfaces::BlockTip tip, double verification_progress, bool header)
219+void ClientModel::TipChanged(SynchronizationState sync_state, interfaces::BlockTip tip, double verification_progress, int header)

glozow commented at 12:45 pm on August 23, 2022:

in 52eb9f33a1, header seems better suited as an enum, but I don’t feel very strongly

sipa commented at 4:57 pm on August 23, 2022:

@glozow The Qt signals framework can’t enqueue enum values, unfortunately. The alternative is creating some wrapper class etc, but that seemed overkill. I agree this is suboptimal, but I think cleaner alternatives aren’t worth the effort here.

glozow commented at 4:58 pm on August 23, 2022:

Aha, that’s my poor understanding of qt showing :sweat_smile: thanks @sipa

hebasto commented at 11:33 am on August 24, 2022:

@glozow @sipa

The Qt signals framework can’t enqueue enum values, unfortunately.

They could. See #18152 as an example.

sipa commented at 2:27 pm on August 24, 2022:

@hebasto Here is a commit that uses an enum instead of an int for the TipChanged signal, which compiles, but at runtime I get a GUI: QObject::connect: Cannot queue arguments of type 'SyncType' error: https://github.com/sipa/bitcoin/commits/202208_headerssync_log

hebasto commented at 2:40 pm on August 24, 2022:

@sipa Suggesting the following diff to your branch:

 0--- a/src/qt/bitcoin.cpp
 1+++ b/src/qt/bitcoin.cpp
 2@@ -75,6 +75,7 @@ Q_IMPORT_PLUGIN(QAndroidPlatformIntegrationPlugin)
 3 Q_DECLARE_METATYPE(bool*)
 4 Q_DECLARE_METATYPE(CAmount)
 5 Q_DECLARE_METATYPE(SynchronizationState)
 6+Q_DECLARE_METATYPE(SyncType)
 7 Q_DECLARE_METATYPE(uint256)
 8 
 9 static void RegisterMetaTypes()
10@@ -82,6 +83,7 @@ static void RegisterMetaTypes()
11     // Register meta types used for QMetaObject::invokeMethod and Qt::QueuedConnection
12     qRegisterMetaType<bool*>();
13     qRegisterMetaType<SynchronizationState>();
14+    qRegisterMetaType<SyncType>();
15   #ifdef ENABLE_WALLET
16     qRegisterMetaType<WalletModel*>();
17   #endif

sipa commented at 2:51 pm on August 24, 2022:

@hebasto Ah, thanks, that works!

Add function to validate difficulty changes

The rule against difficulty adjustments changing by more than a factor of 4 can
be helpful for anti-DoS measures in contexts where we lack a full headers
chain, so expose this functionality separately and in the narrow case where we
only know the height, new value, and old value.

Includes fuzz test by Martin Zumsande.

1d4cfa4272

Add bitdeque, an std::deque<bool> analogue that does bit packing. 84852bb6bb

in src/net_processing.cpp:2574 in 96140e9479 outdated

2557+{
2558+    if (header == nullptr) {
2559+        return false;
2560+    } else if (m_chainman.m_best_header != nullptr && header == m_chainman.m_best_header->GetAncestor(header->nHeight)) {
2561+        return true;
2562+    } else if (m_chainman.ActiveTip() != nullptr && header == m_chainman.ActiveTip()->GetAncestor(header->nHeight)) {

sipa commented at 2:47 pm on August 23, 2022:

I think this can be written more efficiently as (m_chainman.m_chain.Contains(header)).

sdaftuar commented at 6:40 pm on August 23, 2022:

Done.

sdaftuar force-pushed on Aug 23, 2022

glozow commented at 4:48 pm on August 23, 2022: member

Have stared at each commit enough to convince myself this is safe. Tested this a few times on mainnet and testnet with some sanity check assertions thrown in. I don’t know much about qt but the code looks correct, and when testing, both gui and debug.log seemed to display information at a reasonable cadence.

sdaftuar force-pushed on Aug 23, 2022

Add functions to construct locators without CChain

This introduces an insignificant performance penalty, as it means locator
construction needs to use the skiplist-based CBlockIndex::GetAncestor()
function instead of the lookup-based CChain, but avoids the need for
callers to have access to a relevant CChain object.

ed470940cd

sdaftuar force-pushed on Aug 23, 2022

sdaftuar commented at 8:10 pm on August 23, 2022: member

Addressed review feedback and included some changes to further clean up the logic in ProcessHeadersMessage(). Also, I included a new test for large reorg handling.

sdaftuar force-pushed on Aug 23, 2022

in src/headerssync.cpp:5 in 5d49295216 outdated

0@@ -0,0 +1,312 @@
1+#include <headerssync.h>

fjahr commented at 10:12 pm on August 23, 2022:

This file is missing the copyright notice.

fanquake commented at 11:37 am on August 24, 2022:

Has been added.

sdaftuar force-pushed on Aug 23, 2022

sdaftuar force-pushed on Aug 24, 2022

in src/qt/clientmodel.cpp:275 in bbef38fdac outdated

273         });
274     m_handler_notify_header_tip = m_node.handleNotifyHeaderTip(
275-        [this](SynchronizationState sync_state, interfaces::BlockTip tip, double verification_progress) {
276-            TipChanged(sync_state, tip, verification_progress, /*header=*/true);
277+        [this](SynchronizationState sync_state, interfaces::BlockTip tip, bool presync) {
278+            TipChanged(sync_state, tip, /*verification_progress=*/0.0, /*header=*/2 - presync);

hebasto commented at 11:45 am on August 24, 2022:

nit: Could an implicit bool to int conversion be avoided?

hebasto commented at 11:45 am on August 24, 2022: member

Tested bbef38fdac90a03020ad70cbe6b97fbf09fccbc6 on the GUI:

Screenshot from 2022-08-24 12-40-08

Screenshot from 2022-08-24 12-40-47

Screenshot from 2022-08-24 12-41-59

in src/net_processing.cpp:598 in 1fca7ed49e outdated

587@@ -581,18 +588,59 @@ class PeerManagerImpl final : public PeerManager
588 
589     void ProcessOrphanTx(std::set<uint256>& orphan_work_set) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans)
590         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
591-    /** Process a single headers message from a peer. */
592+    /** Process a single headers message from a peer.
593+     *
594+     * @param[in]   pfrom     CNode of the peer
595+     * @param[in]   peer      The peer sending us the headers
596+     * @param[in]   headers   The headers received. Note that this may be modified within ProcessHeadersMessage.

Sjors commented at 12:35 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: [in,out]?

sdaftuar commented at 7:45 pm on August 24, 2022:

Hmm, interesting point. I think the intention is not to think of the headers in the vector after calling ProcessHeaderMessage as the output of the function, but rather just that the caller shouldn’t assume anything about what is in the vector. So I think it’s better to not have the docs say “[in,out]”, given that understanding?

Sjors commented at 8:22 am on August 25, 2022:

The bigger question is if this should be a single param or two (and if so, if that’s for a followup): #25717 (review)

Whether headers is treated as an output or just as an input-that-might-get-modified depends on what the callers do with it.

sipa commented at 12:11 pm on August 25, 2022:

With the recent changes in the net_processing, I think it’s easier to think about it as “reference to a list of headers to be processed, and which may be modified”. All the caller would do with an output argument is overwrite the passed-in headers variable anyway.

vasild commented at 3:27 pm on August 25, 2022:

Why modify it if you don’t want the caller to use it afterwards?

Is it only modified by the headers.swap(result.pow_validated_headers) in IsContinuationOfLowWorkHeadersSync() and all callers ignore the result? If yes, then that swap() can be removed and the parameters made const?

Edit: the callers of IsContinuationOfLowWorkHeadersSync() don’t ignore the result, I was confused, sorry. Maybe document what is expected as in and what is returned as out for the headers parameter in both PeerManagerImpl::ProcessHeadersMessage() and IsContinuationOfLowWorkHeadersSync().

sipa commented at 3:29 pm on August 25, 2022:

No, ProcessHeadersMessage continues operating on the headers variable.

sdaftuar commented at 6:10 pm on August 25, 2022:

@vasild For IsContinuationOfLowWorkHeadersSync, the comment currently says:

 0     * [@param](/bitcoin-bitcoin/contributor/param/)[in,out] headers                      The headers to be processed.
 1     * [@return](/bitcoin-bitcoin/contributor/return/)     True if the passed in headers were successfully processed
 2     *              as the continuation of a low-work headers sync in progress;
 3     *              false otherwise.
 4     *              If false, the passed in headers will be returned back to
 5     *              the caller.
 6     *              If true, the returned headers may be empty, indicating
 7     *              there is no more work for the caller to do; or the headers
 8     *              may be populated with entries that have passed anti-DoS
 9     *              checks (and therefore may be validated for block index
10     *              acceptance by the caller).

For ProcessHeadersMessage, the comment reads:

0     * [@param](/bitcoin-bitcoin/contributor/param/)[in]   headers   The headers received. Note that this may be modified within ProcessHeadersMessage.

Can you clarify what documentation you think is missing?

sipa commented at 8:36 pm on August 25, 2022:

The headers argument to ProcessHeadersMessage should probably be of type std::vector<CBlockHeader>&&, making the caller indicate that the referenced vector isn’t going to be used anymore. Callers can pass something in with std::move(headers). The compact block call site can turn into return ProcessHeadersMessage(pfrom, *peer, {cmpctblock.header}, /*via_compact_block=*/true); then even, without temporary headers variable.

sdaftuar commented at 11:58 pm on August 25, 2022:

Done.

vasild commented at 10:16 am on August 30, 2022:

@sdaftuar, I was just briefly looking around and I missed the explanations in IsContinuationOfLowWorkHeadersSync() @return. The comment in ProcessHeadersMessage() is too scarce - “may be modified”, but in which way?

Anyway, too minor and I guess the @sipa’s suggestion above has improved it enough.

in src/net_processing.cpp:616 in 1fca7ed49e outdated

610      * occasional non-connecting header (this can happen due to BIP 130 headers
611      * announcements for blocks interacting with the 2hr (MAX_FUTURE_BLOCK_TIME) rule). */
612     void HandleFewUnconnectingHeaders(CNode& pfrom, Peer& peer, const std::vector<CBlockHeader>& headers);
613     /** Return true if the headers connect to each other, false otherwise */
614     bool CheckHeadersAreContinuous(const std::vector<CBlockHeader>& headers) const;
615+    /** Try to continue a low-work headers sync that has already begun.

Sjors commented at 12:50 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 Given that this calls ProcessNextHeaders() we should probably repeat the requirement for that function:

Assumes the caller has already verified the headers connect, and has checked that each header satisfies proof-of-work target included in the header

sdaftuar commented at 11:23 pm on August 24, 2022:

Done.

in src/net_processing.cpp:617 in 1fca7ed49e outdated

614     bool CheckHeadersAreContinuous(const std::vector<CBlockHeader>& headers) const;
615+    /** Try to continue a low-work headers sync that has already begun.
616+     *  @param[in]  peer                            The peer we're syncing with.
617+     *  @param[in]  pfrom                           CNode of the peer
618+     *  @param[in,out] headers                      The headers to be processed.
619+     *  @return     True if the passed in headers were successfully processed; false otherwise.

Sjors commented at 12:53 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 to avoid using the ambiguous term “processed” twice, maybe replace “were successfully processed” with “passed the checks in ValidateAndStoreHeadersCommitments” or “connect and their difficulty adjustment is within bounds”.

sdaftuar commented at 11:23 pm on August 24, 2022:

Reworked this comment to be clearer.

in src/headerssync.h:151 in 1fca7ed49e outdated

137+    };
138+
139+    /** Process a batch of headers, once a sync via this mechanism has started
140+     *
141+     * received_headers: headers that were received over the network for processing.
142+     *                   Assumes the caller has already verified the headers connect,

Sjors commented at 12:56 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 ProcessNextHeaders calls ValidateAndStoreHeadersCommitments which checks that headers connect. So does it really assume it?

sdaftuar commented at 8:00 pm on August 24, 2022:

That’s a fair point; I think what I was thinking here is that headers within a headers message that don’t connect aren’t going to be well-handled by this logic, because that is a level of brokenness that we’d want to detect at the net_processing level and just disconnect the peer (certainly it’s not a type of failure that we would handle by issuing another getheaders and hoping for a different response).

The logic does (or should!) correctly deal with a case where the first header in a message doesn’t connect with the last header in a prior message, which is not necessarily indicative of a broken peer (it could happen due to the peer reorging, for instance). In that type of situation, we want to return a failure to the caller that indicates the caller could try again with a new headers sync from a different starting point.

So in total, I think the idea that the caller should be checking for this error is still the right idea to get across, but perhaps this comment is slightly misleading.

Sjors commented at 8:13 am on August 25, 2022:

headers within a headers message that don’t connect

I’ve been assuming that in the source code comments “connect” means the first headers fits onto something we have and “continuous” means all headers inside the package connect (with no forks?). For each function we should probably be clear which of the two we care about.

The logic does (or should!) correctly deal with a case where the first header in a message doesn’t connect with the last header in a prior message,

It indeed does, afaik.

sdaftuar commented at 6:06 pm on August 25, 2022:

Ah yes, fixed the comment in headerssync.h to use the term “continuous” rather than “connect”.

in src/qt/clientmodel.h:108 in bbef38fdac outdated

104@@ -105,13 +105,13 @@ class ClientModel : public QObject
105     //! A thread to interact with m_node asynchronously
106     QThread* const m_thread;
107 
108-    void TipChanged(SynchronizationState sync_state, interfaces::BlockTip tip, double verification_progress, bool header) EXCLUSIVE_LOCKS_REQUIRED(!m_cached_tip_mutex);
109+    void TipChanged(SynchronizationState sync_state, interfaces::BlockTip tip, double verification_progress, int header) EXCLUSIVE_LOCKS_REQUIRED(!m_cached_tip_mutex);

hebasto commented at 1:05 pm on August 24, 2022:

As a follow up, instead of this function the following two ones could be introduced:

BlockTipChanged(sync_state, tip, verification_progress)
HeaderTipChanged(sync_state, tip, presync)

Implementation each of them will be much simpler than the current TipChanged’s implementation.

hebasto commented at 1:23 pm on August 24, 2022: member

I have reviewed the GUI-specific code and it looks OK, tested it with QT_FATAL_WARNINGS=1. I agree it can be merged.

UPDATE: This PR adds two new translatable strings, but that is ok as we are still in pre-“Translation string freeze”.

in src/net_processing.cpp:2467 in 1fca7ed49e outdated

2410+                // safe.
2411+                bool sent_getheaders = MaybeSendGetHeaders(pfrom, locator, peer);
2412+                if (sent_getheaders) {
2413+                    LogPrint(BCLog::NET, "more getheaders (from %s) to peer=%d\n",
2414+                            locator.vHave.front().ToString(), pfrom.GetId());
2415+                }

Sjors commented at 1:35 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 probably worth logging if this does go wrong.

sdaftuar commented at 11:24 pm on August 24, 2022:

Done.

in src/headerssync.h:166 in 1fca7ed49e outdated

161+    /** Issue the next GETHEADERS message to our peer.
162+     *
163+     * This will return a locator appropriate for the current sync object, to continue the
164+     * synchronization phase it is in.
165+     */
166+    CBlockLocator MakeNextHeadersRequest() const;

Sjors commented at 1:38 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: NextHeadersRequestLocator might be a better name

sdaftuar commented at 11:24 pm on August 24, 2022:

Agreed, done.

in src/net_processing.cpp:622 in 1fca7ed49e outdated

613     /** Return true if the headers connect to each other, false otherwise */
614     bool CheckHeadersAreContinuous(const std::vector<CBlockHeader>& headers) const;
615+    /** Try to continue a low-work headers sync that has already begun.
616+     *  @param[in]  peer                            The peer we're syncing with.
617+     *  @param[in]  pfrom                           CNode of the peer
618+     *  @param[in,out] headers                      The headers to be processed.

Sjors commented at 1:43 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: is there any downside to just having a const headers_in and a separate headers_out param? Inside IsContinuationOfLowWorkHeadersSync you’re not editing headers in place anyway, using swap instead.

sdaftuar commented at 8:36 pm on August 24, 2022:

This was the result of a couple iterations that I went through with @sipa. I believe @sipa pointed out to me that we could push the headers manipulation down another level into the headers sync object itself, and then we could save an allocation – but I was going to save further refinements of this code to a future PR at this point.

Sjors commented at 8:17 am on August 25, 2022:

There is indeed a trade-off with review burden at this point.

sipa commented at 3:56 pm on August 25, 2022:

FWIW, here is a commit that pushes things down further, though I’m going to keep that for potential follow-up PR: https://github.com/sipa/bitcoin/commit/95422ec7ecca5acc427d511b8716bebeed7e4b0a

in src/net_processing.cpp:2659 in 1fca7ed49e outdated

2656+    }
2657+
2658+    // Before we do any processing, make sure these pass basic sanity checks.
2659+    // We'll rely on headers having valid proof-of-work further down, as an
2660+    // anti-DoS criteria (note: this check is required before passing any
2661+    // headers/ into HeadersSyncState).

Sjors commented at 1:53 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 nit: spurious /

sdaftuar commented at 11:25 pm on August 24, 2022:

Fixed.

in src/net_processing.cpp:2775 in 1fca7ed49e outdated

2674+    // If we're in the middle of headers sync, let it do its magic.
2675+    bool have_headers_sync = false;
2676+    {
2677+        LOCK(peer.m_headers_sync_mutex);
2678+
2679+        already_validated_work = IsContinuationOfLowWorkHeadersSync(peer, pfrom, headers);

Sjors commented at 2:09 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 This can be true if we’re still still in the PRESYNC state, even though we haven’t checked against the minimum work yet. In that case this function returns early, so it’s fine. But a clarifying comment would be good here.

sdaftuar commented at 11:25 pm on August 24, 2022:

Updated the comments around this variable.

in src/net_processing.cpp:2668 in 1fca7ed49e outdated

2665     }
2666 
2667     const CBlockIndex *pindexLast = nullptr;
2668 
2669+    // We'll set already_validated_work to true if the headers-sync logic
2670+    // returns headers for us to process, to bypass the minimum work check

Sjors commented at 2:13 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 Maybe say “minimum total work check”, so it’s not confused with a minimal check on the work.

sdaftuar commented at 11:26 pm on August 24, 2022:

Reworked this comment.

in src/net_processing.cpp:2757 in 1fca7ed49e outdated

2658+    // Before we do any processing, make sure these pass basic sanity checks.
2659+    // We'll rely on headers having valid proof-of-work further down, as an
2660+    // anti-DoS criteria (note: this check is required before passing any
2661+    // headers/ into HeadersSyncState).
2662+    if (!CheckHeadersPoW(headers, m_chainparams.GetConsensus(), peer)) {
2663+        Misbehaving(peer, 100, "invalid proof-of-work");

Sjors commented at 2:30 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 maybe add a note that, even if via_compact_block is set, BIP 152 requires a valid header.

sdaftuar commented at 8:15 pm on August 24, 2022:

Will add. Also, you just made me realize that the anti-DoS work calculation to avoid processing compact blocks that are low-work doesn’t first verify the proof-of-work on the header before doing the work check. After giving this more thought, I don’t think it’s a problem, because if the claimed work is low we ignore it, and if the claimed work is high then we might try to validate it, which would catch any problems – but wanted to mention it in case I am overlooking something.

in src/net_processing.cpp:4157 in 1fca7ed49e outdated

4153@@ -3913,7 +4154,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
4154             // the peer if the header turns out to be for an invalid block.
4155             // Note that if a peer tries to build on an invalid chain, that
4156             // will be detected and the peer will be disconnected/discouraged.
4157-            return ProcessHeadersMessage(pfrom, *peer, {cmpctblock.header}, /*via_compact_block=*/true);
4158+            std::vector<CBlockHeader> dummy{cmpctblock.header};

Sjors commented at 2:35 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: it’s not really a dummy is it? E.g. the proof of work gets checked by ProcessHeadersMessage()

sdaftuar commented at 11:27 pm on August 24, 2022:

Fixed.

in src/headerssync.cpp:19 in 1fca7ed49e outdated

14+//! Store a commitment to a header every HEADER_COMMITMENT_PERIOD blocks.
15+constexpr size_t HEADER_COMMITMENT_PERIOD{584};
16+
17+//! Only feed headers to validation once this many headers on top have been
18+//! received and validated against commitments.
19+constexpr size_t REDOWNLOAD_BUFFER_SIZE{13959};

Sjors commented at 3:39 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: so this doesn’t need to be a multiple of HEADER_COMMITMENT_PERIOD? Maybe mention that it’s approximately 24 commitment periods worth of headers, i.e. 24 bits of security (? Or ~9 more because an attacker has to account for every possible offset)

sdaftuar commented at 11:27 pm on August 24, 2022:

I’m not sure “bits of security” is quite the right way to think about this, but I added a comment mentioning this is about 23.9 commitment bits that will fit in the buffer.

in src/headerssync.h:103 in 1fca7ed49e outdated

 98+ * sync (temporary, per-peer storage).
 99+ */
100+
101+class HeadersSyncState {
102+public:
103+    ~HeadersSyncState() {}

Sjors commented at 3:50 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 : maybe Assume(m_download_state == State::FINAL);

sdaftuar commented at 8:42 pm on August 24, 2022:

Not necessarily true I think, such as if we are destroying the object early (such as due to peer disconnection, or being in a unit test environment, etc).

Sjors commented at 8:28 am on August 25, 2022:

Wouldn’t it be safer to always require a call to Finalize() in these cases? Alternatively, to have a State::NEW where we know nothing has been allocated yet.

in src/headerssync.cpp:56 in 1fca7ed49e outdated

51+void HeadersSyncState::Finalize()
52+{
53+    Assume(m_download_state != State::FINAL);
54+    m_header_commitments.clear();
55+    m_last_header_received.SetNull();
56+    std::deque<CompressedHeader>().swap(m_redownloaded_headers);

Sjors commented at 3:52 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 is that much faster than clear()?

sdaftuar commented at 8:45 pm on August 24, 2022:

I believe clear() does not actually free memory, but the swap() here does. (May not matter much since the caller will just be deleting us anyway, but seems better to free resources as soon as we can.)

sipa commented at 8:48 pm on August 24, 2022:

This seems based on the (somewhat surprising) std::vector<T>::clear() behavior which does not release memory, just sets the size of the used area of it to 0. That doesn’t apply here though, because std::deque is not an std::vector and I don’t think it exhibits that behavior.

Also, even if it was, I think m_redownloaded_headers = std::deque<CompressedHeader>{}; is cleaner (since C++11 move semantics that’s equivalent to that swap statement, but a bit more readable).

sipa commented at 9:13 pm on August 24, 2022:

Seems I’m wrong; std::deque<T>::clear() does not guarantee memory is released.

sdaftuar commented at 11:45 pm on August 24, 2022:

Rewrote this as m_redownloaded_headers = std::deque<CompressedHeader>{};

in src/headerssync.cpp:90 in 1fca7ed49e outdated

85+            if (full_headers_message || m_download_state == State::REDOWNLOAD) {
86+                // A full headers message means the peer may have more to give us;
87+                // also if we just switched to REDOWNLOAD then we need to re-request
88+                // headers from the beginning.
89+                ret.request_more = true;
90+            } else {

Sjors commented at 4:04 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 : Assume(m_download_state == State::PRESYNC)

sdaftuar commented at 11:27 pm on August 24, 2022:

Done.

sdaftuar force-pushed on Aug 24, 2022

in src/headerssync.cpp:135 in 1fca7ed49e outdated

129+                LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: incomplete headers message at height=%i (redownload phase)\n", m_id, m_redownload_buffer_last_height);
130+            }
131+        }
132+    }
133+
134+    if (!(ret.success && ret.request_more)) Finalize();

Sjors commented at 7:09 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: calling Finalize() here is safe, because we’ve already popped the headers we need into ret.

in src/headerssync.h:229 in 1fca7ed49e outdated

224+     *
225+     * Any peer giving us more headers than this will have its sync aborted. This serves as a
226+     * memory bound on m_header_commitments. */
227+    uint64_t m_max_commitments{0};
228+
229+    /** Store the latest header received while in PRESYNC */

Sjors commented at 7:13 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 or the hash of m_chain_start.

sdaftuar commented at 11:27 pm on August 24, 2022:

Updated comment.

in src/headerssync.h:184 in 1fca7ed49e outdated

179+     *  On failure, this invokes Finalize() and returns false.
180+     */
181+    bool ValidateAndStoreHeadersCommitments(const std::vector<CBlockHeader>& headers);
182+
183+    /** In PRESYNC, process and update state for a single header */
184+    bool ValidateAndProcessSingleHeader(const CBlockHeader& previous, const

Sjors commented at 7:16 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: the previous argument is a bit pointless, since it’s a member variable.

sdaftuar commented at 11:27 pm on August 24, 2022:

Eliminated the extra argument.

in src/headerssync.h:243 in 1fca7ed49e outdated

238+    std::deque<CompressedHeader> m_redownloaded_headers;
239+
240+    /** Height of last header in m_redownloaded_headers */
241+    int64_t m_redownload_buffer_last_height{0};
242+
243+    /** Hash of last header in m_redownloaded_headers (we have to cache it

Sjors commented at 7:19 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212 or the hash of m_chain_start.

sdaftuar commented at 11:27 pm on August 24, 2022:

Updated comment.

in src/headerssync.cpp:201 in 1fca7ed49e outdated

195+        // Add a commitment.
196+        m_header_commitments.push_back(m_hasher(current.GetHash()) & 1);
197+        if (m_header_commitments.size() > m_max_commitments) {
198+            // The peer's chain is too long; give up.
199+            // It's possible the chain grew since we started the sync; so
200+            // potentially we could succeed in syncing the peer's chain if we

Sjors commented at 7:23 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: in the case of IBD this is just a remote theoretical possibility, right? But for a shorter catchup there’s at least a less-remote chance someone produced lots of blocks with timestamps progressing only 1 second per 6 blocks, e.g. as part of an going timewarp attack.

sdaftuar commented at 9:06 pm on August 24, 2022:

I think even in that scenario, this code ought to succeed, because we establish the minimum work requirement to leave this logic at the start of a sync, and not as we go. So if a peer had a high-enough-work chain at the start of our sync, then this calculation should correctly capture the number of valid headers they might have to satisfy that work requirement.

in src/headerssync.cpp:256 in 1fca7ed49e outdated

251+    // Also, don't check commitments once we've gotten to our target blockhash;
252+    // it's possible our peer has extended its chain between our first sync and
253+    // our second, and we don't want to return failure after we've seen our
254+    // target blockhash just because we ran out of commitments.
255+    if (!m_process_all_remaining_headers && next_height % HEADER_COMMITMENT_PERIOD == m_commit_offset) {
256+         bool commitment = m_hasher(header.GetHash()) & 1;

Sjors commented at 7:27 pm on August 24, 2022:

1fca7ed49eca6c2cd71d37415fa2d419779cf212: maybe move this line down so it’s right before or after bool expected_commitment =

sdaftuar commented at 11:28 pm on August 24, 2022:

Done.

Sjors commented at 7:36 pm on August 24, 2022: member

Reviewed the main commit 1fca7ed49eca6c2cd71d37415fa2d419779cf212. It makes sense to me now. Mostly documentation and aesthetics comments.

sdaftuar force-pushed on Aug 24, 2022

in src/validation.cpp:3673 in 01a9c4dc28 outdated

3668@@ -3668,6 +3669,10 @@ bool ChainstateManager::AcceptBlockHeader(const CBlockHeader& block, BlockValida
3669             }
3670         }
3671     }
3672+    if (!min_pow_checked) {
3673+        LogPrint(BCLog::VALIDATION, "%s: not adding new block header %s, missing anti-dos proof-of-work validation\n", __func__, hash.ToString());

Sjors commented at 8:58 am on August 25, 2022:

01a9c4dc2869708957ce3d9d4a391740e074363c: would be good to have a test case, if it’s actually possible to hit this condition (see suggestion below)

sdaftuar commented at 2:43 pm on August 25, 2022:

There’s a test in p2p_unrequested_blocks.py that fails when I comment out this if() code – does that seems like sufficient test coverage?

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index 21d801306158..c0df90128d04 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -3669,10 +3669,10 @@ bool ChainstateManager::AcceptBlockHeader(const CBlockHeader& block, BlockValida
 5             }
 6         }
 7     }
 8-    if (!min_pow_checked) {
 9-        LogPrint(BCLog::VALIDATION, "%s: not adding new block header %s, missing anti-dos proof-of-work validation\n", __func__, hash.ToString());
10-        return state.Invalid(BlockValidationResult::BLOCK_HEADER_LOW_WORK, "too-little-chainwork");
11-    }
12+    //if (!min_pow_checked) {
13+    //    LogPrint(BCLog::VALIDATION, "%s: not adding new block header %s, missing anti-dos proof-of-work validation\n", __func__, hash.ToString());
14+    //    return state.Invalid(BlockValidationResult::BLOCK_HEADER_LOW_WORK, "too-little-chainwork");
15+   // }
16     CBlockIndex* pindex{m_blockman.AddToBlockIndex(block, m_best_header)};
17 
18     if (ppindex)

 02022-08-25T14:41:14.204000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_zotpggyy
 12022-08-25T14:41:14.791000Z TestFramework (ERROR): Assertion failed
 2Traceback (most recent call last):
 3  File "/home/sdaftuar/ccl-bitcoin/2022-02-headers-dos-prevention/test/functional/test_framework/test_framework.py", line 133, in main
 4    self.run_test()
 5  File "/home/sdaftuar/ccl-bitcoin/2022-02-headers-dos-prevention/test/functional/p2p_unrequested_blocks.py", line 105, in run_test
 6    assert_equal(self.check_hash_in_chaintips(self.nodes[1], blocks_h2[1].hash), False)
 7  File "/home/sdaftuar/ccl-bitcoin/2022-02-headers-dos-prevention/test/functional/test_framework/util.py", line 56, in assert_equal
 8    raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
 9AssertionError: not(True == False)
102022-08-25T14:41:14.842000Z TestFramework (INFO): Stopping nodes
112022-08-25T14:41:14.996000Z TestFramework (WARNING): Not cleaning up dir /tmp/bitcoin_func_test_zotpggyy
122022-08-25T14:41:14.996000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/bitcoin_func_test_zotpggyy/test_framework.log
132022-08-25T14:41:14.996000Z TestFramework (ERROR): 
142022-08-25T14:41:14.996000Z TestFramework (ERROR): Hint: Call /home/sdaftuar/ccl-bitcoin/2022-02-headers-dos-prevention/test/functional/combine_logs.py '/tmp/bitcoin_func_test_zotpggyy' to consolidate all logs
152022-08-25T14:41:14.996000Z TestFramework (ERROR): 
162022-08-25T14:41:14.997000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
172022-08-25T14:41:14.997000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
182022-08-25T14:41:14.997000Z TestFramework (ERROR):

Sjors commented at 2:57 pm on August 25, 2022:

Shouldn’t that test be able to check for the log message?

sdaftuar commented at 3:20 pm on August 25, 2022:

Done.

in src/net_processing.cpp:4436 in 01a9c4dc28 outdated

4345@@ -4336,8 +4346,14 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
4346             // which peers send us compact blocks, so the race between here and
4347             // cs_main in ProcessNewBlock is fine.
4348             mapBlockSource.emplace(hash, std::make_pair(pfrom.GetId(), true));
4349+
4350+            // Check work on this block against our anti-dos thresholds.
4351+            const CBlockIndex* prev_block = m_chainman.m_blockman.LookupBlockIndex(pblock->hashPrevBlock);
4352+            if (prev_block && prev_block->nChainWork + CalculateHeadersWork({pblock->GetBlockHeader()}) >= GetAntiDoSWorkThreshold()) {
4353+                min_pow_checked = true;

Sjors commented at 9:09 am on August 25, 2022:

01a9c4dc2869708957ce3d9d4a391740e074363c: so in the else case min_pow_checked would remain false and thus trigger the BLOCK_HEADER_LOW_WORK condition, can we cover that in a test?

Sjors commented at 9:15 am on August 25, 2022: member

01a9c4dc2869708957ce3d9d4a391740e074363c: I’m not sure how robust the min_pow_checked “attestation” is against breaking things in the future (it looks correct now). It’s important to have test coverage for low-work headers trying to sneak in via the various call sites. This commit adds a few, so that’s good.

in src/headerssync.h:124 in bb60d93657 outdated

120@@ -121,6 +121,12 @@ class HeadersSyncState {
121     /** Return the height reached during the PRESYNC phase */
122     int64_t GetPresyncHeight() const { return m_current_height; }
123 
124+    /** Return the block timestamp of the last block received during the PRESYNC phase. */

Sjors commented at 11:18 am on August 25, 2022:

bb60d93657df10702a48ea3094d2eee732a80e52: last header received

sdaftuar commented at 3:21 pm on August 25, 2022:

Fixed.

in test/functional/p2p_headers_sync_with_minchainwork.py:131 in ea7f50a3bc outdated

126+        self.generate(self.nodes[0], BLOCKS_TO_MINE, sync_fun=self.no_op)
127+        self.generate(self.nodes[1], BLOCKS_TO_MINE+2, sync_fun=self.no_op)
128+
129+        self.reconnect_all()
130+
131+        self.sync_blocks(timeout=300)

Sjors commented at 11:27 am on August 25, 2022:

ea7f50a3bce71c7e883245ea82e6547c02317f91: maybe comment that this works because sync_blocks checks that both nodes agree on the tip.

sdaftuar commented at 3:21 pm on August 25, 2022:

Done.

in src/validation.cpp:3731 in 6a1df15b85 outdated

3726@@ -3727,7 +3727,9 @@ void ChainstateManager::ReportHeadersPresync(const arith_uint256& work, int64_t
3727         if (now < m_last_presync_update + std::chrono::milliseconds{250}) return;
3728         m_last_presync_update = now;
3729     }
3730-    if (chainstate.IsInitialBlockDownload()) {
3731+    bool initial_download = chainstate.IsInitialBlockDownload();
3732+    uiInterface.NotifyHeaderTip(GetSynchronizationState(initial_download), height, timestamp, true);

Sjors commented at 11:45 am on August 25, 2022:

6a1df15b859bebad69a171e80d3c8eda4e936c46 : /*presync=*/true

sdaftuar commented at 3:21 pm on August 25, 2022:

Done.

in src/qt/bitcoingui.cpp:618 in acee94b4fd outdated

614@@ -615,8 +615,8 @@ void BitcoinGUI::setClientModel(ClientModel *_clientModel, interfaces::BlockAndH
615         connect(_clientModel, &ClientModel::numConnectionsChanged, this, &BitcoinGUI::setNumConnections);
616         connect(_clientModel, &ClientModel::networkActiveChanged, this, &BitcoinGUI::setNetworkActive);
617 
618-        modalOverlay->setKnownBestHeight(tip_info->header_height, QDateTime::fromSecsSinceEpoch(tip_info->header_time));
619-        setNumBlocks(tip_info->block_height, QDateTime::fromSecsSinceEpoch(tip_info->block_time), tip_info->verification_progress, false, SynchronizationState::INIT_DOWNLOAD);
620+        modalOverlay->setKnownBestHeight(tip_info->header_height, QDateTime::fromSecsSinceEpoch(tip_info->header_time), false);

Sjors commented at 11:47 am on August 25, 2022:

acee94b4fdb1a28db1c217d82ca0611d7ba5a56d /*presync=/*false

sdaftuar commented at 3:21 pm on August 25, 2022:

Done.

Sjors approved

Sjors commented at 11:51 am on August 25, 2022: member

tACK acee94b4fdb1a28db1c217d82ca0611d7ba5a56d

did not review the deque implementation and fuzzer code
did not check the actual per peer memory usage
only lightly reviewed the stats logging code in bb60d93657df10702a48ea3094d2eee732a80e52

sdaftuar commented at 3:23 pm on August 25, 2022: member

Addressed latest review feedback from @sjors, unsquashed branch is here for comparison.

sdaftuar force-pushed on Aug 25, 2022

mzumsande approved

mzumsande commented at 6:26 pm on August 25, 2022: contributor

ACK bae408fc87b3111cf93d26db66f1947430b04f34

I reviewed the code (minus the bitdeque implementation, where I just ran the extensive fuzz test for a while) and tested this multiple times on mainnet, signet and testnet, both with bitcoind and the GUI.

in src/net_processing.cpp:2489 in 0f24c9a924 outdated

2484+            // process the headers using it as normal.
2485+            return IsContinuationOfLowWorkHeadersSync(peer, pfrom, headers);
2486+        } else {
2487+            LogPrint(BCLog::NET, "Ignoring low-work chain (height=%u) from peer=%d\n", chain_start_header->nHeight + headers.size(), pfrom.GetId());
2488+            // Since this is a low-work headers chain, no further processing is required.
2489+            std::vector<CBlockHeader>().swap(headers);

sipa commented at 8:38 pm on August 25, 2022:

Could use headers = {}; (syntactic sugar for headers = std::vector<CBlockHeader>{};).

sdaftuar commented at 11:59 pm on August 25, 2022:

Done.

in src/headerssync.cpp:56 in 0f24c9a924 outdated

51+void HeadersSyncState::Finalize()
52+{
53+    Assume(m_download_state != State::FINAL);
54+    m_header_commitments.clear();
55+    m_last_header_received.SetNull();
56+    m_redownloaded_headers = std::deque<CompressedHeader>{};

sipa commented at 8:39 pm on August 25, 2022:

Could even use m_redownloaded_headers = {};.

sdaftuar commented at 11:59 pm on August 25, 2022:

Done.

in src/headerssync.cpp:54 in 0f24c9a924 outdated

49+ * required to guarantee that we won't reuse this object with the same
50+ * SaltedTxidHasher for another sync. */
51+void HeadersSyncState::Finalize()
52+{
53+    Assume(m_download_state != State::FINAL);
54+    m_header_commitments.clear();

sipa commented at 8:40 pm on August 25, 2022:

If you want to guarantee memory is released here, use m_headers_commitments = {}; here as well.

sdaftuar commented at 11:59 pm on August 25, 2022:

Done.

in src/net_processing.cpp:2552 in bae408fc87 outdated

2547+            // Note: we could advance to the last header in this set that is
2548+            // known to us, rather than starting at the first header (which we
2549+            // may already have). But we actually might have all the headers in
2550+            // this set already, because headers sync can be imprecise when a
2551+            // peer has to serve us a long chain (due to imprecision in the way
2552+            // locators are calculated). So philosophically, if we wanted to

sipa commented at 8:46 pm on August 25, 2022:

I think this comment is outdated now. We’re actually not starting low-work headers sync until the peer gives a headers message which contains at least one header not in m_best_header/m_active_chain already.

sdaftuar commented at 0:07 am on August 26, 2022:

Thanks for catching; I updated this comment.

in src/validation.h:1007 in bae408fc87 outdated

1003@@ -991,10 +1004,11 @@ class ChainstateManager
1004      *
1005      * @param[in]   block The block we want to process.
1006      * @param[in]   force_processing Process this block even if unrequested; used for non-network block sources.
1007+     * @param[in]   min_pow_checked  True if proof-of-work anti-DoS checks have been done by caller for headers chain

sipa commented at 9:09 pm on August 25, 2022:

Maybe point out that this only affects situations where a new header is being added along with the block. If we already had the header, this parameter has no effect.

sdaftuar commented at 11:59 pm on August 25, 2022:

Done.

sipa commented at 9:18 pm on August 25, 2022: member

ACK bae408fc87b3111cf93d26db66f1947430b04f34, after squashing (with the caveat that some of the code changes are mine).

I’ve re-reviewed all commits, and separately reviewed code changes since my last review (by comparing with a rebased version of the old commit hash). I’ve also done several more synchronizations from scratch, on mainnet and testnet.

Only non-blocking nits below.

sdaftuar force-pushed on Aug 25, 2022

sdaftuar force-pushed on Aug 26, 2022

sdaftuar commented at 0:07 am on August 26, 2022: member

Updated to address @sipa’s latest comments; unsquashed version is here.

sipa commented at 3:57 am on August 26, 2022: member

ACK f0dae7c6fa54d4d00489da02eebfa3e4ef40cb63 (for the parts that weren’t authored by me).

Reviewed the diff with my previous review.

in src/logging.h:68 in 9a0d196a9c outdated

64@@ -65,6 +65,7 @@ namespace BCLog {
65 #endif
66         UTIL        = (1 << 25),
67         BLOCKSTORE  = (1 << 26),
68+        HEADERSSYNC = (1 << 27),

ajtowns commented at 5:18 am on August 26, 2022:

Doing a testnet sync, the headerssync logs look like:

02022-08-26T04:50:14Z [headerssync] Initial headers sync started with peer=0: height=0, max_commitments=3748027, min_work=00000000000000000000000000000000000000000000064728c7be6fe4b2f961
12022-08-26T04:52:45Z [headerssync] Initial headers sync started with peer=3: height=0, max_commitments=3748029, min_work=00000000000000000000000000000000000000000000064728c7be6fe4b2f961
22022-08-26T04:56:48Z [headerssync] Initial headers sync transition with peer=0: reached sufficient work at height=2144000, redownloading from height=0
32022-08-26T04:58:35Z [headerssync] Initial headers sync transition with peer=3: reached sufficient work at height=2144000, redownloading from height=0
42022-08-26T05:02:25Z [headerssync] Initial headers sync complete with peer=0: releasing all at height=2144000 (redownload phase)

which isn’t very spammy, so I think it would make sense to merge this back into BCLog::NET in a followup. (The main rationale being that if you’re trying to debug weird p2p behaviour, better to see what’s going on with header syncing in case that happens to be the cause, without having to realise the possiblity after the fact then start debugging again later when after adding the extra debug category)

in src/net_processing.cpp:2791 in 9a0d196a9c outdated

2706+        // or not.
2707+        if (headers.empty()) {
2708+            return;
2709+        }
2710+
2711+        have_headers_sync = !!peer.m_headers_sync;

ajtowns commented at 6:35 am on August 26, 2022:

Not quite sure where to add this note; but having dropped the optimisation, this results in slightly weird behaviour: if you’re doing headers sync with two peers, A and B, and A completes redownload and then syncs up to the tip, then A, B and all other nodes will report synced_headers as matching the tip, and all nodes will start downloading blocks, but B will continue to download every header until it re-confirms that min work has been reached. Adding m_redownload_buffer_last_height to getpeerinfo makes this behaviour more easily observable.

Probably not worth optimising for this particular case rather than doing something more general and with greater impact, though.

in src/net_processing.cpp:2542 in f0dae7c6fa outdated

2537+    // before we'll store it)
2538+    arith_uint256 minimum_chain_work = GetAntiDoSWorkThreshold();
2539+
2540+    // Avoid DoS via low-difficulty-headers by only processing if the headers
2541+    // are part of a chain with sufficient work.
2542+    if (total_work < minimum_chain_work) {

ajtowns commented at 2:43 pm on August 26, 2022:

Perhaps add an exception here for peers with some NetPermissionsFlag set (reuse noban or download?) to allow you to easily avoid downloading headers twice if you have a trusted node to connect to?

ajtowns commented at 3:52 pm on August 26, 2022: contributor

utACK f0dae7c6fa54d4d00489da02eebfa3e4ef40cb63 – logic review more than code review

sdaftuar commented at 6:27 pm on August 26, 2022: member

@ajtowns I think both of your suggestions – dropping the HEADERSSYNC log category in favor of using NET, and dropping the anti-DoS check for NoBan peers – make sense, but I think both of those improvements are minor at this point and I’d prefer to postpone to a followup PR, just to minimize additional review burden for this.

Creating a checklist so that these suggestions don’t get forgotten:

in src/headerssync.cpp:198 in f0dae7c6fa outdated

193+    }
194+
195+    if (next_height % HEADER_COMMITMENT_PERIOD == m_commit_offset) {
196+        // Add a commitment.
197+        m_header_commitments.push_back(m_hasher(current.GetHash()) & 1);
198+        if (m_header_commitments.size() > m_max_commitments) {

ariard commented at 1:50 am on August 29, 2022:

I wonder if it wouldn’t be better to make m_max_commitments function of right now when we evaluate this check (e.g GetMaxCommitments(NodeClock::time_point_now), rather than when we initiate headers sync. One could argue that the most-work consensus-valid chain could be ignored by this logic as the chain might have grown since the start of the sync and is artificially sorted out by DoS mitigations. I concede, I don’t think it matters in practice as if we reach the rate of 6 blocks/second something is already wrong, however I think ideally the consensus outcome should be abstracted out from our headers/blocks syncing strategy.

sdaftuar commented at 9:19 am on August 29, 2022:

Is there a plausible scenario where this would matter? We also specify at the start of sync the minimum work required; fixing the maximum chain length just has the effect of requiring that a peer’s chain has enough work at the time the sync is initiated to pass anti-DoS checks.

Failure to sync because of a low-work chain will not cause a peer to be dropped, so if their chain grows later we’ll try syncing again. And in the specific case of a peer announcing many blocks to us (like approaching the 6 blocks/second theoretical limit), then this should not take long at all, because as soon as the sync ends and we get one more block announcement, we’re resume headers sync.

ariard commented at 3:54 am on August 31, 2022:

As far as I can think, I don’t see a plausible scenario where it matters. In fine there is the space of the historical chain where the 6 blocks/second has not been observed, and as such servicing as a additional chain length buffer to correct discrepancies between HeadersSyncState object initialization time point and a right now, I think. From my understanding, the maximum chain length is a protection targeting memory DoS of m_header_commitments, enforcing a constraint on the number of blocks allowed to satisfy the minimum work required.

sipa commented at 12:34 pm on August 31, 2022:

I can’t imagine this matters, because it would require the peer to have a chain which is invalid at the time the sync starts, which becomes valid while we’re busy learning about it. If that were the case, we’ll still synchronize it later.

in src/net_processing.cpp:2374 in f0dae7c6fa outdated

2369+    }
2370+
2371+    // Are these headers connected to each other?
2372+    if (!CheckHeadersAreContinuous(headers)) {
2373+        Misbehaving(peer, 20, "non-continuous headers sequence");
2374+        return false;

ariard commented at 2:07 am on August 29, 2022:

I’m not sure why a misbehaving score of 20 is assigned here, considering that if CheckHeadersPow returns false in ProcessHeadersMessage(), an additional misbehaving score of 100 is assigned there (L2757), as such beyond DISCOURAGEMENT_THRESHOLD. I’m don’t know if it makes sense.

sdaftuar commented at 12:02 pm on August 29, 2022:

Thanks, I think this may have been an oversight when I added the Misbehaving() to the caller. Looks like the score of 20 is how things worked historically; I guess I’ll just leave that unchanged but remove the Misbehaving() from the caller.

in src/headerssync.cpp:263 in f0dae7c6fa outdated

258+            LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: commitment overrun at height=%i (redownload phase)\n", m_id, next_height);
259+            // Somehow our peer managed to feed us a different chain and
260+            // we've run out of commitments.
261+            return false;
262+        }
263+        bool commitment = m_hasher(header.GetHash()) & 1;

ariard commented at 2:35 am on August 29, 2022:

Considering the 1-bit commitment, I wonder how hard it would be for an attacker to feed us a high-work chain of headers during PRESYNC, to substitute a second low-work chain of headers during REDOWNLOAD satisfying the exact same commitment pattern.

sdaftuar commented at 9:21 am on August 29, 2022:

The commitment parameters (frequency of commitments and size of redownload buffer) are chosen to tune this exact ability of an attacker, based on the simulation in https://gist.github.com/sipa/016ae445c132cdf65a2791534dfb7ae1.

ariard commented at 4:02 am on August 31, 2022:

From browsing the simulation, the 1 bit commitment is assumed as a system parameter at the beginning of demonstration, how which the commitment parameters like frequency of commitments and size of redownload buffer are derived. I wonder if N-bit commitment could be more information efficient, though we’re free to explore again in the future if the commitment storage memory usage becomes an issue (probably not).

sipa commented at 12:40 pm on August 31, 2022:

We’ve analyzed half a dozen variations before ending up with this approach, including:

Just (2000/N) bits every 2000 headers
1 bit every N headers, with a fixed offset
1 bit every N headers, with a random offset (approach from this PR)
1 bit every N headers, with an offset chosen such that it minimizes the attacker’s power
A 1 in N probability for every header to have a 1-bit commitment
A 1/N-bit commitment for every header, using an entropy coder.

(1) and (2) are equivalent, as we’re only going to verify commitments after blobs of 2000 anyway. (3) is better, and destroys the equivalence with (1) because the attacker doesn’t know when to start. (4) is very slightly worse than (3), but still significantly better than (1) and (2). (5) and (6) are significantly worse.

If you have other ideas for commitment structures, I’m happy to try simulating them.

ariard commented at 2:38 am on August 29, 2022: member

Will still review post-merge if this PR is evaluated as mature enough to land soon.

Utilize anti-DoS headers download strategy

Avoid permanently storing headers from a peer, unless the headers are part of a
chain with sufficiently high work. This prevents memory attacks using low-work
headers.

Designed and co-authored with Pieter Wuille.

551a8d957c

Require callers of AcceptBlockHeader() to perform anti-dos checks

In order to prevent memory DoS, we must ensure that we don't accept a new
header into memory until we've performed anti-DoS checks, such as verifying
that the header is part of a sufficiently high work chain. This commit adds a
new argument to AcceptBlockHeader() so that we can ensure that all call-sites
which might cause a new header to be accepted into memory have to grapple with
the question of whether the header is safe to accept, or needs further
validation.

This patch also fixes two places where low-difficulty-headers could have been
processed without such validation (processing an unrequested block from the
network, and processing a compact block).

Credit to Niklas Gögge for noticing this issue, and thanks to Sjors Provoost
for test code.

ed6cddd98e

Reduce spurious messages during headers sync

Delay sending SENDHEADERS (BIP 130) message until we know our peer's best
header's chain has more than nMinimumChainWork. This reduces inadvertent
headers messages received during initial headers sync due to block
announcements, which throw off our sync algorithm.

83c6a0c524

Add unit test for HeadersSyncState 0b6aa826b5

Test headers sync using minchainwork threshold 150a5486db

Expose HeadersSyncState::m_current_height in getpeerinfo() 03712dddfb

Track headers presync progress and log it 355547334f

Test large reorgs with headerssync logic 93eae27031

Make validation interface capable of signalling header presync

This makes a number of changes:
- Get rid of the verification_progress argument in the node interface
  NotifyHeaderTip (it was always 0.0).
- Instead of passing a CBlockIndex* in the UI interface's NotifyHeaderTip,
  send separate height, timestamp fields. This is becuase in headers presync,
  no actual CBlockIndex object is available.
- Add a bool presync argument to both of the above, to identify signals
  pertaining to the first headers sync phase.

376086fc5a

Emit NotifyHeaderTip signals for pre-synchronization progress 738421c50f

ui: show header pre-synchronization progress 3add234546

sdaftuar force-pushed on Aug 29, 2022

sdaftuar commented at 12:13 pm on August 29, 2022: member

Updated to drop an extra Misbehaving() call that was unnecessary; unsquashed version for comparing against prior branch is here.

Sjors commented at 4:53 pm on August 29, 2022: member

re-tACK 3add23454624c4c79c9eebc060b6fbed4e3131a7

Same caveats #25717#pullrequestreview-1085263210

Agree that merging the new log into NET makes sense. In fact it’s useful to look at both of them, e.g. to see that sendheaders is sent at the right time.

mzumsande commented at 8:06 pm on August 29, 2022: contributor

re-ACK 3add23454624c4c79c9eebc060b6fbed4e3131a7

sipa commented at 11:38 am on August 30, 2022: member

re-ACK 3add23454624c4c79c9eebc060b6fbed4e3131a7

glozow commented at 12:33 pm on August 30, 2022: member

ACK 3add234546

I can’t say I have a complete mental model of every possible interaction during IBD, end of IBD, reorgs, etc., but whiteboarded the syncing logic to the best of my ability. The comments are very helpful, and I reviewed the code to verify that it’s doing what the comments say it does.
Tested on {mainnet, signet, testnet} {with, without} gui to manually check getpeerinfo “presynced_headers,” logging, ui
Reviewed the code, a few notes if they help anyone. I did look at the bitdeque, though lightly. Fuzz looks pretty comprehensive to me.

in src/validation.cpp:3721 in 355547334f outdated

3716+    AssertLockNotHeld(cs_main);
3717+    const auto& chainstate = ActiveChainstate();
3718+    {
3719+        LOCK(cs_main);
3720+        // Don't report headers presync progress if we already have a post-minchainwork header chain.
3721+        // This means we lose reporting for potentially legimate, but unlikely, deep reorgs, but

instagibbs commented at 1:58 pm on August 30, 2022:

legitimate

instagibbs commented at 2:09 pm on August 30, 2022: member

Logging looks great!

edit: Gah, it just took my node forever to find a peer to give me all the headers after a disconnect I guess. peerinfo is showing a new peer is almost caught up and it should start printing again…

edit2: everything normal and logging continuing as expected by the code. All good here.

I guess the note is that if you have a flapping connection during presync, you’ll waste some bandwidth and time.

fanquake merged this on Aug 30, 2022

fanquake closed this on Aug 30, 2022

sidhujag referenced this in commit c19a7c43d0 on Aug 30, 2022

in src/pow.cpp:102 in 3add234546

 97+
 98+        // Round and then compare this new calculated value to what is
 99+        // observed.
100+        arith_uint256 maximum_new_target;
101+        maximum_new_target.SetCompact(largest_difficulty_target.GetCompact());
102+        if (maximum_new_target < observed_new_target) return false;

ariard commented at 11:49 pm on August 30, 2022:

I think comment could be added that last received header has been detected to be consensus-invalid as the new target is above what is allowed by the difficulty adjustment bounds in CalculateNextWorkRequired(). Same for the lower bound, if my understanding is correct.

Test coverage in pow_tests.cpp L61 and L78.

sipa commented at 12:42 pm on August 31, 2022:

headerssync.cpp logs this case with an “invalid difficulty transition” message.

in src/pow.h:30 in 3add234546

25+ * given height is not possible, given the proof-of-work on the prior block as
26+ * specified by old_nbits.
27+ *
28+ * This function only checks that the new value is within a factor of 4 of the
29+ * old value for blocks at the difficulty adjustment interval, and otherwise
30+ * requires the values to be the same.

ariard commented at 11:58 pm on August 30, 2022:

Minor comment: while this function is only used for p2p DoS protection and the difficulty adjustment bounds checked should reject invalid headers, this distinction could be precised.

Bitcoin Core is used as a reference software for many bitcoin libraries across the ecosystem and there is often confusion between what is policy, consensus or optimizations based on historical consensus state like BIP90.

in src/util/bitdeque.h:18 in 84852bb6bb outdated

13+#include <tuple>
14+
15+/** Class that mimics std::deque<bool>, but with std::vector<bool>'s bit packing.
16+ *
17+ * BlobSize selects the (minimum) number of bits that are allocated at once.
18+ * Larger values reduce the asymptotic memory usage overhead, at the cost of

ariard commented at 0:19 am on August 31, 2022:

I think it could be valuable to document against which type of workpayloads this is holding or if it’s independent of the container element insertion usage. Also it could be precised if the implementation is from scratch by following the standard library headers or inspired from an existent implementation to ease maintenance if there are relevant changes upstream. We do so in LDK.

sipa commented at 12:43 pm on August 31, 2022:

It’s independent of usage patterns, and written from scratch, with an API that’s compatible with std::deque.

in src/net_processing.cpp:2509 in ed470940cd outdated

2505@@ -2506,11 +2506,12 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
2506             return;
2507         }
2508     }
2509+    Assume(pindexLast);

ariard commented at 2:59 am on August 31, 2022:

Note, I think it would be valuable to document the interactions between ProcessNewBlockHeaders() and pindexLast. The method documentation says “ppindex If set, the pointer will be set to to point to the last new block index object for the given headers”. At that stage L2509 (net_processing.cpp), I understand that pindexLast should be always set as the unset case happens only if accepted=false L3696 (validation.cpp), afaict accepte=false induced that BlockValidationState is invalid from AcceptBlockHeader and therefore we should return L2506 (validation.cpp). If there is a refactor of the chain of nested calls and their return values, we could have a silent break.

Do we have cases where a) accepted=false, b) BlockValidationState is valid and c) pindexLast unset ? I don’t think so.

ariard commented at 3:05 am on August 31, 2022: member

Ongoing review commit by commit until 551a8d957. When done, will do another protocol-wise.

vasild commented at 6:45 am on August 31, 2022: contributor

Coverage report for modified or added lines by this PR and not covered by unit or functional tests.

in test/functional/p2p_headers_sync_with_minchainwork.py:107 in 3add234546

102+            block = create_block(hashprev = hashPrevBlock, tmpl=node.getblocktemplate(NORMAL_GBT_REQUEST_PARAMS))
103+            block.solve()
104+            new_blocks.append(block)
105+            hashPrevBlock = block.sha256
106+
107+        headers_message = msg_headers(headers=new_blocks)

jonatack commented at 10:02 am on August 31, 2022:

Suggested test assertion

 0--- a/test/functional/p2p_headers_sync_with_minchainwork.py
 1+++ b/test/functional/p2p_headers_sync_with_minchainwork.py
 2@@ -104,6 +104,9 @@ class RejectLowDifficultyHeadersTest(BitcoinTestFramework):
 3             new_blocks.append(block)
 4             hashPrevBlock = block.sha256
 5 
 6+        # no low-work sync is in progress
 7+        assert_equal(node.getpeerinfo()[0]['presynced_headers'], -1)
 8+
 9         headers_message = msg_headers(headers=new_blocks)
10         p2p.send_and_ping(headers_message)

in src/headerssync.h:67 in 551a8d957c outdated

62+ * something we know. If a peer's chain has more than 2000 blocks, then we need
63+ * a way to verify that the chain actually has enough work on it to be useful to
64+ * us -- by being above our anti-DoS minimum-chain-work threshold -- before we
65+ * commit to storing those headers in memory. Otherwise, it would be cheap for
66+ * an attacker to waste all our memory by serving us low-work headers
67+ * (particularly for a new node coming online for the first time).

ariard commented at 1:27 am on September 1, 2022:

So I was wondering how our overall headers-sync logic would perform against potential best-chain blinding attacks, where an adversary with strong hashrate capabilities feed us with a quasi-unlimited number of minimum-chain-work headers chain to prevent the node from ever discovering the most-work valid chain (or at least significantly delay it), as we would continually iterate the “fake” chain space.

During IBD, we sync from a single peer (L5301, in net_processing.cpp), and for each new inv block received we’ll do initial-sync headers with one more peer (L3621, in `net_processing.cpp). So assuming the first peer picked up is controlled by the adversary to feed us one element of the “fake” chain space, each new block give us a chance to escape the space by syncing with a honest peer. That said, the adversary might control a number of inbound/outbound slots granting one more ~10min of blinding (I believe the criteria L3621 on which we pick up the one-more peer is falsifiable as it’s just an inv(block)).

Above IBD, we’ll send a GETHEADERS to all our peers (L5312, in net_processing.cpp), as such this blinding attack is reducible

I think the nMinimumChainWork puts a high bar in term of hashrate capabilities required, moreover our peer selection strategy would also need to be bypassed, or a high number of puppets nodes be deployed by the adversary AND the first preferred peer we’re selecting for initial-sync headers to be controlled by the attacker.

If this issue is concerning, I think we could request that the “one-more” initial-headers-sync peers as added by 05f7f31598 to be fPreferredDownload (outbound or blessed peer) as a slight hardening.

in src/headerssync.h:216 in 551a8d957c outdated

211+    const SaltedTxidHasher m_hasher;
212+
213+    /** A queue of commitment bits, created during the 1st phase, and verified during the 2nd. */
214+    bitdeque<> m_header_commitments;
215+
216+    /** The (secret) offset on the heights for which to create commitments.

ariard commented at 1:56 am on September 1, 2022:

I wonder what is meant here by “(secret)”, if it should be interpreted as the offset not being known or observable by our peers. As it’s a software-defined value (HEADERS_COMMITMENT_PERIOD) in this module, I don’t know if there is a secrecy goal achieved.

sipa commented at 3:06 am on September 1, 2022:

The period is not secret, the offset is secret and non-observable (until it is too late for the attacker).

in src/headerssync.cpp:224 in 551a8d957c outdated

219+
220+    int64_t next_height = m_redownload_buffer_last_height + 1;
221+
222+    // Ensure that we're working on a header that connects to the chain we're
223+    // downloading.
224+    if (header.hashPrevBlock != m_redownload_buffer_last_hash) {

ariard commented at 2:51 am on September 1, 2022:

Note, one of the requirement of ValidateAndStoreRedownloadedHeader is to have to check the continuity of the headers chain “Assumes the caller has already verified the headers are continuous”. This requirement is done L2372 in net_processing.cpp thus I wonder if this check can be effectively hit, and there is not a redundancy.

ariard commented at 2:59 am on September 1, 2022: member

Still reviewing 551a8d95

hebasto referenced this in commit f79d612fba on Sep 1, 2022

in src/headerssync.cpp:142 in 3add234546

137+}
138+
139+bool HeadersSyncState::ValidateAndStoreHeadersCommitments(const std::vector<CBlockHeader>& headers)
140+{
141+    // The caller should not give us an empty set of headers.
142+    Assume(headers.size() > 0);

ariard commented at 3:52 pm on September 1, 2022:

This non-zero headers set requirement could be documented in the method definition. Sanitized L2735 in net_processing.cpp.

sipa commented at 6:14 pm on September 1, 2022:

#25968 gets rid of that special case

in src/headerssync.cpp:260 in 3add234546

255+    // target blockhash just because we ran out of commitments.
256+    if (!m_process_all_remaining_headers && next_height % HEADER_COMMITMENT_PERIOD == m_commit_offset) {
257+        if (m_header_commitments.size() == 0) {
258+            LogPrint(BCLog::HEADERSSYNC, "Initial headers sync aborted with peer=%d: commitment overrun at height=%i (redownload phase)\n", m_id, next_height);
259+            // Somehow our peer managed to feed us a different chain and
260+            // we've run out of commitments.

ariard commented at 4:12 pm on September 1, 2022:

Here we could invite the node operator to submit a bug report, as if a peer effectively managed to feed us a different low-work chain and exhaust our commitments it would be a data point than a) this anti-DoS headers sync mitigation can be falsified and b) we have adversaries in the wild with hashing capability actively targeting nodes.

in src/net_processing.cpp:388 in 3add234546

381@@ -381,6 +382,15 @@ struct Peer {
382     /** Time of the last getheaders message to this peer */
383     NodeClock::time_point m_last_getheaders_timestamp{};
384 
385+    /** Protects m_headers_sync **/
386+    Mutex m_headers_sync_mutex;
387+    /** Headers-sync state for this peer (eg for initial sync, or syncing large
388+     * reorgs) **/

ariard commented at 6:10 pm on September 1, 2022:

IIUC, syncing large reorgs is defined here by if we receive chain headers forking more than the 144 block buffer as documented in GetAntiDoSWorkThreshold(). Any fork under that threshold should have its headers accepted without the anti-DoS headers mechanism playing out.

ariard commented at 6:11 pm on September 1, 2022: member

Good with commit 551a8d957

in src/net_processing.cpp:878 in ed6cddd98e outdated

874@@ -875,7 +875,7 @@ class PeerManagerImpl final : public PeerManager
875         EXCLUSIVE_LOCKS_REQUIRED(!m_most_recent_block_mutex, peer.m_getdata_requests_mutex) LOCKS_EXCLUDED(::cs_main);
876 
877     /** Process a new block. Perform any post-processing housekeeping */
878-    void ProcessBlock(CNode& node, const std::shared_ptr<const CBlock>& block, bool force_processing);
879+    void ProcessBlock(CNode& node, const std::shared_ptr<const CBlock>& block, bool force_processing, bool min_pow_checked);

ariard commented at 4:52 pm on September 2, 2022:

Rational under which conditions min_pow_checked can be setup to true could be added. E.g, when anti-dos checks have been carried out like in compact block processing or when the block have been submitted by a RPC caller (like in generateblock() or submitheader(). Or replicate the comments added for AcceptBlockHeader(), ProcessNewBlock(), ProcessNewBlockHeaders().

in src/net_processing.cpp:5041 in 83c6a0c524 outdated

5032@@ -5034,6 +5033,27 @@ void PeerManagerImpl::MaybeSendAddr(CNode& node, Peer& peer, std::chrono::micros
5033     }
5034 }
5035 
5036+void PeerManagerImpl::MaybeSendSendHeaders(CNode& node, Peer& peer)
5037+{
5038+    // Delay sending SENDHEADERS (BIP 130) until we're done with an
5039+    // initial-headers-sync with this peer. Receiving headers announcements for
5040+    // new blocks while trying to sync their headers chain is problematic,
5041+    // because of the state tracking done.

ariard commented at 6:08 pm on September 2, 2022:

IIUC the rational, in the lack of this delay of sending SENDHEADERS, we could have duplicated headers-sync with a peer. One, with HEADERS reception triggering continuous GETHEADERS in ProcessHeadersMessage() (L2854). A second, when we receive an INV(block_hash) at INV reception, as the “1 new headers-sync peer every new block” logic (L3622).

Note, delaying sending SENDHEADERS might have interference with old versions, or least BIP130 mentions “As support for sendheaders is optional, software that implements this may also optionally impose additional constraints, such as only honoring sendheaders messages shortly after a connection is established”.

in src/validation.cpp:3733 in 355547334f outdated

3728+        m_last_presync_update = now;
3729+    }
3730+    if (chainstate.IsInitialBlockDownload()) {
3731+        const int64_t blocks_left{(GetTime() - timestamp) / GetConsensus().nPowTargetSpacing};
3732+        const double progress{100.0 * height / (height + blocks_left)};
3733+        LogPrintf("Pre-synchronizing blockheaders, height: %d (~%.2f%%)\n", height, progress);

ariard commented at 6:19 pm on September 2, 2022:

Even if it’s a validation-related log, the peer id could be provided to indicate the headers source in case of anomalies detection.

ariard commented at 6:21 pm on September 2, 2022: member

Reviewed until tip commit 3add2345

ariard commented at 2:33 am on September 3, 2022: member

Post-Merge ACK 3add2345

Reviewed the code, the HeadersSyncState state machine correctness, the soundness of the anti-DoS mechanism design and partially the test coverage.

From my understanding, the new bitdeque<> m_headers_commitments have been deliberately designed to avoid memory DoS and it should scale well if we had more outbound full-relay/block-relay-only peers in the future.

The commitment scheme is relying on SipHash, which has been deliberately designed for performance and combined with m_max_commitments should avoid new CPU DoS.

I don’t think the commitment scheme could be abused for node fingerprint, as even if the random offset is unique per-node, it should be cleanup once the peer is disconnected, and as such non-observable.

This anti-DoS mechanism should not introduce any message amplification attacks, as we don’t increase the volume neither the frequency of our p2p messages based on the result of those anti-DoS checks, neither network-partitions vector as we stop processing on invalid/non-continuous headers.

domob1812 referenced this in commit 047aebfeda on Sep 5, 2022

fresheneesz commented at 3:04 pm on September 7, 2022: none

If the primary attack being prevented is on memory usage, why require downloading the headers twice? Why not simply ensure the headers are removed from memory and store them on disk for later access? Easy enough to delete them if it turns out not enough PoW can be accumulated on that chain.

The larger question I have is: why are we putting in effort to remove the IBD checkpoint? The checkpoint is effective, efficient, and secure. “Confusion about checkpoints” is an absolutely terrible reason to remove them. Technical design decisions should not be made because some people are confused about “the role” effective solutions play. Please do not remove the checkpoint, its a critical piece of efficient IBD.

instagibbs commented at 3:17 pm on September 7, 2022: member

Why not simply ensure the headers are removed from memory and store them on disk for later access?

“simply” generally is not as simple as we’d like, and doesn’t solve disk-filling attacks

Please do not remove the checkpoint, its a critical piece of efficient IBD.

AFAIK it doesn’t speed up sync in non-adversarial situations, that would be “assumevalid” type parameters which aren’t being discussed for removal.

sdaftuar commented at 3:26 pm on September 7, 2022: member

If the primary attack being prevented is on memory usage, why require downloading the headers twice? Why not simply ensure the headers are removed from memory and store them on disk for later access? Easy enough to delete them if it turns out not enough PoW can be accumulated on that chain.

To expand on what @instagibbs wrote, this would turn a memory attack into a disk-filling attack. Moreover, deleting headers is not simple; in order to ensure we always can arrive at consensus, we need an algorithm for determining when it is safe to do so. If a peer is serving us a chain that seems to have low-work, how do we know that it’s safe to delete, and that if we wait long enough we won’t eventually learn a high-work chain from this peer? This PR implements an algorithm for making that determination in a robust way that should guarantee that we can always reach consensus with the network.

(I do think that we could prune headers on startup/shutdown, though I imagine that after this PR there will be much less need to consider doing so.)

The larger question I have is: why are we putting in effort to remove the IBD checkpoint?

This PR does not touch the checkpoints; it solves a problem that the checkpoints do not solve (preventing DoS attacks against new nodes), and happens to do so in a way that would make checkpoints no longer necessary. However it’s more appropriate to discuss this in #25725, the PR where that is proposed.

As an aside for reviewers who may not be familiar with how checkpoints work: there is not a single “IBD checkpoint”, there are many heights for which we have cached blockhashes which we expect at those heights, and once we download those block headers matching those checkpointed hashes, we no longer accept new headers that would fork below those heights. Note that this is not a performance optimization during IBD, it is merely a protection against spam from low-difficulty headers.

fresheneesz commented at 4:49 am on September 8, 2022: none

this would turn a memory attack into a disk-filling attack

The block headers are currently 60 megabytes. In another 15 years they’ll be 120 megabytes. That doesn’t seem like enough to credibly attack any full node. Plus, this PR is about only downloading “enough PoW” which means you don’t need anywhere near that much disk space. Seems like the ideal way to do this is to download the latest X block headers, since those are the ones with the most PoW per bytes of block headers.

deleting headers is not simple; in order to ensure we always can arrive at consensus, we need an algorithm for determining when it is safe to do so.

This PR already deletes headers as soon as they’re received. If it simply stores them somewhere (and I do mean simply, ie not connecting it up with anything that would make it hard to safely delete) then they can be retrieved from that location instead of redownloading them.

But maybe I just don’t understand the attack that deleting the headers upon receipt is intended to prevent. Is that attack detailed anywhere? Why would the node need to download more than one set of headers at a time? A node should query its peers for the longest chain, and if there are competing longest chains, it should choose the longest/heaviest chain and verify via the headers that the chain is as heavy as it says, then start verifying the actual blocks.

Under what circumstance is there a possibility of a disk-filling attack here?

sdaftuar commented at 2:00 pm on September 8, 2022: member

The block headers are currently 60 megabytes. In another 15 years they’ll be 120 megabytes. That doesn’t seem like enough to credibly attack any full node.

Due to the timewarp bug[1], it is inexpensive to construct a consensus-valid blockchain where block timestamps advance by one second every 6 blocks. Starting at the genesis block timestamp (January 3, 2009), it’s possible to construct a chain that has over 2.5 billion blocks, where the last block’s timestamp is less than 2 hours in the future from the current time (September 2022). If an adversary produced such a chain and fed it to a new node, then storing it on disk while we calculate the accumulated work would mean we store almost 100 GB of data for a single peer (assuming we could, say, compress the headers from 80 bytes to 40 bytes). Of course, nodes can have many peers, up to 100 or more – so without complex logic to limit the number of concurrent headers syncs (which would also need to be carefully designed to thwart adversaries that seek to block with headers sync from an honest peer) – that could be 10TB of data being stored on disk.

Plus, this PR is about only downloading “enough PoW” which means you don’t need anywhere near that much disk space. Seems like the ideal way to do this is to download the latest X block headers, since those are the ones with the most PoW per bytes of block headers.

There is no functionality in the p2p protocol to “download the latest X block headers”.[2]

It would be great to design a protocol that would allow for proving the work on a chain in a compact/efficient way[3], but even if we had such a proposal ready to go, that wouldn’t solve our problem – in order to ensure we don’t break consensus with old nodes, we cannot deploy a new p2p protocol and then only sync headers with new software that implements the protocol (at least, not without a sufficiently long period of time to ensure adequate deployment of the new software). Thus we’d remain open to the DoS vector fixed in this PR until such time that we turned off headers sync from older software – and I think that is inferior to just fixing this issue for new software in a backwards compatible way, as this PR does.

But maybe I just don’t understand the attack that deleting the headers upon receipt is intended to prevent. Is that attack detailed anywhere?

I don’t know if my description above of the time warp bug is enough for you to understand the issue, but if you want some more background perhaps you can try the bitcoin stack exchange? There is also a code comment in #25970 that you might find helpful, starting here.

Why would the node need to download more than one set of headers at a time?

Different peers might be on different chain tips, and the simplest behavior we can have is to sync headers with all of our peers, without regard to one another. This can be bandwidth-wasteful when peers are on the same tip, so we have optimizations on startup to try to only sync with a single peer at first. But if that peer is adversarial or broken then they might not give us the honest chain, so at some point we have to try other peers as well, to ensure that as long as we have at least one honest peer, we’ll eventually sync the consensus chain (see #25720 for more discussion of this bandwidth tradeoff). While you could try to design a sync behavior that would throttle or rate-limit headers sync in some way, so that we don’t have multiple peers with which we’re syncing headers at the same time, that is a more complex logic than always being willing to sync headers with all peers that might have something to offer us (which only requires per-peer logic that is independent of what other peers might be giving us).

A node should query its peers for the longest chain, and if there are competing longest chains, it should choose the longest/heaviest chain and verify via the headers that the chain is as heavy as it says, then start verifying the actual blocks.

Yes, obviously. The details around how that is accomplished are what is being discussed, and the way we had been doing it prior to this PR was vulnerable to DoS, and hopefully now that this PR has been merged that should no longer be the case. If you have a concrete proposal for another way we could implement DoS protection while syncing headers in a way that will always keep us in consensus with the network, please suggest some code, but vague comments like this are not specific enough to discuss.

[1] See https://bitcointalk.org/index.php?topic=43692.msg521772#msg521772 for one of the earliest-known (to me) descriptions of this problem.

[2] The way headers are requested is via a “block locator”, which is a set of block hashes that describe the chain that a node is on. When our peer receives the locator, they are expected to find the first block hash in the locator that is on their chain, and then start sending us headers from that point on, towards their tip. Headers messages have a maximum of 2000 headers in them, so the node receiving a full headers message in response is expected to continue requesting headers from the last header in the message.

[3] Many years ago, there was a proposal to revamp the way headers are requested to try to allow for something like this, but this effort stalled and was not implemented or deployed. This might be an interesting avenue for future work.

instagibbs commented at 2:09 pm on September 8, 2022: member

Plus, this PR is about only downloading “enough PoW” which means you don’t need anywhere near that much disk space. Seems like the ideal way to do this is to download the latest X block headers, since those are the ones with the most PoW per bytes of block headers. @sdaftuar comments aside, this is simply not true. Mining difficulty can drop, and we don’t want to lose consensus due to this, with nodes online following lower PoW headers, and newly syncing nodes deciding it’s not high enough.

jonatack commented at 7:20 am on September 16, 2022: contributor

2022-09-15 <sipa> For 25717 observing the pre-syncing phase is one thing (it should be there), but arguably the more interesting property is that syncing still works at all. It’s only triggered when syncing a new node from scratch, or one that is ~months or more behind.

Catching up with yesterday’s IRC meeting, regarding https://www.erisian.com.au/bitcoin-core-dev/log-2022-09-15.html#l-258, I synced a new node from scratch on master a week ago without issue.

LarryRuane referenced this in commit bdcafb9133 on Sep 24, 2022

glozow referenced this in commit 9fcdb9f3a0 on Sep 27, 2022

sidhujag referenced this in commit 46070827fc on Sep 27, 2022

fanquake referenced this in commit e4f7fd3461 on Sep 29, 2022

fanquake referenced this in commit 7e0bcfbfef on Oct 11, 2022

theStack referenced this in commit ad5a5af916 on Aug 2, 2023

bitcoin locked this on Oct 13, 2023

p2p: Implement anti-DoS headers sync #25717

Conflicts