p2p: Avoid multiple getheaders messages in flight to the same peer

sdaftuar commented at 8:25 pm on June 22, 2022: member

Change getheaders messages so that we wait up to 2 minutes for a response to a prior getheaders message before issuing a new one.

Also change the handling of the getheaders message sent in response to a block INV, so that we no longer use the hashstop variable (including the hash stop will just mean that if our peer’s headers chain is longer, then we won’t learn it, so there’s no benefit to using hashstop).

Also, now respond to a getheaders during IBD with an empty headers message (rather than nothing) – this better conforms to the intent of the new logic that it’s better to not ignore a peer’s getheaders message, even if you have nothing to give. This also avoids a lot of functional tests breaking.

This PR also reworks the headers processing logic to make it more readable.

fanquake added the label P2P on Jun 22, 2022

sdaftuar commented at 8:26 pm on June 22, 2022: member

@ajtowns This should fix the issue you’ve seen in functional tests, where generating a bunch of blocks at once can result in O(n^2) headers downloads.

joani24 approved

sdaftuar force-pushed on Jun 23, 2022

DrahtBot commented at 3:37 am on June 23, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25515 ([draft] PeerManager unit tests by dergoegge)
#25514 (net processing: Move CNode::nServices and CNode::nLocalServices to Peer by dergoegge)
#25268 (refactor: Introduce EvictionManager by dergoegge)
#25203 (logging: update to severity-based logging by jonatack)
#24571 (p2p: Prevent block index fingerprinting by sending additional getheaders messages by dergoegge)
#24008 (assumeutxo: net_processing changes by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

laanwj commented at 7:56 am on June 23, 2022: member

Concept ACK, great catch

in src/net_processing.cpp:3414 in f757d2a471 outdated

3411@@ -3403,6 +3412,9 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3412         if (m_chainman.ActiveTip() == nullptr ||
3413                 (m_chainman.ActiveTip()->nChainWork < nMinimumChainWork && !pfrom.HasPermission(NetPermissionFlags::Download))) {
3414             LogPrint(BCLog::NET, "Ignoring getheaders from peer=%d because active chain has too little work\n", pfrom.GetId());

dergoegge commented at 7:13 pm on June 23, 2022:

Maybe change this log message? because now we send something in response instead of ignoring

sdaftuar commented at 12:23 pm on June 24, 2022:

Fixed.

in src/net_processing.cpp:364 in f757d2a471 outdated

356@@ -355,8 +357,11 @@ struct Peer {
357     /** Work queue of items requested by this peer **/
358     std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex);
359 
360+    /** Time of the last getheaders message to this peer */
361+    std::atomic<std::chrono::seconds> m_last_getheaders_timestamp{0s};
362+
363     Peer(NodeId id)
364-        : m_id{id}
365+        : m_id(id)

dergoegge commented at 7:17 pm on June 23, 2022:

nit: this looks like an accidental change, or do you have a reasoning for this?

sdaftuar commented at 12:22 pm on June 24, 2022:

Yep this was a rebase error, fixed!

in src/net_processing.cpp:2271 in f757d2a471 outdated

2274- * of a known chain (such as our tip) we don't yet know where the peer's chain
2275- * might fork from what we know, so we continue exactly from where the peer
2276- * left off.
2277- */
2278-void PeerManagerImpl::FetchMoreHeaders(CNode& pfrom, const CBlockIndex *pindexLast, const Peer& peer)
2279+bool PeerManagerImpl::MaybeSendGetHeaders(CNode& pfrom, const CBlockLocator& locator, Peer& peer)

dergoegge commented at 7:31 pm on June 23, 2022:

Why did you change it to receive a CBlockLocator? I don’t think we ever create locators that are not part of the main chain. Also (maybe a bit nitpicky) we dont need to compute the locator if we don’t send the getheaders message, so doing that inside of MaybeSendGetHeaders makes more sense to me.

sdaftuar commented at 12:10 pm on June 24, 2022:

The reason I introduced this interface change here is that this branch is a precursor to another change I will be proposing to the headers sync logic, in which we might send getheaders messages based on locators that are not calculated from a single CBlockIndex entry.

dergoegge commented at 7:37 pm on June 23, 2022: member

Concept ACK

in src/net_processing.cpp:2448 in 64deef68be outdated

2452-            received_new_header = true;
2453-        }
2454-    }
2455+    // If we don't have the last header, then this peer will have given us
2456+    // something new (if these headers are valid).
2457+    received_new_header = WITH_LOCK(::cs_main, return m_chainman.m_blockman.LookupBlockIndex(headers.back().GetHash()) == nullptr);

mzumsande commented at 10:13 pm on June 23, 2022:

nit: could just use bool received_new_header = ... , remove the initialization a few lines above and move this down a few lines to where it is used.

sdaftuar commented at 4:29 pm on June 24, 2022:

We need to do the calculation here before invoking ProcessNewBlockHeaders() because that will cause headers to be added to the block index, but I’ll get rid of the declaration at line 2423, as we can just declare and initialize here instead.

sdaftuar force-pushed on Jun 24, 2022

in src/net_processing.cpp:4455 in f2e3f6d732 outdated

4460+                    // Bump the timeout to allow a response, which could clear the timeout
4461+                    // (if the response shows the peer has synced), reset the timeout (if
4462+                    // the peer syncs to the required work but not to our tip), or result
4463+                    // in disconnect (if we advance to the timeout and pindexBestKnownBlock
4464+                    // has not sufficiently progressed)
4465+                    state.m_chain_sync.m_timeout = time_in_seconds + HEADERS_RESPONSE_TIME;

mzumsande commented at 3:51 pm on June 24, 2022:

Could this be abused by a peer to evade eviction by making us send GetHeaders() from other places (e.g. sending unconnecting CMPCTBLOCK messages) and sometimes sending stale headers? That might restart the m_last_getheaders_timestamp timer so that we never get to set state.m_chain_sync.m_sent_getheaders and state.m_chain_sync.m_timeout here and therefore won’t disconnect.

sdaftuar commented at 4:38 pm on June 24, 2022:

Thanks for thinking about this. I think you’re right to raise this concern, and after re-reviewing this logic, I think the best course of action is to just have this call-site assume the getheaders message goes out (rather than branch on the return value) – because if it doesn’t, it means we’ve already sent a getheaders, and so the peer should have provided us with a sufficiently high work chain anyway.

Does that reasoning sound right to you as well?

(Edit: I guess a downside to this is that if a peer is in the middle of big reorg, that this logic might cause them to be evicted, because our timeouts aren’t long enough to necessarily deliver a huge reorg – however that is a pre-existing issue with this strategy, and we have other mitigations in place like protecting some of our outbound peers from eviction under this logic.)

mzumsande commented at 5:29 pm on June 24, 2022:

Does that reasoning sound right to you as well?

Yes, that’s also the solution I had thought of.

mzumsande commented at 4:20 pm on June 24, 2022: contributor

Concept ACK

sdaftuar force-pushed on Jun 24, 2022

DrahtBot added the label Needs rebase on Jun 27, 2022

sdaftuar force-pushed on Jun 27, 2022

DrahtBot removed the label Needs rebase on Jun 27, 2022

DrahtBot added the label Needs rebase on Jun 27, 2022

sdaftuar force-pushed on Jun 27, 2022

DrahtBot removed the label Needs rebase on Jun 27, 2022

in src/net_processing.cpp:2276 in e3b2185e70 outdated

2271+
2272+bool PeerManagerImpl::MaybeSendGetHeaders(CNode& pfrom, const CBlockLocator& locator, Peer& peer)
2273+{
2274+    const CNetMsgMaker msgMaker(pfrom.GetCommonVersion());
2275+
2276+    const auto current_time = GetTime<std::chrono::seconds>();

dergoegge commented at 11:45 am on June 28, 2022:

Looking at the documentation for GetTime it appears that NodeClock::now() would be preferred here.

sdaftuar commented at 2:55 pm on June 28, 2022:

Thanks, I tried to make this switch in the last commit but it took me several tries to get it to compile, so please review and let me know if I’m using NodeClock correctly!

sdaftuar commented at 7:15 pm on June 28, 2022:

Dropping the std::atomic seems to have made things work with both clang and g++, so I think this is probably right now (thanks to @dongcarl and @sipa for helping me sort through this!).

dergoegge commented at 7:28 pm on June 28, 2022:

Interesting, the std::atomic version did work for me locally (compiled with clang). What you have now looks good to me!

maflcko commented at 7:29 pm on June 28, 2022:

It should be possible to use std::atomic on this (if you want to). Though, you may have to invoke the time_point constructor before assigning a value to the atomic. This can be done, for example, by calling NodeSeconds{} from https://github.com/bitcoin/bitcoin/blob/480d8069d7da954301fbb5c96b766e91b139fe5b/src/util/time.h#L25

sdaftuar commented at 7:49 pm on June 28, 2022:

@MarcoFalke I think I’ll probably give up on trying to use std::atomic because all my efforts seem to fail, but if you feel like writing the code and can share a diff that compiles I’d be happy to include it.

maflcko commented at 3:40 pm on June 30, 2022:

I presume the issue was g++-8 in https://cirrus-ci.com/task/5966736871653376?logs=ci#L2017

I tried my proposal (calling NodeSeconds{}) and it worked on top of your commit.

 0# git log -1 && git diff 
 1commit fa24d37213cede7136ed34d481091b84642c62a6 (HEAD)
 2Author: Suhas Daftuar <sdaftuar@gmail.com>
 3Date:   Tue Jun 28 10:53:02 2022 -0400
 4
 5    Replace GetTime() with NodeClock in MaybeSendGetHeaders()
 6diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 7index a75e3bd7f..6fc3009cb 100644
 8--- a/src/net_processing.cpp
 9+++ b/src/net_processing.cpp
10@@ -358,7 +358,7 @@ struct Peer {
11     std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex);
12 
13     /** Time of the last getheaders message to this peer */
14-    std::atomic<NodeClock::time_point> m_last_getheaders_timestamp{};
15+    std::atomic<NodeClock::time_point> m_last_getheaders_timestamp{NodeSeconds{}};
16 
17     Peer(NodeId id)
18         : m_id{id}

An equivalent would be to call NodeClock::time_point{} instead.

ajtowns commented at 10:23 am on July 1, 2022:

I think the problem here is std::atomic<T> requires T to be trivially copyable, but unlike duration, time_point isn’t implemented as trivially copyable (and none of time_point, duration, or the underlying rep are specified as being trivially copyable either). Since we already assume duration is trivially copyable, last time I messed with this I wrote a wrapper, something like:

 0template<typename TP>
 1class atomic_time_point {
 2private:
 3    using dur = typename TP::duration;
 4    std::atomic<dur> d;
 5public:
 6    atomic_time_point() noexcept : d{dur{0}} { }
 7    atomic_time_point(const atomic_time_point&) = delete;
 8    atomic_time_point(const TP& t) noexcept : d{t.time_since_epoch()} { }
 9    TP load() { return TP{d.load()}; }
10    TP operator=(const TP& t) { d = t.time_since_epoch(); return t; }
11};
12using AtomicNodeTime = atomic_time_point<NodeClock::time_point>;
13using AtomicNodeSeconds = atomic_time_point<NodeSeconds>;

which would let you write AtomicNodeTime m_last_getheaders_timestamp{}; without relying on time_point implementation details.

maflcko commented at 10:55 am on July 1, 2022:

time_point isn’t implemented as trivially copyable

Can you add references to your claim that they are not trivially copyable?

See also the cpp program which passes on gcc7/clang5:

 0#include <chrono>
 1
 2template <class T>
 3void CanUseInAtomic() {
 4    static_assert(std::is_trivially_copyable_v<T>);
 5    static_assert(std::is_copy_constructible_v<T>);
 6    static_assert(std::is_move_constructible_v<T>);
 7    static_assert(std::is_copy_assignable_v<T>);
 8    static_assert(std::is_move_assignable_v<T>);
 9}
10
11int main() {
12    CanUseInAtomic<bool>();
13    CanUseInAtomic<std::chrono::seconds>();      // duration
14    CanUseInAtomic<std::chrono::system_clock::time_point>();  // time_point
15}

https://godbolt.org/z/3nKc43cPv

ajtowns commented at 12:23 pm on July 1, 2022:

Hmm, fair point. static_assert(std::is_nothrow_default_constructible_v<T>); catches it for clang 5, but not for gcc 7.1 for what that’s worth.

But I don’t think the pre-C++20 behaviour of default std::atomic initialization is ever what we want anyway? Per https://en.cppreference.com/w/cpp/atomic/atomic/atomic

The default constructor is trivial: no initialization takes place other than zero initialization of static and thread-local objects. std::atomic_init may be used to complete initialization.

that will just leave things undefined, like int x; ?

(Anyway, no need for further changes in this PR, as far as I can see)

sdaftuar commented at 12:28 pm on July 1, 2022:

I can’t say I’m exactly following everything you guys are talking about, but @MarcoFalke thank you – I updated the PR and the compilers all seem happy now. :). Will squash that last fixup commit now…

in src/net_processing.cpp:574 in e3b2185e70 outdated

564@@ -560,6 +565,22 @@ class PeerManagerImpl final : public PeerManager
565                                const std::vector<CBlockHeader>& headers,
566                                bool via_compact_block)
567         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
568+    /** Various helpers for headers processing, invoked by ProcessHeadersMessage() */
569+    /** Deal with state tracking and headers sync for peers that send the
570+     * occasional non-connecting header (this can happen due to BIP 130 headers
571+     * announcements for blocks interacting with the 2hr rule). */

dergoegge commented at 11:53 am on June 28, 2022:

It’s not clear to me what 2hr rule you are referring to

sdaftuar commented at 6:30 pm on June 28, 2022:

One of the most unusual validation rules is that we don’t accept a block that is more than 2 hours in the future, see: https://github.com/bitcoin/bitcoin/blob/5bf65ec66e5986c9188e3f6234f1c5c0f8dc7f90/src/validation.cpp#L3469

dergoegge commented at 7:23 pm on June 28, 2022:

Ah ok, thanks for clarifying! Maybe mention the MAX_FUTURE_BLOCK_TIME constant here to make it obvious?

sdaftuar commented at 7:54 pm on June 28, 2022:

Done!

in src/net_processing.cpp:2441 in e3b2185e70 outdated

2473+            // could be benign.
2474+            HandleFewUnconnectingHeaders(pfrom, peer, headers);
2475+        } else {
2476+            BlockValidationState state;
2477+            state.Invalid(BlockValidationResult::BLOCK_MISSING_PREV, "prev-blk-not-found");
2478+            MaybePunishNodeForBlock(pfrom.GetId(), state, via_compact_block, "invalid header received");

dergoegge commented at 11:59 am on June 28, 2022:

I think a direct call to Misbehaving() is better here, see: #24571 (review)

sdaftuar commented at 6:32 pm on June 28, 2022:

Done.

in src/net_processing.cpp:2257 in e3b2185e70 outdated

2252+    // The peer may just be broken, so periodically assign DoS points if this
2253+    // condition persists.
2254+    if (nodestate->nUnconnectingHeaders % MAX_UNCONNECTING_HEADERS == 0) {
2255+        Misbehaving(peer, 20, strprintf("%d non-connecting headers", nodestate->nUnconnectingHeaders));
2256+    }
2257+    return;

dergoegge commented at 12:04 pm on June 28, 2022:

sdaftuar commented at 6:32 pm on June 28, 2022:

Done.

in src/net_processing.cpp:2454 in e3b2185e70 outdated

2491+        return;
2492     }
2493 
2494+    // If we don't have the last header, then this peer will have given us
2495+    // something new (if these headers are valid).
2496+    bool received_new_header = WITH_LOCK(::cs_main, return m_chainman.m_blockman.LookupBlockIndex(headers.back().GetHash()) == nullptr);

dergoegge commented at 12:14 pm on June 28, 2022:

0    bool received_new_header{WITH_LOCK(::cs_main, return m_chainman.m_blockman.LookupBlockIndex(headers.back().GetHash()) == nullptr)};

sdaftuar commented at 6:33 pm on June 28, 2022:

Took this change and also switched to this type of initialization for headers_connect_blockindex as well.

sdaftuar force-pushed on Jun 28, 2022

Move handling of unconnecting headers into own function 7f2450871b

Add helper function for checking header continuity 9492e93bf9

Move additional headers fetching to own function

Also moves the call to happen directly after validation of a headers message
(rather than mixed in with other state updates for the peer), and removes an
incorrect comment in favor of one that explains why headers sync must continue
from the last header a peer has sent.

bf8ea6df75

Move headers-direct-fetch logic into own function 29c4518522

Move headers direct fetch to end of ProcessHeadersMessage 2b341db731

Move peer state updates from headers message into separate function 6d95cd3e74

Cleanup received_new_header calculation to use WITH_LOCK ffe87db247

Don't send getheaders message when another request is outstanding

Change getheaders messages so that we wait up to 2 minutes for a response to a
prior getheaders message before issuing a new one.

Also change the handling of the getheaders message sent in response to a block
INV, so that we no longer use the hashstop variable (including the hash stop
will just mean that if our peer's headers chain is longer, then we won't learn
it, so there's no benefit to using hashstop).

Also, now respond to a getheaders during IBD with an empty headers message
(rather than nothing) -- this better conforms to the intent of the new logic
that it's better to not ignore a peer's getheaders message, even if you have
nothing to give. This also avoids a lot of functional tests breaking.

p2p_segwit.py is modified to use this same strategy, as the test logic (of
expecting a getheaders after a block inv) would otherwise be broken.

abf5d16c24

sdaftuar force-pushed on Jun 28, 2022

dergoegge commented at 6:43 pm on June 30, 2022: member

Code review ACK b6f63bd991569d45b19c4f2d83b223e6e7cdcff3

Replace GetTime() with NodeClock in MaybeSendGetHeaders() 99f4785cad

sdaftuar force-pushed on Jul 1, 2022

ajtowns commented at 3:28 am on July 4, 2022: contributor

@ajtowns This should fix the issue you’ve seen in functional tests, where generating a bunch of blocks at once can result in O(n^2) headers downloads.

Can confirm that it seems to fix the problem, and that the debug logs for p2p_blockheaders show much fewer “getheaders” requests.

ajtowns commented at 11:46 am on July 4, 2022: contributor

ACK 99f4785cad94657dcf349d00fdd6f1d44cac9bb0 ; code review, check over new logic of when to send getheaders messages

in src/net_processing.cpp:2209 in 7f2450871b outdated

2204+ * announcement.
2205+ *
2206+ * We'll send a getheaders message in response to try to connect the chain.
2207+ *
2208+ * The peer can send up to MAX_UNCONNECTING_HEADERS in a row that
2209+ * don't connect before given DoS points.

sipa commented at 11:55 am on July 4, 2022:

Nit, typo: “before being given DoS points” ?

maflcko commented at 7:57 am on July 14, 2022:

Done as part of https://github.com/bitcoin/bitcoin/pull/25555

in test/functional/feature_minchainwork.py:85 in abf5d16c24 outdated

81@@ -82,7 +82,7 @@ def run_test(self):
82         msg.hashstop = 0
83         peer.send_and_ping(msg)
84         time.sleep(5)
85-        assert "headers" not in peer.last_message
86+        assert ("headers" not in peer.last_message or len(peer.last_message["headers"].headers) == 0)

sipa commented at 1:21 pm on July 4, 2022:

Nit: braces are unnecessary, I think.

maflcko commented at 7:57 am on July 14, 2022:

Done as part of 8efa73e7ce4ea0c1b7ad5c3947a5ecf9fb6361d3

sipa commented at 1:25 pm on July 4, 2022: member

utACK 99f4785cad94657dcf349d00fdd6f1d44cac9bb0

Nits are very non-blocking

dergoegge commented at 1:40 pm on July 4, 2022: member

Code review ACK 99f4785cad94657dcf349d00fdd6f1d44cac9bb0

maflcko renamed this:
~~p2p, refactor: Avoid multiple getheaders messages in flight to the same peer~~
p2p: Avoid multiple getheaders messages in flight to the same peer
on Jul 4, 2022

w0xlt approved

w0xlt commented at 5:20 pm on July 4, 2022: contributor

tACK https://github.com/bitcoin/bitcoin/pull/25454/commits/99f4785cad94657dcf349d00fdd6f1d44cac9bb0 Good improvement in the code.

Tested on signet.

mzumsande commented at 7:07 pm on July 4, 2022: contributor

Code Review ACK 99f4785cad94657dcf349d00fdd6f1d44cac9bb0

fanquake merged this on Jul 4, 2022

fanquake closed this on Jul 4, 2022

sidhujag referenced this in commit 35150b06a7 on Jul 4, 2022

in src/net_processing.cpp:2208 in 7f2450871b outdated

2203+ * Special handling for unconnecting headers that might be part of a block
2204+ * announcement.
2205+ *
2206+ * We'll send a getheaders message in response to try to connect the chain.
2207+ *
2208+ * The peer can send up to MAX_UNCONNECTING_HEADERS in a row that

maflcko commented at 7:51 am on July 5, 2022:

7f2450871b3ea0b4d02d56bd2ca365fcc25cf90e: Pretty sure this counts the headers mgs, not the headers?

sdaftuar commented at 2:09 pm on July 6, 2022:

Sure, I can update the comment in a followup (there’s a sense in which what is written is true, in that we only are checking the first header in a message to see if it connects, but that’s probably an overly convoluted way to think about things).

maflcko commented at 3:08 pm on July 6, 2022:

Did that in #25555. Feel free to NACK or ACK

in src/net_processing.cpp:2269 in 7f2450871b outdated

2292+            // If this looks like it could be a BIP 130 block announcement, use
2293+            // special logic for handling headers that don't connect, as this
2294+            // could be benign.
2295+            HandleFewUnconnectingHeaders(pfrom, peer, headers);
2296+        } else {
2297+            Misbehaving(peer, 10, "invalid header received");

maflcko commented at 7:53 am on July 5, 2022:

7f2450871b3ea0b4d02d56bd2ca365fcc25cf90e It might be good to mention in the commit message that you are changing the points in this “move” commit.

Previously a non-connecting non-continuous header sequence of size MAX_BLOCKS_TO_ANNOUNCE+1 was assigned 20. Now it is assigned 10, no?

ajtowns commented at 11:19 am on July 6, 2022:

It still gets a score of 20 via HandleFewUnconnectingHeaders() when nCount <= MAX_BLOCKS_TO_ANNOUNCE; this adds an unavoidable score of 10 for unconnecting headers when nCount > MAX_BLOCKS_TO_ANNOUNCE which previously was more or less ignored…

jnewbery commented at 12:19 pm on July 6, 2022:

Previously a non-connecting non-continuous header sequence of size MAX_UNCONNECTING_HEADERS would be submitted to validation through ProcessNewBlockHeaders(), since the if (!m_chainman.m_blockman.LookupBlockIndex(headers[0].hashPrevBlock) && nCount <= MAX_BLOCKS_TO_ANNOUNCE) block would be skipped.

That would return BlockValidationResult::BLOCK_MISSING_PREV:

https://github.com/bitcoin/bitcoin/blob/691a08718beff31d1b821b192609ea3bfdb24d41/src/validation.cpp#L3590-L3593

which would result in 10 misbehaviour points in MaybePunishNodeForBlock

maflcko commented at 1:09 pm on July 6, 2022:

this adds an unavoidable score of 10 for unconnecting headers when nCount > MAX_BLOCKS_TO_ANNOUNCE which previously was more or less ignored…

Can you explain a bit more why this was “more or less ignored”?

Previously it was deterministically and consistently assigned a Misbehaving(peer, 20, "non-continuous headers sequence");.

Now it is deterministically and consistently assigned a Misbehaving(peer, 10, "invalid header received");.

Previously a non-connecting non-continuous header sequence of size MAX_UNCONNECTING_HEADERS would be submitted to validation through ProcessNewBlockHeaders()

No, it wouldn’t, as it would be rejected by the Misbehaving(peer, 20, "non-continuous headers sequence"); check?

What am I missing?

jnewbery commented at 1:22 pm on July 6, 2022:

What am I missing?

Nothing. I misread non-continuous as continuous. You’re right that this would be rejected with Misbehaving(peer, 20, "non-continuous headers sequence")

sdaftuar commented at 2:20 pm on July 6, 2022:

Not sure what to say here other than that yes, there are minor behavior changes in this PR. I don’t think any are significant as the DoS points here are pretty arbitrary and when checking for different failures, it shouldn’t matter too much which order we do things. I would have been happy to update the commit message as you suggest but now that this is merged there’s not much I can do!

It should be the case that any failure before that resulted in DoS points would still result in DoS points now, I think – if that is not the case, then that would be an oversight to fix. The motivation for making these kinds of minor changes is to make the headers processing easier to reason about, which is also in preparation for another change I’ll be proposing to headers sync in the future (which I think will be easier to understand with the headers processing logic that is introduced in this PR).

maflcko commented at 2:59 pm on July 6, 2022:

Yeah, it wasn’t meant as a criticism of https://github.com/bitcoin/bitcoin/commit/7f2450871b3ea0b4d02d56bd2ca365fcc25cf90e more as a nit to keep in mind in the future. I think the code changes in this commit are perfectly fine. Generally I just prefer to split behaviour changes from refactoring changes, or at least explain the behaviour changes in the commit message briefly. Otherwise, what seems obvious to the author is easily missed by reviewers. (I think this discussion is supporting evidence enough).

ajtowns commented at 3:36 pm on July 6, 2022:

Previously a non-connecting non-continuous header sequence of size MAX_BLOCKS_TO_ANNOUNCE+1 was assigned 20. Now it is assigned 10, no? Previously it was deterministically and consistently assigned a Misbehaving(peer, 20, "non-continuous headers sequence");.

Ah, sorry, I was only thinking of the non-connecting continuous case, not the non-connecting, non-continuous case.

in src/net_processing.cpp:2448 in ffe87db247 outdated

2452-            received_new_header = true;
2453-        }
2454-    }
2455+    // If we don't have the last header, then this peer will have given us
2456+    // something new (if these headers are valid).
2457+    bool received_new_header{WITH_LOCK(::cs_main, return m_chainman.m_blockman.LookupBlockIndex(headers.back().GetHash()) == nullptr)};

maflcko commented at 9:54 am on July 5, 2022:

ffe87db247b19ffb8bfba329c5dd0be39ef5a53f: It looks like you are splitting one LOCK of cs_main in master into at least 3 in this pull. It would seem plausible that under no load this has no impact, as only one thread will be active at most. However at higher load, this may cause net_processing to be (minimally) slowed down with the changes here?

Not a big deal, but maybe at least this cs_main can be avoided by opportunistically calculating the result in the previous WITH_LOCK ?

sdaftuar commented at 2:23 pm on July 6, 2022:

I suppose! My view is that the performance here is extremely minor compared to code readability and organizing the ideas in this function together, so I prefer it this way.

maflcko commented at 12:53 pm on July 5, 2022: member

left some questions

in src/net_processing.cpp:361 in 99f4785cad

357@@ -358,7 +358,7 @@ struct Peer {
358     std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex);
359 
360     /** Time of the last getheaders message to this peer */
361-    std::atomic<std::chrono::seconds> m_last_getheaders_timestamp{0s};
362+    std::atomic<NodeClock::time_point> m_last_getheaders_timestamp{NodeSeconds{}};

maflcko commented at 1:55 pm on July 6, 2022:

Is there even a need to make this atomic? In a single threaded environment this is not needed. And if the same peer is served by several threads (for whatever reason) in the future, an atomic may prevent UB, but doesn’t seem sufficient to prevent races logically.

sdaftuar commented at 2:13 pm on July 6, 2022:

Hmm that is an interesting point. My sense was that we don’t want to rely on implicit single-threadedness to reason about data races, so it was safer to just throw this into an atomic and not worry about it.

I also think that having multiple threads servicing the same peer would not really work for a bunch of reasons (including the one you give, about logical errors in code like this). I would imagine that multithreading network handling in the future would involve different threads servicing different peers.

I think the most likely form of UB from data races would be if we were to expose this value via RPC in the future. So to make this future-proof, I’d prefer that we leave this as-is; it doesn’t seem like the performance hit of having an atomic here (which is rarely accessed, as getheaders messages are not a frequent occurrence, and there are far more effective ways to CPU DoS a node than to trigger a getheaders) is significant enough to warrant making this harder to reason about in the future.

maflcko commented at 3:07 pm on July 6, 2022:

I think if this was ever exposed on RPC, we’d also want to think which other fields to return atomically in the response, in which case a per-field atomic isn’t enough to prevent logic races. So to me it seems slightly preferable to leave the design of thread safety to when it is needed. Otherwise it may be assumed in the future that this is already perfectly thread safe and can be used as is. Leaving this plain (without atomic) would at least have a thread sanitizer failure hint that the design needs to be re-thought if this was accessed by more than one thread in the future.

maflcko commented at 3:07 pm on July 6, 2022:

(Other than that, agree that atomic doesn’t affect performance)

sdaftuar commented at 5:42 pm on July 6, 2022:

You know, I think you’re right that this atomic is incorrect. We need to hold a lock throughout MaybeSendGetHeaders() because this variable can be accessed in the scheduler thread or in the net processing thread, and if we don’t hold a lock throughout that function we can get a data race. Thanks for asking about this, I’ll open a PR to fix.

sdaftuar commented at 6:49 pm on July 6, 2022:

Addressed in #25557

maflcko commented at 3:03 pm on July 11, 2022:

Can you elaborate a bit on this? Unless I am missing something MaybeSendGetHeaders/ConsiderEviction is only called in-thread, not to be confused with CheckForStaleTipAndEvictPeers, which is called out-of-thread. What am I missing?

vasild commented at 4:00 pm on July 11, 2022:

What is “in-thread” and “out-of-thread”?

maflcko commented at 5:54 am on July 12, 2022:

“in-thread” means running in the thread “ThreadMessageHandler (b-msghand): Application level message handling (sending and receiving). Almost all net_processing and validation logic runs on this thread.”

vasild commented at 8:13 am on July 12, 2022:

Right, m_last_getheaders_timestamp is accessed from PeerManagerImpl::MaybeSendGetHeaders() and PeerManagerImpl::ProcessMessage() which are only called from a single thread ThreadMessageHandler().

because this variable can be accessed in the scheduler thread or in the net processing thread @sdaftuar, I think it is not accessed from the scheduler thread?

I am in favor of simplicity, in this case meaning to not protect data that is accessed by a single thread. If in the future it is accessed by another thread, then an appropriate protection should be added. That may be an atomic or a mutex just for that variable or a mutex for multiple variables depending on the need. Otherwise we may end up protecting it needlessly now and later to have to rework the protection anyway (or worse - use the available, inappropriate protection, like @MarcoFalke mentioned above “…it may be assumed in the future that this is already perfectly thread safe…”).

sdaftuar commented at 3:03 pm on July 12, 2022:

Can you elaborate a bit on this? Unless I am missing something MaybeSendGetHeaders/ConsiderEviction is only called in-thread, not to be confused with CheckForStaleTipAndEvictPeers, which is called out-of-thread. What am I missing?

Ah, I guess I misremembered how ConsiderEviction is called (and thought it was run in the scheduler thread).

I’ll update my other PR and drop the mutex/atomic altogether.

ryanofsky commented at 3:14 pm on July 12, 2022:

I am in favor of simplicity, in this case meaning to not protect data that is accessed by a single thread.

Possible idea for the future. In theory could have SingleThread<T> class similar to the Synced<T> class from #25390 that wraps an object and asserts in debug mode that is always accessed from the same thread. I think a solution like that would be overkill here, and I’m struggling to think of cases where it wouldn’t be overkill, but it wouldn’t be hard to implement and could be useful to help code document assumptions it is making.

maflcko commented at 7:59 am on July 14, 2022:

Should be resolved by 8efa73e7ce4ea0c1b7ad5c3947a5ecf9fb6361d3

maflcko referenced this in commit 8efa73e7ce on Jul 14, 2022

in src/net_processing.cpp:2225 in 99f4785cad

2438-        //   don't connect before giving DoS points
2439-        // - Once a headers message is received that is valid and does connect,
2440-        //   nUnconnectingHeaders gets reset back to 0.
2441-        if (!m_chainman.m_blockman.LookupBlockIndex(headers[0].hashPrevBlock) && nCount <= MAX_BLOCKS_TO_ANNOUNCE) {
2442-            nodestate->nUnconnectingHeaders++;
2443-            m_connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::GETHEADERS, m_chainman.ActiveChain().GetLocator(m_chainman.m_best_header), uint256()));

Sjors commented at 1:24 pm on August 6, 2022:

msgMaker is now unused in ProcessHeadersMessage. This is fixed in #25717

fanquake added this to the milestone 24.0 on Sep 15, 2022

hebasto commented at 11:04 am on March 30, 2023: member

msgMaker is now unused in ProcessHeadersMessage.

Same for PeerManagerImpl::HandleFewUnconnectingHeaders() and PeerManagerImpl::ConsiderEviction(). See #27368.

fanquake referenced this in commit 5241b8bdff on Mar 30, 2023

sidhujag referenced this in commit d5d878e4d8 on Apr 1, 2023

Fabcien referenced this in commit e70a075264 on Dec 5, 2023

Fabcien referenced this in commit 5ccc844d00 on Dec 5, 2023

Fabcien referenced this in commit bba802a752 on Dec 5, 2023

Fabcien referenced this in commit a0058fe859 on Dec 5, 2023

Fabcien referenced this in commit 2d39859311 on Dec 5, 2023

Fabcien referenced this in commit 61921db596 on Dec 5, 2023

Fabcien referenced this in commit 71e67ffdd3 on Dec 6, 2023

sipa referenced this in commit a903bba4d6 on Mar 7, 2024

sipa referenced this in commit ce238dd79b on Mar 12, 2024

bitcoin locked this on Mar 29, 2024

p2p: Avoid multiple getheaders messages in flight to the same peer #25454

Conflicts