p2p: adaptive connections services flags #28170

pull furszy wants to merge 6 commits into bitcoin:master from furszy:2023_net_adaptable_desirable_flags changing 11 files +163 −64
  1. furszy commented at 3:34 pm on July 27, 2023: member

    Derived from #28120 discussion.

    By relocating the peer desirable services flags into the peer manager, we allow the connections acceptance process to handle post-IBD potential stalling scenarios.

    The peer manager will be able to dynamically adjust the services flags based on the node’s proximity to the tip (back and forth). Allowing the node to recover from the following post-IBD scenario: Suppose the node has successfully synced the chain, but later experienced dropped connections and remained inactive for a duration longer than the limited peers threshold (the timeframe within which limited peers can provide blocks). In such cases, upon reconnecting to the network, the node might only establish connections with limited peers, filling up all available outbound slots. Resulting in an inability to synchronize the chain (because limited peers will not provide blocks older than the NODE_NETWORK_LIMITED_MIN_BLOCKS threshold).

  2. DrahtBot commented at 3:34 pm on July 27, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild, naumenkogs, mzumsande, andrewtoth, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29231 (logging: Update to new logging API by ajtowns)
    • #28016 (p2p: gives seednode priority over dnsseed if both are provided by sr-gi)
    • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)
    • #15218 (validation: Flush state after initial sync by andrewtoth)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Jul 27, 2023
  4. furszy renamed this:
    net: adaptive connections services flags
    p2p: adaptive connections services flags
    on Jul 27, 2023
  5. DrahtBot added the label CI failed on Jul 27, 2023
  6. DrahtBot removed the label CI failed on Jul 28, 2023
  7. in src/net.h:1019 in 2325ff03f4 outdated
    665@@ -666,6 +666,12 @@ class NetEventsInterface
    666     /** Handle removal of a peer (clear state) */
    667     virtual void FinalizeNode(const CNode& node) = 0;
    668 
    669+    /**
    670+     * Callback to determine whether the given set of service flags are sufficient
    671+     * for a peer to be "relevant".
    672+     */
    673+    virtual bool HasDesirableServiceFlags(ServiceFlags services) = 0;
    


    dergoegge commented at 2:50 pm on August 7, 2023:

    This interface is not the right place for a callback like this because it is not a “net event”.

    Would be better to have a method on connman, e.g. CConnman::SetDesirableServiceFlags that is then called by peerman.


    furszy commented at 10:20 pm on August 7, 2023:

    This interface is not the right place for a callback like this because it is not a “net event”.

    Hmm ok, that would avoid locking cs_main in the open connections thread. Which is nice.


    furszy commented at 3:23 pm on August 8, 2023:

    Have thought more about this and I’m not so convinced about it anymore.

    The connection manager class lives on a lower level, and manages only a fraction of the node’s information. And this field is strictly linked to the node’s overall status. Right now, HasDesirableServiceFlags() requires to know the chain sync status. So, by placing it inside CConMan, we add another level of indirection. Because, most of the time, the peer manager class will call to the CConMan setter and subsequently call the getter to utilize it. Using the CConMan class mostly as a field container.

    Also, it would complicate any expansion of the peer services selectivity logic because different processes would require to update the flag stored in CConMan. For instance, the node could be seeking to exclusively connect to peers that provide block filters (up to a certain point, then pick other preferences on a case basis), or could want to minimize the number of limited connections based on the chain download state, or if a chain stale state was detected, or could be needing to disallow certain peer types for a certain time etc.

    Considering this, the responsibility for managing this field seems to fit better inside the peer manager class.

    Happy to hear your thoughts about this. Will keep thinking about it, we can still improve this to not require cs_main lock even when it’s placed inside the peers manager class.


    dergoegge commented at 2:37 pm on August 14, 2023:

    Because, most of the time, the peer manager class will call to the CConMan setter and subsequently call the getter to utilize it.

    I wasn’t saying to make connman the container for this data, I only meant to add a setter to inform connman of the things it needs to perform its job (making connections to desirable nodes). The main container for the flags would still be peerman.


    furszy commented at 3:47 pm on August 17, 2023:

    Because, most of the time, the peer manager class will call to the CConMan setter and subsequently call the getter to utilize it.

    I wasn’t saying to make connman the container for this data, I only meant to add a setter to inform connman of the things it needs to perform its job (making connections to desirable nodes). The main container for the flags would still be peerman.

    Ok. What I dislike about that approach is that we lose a lot of flexibility.

    The PeerManager should be able to decide granularly whether to connect to a specific address before ConnMann opens the connection and initiates handshaking etc. The node’s contextual information and logic required to take the decision of opening a socket to a particular address resides mostly in PeerManager (right now, we only have the stale chain condition, but there could be more). And, by adding another field inside ConnMan, we would be duplicating data (same desirable_services field inside PeerManager and ConnMan), which would need to be synchronized and could potentially lead to race conditions.

    I mean, this misalignment could result in ConnMan performing actions that are no longer in line with the latest PeerManager state. For instance, it might create connections to multiple peers with certain service flags, when PeerManager only intended to connect to one peer supporting such service flags. This situation would eventually lead to PeerManager closing the newly created connections when it sends the version message, as these connections are no longer desirable. Which is a waste of resources.

    Thus why I think that the callback mechanism suits well here. So PeerManager is able to evaluate each connection request, and avoid executing the connection + handshake when we are going to disconnect the peer anyway.


    dergoegge commented at 12:25 pm on January 18, 2024:

    Circling back to this, I still don’t think the interface choices in this PR are right.

    Mostly it’s just that the NetEventsInterface is not the place for a getter like this. It is meant for networking events (hence the name) such as opening/closing connections and processing/sending messages. So Approach NACK on misusing this interface.

    The node’s contextual information and logic required to take the decision of opening a socket to a particular address resides mostly in PeerManager (right now, we only have the stale chain condition, but there could be more)

    This is not true? ThreadOpenConnections is a method on CConnman and handles the opening of network connections.

    I would suggest that you take the same approach as we have already taken with CConnman::{SetTryNewOutboundPeer, StartExtraBlockRelayPeers}, i.e. peerman telling connman what to do. That approach avoids increasing the coupling between the two components.


    furszy commented at 10:29 pm on January 19, 2024:

    The node’s contextual information and logic required to take the decision of opening a socket to a particular address resides mostly in PeerManager (right now, we only have the stale chain condition, but there could be more)

    This is not true? ThreadOpenConnections is a method on CConnman and handles the opening of network connections.

    Yes, that’s the layering violation that causes the coupling between PeerManager and ConMann that we both dislike and shouldn’t be there (at least not entirely). The connections manager alone does not have enough information to select the best candidate based on the node’s context. CConMan cannot access the chain state, the sync progress, in-flight block requests, nor any services predilection, nor any other user customization to prefer certain peers over others—such as ones supporting a number of compact block filters or v2 encrypted connections, etc. This is the reason why we are forced to set the bool flags you mentioned (which can easily get out of sync, causing a waste of resources and races). Under this structure, the only thing we do is continue expanding the number of fields inside CConMan that are susceptible to the same issues (see below).

    I would suggest that you take the same approach as we have already taken with CConnman::{SetTryNewOutboundPeer, StartExtraBlockRelayPeers}, i.e. peerman telling connman what to do. That approach avoids increasing the coupling between the two components.

    I still stand by what I wrote previously at #28170 (review). Essentially, in my view, the approach is resource-wasteful and could lead to race conditions. And, following what I wrote above, it has a poor structure that increases the number of unrelated fields inside CConMan over time.

    Overall, I’m in favor of decoupling these two components, but I’m not in favor of continuing to expand the current structure. The connections’ desirable services flag can (and will) change continually over time, depending on the node’s state, connection heuristics, and possible user custom configs. It is a much more dynamic field than the ones you mention.

    Mostly it’s just that the NetEventsInterface is not the place for a getter like this. It is meant for networking events (hence the name) such as opening/closing connections and processing/sending messages.

    Two points about this:

    Firstly, it is not just a getter. The node needs to make the decision to connect to a certain address based on several factors that depend on the node’s context. Caching a flag doesn’t help here. I mentioned some possible use cases above. I don’t think continuing to expand CConMan fields with data from upper layers is a good structure.

    Secondly, I believe that with this statement, we are somewhat in agreement that the logic of “to which address, supporting which services, should we connect?” shouldn’t be part of CConMan. However, as long as it remains there, the process of deciding to connect (or not) to a certain address is a net event. Like you, I would prefer if it weren’t there, and we could work on it. But I don’t think this PR is introducing something incorrect right now.


    dergoegge commented at 11:51 am on January 22, 2024:

    CConMan cannot access the chain state, the sync progress, in-flight block requests

    I don’t see how the connection opening logic needs access to any of those things.

    number of compact block filters

    Bitcoin Core doesn’t download compact block filters.

    v2 encrypted connections, etc.

    Connman is doing this right now using our service flags and the service flag of the potential new peer. It has all the data it needs.

    Essentially, in my view, the approach is resource-wasteful and could lead to race conditions.

    Your approach has the exact same “flaw” w.r.t. connecting to desired service flags, e.g.:

    • connman calls HasDesireableServiceFlags for peer A which returns true
    • connman proceeds with opening the connection to peer A
    • some event occurs (e.g. messages from other peers are processed) that would make HasDesireableServiceFlags return false for peer A
    • connection to peer A is established

    Peer A is now connected even-though HasDesireableServiceFlags would return false. These races can’t really be avoided, so it’s best to stick with the interface design we’ve already got imo.


    furszy commented at 1:21 pm on January 23, 2024:

    CConMan cannot access the chain state, the sync progress, in-flight block requests

    I don’t see how the connection opening logic needs access to any of those things.

    I believe that we are not in sync regarding the distinction between (1) the logic for deciding whether to connect to a specific address and (2) the thread/process responsible for opening such a connection. The former requires access to that contextual information; its usage is behind the extra_block_relay and try_another_outbound_peer bool flags, and also behind the existing HasAllDesirableServiceFlags() call.

    What I intended to say on that sentence was that a node could have different connections strategies in the future, even though it doesn’t have them now. These strategies would need access to the overall node state. Thus why I’m saying that the current approach of adding fields inside CConnMan with each new connection decision logic change make the code harder to work with, rather than simplifying it. Still, adding a callback as I do here isn’t the saint grail neither. It is slightly better for the reasons I state below, but in the future, with the introduction of more connection strategies, I think we should consider moving part of the ThreadOpenConnections logic to PeerManager.

    number of compact block filters

    Bitcoin Core doesn’t download compact block filters.

    It was just an example of a modification we could add to the connections decision-making process. We can replace it with a more general “reserve X outbound slots for peers supporting Y services” which could also depend on certain contextual factors. This is something either the user or we might want in the future. And like this one, there could be many other modifications that could be added. But, in case it helps, I do have an experimental branch implementing compact block filters download and the entire BIP157 client mode.

    v2 encrypted connections, etc.

    Connman is doing this right now using our service flags and the service flag of the potential new peer. It has all the data it needs.

    You are talking about the way we connect to potential v2 peers and fallback to v1 based on their response right?. My point was about preferring certain services over others, for a limited number of outbound connection slots, before opening the socket. This prioritization could change depending on the node’s context; e.g. taking into consideration the chain progress and/or if the user has configured the node differently.

    Essentially, in my view, the approach is resource-wasteful and could lead to race conditions.

    Your approach has the exact same “flaw” w.r.t. connecting to desired service flags, e.g.:

    • connman calls HasDesireableServiceFlags for peer A which returns true
    • connman proceeds with opening the connection to peer A
    • some event occurs (e.g. messages from other peers are processed) that would make HasDesireableServiceFlags return false for peer A
    • connection to peer A is established

    Peer A is now connected even-though HasDesireableServiceFlags would return false. These races can’t really be avoided, so it’s best to stick with the interface design we’ve already got imo.

    The difference between your proposed design and mine is essentially polling to set the value vs calculating the value on-demand.

    In your design, the code needs to check data at regular intervals to update the flag in another component. If this process is executed too rapidly, it can harm performance, and if it is done too slowly, it wastes resources. In the one I’m proposing, because the value is calculated only when is needed, it shortens the race window. Are we in agreement on that?

    Still, following to what I wrote in my previous comment, I would like to re-think ThreadOpenConnections because otherwise, we will keep adding fields to CConMan that are unrelated to the component itself.

  8. dergoegge commented at 2:54 pm on August 7, 2023: member

    I don’t think the scenario you describe is likely to happen since we consider ourselves to be in IBD if our tip is older than 24h (which corresponds to ~144 blocks).

    Suppose the node has successfully synced the chain, but later experienced dropped connections and remained inactive for a duration longer than the limited peers threshold (the timeframe within which limited peers can provide blocks). In such cases, upon reconnecting to the network, the node might only establish connections with limited peers, filling up all available outbound slots.

    So this isn’t true, upon restart the node would be behind by >24h and open connections to non-pruning peers (unless the block rate significantly increased beyond 1block/10min).


    Getting rid of the g_initial_block_download_completed global would still be worthwhile but the other changes seem unnecessary to me.

  9. furszy commented at 4:01 pm on August 7, 2023: member

    I don’t think the scenario you describe is likely to happen since we consider ourselves to be in IBD if our tip is older than 24h (which corresponds to ~144 blocks).

    Thats not entirely true, misses an important part. We consider ourselves in IBD if our tip is older than 24hs (or the custom max tip age) and we haven’t completed IBD yet. Once IBD is completed, the node remains out of IBD for the entire software lifecycle.

    Suppose the node has successfully synced the chain, but later experienced dropped connections and remained inactive for a duration longer than the limited peers threshold (the timeframe within which limited peers can provide blocks). In such cases, upon reconnecting to the network, the node might only establish connections with limited peers, filling up all available outbound slots.

    So this isn’t true, upon restart the node would be behind by >24h and open connections to non-pruning peers (unless the block rate significantly increased beyond 1block/10min).

    This conclusion misses what I wrote above. The IBD flag is set to true but never reverted for the entire software lifecycle.

    The scenario that I described in the PR description is a real scenario. Reproducible by running #28120 test case. Which exercises the proposed behavior.

    Basically, same as the IBD flag, g_initial_block_download_completed is set to true but never reverted. Which ends up with the node always connecting to limited peers after the first IBD completion. Not accounting for prolonged stalling situations, where the node is beyond the limited peers threshold due an internet outage.

    To test the behavior, you could drop the bug fix commit from that branch. Will see the post-IBD super stalled node connecting to a limited peer even when such peer is not helpful, as it will not provide any of the historical blocks required to sync-up the node.

  10. in src/net_processing.cpp:1655 in 1428b7f4e8 outdated
    1578@@ -1577,6 +1579,20 @@ void PeerManagerImpl::FinalizeNode(const CNode& node)
    1579     LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
    1580 }
    1581 
    1582+bool PeerManagerImpl::HasDesirableServiceFlags(ServiceFlags services)
    


    mzumsande commented at 7:23 pm on August 16, 2023:
    The comment (“shortcut for…”) was helpful for me, any reason to drop it?

    furszy commented at 10:17 pm on August 17, 2023:

    The comment (“shortcut for…”) was helpful for me, any reason to drop it?

    That was because the method was moved to the net interface (see). And there, this is no longer a shortcut. It is the only available method.


    mzumsande commented at 5:09 pm on August 18, 2023:
    I interpreted the comment as explaining the not-so-trivial bit-operator logic: !(GetDesirableServiceFlags(services) & (~services)); is still a shortcut for the longer, but more understandable alternative services & GetDesirableServiceFlags(services)) == GetDesirableServiceFlags(services) no matter where the function is situated.

    furszy commented at 5:19 pm on August 18, 2023:

    Ok. For that, we can locate the comment in the implementation (above the no-so-trivial logic). So it’s clear what that single line does. I just don’t think that is good to place it at the API level. Where GetDesirableServiceFlags doesn’t exist and HasDesirableServiceFlags() could be implemented in different manners.

    Side note: If the line is confusing, instead of adding the comment, we could just replace it by the more understandable one. Isn’t something that will really affect anything.


    mzumsande commented at 6:01 pm on August 18, 2023:
    Both options are fine with me. I will admit that I’m not as fluent/quick reading code with bitwise operators as I should be, but then again if even the original author of this code got it wrong at first (https://github.com/bitcoin/bitcoin/pull/11456#discussion_r144391264) it’s probably not just me.
  11. in src/net_processing.cpp:5251 in 2325ff03f4 outdated
    5243@@ -5235,6 +5244,11 @@ void PeerManagerImpl::CheckForStaleTipAndEvictPeers()
    5244     if (!m_initial_sync_finished && CanDirectFetch()) {
    5245         m_connman.StartExtraBlockRelayPeers();
    5246         m_initial_sync_finished = true;
    5247+    } else if (super_stale) {
    5248+        // Disable extra block relay peers and disallow connection to limited peers.
    5249+        // We need to find a full node that can help us recover.
    5250+        m_connman.StopExtraBlockRelayPeers();
    


    mzumsande commented at 7:44 pm on August 16, 2023:
    Why stop doing extra block-relay peers in this situation? These aren’t attempted during original IBD because we are expected to be behind the tip. However, in this super-stale situation which should really not happen normally, it seems that they could only help?

    naumenkogs commented at 12:13 pm on August 17, 2023:
    What do you think they would help with?

    mzumsande commented at 1:14 pm on August 17, 2023:
    If for whatever reason our current peers aren’t able to send us the blocks to get us out of this situation, maybe trying an additional one will.

    furszy commented at 8:57 pm on August 17, 2023:

    Why stop doing extra block-relay peers in this situation?

    Because, in the stalling scenario, we not only need to progress at the chain level, we might also need to receive new addresses from the network. And block-relay-only connections are limited in that sense. The node rejects addresses coming from them.

    If for whatever reason our current peers aren’t able to send us the blocks to get us out of this situation, maybe trying an additional one will.

    Thats basically what we currently do. The stalling behavior triggers the “try new outbound connection” functionality. Which instructs ConnMan to create a new full relay connection even when the outbound slots are full.

    ….

    With this, I’m not disagreeing about keeping the extra-block-relay connections enabled. Just thought that they might not be as useful, given that we’re already performing the same actions with the “try new outbound connection” functionality.


    naumenkogs commented at 11:25 am on August 18, 2023:
    In that case i’d rather not StopExtraBlockRelayPeers. Instead we can leave a comment regarding the “same actions”, although I’m not sure what’s the appropriate place for it.

    mzumsande commented at 6:07 pm on August 18, 2023:
    I think there’s no harm in keeping the ExtraBlockRelayPeers logic: While it’s true that we would also do extra full outbound connections in this case, those have priority in ThreadOpenConnections over the block-relay-only ones, so trying another block-relay only might not have a super high chance of helping, but it still is another try and if there is no downside to it (I can’t see any), I don’t see why we should actively add logic to stop these.

    furszy commented at 6:17 pm on August 18, 2023:
    yeah, agree

    vasild commented at 2:47 pm on December 18, 2023:
    The problem this PR aims to resolve is that if we are too much behind (48h), then limited peers may not be able to give us the blocks we need. To resolve this problem, I think that it is not necessary to change the “extra block relay only connections” logic. IMO that better be done in a separate PR with its own justification.
  12. in src/net_processing.cpp:1280 in 2325ff03f4 outdated
    1277-    return m_last_tip_update.load() < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 3} && mapBlocksInFlight.empty();
    1278+
    1279+    auto last_update_time = m_last_tip_update.load();
    1280+    if (last_update_time < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 3} && mapBlocksInFlight.empty()) {
    1281+        // If tip is further than NODE_NETWORK_LIMITED_MIN_BLOCKS, we are really behind and shouldn't connect to limited peers anymore.
    1282+        super_stale = last_update_time < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 290};
    


    mzumsande commented at 8:07 pm on August 16, 2023:

    m_initial_sync_finished is set using AdjustedTime (see CanDirectFetch()) while it’s unset using local time. In spite of plans to get rid of AdjustedTime, I think it would be better to use it here as well, otherwise there could be constant switching back and forth in some situations where the two times differ.

    Also, why introduce a new magic number of 290, can’t we just use NODE_NETWORK_LIMITED_MIN_BLOCKS (maybe + 2, if there is a reason for it)?


    furszy commented at 9:59 pm on August 17, 2023:
    Yeah. Sure for both. Not sure why I hardcoded the number..
  13. mzumsande commented at 8:28 pm on August 16, 2023: contributor
    Concept ACK This seems to be similar to the suggestion by gmaxwell in #10387 (comment) It also is a good thing to have protocol.h not depend on dynamic net_processing state.
  14. in src/protocol.h:334 in 1dbc6cc6b1 outdated
    326@@ -327,12 +327,16 @@ std::vector<std::string> serviceFlagsToStr(uint64_t flags);
    327  * guaranteed to not change dependent on state - ie they are suitable for
    328  * use when describing peers which we know to be desirable, but for which
    329  * we do not have a confirmed set of service flags.
    330- *
    331- * If the NODE_NONE return value is changed, contrib/seeds/makeseeds.py
    


    naumenkogs commented at 11:57 am on August 17, 2023:
    Was it just outdated already? I’m confused by what NODE_NONE meant here before this change.

    furszy commented at 10:07 pm on August 17, 2023:

    Was it just outdated already? I’m confused by what NODE_NONE meant here before this change.

    It was talking about the input ServiceFlags. Not the GetDesirableServiceFlags() output.

    Basically, we expect to have all the hardcoded seeds supporting NODE_NETWORK and NODE_WITNESS.

  15. in src/net_processing.h:108 in 1428b7f4e8 outdated
    103+     * version of the peer may determine which flags are required (eg in the
    104+     * case of NODE_NETWORK_LIMITED where we seek out NODE_NETWORK peers
    105+     * unless they set NODE_NETWORK_LIMITED and we are out of IBD, in which
    106+     * case NODE_NETWORK_LIMITED suffices).
    107+     *
    108+     * Thus, generally, avoid calling with peerServices == NODE_NONE, unless
    


    naumenkogs commented at 11:59 am on August 17, 2023:
    does peerServices refer to the outdated code? Pehraps we should change it here?

    furszy commented at 10:09 pm on August 17, 2023:
    yeah sure.
  16. furszy force-pushed on Aug 17, 2023
  17. DrahtBot added the label CI failed on Aug 18, 2023
  18. furszy force-pushed on Aug 18, 2023
  19. furszy force-pushed on Aug 18, 2023
  20. DrahtBot removed the label CI failed on Aug 18, 2023
  21. furszy commented at 1:59 pm on September 20, 2023: member
    per convo, cc @vasild
  22. DrahtBot added the label Needs rebase on Oct 19, 2023
  23. furszy force-pushed on Oct 19, 2023
  24. DrahtBot removed the label Needs rebase on Oct 20, 2023
  25. in src/protocol.h:339 in 7c20e65b28 outdated
    337 ServiceFlags GetDesirableServiceFlags(ServiceFlags services);
    338 
    339+/**
    340+ * State independent service flags.
    341+ * If the return value is changed, contrib/seeds/makeseeds.py
    342+ * should be updated appropriately to filter for the same nodes.
    


    naumenkogs commented at 11:18 am on October 27, 2023:

    7c20e65b2880a23d753b3f05fd28994353b3049d

    the same nodes always takes non-negligible time for me to understand… I would suggest making it less ambiguous, e.g. “nodes with desired service flags (compatible with our new flags)”


    furszy commented at 10:00 pm on October 30, 2023:
    Done as suggested
  26. in src/protocol.h:318 in 893c541dce outdated
    348- */
    349-static inline bool HasAllDesirableServiceFlags(ServiceFlags services)
    350-{
    351-    return !(GetDesirableServiceFlags(services) & (~services));
    352-}
    353+ServiceFlags StatelessServicesFlags();
    


    andrewtoth commented at 4:57 pm on October 29, 2023:

    The implementation could be moved here and this made constexpr?

    0ServiceFlags StatelessServiceFlags();
    

    furszy commented at 10:00 pm on October 30, 2023:
    Done as suggested
  27. in src/net_processing.cpp:1647 in 3c895206cb outdated
    1643@@ -1642,6 +1644,21 @@ void PeerManagerImpl::FinalizeNode(const CNode& node)
    1644     LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
    1645 }
    1646 
    1647+bool PeerManagerImpl::HasDesirableServiceFlags(ServiceFlags services)
    


    andrewtoth commented at 5:12 pm on October 29, 2023:
    Why rename from HasAllDesirableServiceFlags? The HasAll is more descriptive. It is now ambiguous whether this returns true if the ServiceFlags has some number of desirable service flags or strictly all of them.

    furszy commented at 7:52 pm on October 30, 2023:

    Why rename from HasAllDesirableServiceFlags? The HasAll is more descriptive. It is now ambiguous whether this returns true if the ServiceFlags has some number of desirable service flags or strictly all of them.

    I don’t think the name is good for the PeerManager-Connman interface. What “HasAll” means for NetEventsInterface? There, it should be something like bool CanConnectTo(services) or something similar.

    Still, I’m not really happy with the interface at this point. No problem on renaming it back.

  28. in src/net_processing.h:135 in 3c895206cb outdated
    130+     * 'services' == NODE_NONE, the returned desirable service flags are
    131+     * guaranteed to not change dependent on state - ie they are suitable for
    132+     * use when describing peers which we know to be desirable, but for which
    133+     * we do not have a confirmed set of service flags.
    134+    */
    135+    virtual ServiceFlags GetDesirableServiceFlags(ServiceFlags services) = 0;
    


    andrewtoth commented at 5:23 pm on October 29, 2023:
    Could be a const function?

    furszy commented at 10:00 pm on October 30, 2023:
    Done as suggested
  29. in src/net.h:1019 in 3c895206cb outdated
    1011@@ -1012,6 +1012,12 @@ class NetEventsInterface
    1012     /** Handle removal of a peer (clear state) */
    1013     virtual void FinalizeNode(const CNode& node) = 0;
    1014 
    1015+    /**
    1016+     * Callback to determine whether the given set of service flags are sufficient
    1017+     * for a peer to be "relevant".
    1018+     */
    1019+    virtual bool HasDesirableServiceFlags(ServiceFlags services) = 0;
    


    andrewtoth commented at 5:25 pm on October 29, 2023:
    Could be a const function?

    furszy commented at 10:00 pm on October 30, 2023:
    Done as suggested
  30. in src/net_processing.cpp:896 in cc5f276fa2 outdated
    892@@ -892,7 +893,7 @@ class PeerManagerImpl final : public PeerManager
    893      */
    894     bool BlockRequested(NodeId nodeid, const CBlockIndex& block, std::list<QueuedBlock>::iterator** pit = nullptr) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    895 
    896-    bool TipMayBeStale() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    897+    bool TipMayBeStale(bool& super_stale) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    


    andrewtoth commented at 5:51 pm on October 29, 2023:

    I think super_stale is not very descriptive. I also think this pattern of returning extra data from this function in this way is a little confusing. This could be made more descriptive by declaring a new enum and using it as the return value. eg

    0enum Staleness {
    1    NotStale,
    2    Stale,
    3    StalePastNetworkLimited,
    4};
    

    furszy commented at 10:00 pm on October 30, 2023:
    yeah, better. Done as suggested
  31. in src/net_processing.cpp:1677 in 893c541dce outdated
    1660@@ -1661,7 +1661,6 @@ bool PeerManagerImpl::HasDesirableServiceFlags(ServiceFlags services)
    1661 ServiceFlags PeerManagerImpl::GetDesirableServiceFlags(ServiceFlags services)
    1662 {
    1663     if (services & NODE_NETWORK_LIMITED) {
    1664-        LOCK(::cs_main);
    1665         if (m_initial_sync_finished) return ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS);
    


    andrewtoth commented at 5:51 pm on October 29, 2023:
    Is relaxed ordering ok for this? We don’t need to call m_initial_sync_finished.load()?

    furszy commented at 8:59 pm on October 30, 2023:
    Why relaxed? look at the operator. Is equivalent to load, which uses std::memory_order_seq_cst.
  32. andrewtoth commented at 5:54 pm on October 29, 2023: contributor

    Concept ACK.

    Should we have a functional test for this scenario?

  33. in src/net_processing.cpp:1309 in cc5f276fa2 outdated
    1306-    return m_last_tip_update.load() < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 3} && mapBlocksInFlight.empty();
    1307+
    1308+    auto last_update_time = m_last_tip_update.load();
    1309+    if (last_update_time < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 3} && mapBlocksInFlight.empty()) {
    1310+        // If tip is further than NODE_NETWORK_LIMITED_MIN_BLOCKS, we are really behind and shouldn't connect to limited peers anymore.
    1311+        super_stale = NodeSeconds{last_update_time} < GetAdjustedTime() - std::chrono::seconds{consensusParams.nPowTargetSpacing * NODE_NETWORK_LIMITED_MIN_BLOCKS};
    


    naumenkogs commented at 8:31 am on October 30, 2023:
    nit: add Assume(NODE_NETWORK_LIMITED_MIN_BLOCKS > 3) so that stale/super-stale makes sense here.

    furszy commented at 10:00 pm on October 30, 2023:
    Done as suggested
  34. in src/protocol.cpp:139 in 7c20e65b28 outdated
    133@@ -134,6 +134,11 @@ ServiceFlags GetDesirableServiceFlags(ServiceFlags services) {
    134     return ServiceFlags(NODE_NETWORK | NODE_WITNESS);
    135 }
    136 
    137+ServiceFlags StatelessServicesFlags()
    138+{
    139+    return ServiceFlags(NODE_NETWORK | NODE_WITNESS);
    


    naumenkogs commented at 12:53 pm on October 30, 2023:

    7c20e65b2880a23d753b3f05fd28994353b3049d

    While you decouple the two, do you mean to make them independent from each other? It seems to me they will always be roughly the same. I agree with the goal of the PR, just not sure about this approach.


    furszy commented at 9:48 pm on October 30, 2023:

    While you decouple the two, do you mean to make them independent from each other? It seems to me they will always be roughly the same. I agree with the goal of the PR, just not sure about this approach.

    Yeah. The goal is to make them independent from each other so the peer manager runtime logic doesn’t affect the hardcoded seeds. They are built from another source and depend on the way we externally collect them. These peers are stored in db with the stateless service flags, and would be an error to store them with a different one (they might not support the service).

  35. furszy force-pushed on Oct 30, 2023
  36. furszy commented at 10:44 pm on October 30, 2023: member

    Feedback tackled. Thanks both!

    Should we have a functional test for this scenario?

    We can’t have a functional test because the automatic peers connection process skips localhost addresses. Will work on a unit test for it. Could do one validating that the peer manager service flags changes when the ibd state change.

  37. in src/net_processing.cpp:1689 in bf3ce4a9ae outdated
    1668+ServiceFlags PeerManagerImpl::GetDesirableServiceFlags(ServiceFlags services) const
    1669+{
    1670+    if (services & NODE_NETWORK_LIMITED) {
    1671+        if (m_initial_sync_finished) return ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS);
    1672+    }
    1673+    return ServiceFlags(NODE_NETWORK | NODE_WITNESS);
    


    andrewtoth commented at 11:08 pm on October 30, 2023:
    Can we return StatelessServiceFlags() here instead?

    naumenkogs commented at 7:36 am on October 31, 2023:
    Yeah this what causes my confusion here as well. I think the author means to use StatelessServiceFlags only for seed peers. In that case, i’d probably say that explicitly in the function’s name. Like SeedsServiceFlags. Then, a comment saying something like these flags should probably be in sync with GetDesirableServiceFlags, but not strictly.

    furszy commented at 2:36 pm on October 31, 2023:

    In that case, i’d probably say that explicitly in the function’s name. Like SeedsServiceFlags.

    Sounds good. Will rename it to SeedsServiceFlags. I was thinking we would use this function for other cases in the future, but agree that this can be simplified for now.

    Then, a comment saying something like these flags should probably be in sync with GetDesirableServiceFlags, but not strictly.

    I’m not sure about this comment. We could have a much more complex logic at the PeerManager level, adaptable based on the chain context and/or the user’s configurations. For instance, the user could be running a node whose preferred connections are ones supporting NODE_P2P_V2 or NODE_COMPACT_FILTERS (because the sync mechanism is BIP157) or some other service flag in the future.

  38. furszy force-pushed on Oct 31, 2023
  39. furszy force-pushed on Oct 31, 2023
  40. furszy commented at 2:48 pm on October 31, 2023: member

    Updated per feedback. Thanks again.

    • Added coverage for the introduced PeerManager’s adaptive connections service flags.
    • Renamed StatelessServiceFlags function to SeedsServiceFlags.
    • Re-introduced the “initial sync completed” flag update on UpdatedBlockTip. This is to not depend solely on the stale check inside CheckForStaleTipAndEvictPeers which is executed every 10 minutes (STALE_CHECK_INTERVAL). See code comments for more information.
  41. DrahtBot added the label CI failed on Oct 31, 2023
  42. in src/test/peerman_tests.cpp:30 in 3a56fe6dbb outdated
    25+    // Check we accept network limited peers when we are out of ibd
    26+    SetMockTime(WITH_LOCK(::cs_main, return m_node.chainman->ActiveChain().Tip()->GetBlockTime())); // time to tip time so we are out of ibd
    27+    peerman->CheckForStaleTipAndEvictPeers();  // Update ibd state
    28+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS));
    29+
    30+    // Now check we can go back to ibd if we are further than the limited peers threshold (NODE_NETWORK_LIMITED_MIN_BLOCKS)
    


    andrewtoth commented at 3:31 pm on October 31, 2023:
    Can we first check that we don’t go back to ibd if we are stale but at the threshold, then we do go back to ibd once we step over the threshold?

    furszy commented at 7:37 pm on October 31, 2023:
    yeah sure, good idea.
  43. in src/test/peerman_tests.cpp:43 in 3a56fe6dbb outdated
    38+    SetMockTime(WITH_LOCK(::cs_main, return m_node.chainman->ActiveChain().Tip()->GetBlockTime())); // time to tip time so we are out of ibd
    39+    peerman->CheckForStaleTipAndEvictPeers();  // Update ibd state
    40+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS));
    41+
    42+    // Push the chain forward to surpass the 'STALE_CHECK_INTERVAL' (time between 'CheckForStaleTipAndEvictPeers' calls)
    43+    // to move the "initial sync finished" state flag back to false (back to IBD state once more).
    


    andrewtoth commented at 3:37 pm on October 31, 2023:
    The 3 lines below are the same as lines 33-35, except for the +11min. It’s unclear to me from this comment what the 11min is doing and why this is different from what we are doing before. Can this maybe be explained better?

    furszy commented at 7:39 pm on October 31, 2023:
    Sure. This is because we execute the stale check every 10 min, so needed to push the last stale check time forward. Will re-order the test checks so this becomes clearer.
  44. furszy force-pushed on Oct 31, 2023
  45. furszy commented at 7:50 pm on October 31, 2023: member

    Updated per feedback, thanks @andrewtoth.

    • Re-ordered test checks for improved clarity.
    • Included coverage for when the node is stale but still recoverable (below the limited peers threshold).
  46. DrahtBot removed the label CI failed on Oct 31, 2023
  47. in src/net_processing.cpp:1317 in 636d927c4a outdated
    1314+
    1315+    auto last_update_time = m_last_tip_update.load();
    1316+    if (last_update_time < GetTime<std::chrono::seconds>() - std::chrono::seconds{consensusParams.nPowTargetSpacing * 3} && mapBlocksInFlight.empty()) {
    1317+        // If tip is further than NODE_NETWORK_LIMITED_MIN_BLOCKS, we are really behind and shouldn't connect to limited peers anymore.
    1318+        Assume(NODE_NETWORK_LIMITED_MIN_BLOCKS > 3);
    1319+        bool in_ibd = NodeSeconds{last_update_time} < GetAdjustedTime() - std::chrono::seconds{consensusParams.nPowTargetSpacing * NODE_NETWORK_LIMITED_MIN_BLOCKS};
    


    naumenkogs commented at 8:08 am on November 1, 2023:

    636d927c4a9cd5cbf4d0c22e1c25dc44ecaf7a69

    Is it still fair to call this state initial block download?


    furszy commented at 1:23 pm on November 1, 2023:
    That’s just me trying to save few characters. Will rename it to past_net_limited.

    maflcko commented at 1:30 pm on November 1, 2023:
    Any reason to use adjusted time, when the tip update time is recorded as NodeClock?

    furszy commented at 2:01 pm on November 1, 2023:

    Any reason to use adjusted time, when the tip update time is recorded as NodeClock?

    Good point. The idea was to considerate the peers’ time too, so the node has less chances of requesting a block at the limited peers threshold that does no longer exist in the other side. In other words, if the peer’s time deviates slightly from the node time, the peer could have pruned the block at the verge of the threshold. But we could achieve the same outcome by reducing the window size by one or two blocks. Which does not require to mix GetTime and GetAdjustedTime usages.


    maflcko commented at 2:52 pm on November 1, 2023:

    In other words, if the peer’s time deviates slightly from the node time

    I don’t think GetAdjustedTime offers this precision even. Also, it is not a per-peer offset, but a global one.


    furszy commented at 1:29 pm on November 2, 2023:

    In other words, if the peer’s time deviates slightly from the node time

    I don’t think GetAdjustedTime offers this precision even. Also, it is not a per-peer offset, but a global one.

    Yeah I know. I just thought it was better to do that than nothing. Still, I’m quite sure we cannot trigger the edge case I mentioned above only using core vanilla alone. The pruning process isn’t that aggressive for the time being. It might arise across different implementations but we can revisit it in a follow-up, the limited peers connection window can be easily shortened without any problem.

  48. furszy force-pushed on Nov 2, 2023
  49. furszy commented at 1:36 pm on November 2, 2023: member

    Updated per feedback. Thanks everyone.

    • Improved test coverage.
    • Moved GetAdjustedTime usage to GetTime.
    • Renamed in_ibd variable to past_net_limited.
    • Updated comments and code to clearly state that the responsibility of setting the ‘initial sync finished’ flag backwards is inside the ‘CheckForStaleTip’ process.
  50. andrewtoth commented at 3:47 pm on November 5, 2023: contributor
    ACK 5628ac55be42c5450c6817ba8dcfe463ceda32a9
  51. DrahtBot requested review from mzumsande on Nov 5, 2023
  52. andrewtoth approved
  53. in src/net_processing.cpp:896 in ef4a1a44dc outdated
    892@@ -892,7 +893,13 @@ class PeerManagerImpl final : public PeerManager
    893      */
    894     bool BlockRequested(NodeId nodeid, const CBlockIndex& block, std::list<QueuedBlock>::iterator** pit = nullptr) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    895 
    896-    bool TipMayBeStale() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    897+    enum class Staleness {
    


    naumenkogs commented at 8:31 am on November 7, 2023:

    ef4a1a44dca1af20adcc8fb0c14e8a31683a9d99

    nit: this enum could be better… First, StalePast also qualifies as Stale (just english-wise). Second, it’s a negation (non-synced).


    furszy commented at 2:24 pm on November 7, 2023:
    Hmm, naming is hard. As the function is called TipMayBeStale(), maybe calling it SyncState could also work. Thoughts? Or do you have anything else in mind?

    naumenkogs commented at 9:55 am on November 8, 2023:
    SyncState yeah that would work if you come up with a good enum values.

    furszy commented at 3:41 pm on November 8, 2023:

    SyncState yeah that would work if you come up with a good enum values.

    Maybe Synced, RecentStale, Stale? Named the second and third “stale” because these states have no in-flight block requests.

    Still, this could also be easily changed in the future if we come up with better names.

  54. in src/net_processing.cpp:2082 in ef4a1a44dc outdated
    2077@@ -2057,6 +2078,16 @@ void PeerManagerImpl::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlock
    2078 {
    2079     SetBestHeight(pindexNew->nHeight);
    2080 
    2081+    // Update internal sync state.
    2082+    // Avoid using 'fInitialDownload' alone since the flag currently cannot go backwards after passing IBD once.
    


    naumenkogs commented at 8:33 am on November 7, 2023:
    I’m wondering, does it even make sense to keep such flag at this point? What it would take to adjust all code to going back-and-forth w.r.t. superstale?

    furszy commented at 2:07 pm on November 7, 2023:

    I’m wondering, does it even make sense to keep such flag at this point? What it would take to adjust all code to going back-and-forth w.r.t. superstale?

    I think that would be an oxymoron. We would be adding code to put the node back into IBD state, because of a stale tip check, when the node is updating the tip.

    The fInitialDownload flag usage is just a speedup. We know for sure that when the flag is true, it is because the node is in IBD. So we don’t need to check anything else in such case.

    Aside from that, about the adjustments: the stale tip check uses the last tip updated time (m_last_tip_update) which is updated on every new block BlockConnected (which is executed prior to UpdatedBlockTip). So that would need to be moved to UpdatedBlockTip. Also, and more importantly for me, because the stale tip check checks the in-flight block requests map, it requires cs_main lock. Which I would try to avoid as much as possible as it usage slows down the net messages processing and validation thread. A little note about this last point; with something like #27837, we could guard mapBlocksInFlight under a different mutex and not cs_main.

  55. in src/net_processing.cpp:5337 in ef4a1a44dc outdated
    5335         m_connman.StartExtraBlockRelayPeers();
    5336         m_initial_sync_finished = true;
    5337+    } else if (staleness == Staleness::StalePastNetworkLimited) {
    5338+        // Disallow connection to limited peers.
    5339+        // We need to find a full node that can help us recover.
    5340+        m_initial_sync_finished = false;
    


    naumenkogs commented at 8:36 am on November 7, 2023:
    The word initial here is indeed confusing… Thinking about something happening only at node bootstrap, which is a more obvious version i think. Can we find a different word now that you’re changing it to certainly not mean that?

    furszy commented at 2:18 pm on November 7, 2023:

    The word initial here is indeed confusing… Thinking about something happening only at node bootstrap, which is a more obvious version i think. Can we find a different word now that you’re changing it to certainly not mean that?

    Hmm, first names coming to my mind are m_close_to_synced or m_close_to_tip? or, could use the existing term m_can_direct_fetch?

    Still, maybe better to leave it for a follow-up because, with the back-and-forth functionality, we could also use this flag instead of calling CanDirectFetch() which locks cs_main.


    andrewtoth commented at 3:42 pm on December 3, 2023:
    This flag is only used to connect to limited peers or not. In that case, why not remove the indirection about syncing and just call it allow_limited_peers?

    furszy commented at 3:55 pm on December 3, 2023:

    This flag is only used to connect to limited peers or not. In that case, why not remove the indirection about syncing and just call it allow_limited_peers?

    Because of what I wrote above, this flag can (and most likely will) be used in other places as well, fulfilling two goals: (1) minimizing cs_main locks, which improves the overall node response, and (2) minimizing the usage of the chainstate IBD flag, which lacks back-and-forth functionality.

    As this PR did not introduce the flag and it is fixing an issue, I think is fine to leave the flag name as is now. It could be revisited on a follow-up.


    andrewtoth commented at 4:24 pm on December 3, 2023:
    Right, then I think m_is_close_to_tip would make the most sense, since it’s set initially after calling IsBlockCloseToTip.

    furszy commented at 1:03 pm on December 4, 2023:
    @naumenkogs, do you agree on the naming as well? I’m happy to change it if we all agree. Better to be sync to not circle much around the naming here.

    naumenkogs commented at 9:00 am on December 5, 2023:
    yeah that would be better i think

    furszy commented at 7:42 pm on December 5, 2023:
    ok cool. Updated per feedback.
  56. furszy force-pushed on Nov 7, 2023
  57. furszy commented at 2:52 pm on November 7, 2023: member
    Updated per @andrewtoth feedback via dm. Improved a test comment for better clarity. Tiny diff.
  58. furszy requested review from naumenkogs on Nov 28, 2023
  59. naumenkogs commented at 9:05 am on December 5, 2023: member

    ACK 60a487dd2430145115fcd0215472a5a2ef9bb090.

    This would be an improvement.

  60. DrahtBot removed review request from naumenkogs on Dec 5, 2023
  61. DrahtBot requested review from andrewtoth on Dec 5, 2023
  62. furszy commented at 7:44 pm on December 5, 2023: member
    Updated per feedback (https://github.com/bitcoin/bitcoin/pull/28170#discussion_r1384553675). Thanks @andrewtoth and @naumenkogs. Tiny diff, only renamed m_initial_sync_finished to m_is_close_to_tip in a scripted-diff commit afc29b15e262b4ff402d479ec77ab8507552bcbc.
  63. naumenkogs commented at 10:10 am on December 6, 2023: member
    ACK afc29b15e262b4ff402d479ec77ab8507552bcbc
  64. DrahtBot removed review request from andrewtoth on Dec 6, 2023
  65. DrahtBot requested review from andrewtoth on Dec 6, 2023
  66. in src/net_processing.h:119 in afc29b15e2 outdated
    114+    /**
    115+     * Gets the set of service flags which are "desirable" for a given peer.
    116+     *
    117+     * These are the flags which are required for a peer to support for them
    118+     * to be "interesting" to us, ie for us to wish to use one of our few
    119+     * outbound connection slots for or for us to wish to prioritize keeping
    


    vasild commented at 10:37 am on December 7, 2023:

    nit:

    0     * outbound connection slots or for us to wish to prioritize keeping
    

    furszy commented at 1:26 pm on December 15, 2023:
    sure
  67. in src/net_processing.h:126 in afc29b15e2 outdated
    121+     *
    122+     * Relevant service flags may be peer- and state-specific in that the
    123+     * version of the peer may determine which flags are required (eg in the
    124+     * case of NODE_NETWORK_LIMITED where we seek out NODE_NETWORK peers
    125+     * unless they set NODE_NETWORK_LIMITED and we are out of IBD, in which
    126+     * case NODE_NETWORK_LIMITED suffices).
    


    vasild commented at 10:39 am on December 7, 2023:
    What is meant by “the version of the peer” here?

    furszy commented at 1:25 pm on December 15, 2023:

    What is meant by “the version of the peer” here?

    It is talking about the version message. NetMsgType::VERSION. Which informs the supported services to the other end.

  68. andrewtoth approved
  69. andrewtoth commented at 3:01 am on December 8, 2023: contributor
    ACK afc29b15e262b4ff402d479ec77ab8507552bcbc
  70. in src/net_processing.cpp:1328 in afc29b15e2 outdated
    1323+}
    1324+
    1325+static bool IsBlockCloseToTip(const CBlockIndex* index, const Consensus::Params& consensus)
    1326+{
    1327+    return index->Time() > GetAdjustedTime() - consensus.PowTargetSpacing() * 20;
    1328 }
    


    vasild commented at 9:53 am on December 8, 2023:
    naming: “close to tip” is misleading because there is no “tip” involved here. Just the block time is compared to the current time. IsBlockRecent() would be better. Or maybe even size_t BlockAge(const CBlockIndex& index) { return (GetAdjustedTime() - index.Time()) / consensus.PowTargetSpacing(); }

    furszy commented at 2:09 pm on December 15, 2023:
    sure, IsBlockRecent() sounds good.
  71. in src/net_processing.cpp:728 in afc29b15e2 outdated
    722@@ -721,8 +723,9 @@ class PeerManagerImpl final : public PeerManager
    723     bool RejectIncomingTxs(const CNode& peer) const;
    724 
    725     /** Whether we've completed initial sync yet, for determining when to turn
    726-      * on extra block-relay-only peers. */
    727-    bool m_initial_sync_finished GUARDED_BY(cs_main){false};
    728+      * on extra block-relay-only peers and the peer connections desirable
    729+      * services flags. */
    730+    std::atomic<bool> m_is_close_to_tip{false};
    


    vasild commented at 10:05 am on December 8, 2023:

    Tying together the “extra block-relay-only peers” and the “desirable service flags” seems unnecessary and maybe even wrong, because the conditions should be different. Flipping back to false is needed for “desirable service flags”, but not for “extra block-relay-only peers” because then this code will be executed more than once:

    https://github.com/bitcoin/bitcoin/blob/1d9da8da309d1dbf9aef15eb8dc43b4a2dc3d309/src/net_processing.cpp#L5268-L5269

    Which seems innocent, but is confusing and looks like a change that is not needed to fix the problem this PR aims to fix.


    furszy commented at 2:06 pm on December 15, 2023:

    There isn’t a behavior change here. StartExtraBlockRelayPeers() can only be executed once, then it runs for the entire software lifetime.

    I don’t think the base condition is different. Both action paths depend on the proximity to the tip. The extra block-relay-only peers functionality is activated when the node is close to the tip (20 blocks away). And the “desirable service flags” is modified on the opposite direction, when the node detect it is further than 288 blocks from the tip.

    I think we should try to consolidate concepts that can be unified instead of continuing adding independent checks for each feature/validation we have. Additionally, something that I had floating around my head when I did this was to transform the m_initial_sync_finished (now named m_is_close_to_tip) into a SyncState field with the different states the sync process could have (sync, close to tip, recent_lagging_behind, not sync) and a bool flag field to denote whether the chain is moving forward or not (if there is any in-flight blocks and the last received block is close to the node’s clock time etc).

  72. in src/net_processing.cpp:2084 in afc29b15e2 outdated
    2080+
    2081+    // Update internal sync state.
    2082+    // Avoid using 'fInitialDownload' alone since the flag currently cannot go backwards after passing IBD once.
    2083+    // This means that, after completing sync for the first time, 'fInitialDownload' will remain true even
    2084+    // when the node has been stalled for days and the arriving block is within the IBD window.
    2085+    if (!fInitialDownload && IsBlockCloseToTip(pindexNew, m_chainparams.GetConsensus())) {
    


    vasild commented at 10:26 am on December 8, 2023:
    Seems like !fInitialDownload && can be removed? If IsBlockCloseToTip() is true then that block is 3h20min or younger. In this case fInitialDownload will always be false because it uses 24h, right?

    furszy commented at 2:12 pm on December 15, 2023:

    Seems like !fInitialDownload && can be removed? If IsBlockCloseToTip() is true then that block is 3h20min or younger. In this case fInitialDownload will always be false because it uses 24h, right?

    It is a simple and quick check to not execute IsBlockCloseToTip() during IBD at all.

  73. in src/net_processing.cpp:2086 in afc29b15e2 outdated
    2082+    // Avoid using 'fInitialDownload' alone since the flag currently cannot go backwards after passing IBD once.
    2083+    // This means that, after completing sync for the first time, 'fInitialDownload' will remain true even
    2084+    // when the node has been stalled for days and the arriving block is within the IBD window.
    2085+    if (!fInitialDownload && IsBlockCloseToTip(pindexNew, m_chainparams.GetConsensus())) {
    2086+        // At this point, only move the flag forward. The responsibility of setting the flag backwards is on
    2087+        // the 'CheckForStaleTip' process, which performs a more exhaustive stale tip verification.
    


    vasild commented at 10:36 am on December 8, 2023:

    I find it confusing to have one condition to set the flag to true and a different condition to set it to false. I mean this:

    0if (A) flag = true;
    1if (B) flag = false;
    

    because, for example, it may happen that B is true and the flag is true (if the first if was not executed yet).

    In this case it can be simpler:

    0if (C) flag = true;
    1if (!C) flag = false;
    2// or just:
    3flag = C;
    

    and C should be just “our highest block is older than 48h” (regardless of whether mapBlocksInFlight is empty) => then don’t connect to limited peers.


    furszy commented at 8:49 pm on December 15, 2023:

    and C should be just “our highest block is older than 48h” (regardless of whether mapBlocksInFlight is empty) => then don’t connect to limited peers.

    Hmm, this observation makes me think.. what if m_is_close_to_tip=true and then the node lags behind; there could be an edge case scenario where the node starts slowly syncing-up the historical blocks before CheckForStaleTipAndEvictPeers() is executed, so mapBlocksInFlight wouldn’t be empty, so.. it would still connect to limited peers even when it is behind what they can provide. This is what you have in mind?


    vasild commented at 2:15 pm on December 18, 2023:
    I was just thinking on ways to simplify this. CheckForStaleTipAndEvictPeers() is executed every 45 seconds. I think it is ok if for a while (~ 10 mins) we think we are close to the tip but we are not in practice. Given that we realize this in a few minutes and don’t get stuck with the wrong assumption forever.
  74. vasild commented at 11:07 am on December 8, 2023: contributor

    Concept ACK. I acknowledge this PR resolves the problem. But the situation is somewhat convoluted in master and I am not sure this PR makes it any better. There are multiple ways to determine if we are stale:

    1. ChainstateManager::IsInitialBlockDownload(): uses -maxtipage (default 24h). Once it considers out of IBD, it stays like that regardless of the tip age (why? surely it is possible to lag behind again).

    2. g_initial_block_download_completed 2.1. Set by Chainstate::ActivateBestChain() -> PeerManagerImpl::UpdatedBlockTip() -> SetServiceFlagsIBDCache() with a value determined by ChainstateManager::IsInitialBlockDownload(), can be changed to/from true/false but in practice is never because of IsInitialBlockDownload(). 2.2. Used to determine desired peers’ service flags.

    3. PeerManagerImpl::m_initial_sync_finished 3.1. Set by PeerManagerImpl::CheckForStaleTipAndEvictPeers() depending on PeerManagerImpl::CanDirectFetch() which checks if the tip is older than 3h20min. Can only go from false to true (why?). 3.2. Used to determine whether to start extra block-relay-only connections.

    4. PeerManagerImpl::TipMayBeStale(): considers 30 minutes without tip change as stale (nb: not the tip age (!) for example if we updated the tip 5 minutes ago with a block that is 1 month old, then this method considers “not stale”). Also takes into account mapBlocksInFlight.empty(). 4.1. Used to try a new outbound peer if “stale”.

    Limited peers are supposed to keep at least 288 blocks (48h).

    In this PR:

    • g_initial_block_download_completed is removed
    • PeerManagerImpl::m_initial_sync_finished is used to determine desired peers’ service flags
    • PeerManagerImpl::m_initial_sync_finished is set based on a bunch of factors: m_stale_tip_check_time, m_chainman.m_blockman.LoadingBlocks(), m_connman.GetNetworkActive(), m_connman.GetUseAddrmanOutgoing(), m_last_tip_update, mapBlocksInFlight.empty(), CanDirectFetch() and ChainstateManager::IsInitialBlockDownload() (I find this a bit obscure).

    Approach NACK. I could be wrong, but I think that it should be possible to resolve the problem in a simpler way and without touching unrelated stuff (like when to start “extra block-relay-only connections”). What about removing g_initial_block_download_completed and replacing it with an expression “our highest block is older than 48h” inside GetDesirableServiceFlags()?

  75. furszy commented at 7:32 pm on December 12, 2023: member
    Thanks for the review vasild!. Will tackle all points in the coming days. I’m currently finishing few bug fixes priorities and will be fully here again.
  76. vasild commented at 2:04 pm on December 13, 2023: contributor

    Would this resolve the problem?

     0diff --git i/src/net_processing.cpp w/src/net_processing.cpp
     1index df54a62f28..ec9bbff8f2 100644
     2--- i/src/net_processing.cpp
     3+++ w/src/net_processing.cpp
     4@@ -2045,13 +2045,12 @@ void PeerManagerImpl::NewPoWValidBlock(const CBlockIndex *pindex, const std::sha
     5  * Update our best height and announce any block hashes which weren't previously
     6  * in m_chainman.ActiveChain() to our peers.
     7  */
     8 void PeerManagerImpl::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockIndex *pindexFork, bool fInitialDownload)
     9 {
    10     SetBestHeight(pindexNew->nHeight);
    11-    SetServiceFlagsIBDCache(!fInitialDownload);
    12 
    13     // Don't relay inventory during initial block download.
    14     if (fInitialDownload) return;
    15 
    16     // Find the hashes of all blocks that weren't previously in the best chain.
    17     std::vector<uint256> vHashes;
    18@@ -5265,12 +5264,19 @@ void PeerManagerImpl::CheckForStaleTipAndEvictPeers()
    19     }
    20 
    21     if (!m_initial_sync_finished && CanDirectFetch()) {
    22         m_connman.StartExtraBlockRelayPeers();
    23         m_initial_sync_finished = true;
    24     }
    25+
    26+    const std::chrono::seconds time_between_blocks{m_chainparams.GetConsensus().nPowTargetSpacing};
    27+    const NodeSeconds limited_peers_oldest_block_time{Now<NodeSeconds>() -
    28+                                                      time_between_blocks * NODE_NETWORK_LIMITED_MIN_BLOCKS};
    29+    const bool tip_within_limited_peers_reach{m_chainman.ActiveChain().Tip()->Time() >
    30+                                              limited_peers_oldest_block_time};
    31+    SetServiceFlagsIBDCache(tip_within_limited_peers_reach);
    32 }
    33 
    34 void PeerManagerImpl::MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::microseconds now)
    35 {
    36     if (m_connman.ShouldRunInactivityChecks(node_to, std::chrono::duration_cast<std::chrono::seconds>(now)) &&
    37         peer.m_ping_nonce_sent &&
    

    against master @ f0e829022a415c7c9513e715c532079ec7756306 plus maybe some renaming is warranted

    My addrman has 71837 addresses and from those 11795 are limited peers. The chance of getting all 10 peers limited is (11795 / 71837)10 which is really small. Was this observed in practice or is it more of a theoretical problem? (I think it should be fixed even if “theoretical”).

  77. furszy commented at 12:51 pm on December 14, 2023: member

    But the situation is somewhat convoluted in master and I am not sure this PR makes it any better. There are multiple ways to determine if we are stale.

    Each of those flags have a different meaning depending on where they are located. They are not equivalent and shouldn’t be used interchangeably. For instance: The flag at the chain state manager level tells us whether the node passed IBD or not based only on the chain information. The flag at the peers manager level tells us whether the node passed IBD or not based on the chain information, last received tip time and the in-flight block requests.

    The higher we go in terms of layering levels, the more complete the sync status information is.

    That being said, it is hard to disagree over the convoluted situation in master. One of the reasons behind the way I implemented this PR is to unify the different ways we check for a stale scenario at the PeerManager level (where we sometimes call CanDirectFetch() directly, others we call IsInitialBlockDownload() and others we use the m_initial_sync_finished field..).

    ChainstateManager::IsInitialBlockDownload(): uses -maxtipage (default 24h). Once it considers out of IBD, it stays like that regardless of the tip age (why? surely it is possible to lag behind again).

    I’m quite sure we don’t want to modify IsInitialBlockDownload() without a really good reason. It has been there for over a decade, and many processes depend on it. Instead, I advocate for a gradual migration of callers to not depend on it. This PR essentially moves in that direction by improving the PeerManager’s existing view of the synchronization state, which differs from the lower-level ChainStateManager view. This is because the concept of the ‘initial block download’ state is incomplete at the ChainStateManager level; it does not consider the in-flight blocks, any recent chain movement nor any network-level heuristic. Furthermore, this concept fails to describe what happens after initial-block-download.. what if the node lags behind for a really long time, etc. The staleness situations can be a bit more more complex than a simple ‘stale’-’not stale’ state obtained only by checking the diff between the tip time and the current clock time.

    On top of that, I’m pretty strong on stopping using functions that lock cs_main indiscriminately everywhere in the sources. The more we lock this mutex, the worse the overall node responsiveness becomes. With this PR changes, we are improving the situation and merging the base building blocks for other cs_main locks removals in the future as well.

    PeerManagerImpl::m_initial_sync_finished
> 3.1. Set by PeerManagerImpl::CheckForStaleTipAndEvictPeers() depending on PeerManagerImpl::CanDirectFetch() which checks if the tip is older than 3h20min. Can only go from false to true (why?).

    Because the field is called ‘initial sync finished’. Same terminology used at the chain state manager level. This field pertains to the initial block download state and does not reflect the node’s proximity to the tip during the entire node lifetime. It does not provide information about post-IBD staling situations. Hence, there are instances in the sources where CanDirectFetch() is directly called, resulting in repeated locking of cs_main to detect stale situations. Instead, we should aim to have this flag accurately represent the true synchronization state. Consequently, we can replace all calls to CanDirectFetch() and, possibly, other lower-level IsInitialBlockDownload() with this PeerManager level flag. (My idea is to submit follow-ups moving into this direction).

  78. furszy commented at 9:40 pm on December 14, 2023: member

    I think that it should be possible to resolve the problem in a simpler way and without touching unrelated stuff (like when to start “extra block-relay-only connections”)

    Please look #28170 (review). The convo started from a different angle but I think the outcome addresses your concern. We ended up agreeing that keeping the extra block relay connection logic as is in the current change set actually improves certain scenarios. For instance; if we are in a post-IBD stalling state, we do need more block relay connections. Could be full node outbound connections or extra block relay only connections. It doesn’t matter. We just need some peer who can help us sync.

    Would this resolve the problem?

    [patch] update g_initial_block_download_completed periodically based on our tip age

    Hmm, I wouldn’t be so happy with it. It is re-implementing existing pieces of code in a CPU-wasteful manner.

    Keep in mind that CheckForStaleTipAndEvictPeers() is executed every 45 seconds indefinitely. We don’t want to perform the checks and updates you added when the node is fully synced. These should only be executed when the node detects the tip lagging behind. To achieve this, we already have the TipMayBeStale() function, which runs only at a 10-minute interval to prevent CPU waste. Additionally, would say that the reason behind PeerManager caching the last time the tip was updated, and not the last tip time, is to differentiate between a node completely isolated, lagging behind, with no relevant connections vs a node that is lagging behind but it is connected to the network and syncing up the chain. These are two different scenarios: the first one requires new connections, meanwhile the second one does not (it could be syncing slow due to a poor internet connection, but the node is still syncing, which is positive).

    Also, the way we cache the last time the tip was updated inside PeerManager is great for avoiding unnecessary cs_main locks, which impact the entire node’s responsiveness (a key motivation behind the current implementation of this PR). A small note about this last point: we can and should do better. I think we will be able to remove the high-level cs_main lock at CheckForStaleTipAndEvictPeers() in a future future follow-up PR.

  79. furszy commented at 9:26 pm on December 15, 2023: member
    News, found something after all the walls of texts. Having the ´TipMayBeStale()´ call surrounded by GetUseAddrmanOutgoing() check makes the node accept limited peers incoming connections indiscriminately when it was initialized with outgoing connections disabled. Update: ^^ this is ok. We skip the desirable services flag check on inbound, feeler and manual connections. Not an issue.
  80. vasild commented at 3:32 pm on December 18, 2023: contributor

    I agree with what you say, except the below:

    It is re-implementing existing pieces of code

    Which?

    in a CPU-wasteful manner.

    Isn’t that just a few CPU instructions? (comparing the tip age to the current time). I agree it can be done less often than 45 seconds for the purpose of avoiding limited peers.

    a node completely isolated, lagging behind, with no relevant connections vs a node that is lagging behind but it is connected to the network and syncing up the chain

    Yes, these are two different scenarios in general. My understanding is that in both we better avoid limited peers.

  81. furszy force-pushed on Dec 19, 2023
  82. furszy commented at 9:03 pm on December 19, 2023: member

    PR reworked based on feedback. Thanks @vasild for the review rounds.

    The functionality is no longer tied to the last tip update time; is now linked to the actual tip time, which is now cached by PeerManager to prevent undesirable cs_main locks. This implies simpler code since we are no longer modifying TipMayBeStale() nor the m_initial_sync_finished field.

    I still believe that we should implement some of the changes made earlier, such as the PeerManager IBD flag modifications, which will allow us to clean up a good number of cs_main locks. However, I agree that these changes should be done in a separate PR.

    It is re-implementing existing pieces of code Which?

    Was talking about TipMayBeStale() which checks the time already. But.. I wasn’t contemplating the fact that it can return NotStale only because the node has in-flight blocks (which could be ancient..).

    Isn’t that just a few CPU instructions? (comparing the tip age to the current time). I agree it can be done less often than 45 seconds for the purpose of avoiding limited peers.

    Covered. Check it now.

  83. in src/init.cpp:1757 in fe8e907e91 outdated
    1753@@ -1754,13 +1754,15 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1754     // ********************************************************* Step 12: start node
    1755 
    1756     //// debug print
    1757+    uint64_t best_block_time;
    


    vasild commented at 9:56 am on December 22, 2023:
    GetBlockTime() returns int64_t. This could possibly cause some compiler warning about assigning signed integer to an unsigned variable. Better use the same type as the return type of the function.

    furszy commented at 2:15 pm on December 22, 2023:
    Done
  84. in src/net_processing.cpp:731 in fe8e907e91 outdated
    726@@ -721,6 +727,8 @@ class PeerManagerImpl final : public PeerManager
    727 
    728     /** The height of the best chain */
    729     std::atomic<int> m_best_height{-1};
    730+    /** The time of the best chain tip block */
    731+    std::atomic<std::chrono::seconds > m_best_block_time{0s};
    


    vasild commented at 9:58 am on December 22, 2023:

    nit:

    0    std::atomic<std::chrono::seconds> m_best_block_time{0s};
    
  85. in src/net_processing.h:96 in fe8e907e91 outdated
    92@@ -93,7 +93,7 @@ class PeerManager : public CValidationInterface, public NetEventsInterface
    93     virtual void SendPings() = 0;
    94 
    95     /** Set the best height */
    96-    virtual void SetBestHeight(int height) = 0;
    97+    virtual void SetBestBlock(int height, uint64_t time) = 0;
    


    vasild commented at 10:03 am on December 22, 2023:

    Maybe change the argument from uint64_t to std::chrono::seconds, so that the units of it are more clear. The comment warrants an update:

    0/** Set the height of the best block and its time (seconds since epoch). */
    1virtual void SetBestBlock(int height, std::chrono::seconds time) = 0;
    

    furszy commented at 2:15 pm on December 22, 2023:
    Done as suggested
  86. in src/net_processing.cpp:1684 in fe8e907e91 outdated
    1679+}
    1680+
    1681+ServiceFlags PeerManagerImpl::GetDesirableServiceFlags(ServiceFlags services) const
    1682+{
    1683+    if (services & NODE_NETWORK_LIMITED) {
    1684+        if (m_allow_limited_peers_conn) return ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS);
    


    vasild commented at 10:34 am on December 22, 2023:

    I like how the best block’s time is cached in PeerManagerImpl::m_best_block_time but further caching the expression “is the best block time older than 288 blocks” seems excessive and unnecessary to me. It makes the code more complex and obviously gets stale one second after it is set.

    Here you can use something like:

    0if (ApproximateBestBlockAgeInNumberOfBlocks() < NODE_NETWORK_LIMITED_MIN_BLOCKS) {
    1    return ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS);
    2}
    3...
    4int64_t ApproximateBestBlockAgeInNumberOfBlocks() {
    5    return (GetTime<std::chrono::seconds>() - m_best_block_time.load()) / m_chainparams.GetConsensus().nPowTargetSpacing;
    6}
    

    Then the following hunks can be dropped from commit 5b4e031b437922735ff2676fa90fbf3d404ab44f net: peer manager, dynamically adjust desirable services flag:

     0@@ -787,12 +787,15 @@ private:
     1     /** Number of preferable block download peers. */
     2     int m_num_preferred_download_peers GUARDED_BY(cs_main){0};
     3     
     4     /** Stalling timeout for blocks in IBD */
     5     std::atomic<std::chrono::seconds> m_block_stalling_timeout{BLOCK_STALLING_TIMEOUT_DEFAULT};
     6 
     7+    /** Whether limited peers connections are accepted or not */
     8+    std::atomic<bool> m_allow_limited_peers_conn{false};
     9+
    10     bool AlreadyHaveTx(const GenTxid& gtxid)
    11         EXCLUSIVE_LOCKS_REQUIRED(cs_main, !m_recent_confirmed_transactions_mutex);
    12     
    13     /**
    14      * Filter for transactions that were recently rejected by the mempool.
    15      * These are not rerequested until the chain tip changes, at which point
    16@@ -1072,12 +1075,22 @@ private:
    17      *            False if address relay is disallowed
    18      */
    19     bool SetupAddressRelay(const CNode& node, Peer& peer) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
    20     
    21     void AddAddressKnown(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
    22     void PushAddress(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
    23+
    24+    /**
    25+     * Updates the contextual information will be used within the connections
    26+     * desirable services flags decision-making process.
    27+     */
    28+    void MaybeUpdateDesirableConnOptions() {
    29+        // If tip is further than NODE_NETWORK_LIMITED_MIN_BLOCKS, we are really behind and shouldn't connect to limit
    30+        std::chrono::seconds time_between_blocks{m_chainparams.GetConsensus().nPowTargetSpacing};
    31+        m_allow_limited_peers_conn = m_best_block_time.load() > GetTime<std::chrono::seconds>() - time_between_blocks 
    32+    }
    33 };
    34 
    35 const CNodeState* PeerManagerImpl::State(NodeId pnode) const EXCLUSIVE_LOCKS_REQUIRED(cs_main)
    36 {
    37     std::map<NodeId, CNodeState>::const_iterator it = m_node_states.find(pnode);
    38     if (it == m_node_states.end())
    39@@ -2068,12 +2080,13 @@ void PeerManagerImpl::NewPoWValidBlock(const CBlockIndex *pindex, const std::sha
    40  * Update our best height and announce any block hashes which weren't previously
    41  * in m_chainman.ActiveChain() to our peers.
    42  */
    43 void PeerManagerImpl::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockIndex *pindexFork, bool fInitialDownlo
    44 {
    45     SetBestBlock(pindexNew->nHeight, pindexNew->GetBlockTime());
    46+    MaybeUpdateDesirableConnOptions();
    47     SetServiceFlagsIBDCache(!fInitialDownload);
    48     
    49     // Don't relay inventory during initial block download.
    50     if (fInitialDownload) return;
    51     
    52     // Find the hashes of all blocks that weren't previously in the best chain.
    53@@ -5272,21 +5285,28 @@ void PeerManagerImpl::CheckForStaleTipAndEvictPeers()
    54     
    55     auto now{GetTime<std::chrono::seconds>()};
    56     
    57     EvictExtraOutboundPeers(now);
    58     
    59     if (now > m_stale_tip_check_time) {
    60-        // Check whether our tip is stale, and if so, allow using an extra
    61-        // outbound peer
    62-        if (!m_chainman.m_blockman.LoadingBlocks() && m_connman.GetNetworkActive() && m_connman.GetUseAddrmanOutgoing(
    63-            LogPrintf("Potential stale tip detected, will try using extra outbound peer (last tip update: %d seconds a
    64-                      count_seconds(now - m_last_tip_update.load()));
    65-            m_connman.SetTryNewOutboundPeer(true);
    66-        } else if (m_connman.GetTryNewOutboundPeer()) {
    67-            m_connman.SetTryNewOutboundPeer(false);
    68+        // Skip checks when blocks are being loaded from disk or the network is disabled
    69+        if (!m_chainman.m_blockman.LoadingBlocks() && m_connman.GetNetworkActive()) {
    70+            // Check whether our tip is stale, and if so, allow using an extra outbound peer
    71+            if (m_connman.GetUseAddrmanOutgoing() && TipMayBeStale()) {
    72+                LogPrintf("Potential stale tip detected, will try using extra outbound peer (last tip update: %d secon
    73+                          count_seconds(now - m_last_tip_update.load()));
    74+                m_connman.SetTryNewOutboundPeer(true);
    75+            } else if (m_connman.GetTryNewOutboundPeer()) {
    76+                m_connman.SetTryNewOutboundPeer(false);
    77+            }
    78+
    79+            // Update connections desirable services flags in case we need to. For instance,
    80+            // the tip could be lagging behind, further than the limited peers threshold.
    81+            MaybeUpdateDesirableConnOptions();
    82         }
    83+
    84         m_stale_tip_check_time = now + STALE_CHECK_INTERVAL;
    85     }
    

    furszy commented at 2:17 pm on December 22, 2023:
    Sure, thanks!. Done as suggested.
  87. in src/test/peerman_tests.cpp:16 in fe8e907e91 outdated
    11+
    12+#include <boost/test/unit_test.hpp>
    13+
    14+BOOST_FIXTURE_TEST_SUITE(peerman_tests, RegTestingSetup)
    15+
    16+static constexpr int64_t NODE_NETWORK_LIMITED_MIN_BLOCKS = 288;
    


    vasild commented at 10:36 am on December 22, 2023:
    Better expose this constant in net_processing.h instead of duplicating it here. This file already includes net_processing.h.

    furszy commented at 2:47 pm on December 22, 2023:
    This is the only change I did not implement. I think we should discuss further where to place this constants before moving them to the header. It seems that with every new unit test, we will be forced to move some/most constants from net_processing.cpp to the header.

    vasild commented at 12:16 pm on January 3, 2024:

    No strong opinion. To me net_processing.h looks like the natural place to move/expose the constant(s) that were defined as static in net_processing.cpp if they need to be used outside of that cpp file. Do you somehow dislike that idea or have another one?

    I dislike having them duplicated because then there is a chance that they will go out of sync. Also, in semantic completion tools one can point to a symbol and ask “show me all places where this is used”. If what is used in the test is defined in the test file (duplicated) then usages of it from other files will not be shown.


    furszy commented at 3:19 pm on January 3, 2024:

    To me net_processing.h looks like the natural place to move/expose the constant(s) that were defined as static in net_processing.cpp if they need to be used outside of that cpp file. Do you somehow dislike that idea or have another one?

    I don’t totally dislike it, but was pondering the option of moving them to a separate file for two reasons; I find the spread of constants across the sources hard to maintain, and I was also thinking about another testing scenario: what if we want to exercise a net flow on a higher level, and use only the node interface without accessing net_processing.h classes?. For example, testing the eviction of net-limited peers when the node gets stalled. In this case, we might want to use the constant to setup the test without introducing a net_processing.h dependency (e.g. use node.connect(), then node.listPeers(), then, after certain context changes, call node.listPeers() again to verify the state).

    That being said, I’m probably going too far and we should just move this constant to the header for now.

  88. in src/protocol.cpp:130 in fe8e907e91 outdated
    122@@ -125,18 +123,6 @@ bool CMessageHeader::IsCommandValid() const
    123     return true;
    124 }
    125 
    126-
    127-ServiceFlags GetDesirableServiceFlags(ServiceFlags services) {
    128-    if ((services & NODE_NETWORK_LIMITED) && g_initial_block_download_completed) {
    


    vasild commented at 10:50 am on December 22, 2023:

    Commit a710e08398 net: move state dependent peer services flags moves this code to src/net_processing.cpp and changes from g_initial_block_download_completed to m_initial_sync_finished. The variable m_initial_sync_finished is used for something else and has a different semantic, thus using m_initial_sync_finished here is wrong. Later another commit changes from m_initial_sync_finished to the proper check.

    I think it would be better to reorder the commits, so that the proper check (either m_allow_limited_peers_conn or ApproximateBestBlockAgeInNumberOfBlocks() as I suggested in another comment) is already available in peerman and so when GetDesirableServiceFlags() is moved, it switches the proper check right away.


    furszy commented at 2:18 pm on December 22, 2023:
    Sure. Have reordered the commits and decoupled the behavior change into an standalone commit.
  89. vasild commented at 10:53 am on December 22, 2023: contributor
    Looks better!
  90. in src/net_processing.cpp:5298 in fe8e907e91 outdated
    5301+            if (m_connman.GetUseAddrmanOutgoing() && TipMayBeStale()) {
    5302+                LogPrintf("Potential stale tip detected, will try using extra outbound peer (last tip update: %d seconds ago)\n",
    5303+                          count_seconds(now - m_last_tip_update.load()));
    5304+                m_connman.SetTryNewOutboundPeer(true);
    5305+            } else if (m_connman.GetTryNewOutboundPeer()) {
    5306+                m_connman.SetTryNewOutboundPeer(false);
    


    vasild commented at 11:04 am on December 22, 2023:

    The conditions under which m_connman.SetTryNewOutboundPeer(false); is executed are not the same before and after this change. Maybe it is ok, but is an unrelated change to the aim of this PR:

    • before m_connman.SetTryNewOutboundPeer(false) would have been executed if this was true:
    0(m_chainman.m_blockman.LoadingBlocks() || !m_connman.GetNetworkActive() || !m_connman.GetUseAddrmanOutgoing() || !TipMayBeStale()) && m_connman.GetTryNewOutboundPeer()
    
    • after m_connman.SetTryNewOutboundPeer(false) will be executed if this is true:
    0!m_chainman.m_blockman.LoadingBlocks() && m_connman.GetNetworkActive() && (!m_connman.GetUseAddrmanOutgoing() || !TipMayBeStale()) && m_connman.GetTryNewOutboundPeer()
    

    Hopefully this entire hunk can be dropped.


    furszy commented at 2:17 pm on December 22, 2023:
    Dropped per comment #28170 (review). Thanks
  91. furszy force-pushed on Dec 22, 2023
  92. vasild approved
  93. vasild commented at 12:09 pm on January 3, 2024: contributor

    ACK 6c603490023317a84c7c96ef4b64f4f96ea03d1f

    It would be nice to avoid duplicating the constant NODE_NETWORK_LIMITED_MIN_BLOCKS (discussion) but I do not see that as a blocker of this PR.

    Thanks!

  94. DrahtBot requested review from naumenkogs on Jan 3, 2024
  95. DrahtBot requested review from andrewtoth on Jan 3, 2024
  96. DrahtBot removed review request from andrewtoth on Jan 3, 2024
  97. DrahtBot requested review from andrewtoth on Jan 3, 2024
  98. DrahtBot removed review request from andrewtoth on Jan 3, 2024
  99. DrahtBot requested review from andrewtoth on Jan 3, 2024
  100. DrahtBot removed review request from andrewtoth on Jan 3, 2024
  101. DrahtBot requested review from andrewtoth on Jan 3, 2024
  102. DrahtBot added the label CI failed on Jan 12, 2024
  103. in src/init.cpp:1757 in 8fff455dd6 outdated
    1753@@ -1754,13 +1754,15 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1754     // ********************************************************* Step 12: start node
    1755 
    1756     //// debug print
    1757+    int64_t best_block_time;
    


    andrewtoth commented at 10:15 pm on January 14, 2024:
    nit: int64_t best_block_time{};

    furszy commented at 1:31 pm on January 15, 2024:
    done
  104. in src/net.cpp:2276 in 9683d11268 outdated
    2272@@ -2273,7 +2273,7 @@ void CConnman::ThreadDNSAddressSeed()
    2273             AddAddrFetch(seed);
    2274         } else {
    2275             std::vector<CAddress> vAdd;
    2276-            ServiceFlags requiredServiceBits = GetDesirableServiceFlags(NODE_NONE);
    2277+            ServiceFlags requiredServiceBits = SeedsServiceFlags();
    


    andrewtoth commented at 10:17 pm on January 14, 2024:
    nit: prefer list initialization, could also make this constexpr since we’re modifying it. constexpr ServiceFlags requiredServiceBits{SeedsServiceFlags()};

    furszy commented at 1:31 pm on January 15, 2024:
    done
  105. in src/net_processing.cpp:1683 in 5b50a6d2c6 outdated
    1677@@ -1678,8 +1678,11 @@ bool PeerManagerImpl::HasAllDesirableServiceFlags(ServiceFlags services) const
    1678 
    1679 ServiceFlags PeerManagerImpl::GetDesirableServiceFlags(ServiceFlags services) const
    1680 {
    1681-    if (services & NODE_NETWORK_LIMITED && GetServicesFlagsIBDCache()) {
    1682-        return ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS);
    1683+    if (services & NODE_NETWORK_LIMITED) {
    1684+        // Limited peers are desirable when we are close to the tip.
    1685+        if (ApproximateBestBlockDepth() < NODE_NETWORK_LIMITED_MIN_BLOCKS) {
    


    andrewtoth commented at 10:29 pm on January 14, 2024:

    nit: Could be one line

    0    // Limited peers are desirable when we are close to the tip.
    1    if ((services & NODE_NETWORK_LIMITED) && ApproximateBestBlockDepth() < NODE_NETWORK_LIMITED_MIN_BLOCKS) {
    
  106. in src/test/peerman_tests.cpp:1 in 3f342bab41 outdated
    0@@ -0,0 +1,79 @@
    1+// Copyright (c) 2023-present The Bitcoin Core developers
    


    andrewtoth commented at 10:32 pm on January 14, 2024:
    nit: 2024

    furszy commented at 1:31 pm on January 15, 2024:
    done
  107. in src/test/peerman_tests.cpp:60 in 3f342bab41 outdated
    55+    // Check we disallow limited peers connections when we are further than the limited peers threshold (NODE_NETWORK_LIMITED_MIN_BLOCKS)
    56+    SetMockTime(GetTime<std::chrono::seconds>() + std::chrono::seconds{consensus.nPowTargetSpacing * 2});
    57+    peerman->CheckForStaleTipAndEvictPeers();
    58+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK | NODE_WITNESS));
    59+
    60+    // By now, we tested that 'CheckForStaleTipAndEvictPeers' can set update the connections desirable services flags.
    


    andrewtoth commented at 10:47 pm on January 14, 2024:

    nit

    0// By now, we tested that 'CheckForStaleTipAndEvictPeers' can update the connections desirable services flags.
    

    furszy commented at 1:31 pm on January 15, 2024:
    done
  108. in src/test/peerman_tests.cpp:69 in 3f342bab41 outdated
    64+    // First, verify a block in the past doesn't enable limited peers connections
    65+    // At this point, our time is (NODE_NETWORK_LIMITED_MIN_BLOCKS + 1) * 10 minutes ahead the tip's time.
    66+    mineBlock(m_node, /*block_time=*/std::chrono::seconds{tip_block_time + 1});
    67+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK | NODE_WITNESS));
    68+
    69+    // Verify a block close to the tip enable limited peers connections
    


    andrewtoth commented at 10:48 pm on January 14, 2024:

    nit

    0// Verify a block close to the tip enables limited peers connections
    

    furszy commented at 1:31 pm on January 15, 2024:
    done
  109. andrewtoth approved
  110. andrewtoth commented at 11:06 pm on January 14, 2024: contributor

    ACK 6c603490023317a84c7c96ef4b64f4f96ea03d1f modulo nits

    For peerman_tests.cpp, we should prefer list initialization and constness for variables when possible.

    For commit message of net: move state dependent peer services flags (b934bfd327f68d39aebb2d9ab52b39b7cd29525d), the last sentence could be:

    0Additionally, this encapsulation enables us
    1to customize the connections decision-making process based on
    2new users' configurations in the future.
    
  111. net: store best block tip time inside PeerManager
    And implement 'ApproximateBestBlockDepth()' to estimate
    the distance, in blocks, between the best-known block
    and the network chain tip. Utilizing the best-block time
    and the chainparams blocks spacing to approximate it.
    97df4e3887
  112. net: decouple state independent service flags from desirable ones
    This former one will be moved to the peer manager class in the
    following-up commit.
    f9ac96b8d6
  113. net: move state dependent peer services flags
    No behavior change. Just an intermediate refactoring.
    
    By relocating the peer desirable services flags into the peer
    manager, we allow the connections acceptance process to handle
    post-IBD potential stalling scenarios.
    
    In the follow-up commit(s), the desirable service flags will be
    dynamically adjusted to detect post-IBD stalling scenarios (such
    as a +48-hour inactive node that must prefer full node connections
    instead of limited peer connections because they cannot provide
    historical blocks). Additionally, this encapsulation enable us
    to customize the connections decision-making process based on
    new user's configurations in the future.
    9f36e591c5
  114. furszy force-pushed on Jan 15, 2024
  115. furszy commented at 1:33 pm on January 15, 2024: member
    Thanks for the review @andrewtoth! All nits tackled, small diff. Ready to go.
  116. DrahtBot removed the label CI failed on Jan 15, 2024
  117. furszy requested review from vasild on Jan 16, 2024
  118. furszy requested review from andrewtoth on Jan 16, 2024
  119. vasild approved
  120. vasild commented at 2:17 pm on January 16, 2024: contributor
    ACK 79e10351b1ce1f8db98ece67aacc24f323008b37
  121. DrahtBot removed review request from andrewtoth on Jan 16, 2024
  122. DrahtBot requested review from andrewtoth on Jan 16, 2024
  123. andrewtoth commented at 5:10 pm on January 16, 2024: contributor
    ACK 79e10351b1ce1f8db98ece67aacc24f323008b37
  124. DrahtBot removed review request from andrewtoth on Jan 16, 2024
  125. in src/test/peerman_tests.cpp:47 in 783f0dd9bf outdated
    42+    uint64_t tip_block_time = tip->GetBlockTime();
    43+    int tip_block_height = tip->nHeight;
    44+    peerman->SetBestBlock(tip_block_height, std::chrono::seconds{tip_block_time});
    45+
    46+    SetMockTime(tip_block_time + 1); // Set node time to tip time
    47+    peerman->CheckForStaleTipAndEvictPeers();  // Trigger status update
    


    mzumsande commented at 10:05 pm on January 16, 2024:
    I don’t think that CheckForStaleTipAndEvictPeers triggers any status updates with the current approach, only UpdatedBlockTip does. Maybe this is left from an earlier version? In any case, if I remove all calls to it, the test still succeeds.

    furszy commented at 1:38 pm on January 17, 2024:
    yeah, leftover from the previous implementation. Cleaning them. Thanks!
  126. mzumsande commented at 10:49 pm on January 16, 2024: contributor

    If I understand it correctly, the logic deciding whether to try and accept outbound connections, using HasAllDesirableServiceFlags(), relies on IsInitialBlockDownload() / DEFAULT_MAX_TIP_AGE (24h) on master, but is changed to rely on NODE_NETWORK_LIMITED_MIN_BLOCKS (288 blocks, ~48h) in this PR.

    I think this means that the safety buffer of 144 blocks proposed in BIP159 would have been removed. I’m not sure about that, the distinction of serving up to 288 blocks from the tip when pruning ourselves, but using a smaller window of ~144 blocks for connecting to limited peers made sense to me.

  127. DrahtBot requested review from mzumsande on Jan 16, 2024
  128. furszy commented at 1:09 pm on January 17, 2024: member

    If I understand it correctly, the logic deciding whether to try and accept outbound connections, using HasAllDesirableServiceFlags(), relies on IsInitialBlockDownload() / DEFAULT_MAX_TIP_AGE (24h) on master, but is changed to rely on NODE_NETWORK_LIMITED_MIN_BLOCKS (288 blocks, ~48h) in this PR.

    I think this means that the safety buffer of 144 blocks proposed in BIP159 would have been removed. I’m not sure about that, the distinction of serving up to 288 blocks from the tip when pruning ourselves, but using a smaller window of ~144 blocks for connecting to limited peers made sense to me.

    Hmm, good point. I would slightly rephrase, and extend, your first paragraph to state that the net limited buffer is connected to a user-customizable parameter, -maxtipage, with its default value is set to 24h. Because, this implies that when the user modifies -maxtipage beyond the net limited threshold, it violates BIP159 rules and could cause the node to connect to limited peers during IBD, potentially blocking progress until the remote peer disconnects (per #28120). This could lead to delays and, in the worst-case scenario, trigger the issue we are fixing here for the post-IBD stalling scenario but during IBD.

    So, I agree on setting the connections window to the BIP’s number (144 blocks) and am glad we are addressing another possible, edge case, bug with this. Thanks for the review!

  129. furszy force-pushed on Jan 17, 2024
  130. furszy commented at 1:53 pm on January 17, 2024: member

    Updated per review, thanks @mzumsande!

    Have reduced the limited peers connections window to 144 blocks (previously set to the limited peers threshold value) as it is encouraged by the BIP159. And cleaned previous implementation leftovers in the test. Code diff.

  131. in src/test/peerman_tests.cpp:58 in 2d050cca9c outdated
    53+
    54+    // Check we disallow limited peers connections when we are further than the limited peers safety window
    55+    SetMockTime(GetTime<std::chrono::seconds>() + std::chrono::seconds{consensus.nPowTargetSpacing * 2});
    56+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK | NODE_WITNESS));
    57+
    58+    // By now, we tested that 'CheckForStaleTipAndEvictPeers' can update the connections desirable services flags.
    


    mzumsande commented at 3:31 pm on January 17, 2024:
    one more reference to CheckForStaleTipAndEvictPeers left

    furszy commented at 3:43 pm on January 17, 2024:
    fixed. Thanks
  132. mzumsande commented at 3:36 pm on January 17, 2024: contributor

    Hmm, good point. I would slightly rephrase, and extend, your first paragraph to state that the net limited buffer is connected to a user-customizable parameter, -maxtipage, with its default value is set to 24h.

    Yes, but -maxtipage is just an undocumented debug-only option meant for testing, nothing to be changed by actual users on mainnet.

  133. furszy force-pushed on Jan 17, 2024
  134. furszy commented at 3:54 pm on January 17, 2024: member

    Hmm, good point. I would slightly rephrase, and extend, your first paragraph to state that the net limited buffer is connected to a user-customizable parameter, -maxtipage, with its default value is set to 24h.

    Yes, but -maxtipage is just an undocumented debug-only option meant for testing, nothing to be changed by actual users on mainnet.

    For sure. Nothing to be changed. It was just good to mention the possible issue. Still, on a slightly connected note; you would be surprised by the amount of config files I saw running this option in mainnet in my previous job. Including exchanges.

  135. vasild commented at 1:48 pm on January 19, 2024: contributor

    HasAllDesirableServiceFlags(), relies on IsInitialBlockDownload() / DEFAULT_MAX_TIP_AGE (24h) on master … I think this means that the safety buffer of 144 blocks proposed in BIP159 would have been removed.

    I do not think -maxtipage / DEFAULT_MAX_TIP_AGE implements “the safety buffer” from BIP159. The former was introduced about 2 years before BIP159, in 2015: #5987.

    BIP159 reads:

    A safety buffer of 144 blocks to handle chain reorganizations SHOULD be taken into account when connecting to a peer signaling the NODE_NETWORK_LIMITED service bit

    I do not fully get it. Can somebody elaborate what is that? With an example of what could go wrong if this “safety buffer” is ignored.

    I’m not sure about that, the distinction of serving up to 288 blocks from the tip when pruning ourselves, but using a smaller window of ~144 blocks for connecting to limited peers made sense to me

    Elsewhere we serve/request 288 +/- 2 blocks from limited peers, to avoid races. Is this what you mean? This makes sense to me, but 144 blocks (24h) to avoid races seems excessive.

  136. mzumsande commented at 6:17 pm on January 19, 2024: contributor

    I do not fully get it. Can somebody elaborate what is that? With an example of what could go wrong if this “safety buffer” is ignored.

    I think one reason is that the estimation in IsInitialBlockDownload() (or now ApproximateBestBlockDepth()) is not precise because block header timestamps could be wrong (within limits). So if we used something very close to 2 days, we could think we are just out of IBD and connect to lots of NODE_NETWORK_LIMITED peers, which would not serve us the next block we needed, because the limit of serving up to 288 blocks from the tip is absolute (not influenced by timestamps). 144 blocks may still be very conservative / excessive, but it doesn’t seem super important to optimize here, when in practice it usually just means that during IBD we wait a bit longer before we accept NODE_NETWORK_LIMITED peers?

    I’m not sure about the reorg scenario mentioned in the BIP.

  137. furszy commented at 7:44 pm on January 19, 2024: member

    I implemented @mzumsande’s proposed changes because they maintain the existing workflow and document the previously obscure behavior. The objective of this PR is to fix the post-IBD stalling scenario. It would be good to leave the discussion about the “safety buffer” number for a follow-up.

    That being said, I agree with @vasild that the number is quite over-conservative for no apparent reason and could be increased.

    144 blocks may still be very conservative / excessive, but it doesn’t seem super important to optimize here, when in practice it usually just means that during IBD we wait a bit longer before we accept NODE_NETWORK_LIMITED peers?

    It isn’t matter of waiting a bit longer, I think that being able to connect to more peers always help. More when they can actually help you sync.

  138. vasild commented at 4:59 pm on January 22, 2024: contributor
    Ok, fine. Then, just a comments nit, the recently added comments “Per BIP159, safety buffer” and “BIP159 connections safety window” better be removed? Because that is not “A safety buffer of 144 blocks to handle chain reorganizations”?
  139. net: peer manager, dynamically adjust desirable services flag
    Introduces functionality to detect when limited peers connections
    are desirable or not. Ensuring that the new connections desirable
    services flags stay relevant throughout the software's lifecycle.
    (Unlike the previous approach, where once the validation IBD flag
    was set, the desirable services flags remained constant forever).
    
    This will let us recover from stalling scenarios where the node had
    successfully synced, but subsequently dropped connections and remained
    inactive for a duration longer than the limited peers threshold (the
    timeframe within which limited peers can provide blocks). Then, upon
    reconnection to the network, the node may end up only establishing
    connections with limited peers, leading to an inability to synchronize
    the chain.
    
    This also fixes a possible limited peers threshold violation during IBD,
    when the user configures `-maxtipage` further than the BIP159's limits.
    This rule violation could lead to sync delays and, in the worst-case
    scenario, trigger the same post-IBD stalling scenario (mentioned above)
    but during IBD.
    6ed53602ac
  140. test: add coverage for peerman adaptive connections service flags aff7d92b15
  141. net: remove now unused global 'g_initial_block_download_completed' 27f260aa6e
  142. furszy force-pushed on Jan 23, 2024
  143. furszy commented at 1:26 pm on January 23, 2024: member

    Ok, fine. Then, just a comments nit, the recently added comments “Per BIP159, safety buffer” and “BIP159 connections safety window” better be removed? Because that is not “A safety buffer of 144 blocks to handle chain reorganizations”?

    Sure @vasild. Updated. Thanks! Small diff.

  144. vasild approved
  145. vasild commented at 9:41 am on January 24, 2024: contributor

    ACK 27f260aa6e04f82dad78e9a06d58927546143a27

    Wrt the interface addition discussed in #28170 (review), I think it is not perfect but I do not see it as a blocker for this PR. Would be nice to see the discussion resolved in one way or another.

  146. DrahtBot requested review from mzumsande on Jan 24, 2024
  147. DrahtBot requested review from andrewtoth on Jan 24, 2024
  148. in src/net_processing.h:128 in 9f36e591c5 outdated
    123+     * version of the peer may determine which flags are required (eg in the
    124+     * case of NODE_NETWORK_LIMITED where we seek out NODE_NETWORK peers
    125+     * unless they set NODE_NETWORK_LIMITED and we are out of IBD, in which
    126+     * case NODE_NETWORK_LIMITED suffices).
    127+     *
    128+     * Thus, generally, avoid calling with 'services' == NODE_NONE, unless
    


    naumenkogs commented at 8:56 am on January 29, 2024:

    9f36e591c551ec2e58a6496334541bfdae8fdfe5

    i see that this is a copy-paste from another place, but since it’s technically green (and i want to sure it still makes sense after your change), can you explain what is this about? especially the returned desirable service flags are guaranteed to not change dependent on state


    furszy commented at 1:51 pm on January 29, 2024:

    9f36e59

    i see that this is a copy-paste from another place, but since it’s technically green (and i want to sure it still makes sense after your change), can you explain what is this about? especially the returned desirable service flags are guaranteed to not change dependent on state

    Yeah sure. It means that when NODE_NONE is provided, the function ensures to consistently return the default service flags (NODE_NETWORK | NODE_WITNESS), regardless of the node’s context.

    I think that, for now, leaving the comment is fine given that the behavior wasn’t changed. However, I totally agree that this phrase will need to be revisited in the near future. PeerManager will always return context specific set of service flags. I’m planning to expand this function for the second part of #29183; where I basically need to make the node connect to peers signaling NODE_COMPACT_FILTERS up to a certain outbound slots limit.

  149. in src/net_processing.cpp:1681 in 6ed53602ac outdated
    1679@@ -1678,8 +1680,11 @@ bool PeerManagerImpl::HasAllDesirableServiceFlags(ServiceFlags services) const
    1680 
    1681 ServiceFlags PeerManagerImpl::GetDesirableServiceFlags(ServiceFlags services) const
    1682 {
    1683-    if (services & NODE_NETWORK_LIMITED && GetServicesFlagsIBDCache()) {
    


    naumenkogs commented at 9:01 am on January 29, 2024:

    6ed53602ac7c565273b5722de167cb2569a0e381

    if you touch it again, i suggest removing GetServicesFlagsIBDCache definition here rather than in the last commit. At this point it’s already unused. Saves some review time.


    furszy commented at 1:54 pm on January 29, 2024:
    sure. Will do if I have to re-touch.
  150. in src/test/peerman_tests.cpp:74 in aff7d92b15 outdated
    69+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS));
    70+
    71+    // Lastly, verify the stale tip checks can disallow limited peers connections after not receiving blocks for a prolonged period.
    72+    SetMockTime(GetTime<std::chrono::seconds>() + std::chrono::seconds{consensus.nPowTargetSpacing * NODE_NETWORK_LIMITED_ALLOW_CONN_BLOCKS + 1});
    73+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK | NODE_WITNESS));
    74+}
    


    naumenkogs commented at 9:07 am on January 29, 2024:

    aff7d92b1500e2478ce36a7e86ae47df47dda178

    would be nice to test reacting to reorgs.


    furszy commented at 1:55 pm on January 29, 2024:
    Sure. As it would be really nice to progress on this PR, will leave it follow-up. Thanks.
  151. naumenkogs approved
  152. naumenkogs commented at 9:08 am on January 29, 2024: member
    ACK 27f260aa6e04f82dad78e9a06d58927546143a27
  153. DrahtBot removed review request from andrewtoth on Jan 29, 2024
  154. DrahtBot requested review from andrewtoth on Jan 29, 2024
  155. DrahtBot removed review request from andrewtoth on Jan 29, 2024
  156. DrahtBot requested review from andrewtoth on Jan 29, 2024
  157. DrahtBot removed review request from andrewtoth on Jan 29, 2024
  158. DrahtBot requested review from andrewtoth on Jan 29, 2024
  159. DrahtBot removed review request from andrewtoth on Jan 29, 2024
  160. DrahtBot requested review from andrewtoth on Jan 29, 2024
  161. in src/test/peerman_tests.cpp:39 in aff7d92b15 outdated
    34+    std::unique_ptr<PeerManager> peerman = PeerManager::make(*m_node.connman, *m_node.addrman, nullptr, *m_node.chainman, *m_node.mempool, {});
    35+    auto consensus = m_node.chainman->GetParams().GetConsensus();
    36+
    37+    // Check we start connecting to full nodes
    38+    ServiceFlags peer_flags{NODE_WITNESS | NODE_NETWORK_LIMITED};
    39+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK | NODE_WITNESS));
    


    mzumsande commented at 8:13 pm on January 30, 2024:
    could also have a check for HasAllDesirableServiceFlags each time GetDesirableServiceFlags is checked.

    furszy commented at 10:34 pm on January 30, 2024:
    Sure, noted. As it is a nit and would love to move forward, will leave it for a small follow-up. Thanks!
  162. mzumsande commented at 9:58 pm on January 30, 2024: contributor

    Light Code Review ACK 27f260aa6e04f82dad78e9a06d58927546143a27

    I don’t have a strong opinion on the interface issue (I read the discussion and can see both points of view), but I do think that having a global variable g_initial_block_download_completed that lives in protocol.h and is updated dynamically from net_processing is an ugly layer violation too, so removing that is an important improvement from the architectural point of view.

  163. DrahtBot removed review request from andrewtoth on Jan 30, 2024
  164. DrahtBot requested review from andrewtoth on Jan 30, 2024
  165. DrahtBot removed review request from andrewtoth on Jan 30, 2024
  166. DrahtBot requested review from andrewtoth on Jan 30, 2024
  167. DrahtBot removed review request from andrewtoth on Jan 30, 2024
  168. DrahtBot requested review from andrewtoth on Jan 30, 2024
  169. in src/test/peerman_tests.cpp:71 in 27f260aa6e
    66+
    67+    // Verify a block close to the tip enables limited peers connections
    68+    mineBlock(m_node, /*block_time=*/GetTime<std::chrono::seconds>());
    69+    BOOST_CHECK(peerman->GetDesirableServiceFlags(peer_flags) == ServiceFlags(NODE_NETWORK_LIMITED | NODE_WITNESS));
    70+
    71+    // Lastly, verify the stale tip checks can disallow limited peers connections after not receiving blocks for a prolonged period.
    


    andrewtoth commented at 1:13 am on January 31, 2024:
    As noted in #28170 (review), stale tip checks don’t factor into the current approach. This check seems redundant now with the one on lines 54-56.

    furszy commented at 1:40 am on January 31, 2024:

    As noted in #28170 (comment), stale tip checks don’t factor into the current approach.

    Sure. As it is a nit, will update the comment in the follow-up PR. Thanks!.

    This check seems redundant now with the one on lines 54-56.

    I think it’s worth checking whether the services flags can still change after receiving blocks from the network. The lines 54-56 are checking updates prior receiving any block.

  170. andrewtoth approved
  171. andrewtoth commented at 1:14 am on January 31, 2024: contributor
    ACK 27f260aa6e04f82dad78e9a06d58927546143a27
  172. achow101 commented at 4:10 pm on January 31, 2024: member
    ACK 27f260aa6e04f82dad78e9a06d58927546143a27
  173. achow101 merged this on Jan 31, 2024
  174. achow101 closed this on Jan 31, 2024

  175. furszy deleted the branch on Jan 31, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me