Without this fix, feeler connections will only feel Witness nodes.
Fixes #9072 and reduces the impact of #9051
I suspect not everyone will agree on the selection criteria, but this can be changed.
NAK. Non-witness peers are still connected to, though the node makes sure that it doesn't get partitioned. This undoes an intentional design decision. It is not acceptable to have the network topology suddenly change all at once time when segwit activates, due to the risk of creating partitioning (we can't fetch blocks from non-witness peers once segwit is active: so we need to be sure that we're connecting to witness peers before it activates). The approach used in the codebase was intended to only have nodes behavior change when they are upgraded, so that the change would phase in gradually in the network (and if it caused problems users could revert to escape it).
If it were just limited to feelers, that might be good though-- since we'd like to learn that they've been upgraded, but it doesn't need to know if witness is active or not. It could just ignore the witness flag for feeler probes-- any interest in changing your patch to do just that?
It would have been good if the decision to not connect out to witness peers were better documented.
@gmaxwell Thank you for reviewing this. I'm not sure I understand the partitioning risk you mention. With the currently implementation, I'd already call this partitioned in that 0.13.1 nodes that can make only outbound connections rarely and only briefly are connecting to non-witness nodes, whereas with this change, only half would be forced to connect to witness nodes, and the other half would connected "un-forced" thereby being roughly related to the general distribution of the network (currently about 20% running with node_witness?) thereby for a configuration of 8 outbound connections, it would mean 5 out of those 8 are connected to Witness nodes, compared to 100% of outbound connections without this PR.
The current code behaves in a way that makes me wonder if it is doing what it was designed to do. It looks like it was designed to do similar to what this PR makes it do - i.e. force half the connections to connect to Witness nodes, and allow the other half to do as they wish, however, because it works by enforcing witness nodes once the connection count reaches half, and the number of nodes connected to, once a node is up and running, is very unlikely to suddenly lose half of its outbound connections, then as outbound connections from time to time are lost, those initial 4 connections which may have connected to non-witness nodes would be replaced by connections which can only connect to Witness nodes, meaning that after a short while 100% of the outbound connections are connected to witness nodes only with no ability (unless internet is lost long enough to disconnect all outbound connections) to connect to non-witness nodes.
Is this current functionality what was intended?
The only change that happens when SegWit activates, with this PR, is that the minimum of 4 Witness nodes increases to 6 - I can remove this code though so that it stays at 4 - thereby achieving what I suspect the initial design intended to achieve.
Also, in the event that SegWit doesn't activate and timesout, the code as it currently is with 0.13.1 is such that it will only ever use half of the outgoing connections, i.e. if maxoutbound is 8, then after 4 outbound connections are established, it will not connect to any peer unless it is node_witness, which after the timeout, will be none. (Not that I am wanting this to happen, I am in favor of SegWit).
Concept ACK if you make the behaviour not depend on segwit being active or not, and just whether or not we have 4 connections to node_witness nodes.
The reason is that it's critical not to be partitioned off segwit nodes at the time it activates (and afterwards), as there are numerous hard-to-reason about edge cases right at activation time. As connections tend to persist for a long time, this means we need to start preferring a while before (potential) activation.
@sipa I've just re-read your comment and I think you may have misunderstood the functionality of this PR as it was. It forces a minimum of 4 connections to Witness nodes prior to SegWit activation, and after activation is makes the minimum 6. Therefore, the it is already doing the "start preferring a while before" that you mention. The current functionality (partly due to #8949), which I think may have been unintended, is to connect exclusively to witness nodes.
Do you still want me to change it so that it stays at a minimum of 4 connections before and after activation (rather than 4 before and 6 after)? I thought 4 was a little low given the functionality is currently 100% (8 for default MaxOutbound), so I settled on 6 as a middle-road.
I'm also making a fix so that rather than allowing fWitnessActive to be initialised as false, it's initialised in init.cpp to be true or false based on the active tip at initialization.
1683 | @@ -1681,8 +1684,10 @@ void CConnman::ThreadOpenConnections() 1684 | if (nANow - addr.nLastTry < 600 && nTries < 30) 1685 | continue; 1686 | 1687 | - // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up. 1688 | - if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1))) 1689 | + // only consider nodes missing relevant services we have at least 4 already, 1690 | + // and at least 6 once SegWit has activated 1691 | + if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 && 1692 | + (nRelevant < 5 || (fWitnessActive && nRelevant < 7))) && !fFeeler)
Several problems with this:
std::atomic<bool> instead.@sipa preferential peering already happens before the activation with this PR. The only reason for checking IsWitnessEnabled is to make it even more preferential afterwards. In order to make it "more preferential" it seems to me that the network code does need to know about such concept - how else could it be done please?
@rebroad Yes, I believe the current logic of only allowing non-relevant peers before 4 outbounds results in an unintentionally strong preference for witness peers, so I like using the nRelevant you have instead, and not applying it to feelers. I don't think there is a need for making the behaviour conditional on segwit being active though - that just results in more edge cases.
Without this fix, feeler connections will only feel Witness nodes.
Fixes #9072 and reduces the impact of #9051
1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections() 1683 | if (nANow - addr.nLastTry < 600 && nTries < 30) 1684 | continue; 1685 | 1686 | - // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up. 1687 | - if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1))) 1688 | + // only consider nodes missing relevant services we have at least 4 already, 1689 | + // and at least 6 once SegWit has activated
This comment seems outdated.
1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections() 1683 | if (nANow - addr.nLastTry < 600 && nTries < 30) 1684 | continue; 1685 | 1686 | - // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up. 1687 | - if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1))) 1688 | + // only consider nodes missing relevant services we have at least 4 already, 1689 | + // and at least 6 once SegWit has activated 1690 | + if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 &&
I believe the && between nTries < 40 and nRelevant < 5 should be an ||.
I intentionally changed it to && instead of || as I felt || was too strong, but can change it back if this is preferred.
1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections() 1683 | if (nANow - addr.nLastTry < 600 && nTries < 30) 1684 | continue; 1685 | 1686 | - // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up. 1687 | - if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1))) 1688 | + // only consider nodes missing relevant services we have at least 4 already, 1689 | + // and at least 6 once SegWit has activated 1690 | + if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 && 1691 | + nRelevant < 5) && !fFeeler)
Using (nMaxOutBound >> 1) avoided a hard coded constant here.
I felt a minimum of 4 was a better criteria than half of the default 8 as if someone reduced maxOutbound then I think fewer than 4 is too few.
Seems I was wrong:
14:39:41 < gmaxwell> sipa: "an unintentionally strong preference for witness peers" <
well it was as strong as I intend it to be, after SW activates I was planning on merging
the NODE_WITNESS preference into the node-network one and making it absolute
and not a preference. In releases post sw I think we shouldn't connect out at all to non
sw peers.
So this should probably just change the nFeelers clause.
@rebroad The goal is to never connect to anything but segwit nodes. The connection count rule is just there because initially there may just not be enough segwit peers. Old nodes are not useful for you to connect out to, as they can't give blocks anymore after activation. We of course still allow them to connect in.
@sipa Given that SegWit has not activated, and from the current charts never will, then it seems entirely premature to be avoiding non-SegWit nodes at all, doesn't it? At least until there is a reasonable chance of SegWit looking likely to activate.