Fix peer selection so that non-Witness peers are still connected to #9082

rebroad commented at 3:50 AM on November 4, 2016: contributor

Without this fix, feeler connections will only feel Witness nodes.

Fixes #9072 and reduces the impact of #9051

I suspect not everyone will agree on the selection criteria, but this can be changed.

rebroad force-pushed on Nov 4, 2016

fanquake added the label P2P on Nov 4, 2016

gmaxwell commented at 8:57 AM on November 4, 2016: contributor

NAK. Non-witness peers are still connected to, though the node makes sure that it doesn't get partitioned. This undoes an intentional design decision. It is not acceptable to have the network topology suddenly change all at once time when segwit activates, due to the risk of creating partitioning (we can't fetch blocks from non-witness peers once segwit is active: so we need to be sure that we're connecting to witness peers before it activates). The approach used in the codebase was intended to only have nodes behavior change when they are upgraded, so that the change would phase in gradually in the network (and if it caused problems users could revert to escape it).

gmaxwell commented at 9:03 AM on November 4, 2016: contributor

If it were just limited to feelers, that might be good though-- since we'd like to learn that they've been upgraded, but it doesn't need to know if witness is active or not. It could just ignore the witness flag for feeler probes-- any interest in changing your patch to do just that?

It would have been good if the decision to not connect out to witness peers were better documented.

rebroad commented at 11:52 AM on November 4, 2016: contributor

@gmaxwell Thank you for reviewing this. I'm not sure I understand the partitioning risk you mention. With the currently implementation, I'd already call this partitioned in that 0.13.1 nodes that can make only outbound connections rarely and only briefly are connecting to non-witness nodes, whereas with this change, only half would be forced to connect to witness nodes, and the other half would connected "un-forced" thereby being roughly related to the general distribution of the network (currently about 20% running with node_witness?) thereby for a configuration of 8 outbound connections, it would mean 5 out of those 8 are connected to Witness nodes, compared to 100% of outbound connections without this PR.

The current code behaves in a way that makes me wonder if it is doing what it was designed to do. It looks like it was designed to do similar to what this PR makes it do - i.e. force half the connections to connect to Witness nodes, and allow the other half to do as they wish, however, because it works by enforcing witness nodes once the connection count reaches half, and the number of nodes connected to, once a node is up and running, is very unlikely to suddenly lose half of its outbound connections, then as outbound connections from time to time are lost, those initial 4 connections which may have connected to non-witness nodes would be replaced by connections which can only connect to Witness nodes, meaning that after a short while 100% of the outbound connections are connected to witness nodes only with no ability (unless internet is lost long enough to disconnect all outbound connections) to connect to non-witness nodes.

Is this current functionality what was intended?

The only change that happens when SegWit activates, with this PR, is that the minimum of 4 Witness nodes increases to 6 - I can remove this code though so that it stays at 4 - thereby achieving what I suspect the initial design intended to achieve.

rebroad commented at 12:20 PM on November 4, 2016: contributor

Also, in the event that SegWit doesn't activate and timesout, the code as it currently is with 0.13.1 is such that it will only ever use half of the outgoing connections, i.e. if maxoutbound is 8, then after 4 outbound connections are established, it will not connect to any peer unless it is node_witness, which after the timeout, will be none. (Not that I am wanting this to happen, I am in favor of SegWit).

rebroad force-pushed on Nov 5, 2016

sipa commented at 8:19 PM on November 5, 2016: member

Concept ACK if you make the behaviour not depend on segwit being active or not, and just whether or not we have 4 connections to node_witness nodes.

The reason is that it's critical not to be partitioned off segwit nodes at the time it activates (and afterwards), as there are numerous hard-to-reason about edge cases right at activation time. As connections tend to persist for a long time, this means we need to start preferring a while before (potential) activation.

rebroad cross-referenced this on Nov 6, 2016 from issue Disk IO > 90% on when running 0.13.1 on Ubuntu by vogelito

rebroad commented at 4:01 AM on November 6, 2016: contributor

@sipa thank you for reviewing this, and I will make the change you advise.

rebroad commented at 4:10 AM on November 6, 2016: contributor

@sipa I've just re-read your comment and I think you may have misunderstood the functionality of this PR as it was. It forces a minimum of 4 connections to Witness nodes prior to SegWit activation, and after activation is makes the minimum 6. Therefore, the it is already doing the "start preferring a while before" that you mention. The current functionality (partly due to #8949), which I think may have been unintended, is to connect exclusively to witness nodes.

Do you still want me to change it so that it stays at a minimum of 4 connections before and after activation (rather than 4 before and 6 after)? I thought 4 was a little low given the functionality is currently 100% (8 for default MaxOutbound), so I settled on 6 as a middle-road.

I'm also making a fix so that rather than allowing fWitnessActive to be initialised as false, it's initialised in init.cpp to be true or false based on the active tip at initialization.

in src/net.cpp:None in 59ea0a2d30 outdated

1683 | @@ -1681,8 +1684,10 @@ void CConnman::ThreadOpenConnections()
1684 |              if (nANow - addr.nLastTry < 600 && nTries < 30)
1685 |                  continue;
1686 |  
1687 | -            // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up.
1688 | -            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1)))
1689 | +            // only consider nodes missing relevant services we have at least 4 already,
1690 | +            // and at least 6 once SegWit has activated
1691 | +            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 &&
1692 | +                    (nRelevant < 5 || (fWitnessActive && nRelevant < 7))) && !fFeeler)

sipa commented at 6:50 AM on November 6, 2016:

Several problems with this:

You're reading fWitnessActive here without holding cs_main, so you have a potential data race (even though in this particular case, likely every supported platform will actually do what you'd expect, this is undefined behaviour according to the C++ standard, and the compiler is allowed to make this do anything - including deleting your files). To correctly write what you're trying to do here, you'd need to make fWitnessActive an std::atomic<bool> instead.
Having an fWitnessActive in the network code is a huge layer violation. The network code shouldn't need to know about such concepts.
As I've explained, we need the connectivity with witness nodes at the time the activation occurs - which means the preferential peering is most important before the actual activation. The easiest way to accomplish that is by doing it always.

rebroad commented at 7:18 AM on November 6, 2016:

@sipa preferential peering already happens before the activation with this PR. The only reason for checking IsWitnessEnabled is to make it even more preferential afterwards. In order to make it "more preferential" it seems to me that the network code does need to know about such concept - how else could it be done please?

rebroad commented at 4:35 AM on November 7, 2016:

@sipa and by the way, thank you for the comments regarding C++ standards - useful to know I need to brush up on them.

sipa commented at 6:44 AM on November 7, 2016:

@rebroad Read my 3rd point. Just don't.

rebroad force-pushed on Nov 6, 2016

sipa commented at 7:17 AM on November 6, 2016: member

@rebroad Yes, I believe the current logic of only allowing non-relevant peers before 4 outbounds results in an unintentionally strong preference for witness peers, so I like using the nRelevant you have instead, and not applying it to feelers. I don't think there is a need for making the behaviour conditional on segwit being active though - that just results in more edge cases.

rebroad commented at 7:20 AM on November 6, 2016: contributor

@sipa Have now removed the check for Witness activation. Thanks again.

Fix peer selection so that non-Witness peers are still connected to

Without this fix, feeler connections will only feel Witness nodes.

Fixes #9072 and reduces the impact of #9051

050f44836e

rebroad force-pushed on Nov 6, 2016

in src/net.cpp:None in 050f44836e

1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections()
1683 |              if (nANow - addr.nLastTry < 600 && nTries < 30)
1684 |                  continue;
1685 |  
1686 | -            // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up.
1687 | -            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1)))
1688 | +            // only consider nodes missing relevant services we have at least 4 already,
1689 | +            // and at least 6 once SegWit has activated

sipa commented at 9:33 AM on November 6, 2016:

This comment seems outdated.

in src/net.cpp:None in 050f44836e

1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections()
1683 |              if (nANow - addr.nLastTry < 600 && nTries < 30)
1684 |                  continue;
1685 |  
1686 | -            // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up.
1687 | -            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1)))
1688 | +            // only consider nodes missing relevant services we have at least 4 already,
1689 | +            // and at least 6 once SegWit has activated
1690 | +            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 &&

sipa commented at 9:52 AM on November 6, 2016:

I believe the && between nTries < 40 and nRelevant < 5 should be an ||.

rebroad commented at 6:35 PM on November 24, 2016:

I intentionally changed it to && instead of || as I felt || was too strong, but can change it back if this is preferred.

in src/net.cpp:None in 050f44836e

1682 | @@ -1681,8 +1683,10 @@ void CConnman::ThreadOpenConnections()
1683 |              if (nANow - addr.nLastTry < 600 && nTries < 30)
1684 |                  continue;
1685 |  
1686 | -            // only consider nodes missing relevant services after 40 failed attempts and only if less than half the outbound are up.
1687 | -            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 || nOutbound >= (nMaxOutbound >> 1)))
1688 | +            // only consider nodes missing relevant services we have at least 4 already,
1689 | +            // and at least 6 once SegWit has activated
1690 | +            if ((addr.nServices & nRelevantServices) != nRelevantServices && (nTries < 40 &&
1691 | +                    nRelevant < 5) && !fFeeler)

sipa commented at 9:53 AM on November 6, 2016:

Using (nMaxOutBound >> 1) avoided a hard coded constant here.

rebroad commented at 6:35 PM on November 24, 2016:

I felt a minimum of 4 was a better criteria than half of the default 8 as if someone reduced maxOutbound then I think fewer than 4 is too few.

sipa commented at 2:27 AM on November 8, 2016: member

Seems I was wrong:

14:39:41 < gmaxwell> sipa: "an unintentionally strong preference for witness peers" <
well it was as strong as I intend it to be, after SW activates I was planning on merging
the NODE_WITNESS preference into the node-network one and making it absolute
and not a preference. In releases post sw I think we shouldn't connect out at all to non
sw peers.

So this should probably just change the nFeelers clause.

rebroad commented at 6:37 PM on November 24, 2016: contributor

@sipa still seems odd to me that it'll only connect to non SegWit nodes upon start-up and then never again once those connections have dropped. Intentional or not.

sipa commented at 8:18 PM on November 24, 2016: member

@rebroad The goal is to never connect to anything but segwit nodes. The connection count rule is just there because initially there may just not be enough segwit peers. Old nodes are not useful for you to connect out to, as they can't give blocks anymore after activation. We of course still allow them to connect in.

rebroad commented at 4:54 AM on January 26, 2017: contributor

@sipa Given that SegWit has not activated, and from the current charts never will, then it seems entirely premature to be avoiding non-SegWit nodes at all, doesn't it? At least until there is a reasonable chance of SegWit looking likely to activate.

fanquake closed this on Aug 12, 2017

bitcoin locked this on Sep 8, 2021