[tests] Allow outbound & block-relay-only connections in functional tests.

amitiuttarwar commented at 9:25 pm on June 17, 2020: contributor

The existing functional test framework uses the addnode RPC to spin up manual connections between bitcoind nodes. This limits our ability to add integration tests for our networking code, which often executes different code paths for different connection types.

This PR enables creating outbound & block-relay-only P2P connections in the functional tests. This allows us to increase our p2p test coverage, since we can now verify expectations around these connection types.

This builds out the prototype proposed by ajtowns in #14210. 🙌🏽

An overview of this branch:

introduces a new test-only RPC function addconnection which initiates opening an outbound or block-relay-only connection. (conceptually similar to addnode but for different connection types & restricted to regtest)
adds test_framework support so a mininode can open an outbound/block-relay-only connection to a P2PInterface/P2PConnection.
updates p2p_blocksonly tests to create a block-relay-only connection & verify expectations around transaction relay.
introduces p2p_add_connections test that checks the behaviors of the newly introduced add_outbound_p2p_connection test framework function.

With these changes, there are many more behaviors that we can add integration tests for. The blocksonly updates is just one example.

Huge props to ajtowns for conceiving the approach & providing me feedback as I’ve built out this branch. Also thank you to jnewbery for lots of thoughtful input along the way.

DrahtBot added the label P2P on Jun 17, 2020

DrahtBot added the label RPC/REST/ZMQ on Jun 17, 2020

DrahtBot added the label Tests on Jun 17, 2020

DrahtBot added the label Validation on Jun 17, 2020

DrahtBot commented at 11:09 pm on June 17, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20726 (p2p: Add DISABLETX message for negotiating block-relay-only connections by sdaftuar)
#20277 (p2p: Stop processing unrequested transactions during IBD and extend p2p_ibd_txrelay.py coverage by ariard)
#20012 (rpc: Remove duplicate name and argNames from CRPCCommand by MarcoFalke)
#18261 (Erlay: bandwidth-efficient transaction relay protocol by naumenkogs)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jnewbery commented at 7:49 pm on June 18, 2020: member

Concept ACK. Being able to explicitly test different connection types including outbounds will be really useful.

ariard commented at 4:05 am on July 3, 2020: member

Concept ACK, that’s a substantial move to increase coverage of p2p! @t-bast, see https://github.com/ariard/bitcoin/commits/2020-07-tda-mitigation-block-relay, based on this PR, it should allow you to open manual block-relay-only connection to a side-node without leaking its presence due to addr-relay. That’s one of the mitigation we talked offline against time-dilation and it should prevent you against unknown in-protocol eclipses. This is experimental software ofc, beware of not killing any kittens :)

in src/net.cpp:391 in b43b7dba9f outdated

366@@ -367,8 +367,10 @@ static CAddress GetBindAddress(SOCKET sock)
367     return addr_bind;
368 }
369 
370-CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, bool manual_connection, bool block_relay_only)
371+CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type)

t-bast commented at 6:59 am on July 3, 2020:

very nit: conn_type or connType?

jnewbery commented at 3:28 pm on July 3, 2020:

conn_type is correct. See the developer guide: https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md#coding-style-c

t-bast commented at 4:04 pm on July 3, 2020:

Thanks for the pointer, I was wondering about the inconsistency between variables (addrConnect instead of addr_connect), but this probably falls under

These are preferred in new code, but are not required when doing so would need changes to significant pieces of existing code.

in src/net.h:119 in b43b7dba9f outdated

112@@ -113,6 +113,14 @@ struct CSerializedNetMsg
113     std::string command;
114 };
115 
116+enum class ConnectionType {
117+    INBOUND, // peer initiated connections
118+    OUTBOUND, // full relay connections (blocks, addrs, txns)
119+    MANUAL, // connections to addresses added via addnode or the connect command line argument

t-bast commented at 7:04 am on July 3, 2020:

My 2 cents (feel free to ignore as I’m clearly very new to the bitcoin codebase): it feels to me that the distinction manual vs non-manual (automatic?) is orthogonal to the connection type and could be a separate enum.

IIUC MANUAL can be composed with either OUTBOUND, FEELER, BLOCK_RELAY or ADDR_FETCH.

jnewbery commented at 3:29 pm on July 3, 2020:

See rationale here: #19316 (comment) and here: #19316 (comment)

t-bast commented at 4:08 pm on July 3, 2020:

Thanks, I should have done my homework and read that PR first (as was clearly said in the PR description!).

t-bast commented at 7:05 am on July 3, 2020: member

Neat, thanks for the notification! Concept ACK.

jnewbery commented at 3:35 pm on July 3, 2020: member

@t-bast I’d strongly encourage you to not use this branch or anything based on it for anything other than testing:

it’s not merged yet, so still unreviewed
it’s intended for testing only, and there are no guarantees about how it would work long-term in production (for example, the way that CConnman tracks the number of open connections is not updated to take account of these test connections. The commit here: https://github.com/ariard/bitcoin/commit/ea9943251b051a59d1b4cc7ae0ff34aeaea441f4 doesn’t fix that)
since it’s for testing only and not a public method, it may be removed or changed at any time.

t-bast commented at 4:10 pm on July 3, 2020: member

Thanks for the concerns @jnewbery, no worries I’m not planning on using anything else than official bitcoind releases for eclair, I’m simply adding my concept ACK on the overall direction that will be quite helpful for hardening lightning!

ariard commented at 5:02 pm on July 3, 2020: member

@jnewbery thanks to press on the warning. The provided commit is for experimentation-only and MUST NOT be used in production for reasons aforementioned.

For anyone reading this and providing more context, Lightning nodes interested to prevent against time-dilation attacks (a no-hashrate eclipse variant against offchain protocols) should have multiple access to the chain view. As of today, the classic infra setup is to run the LN stack on top of one full-node, if this one gets sybilled and partitioned from the rest of the network, the channel funds are at risk. Running a self-hosted or controlled secondary node connected to your main node would mitigate at least eclipses due to weaknesses in addr management or peer selection. As of today, we do have automatic block-relay-only peers, for which in theory their topology should stay masked to an external observer. But a bug in peer selection may be leveraged to lure block-only-relay connections to attacker controlled-peers. Adding redundant manual block-relay-only connections will prevent this. Manual block-only secondary nodes won’t fit there as they do leak addr-relay, which can be used to learn about their presence.

Provided commit (https://github.com/ariard/bitcoin/commit/ea9943251b051a59d1b4cc7ae0ff34aeaea441f4) let you experiment such primary/secondary block-relay-only topology setup. There is no guarantee that anything will stay stable. If by playing with it, you do see value in this configuration, please concept ack and help to review downstream branches. If you don’t understand what you’re doing don’t use it or ask questions.

DrahtBot added the label Needs rebase on Jul 7, 2020

fanquake referenced this in commit ce3bdd0ed1 on Aug 12, 2020

sidhujag referenced this in commit a338d21a64 on Aug 14, 2020

amitiuttarwar force-pushed on Aug 18, 2020

jnewbery commented at 6:59 pm on August 18, 2020: member

Yay rebase! You can remove the “This PR builds on #19316. Please review that first.” from the PR description. Is this PR now ready for review?

amitiuttarwar commented at 7:01 pm on August 18, 2020: contributor

hahha I was going to let the tests pass first, but yes it should be ready for review :)

DrahtBot removed the label Needs rebase on Aug 18, 2020

jnewbery removed the label RPC/REST/ZMQ on Aug 18, 2020

jnewbery removed the label Validation on Aug 18, 2020

in src/rpc/net.cpp:900 in e00848600f outdated

896@@ -841,6 +897,9 @@ static const CRPCCommand commands[] =
897     { "network",            "setnetworkactive",       &setnetworkactive,       {"state"} },
898     { "network",            "getnodeaddresses",       &getnodeaddresses,       {"count"} },
899     { "hidden",             "addpeeraddress",         &addpeeraddress,         {"address", "port"} },
900+
901+    /* Not shown in help */

jnewbery commented at 9:59 am on August 20, 2020:

Either remove this comment or place it above addpeeraddress, since they’re both not shown in help.

amitiuttarwar commented at 8:46 pm on August 24, 2020:

done

in test/functional/test_framework/mininode.py:68 in e00848600f outdated

64@@ -65,7 +65,7 @@
65     NODE_WITNESS,
66     sha256,
67 )
68-from test_framework.util import wait_until
69+from test_framework.util import wait_until, MAX_NODES, p2p_port

jnewbery commented at 10:04 am on August 20, 2020:

style: sort and place on separate lines

amitiuttarwar commented at 8:46 pm on August 24, 2020:

done

in test/functional/test_framework/mininode.py:151 in e00848600f outdated

150-        return conn_gen
151+        logger.debug('Connecting to Bitcoin Node: %s:%d' % (self.dstaddr, self.dstport))
152+        coroutine = loop.create_connection(lambda: self, host=self.dstaddr, port=self.dstport)
153+        return lambda: loop.call_soon_threadsafe(loop.create_task, coroutine)
154+
155+    def peer_outbound_connect(self, dstaddr, dstport, connect_cb, connect_id=0, *, net, timeout_factor):

jnewbery commented at 10:22 am on August 20, 2020:

Consider renaming this function (and the equivalents in P2PInterface and TestNode) to peer_accept_connection() or similar. From the perspective of the P2PConnection, this is an inbound connection since it’s initiated by the bitcoind node.

amitiuttarwar commented at 9:06 pm on August 21, 2020:

great point. renaming this & the equivalent on P2PInterface to peer_accept_connection() is much clearer.

but for TestNode, I think add_outbound_p2p_connection() makes sense? its a connection from node -> p2p conn, so outbound for node. inbound / accept for the P2P connections.

jnewbery commented at 9:17 am on August 24, 2020:

Good point. It makes sense for the TestNode function to stay as add_outbound_p2p_connection().

amitiuttarwar commented at 8:47 pm on August 24, 2020:

renamed both instances of peer_outbound_connect to peer_accept_connection. thanks!

in test/functional/p2p_blocksonly.py:82 in e00848600f outdated

80+        conn = self.nodes[0].add_outbound_p2p_connection(P2PTxInvStore(), connection_type="blockrelay")
81+
82+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)
83+        self.nodes[0].sendrawtransaction(tx_hex)
84+        conn.sync_with_ping()
85+        assert(txid not in conn.get_invs())

jnewbery commented at 10:28 am on August 20, 2020:

This doesn’t actually test that the transaction won’t get announced by the node, since transaction announcements are batched and only sent when the nNextInvSend timer pops. The sync_with_ping() above doesn’t guarantee that there’s nothing in the setInventoryTxToSend waiting to be sent (ie I think this test would still pass if this were an outbound-full-relay peer). I don’t have any good suggestions except maybe adding a comment. Testing for the absence of something is hard!

amitiuttarwar commented at 8:53 pm on August 24, 2020:

ah, yeah I missed that. luckily nNextInvSend is mockable so I’m able to bump time forward to ensure the transaction would send. I also fixed a bug with the matching condition (was comparing different types). I think the test now functions correctly- if you remove the connection_type="blockrelay" param when adding the p2p conn, the tx sends and the test fails. what do you think?

amitiuttarwar commented at 9:23 pm on November 17, 2020:

resolving this conversation so we can continue over here: #19315 (review)

in test/functional/p2p_blocksonly.py:99 in e00848600f outdated

85+        assert(txid not in conn.get_invs())
86+
87+    def check_p2p_tx_violation(self, index=1):
88+        self.log.info('Check that txs from P2P are rejected and result in disconnect')
89+        input_txid = self.nodes[0].getblock(self.nodes[0].getblockhash(index), 2)['tx'][0]['txid']
90+        tx = create_transaction(self.nodes[0], input_txid, self.nodes[0].getnewaddress(), amount=50 - 0.001)

jnewbery commented at 10:35 am on August 20, 2020:

getnewaddress requires a wallet, so this test fails without the wallet compiled. Can you replace it with a hard-coded address?

amitiuttarwar commented at 8:54 pm on August 24, 2020:

so simple! done :)

amitiuttarwar commented at 5:53 am on August 25, 2020:

ah, looks like the blocktools helper uses wallet, so I ended up just adding skip_if_no_wallet and reverting this.

jnewbery commented at 9:27 am on August 25, 2020:

ok, it’s a bit of a shame to require the wallet to be compiled to run this test. Ideally, I think we’d want just the wallet_ tests to require the wallet, but we’re a long way from that.

No need to change this here, but a useful future project would be to eliminate the wallet dependency from the non-wallet tests.

amitiuttarwar commented at 11:35 pm on September 21, 2020:

I haven’t looked too closely at #19800 yet, but I think I should now be able to replace this create_transaction call with miniwallet.send_self_transfer and remove the wallet dependency? does this sound right? cc @MarcoFalke

MarcoFalke commented at 3:16 pm on September 22, 2020:

does this sound right? cc @MarcoFalke

no idea ;)

Just try it and let me know if you run into any problems. If there are bugs in miniwallet, feel free to fix them along the way.

jonatack commented at 4:35 pm on November 20, 2020:

OCD nit, maybe enclose 50 - 0.001 in parens, or better yet, assign this magical entity to a well-named variable (or with a comment) that documents what/why

amitiuttarwar commented at 7:50 pm on November 28, 2020:

resolving this conversation, tracking as a future improvement at #19315 (comment)

jnewbery commented at 10:36 am on August 20, 2020: member

This is wonderful. It’s really useful functionality for testing, and the implementation is simple, clean and easy to follow.

I’ve left a few style comments inline. The test needs tweaking to work without wallet compiled in, but otherwise I think this could be merged as-is.

fanquake deleted a comment on Aug 22, 2020

amitiuttarwar force-pushed on Aug 24, 2020

amitiuttarwar force-pushed on Aug 25, 2020

in src/rpc/net.cpp:306 in 3bb87e7fb2 outdated

301+            HelpExampleCli("addconnection", "\"192.168.0.6:8333\" \"outbound\"")
302+            + HelpExampleRpc("addconnection", "\"192.168.0.6:8333\" \"outbound\"")
303+        },
304+    };
305+
306+    help.Check(request);

MarcoFalke commented at 9:09 am on August 25, 2020:

nit: For new code it might be good to use the non-deprecated constructor (see e.g. #19528)

amitiuttarwar commented at 11:41 pm on September 21, 2020:

done! how’s it look now?

q: is there anywhere that captures the current conventions around RPC functions? during original implementation I was looking around, but clearly wasn’t able to successfully identify which methods are preferred

in src/rpc/net.cpp:900 in 3bb87e7fb2 outdated

895@@ -840,7 +896,10 @@ static const CRPCCommand commands[] =
896     { "network",            "clearbanned",            &clearbanned,            {} },
897     { "network",            "setnetworkactive",       &setnetworkactive,       {"state"} },
898     { "network",            "getnodeaddresses",       &getnodeaddresses,       {"count"} },
899+
900+    /* Not shown in help */

MarcoFalke commented at 9:10 am on August 25, 2020:

nit: Would be good to either consistently mention this in every file where “hidden” is used or nowhere. Maybe just keep the newline?

amitiuttarwar commented at 11:41 pm on September 21, 2020:

removed

in test/functional/p2p_blocksonly.py:12 in 3bb87e7fb2 outdated

13+from test_framework.messages import msg_tx
14+from test_framework.mininode import P2PInterface, P2PTxInvStore
15 from test_framework.test_framework import BitcoinTestFramework
16 from test_framework.util import assert_equal
17 
18-

MarcoFalke commented at 9:12 am on August 25, 2020:

nit: Any reason for this unrelated change? This also violates pep8

amitiuttarwar commented at 11:42 pm on September 21, 2020:

oops, fixed now

in test/functional/test_framework/mininode.py:137 in 3bb87e7fb2 outdated

133@@ -130,7 +134,7 @@ def __init__(self):
134     def is_connected(self):
135         return self._transport is not None
136 
137-    def peer_connect(self, dstaddr, dstport, *, net, timeout_factor):
138+    def peer_connect_helper(self, dstaddr, dstport, net, timeout_factor):

MarcoFalke commented at 9:17 am on August 25, 2020:

nit: Any reason to remove the star? I think keyword args that are simply passed on in python are denoted by **kwargs in intermediate function calls in the stack, not by explicitly listing them each. Though, I am not a python expert, so feel free to ignore.

robot-dreams commented at 3:32 pm on August 27, 2020:

For context, it looks like the star forces subsequent arguments to be present and named:

 0def foo(a, b, *, c):
 1    print(a, b, c)
 2
 3foo(1, 2)
 4# TypeError: foo() missing 1 required keyword-only argument: 'c'
 5
 6foo(1, 2, 3)
 7# TypeError: foo() takes 2 positional arguments but 3 were given
 8
 9foo(1, 2, c=3)
10# 1 2 3

ajtowns commented at 0:38 am on August 28, 2020:

Yeah, the * forces subsequent params to be specified by keyword not position

amitiuttarwar commented at 11:44 pm on September 21, 2020:

yup, thats why I removed the * from the helper.

also note that the * is still present in the peer_connect function, just git is showing the diff since I added the helper. I don’t fully understand what’s going on here with the * on the actual functions though (peer_connect, peer_accept_connection), so please let me know if I’m missing something / should dig in more.

in test/functional/p2p_blocksonly.py:81 in 3bb87e7fb2 outdated

74+        self.log.info('Tests with node in normal mode with block-relay-only connections')
75+        self.restart_node(0, ["-noblocksonly"])  # disables blocks only mode
76+        assert_equal(self.nodes[0].getnetworkinfo()['localrelay'], True)
77+
78+        # Ensure we disconnect if a block-relay-only connection sends us a transaction
79+        self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")

MarcoFalke commented at 9:24 am on August 25, 2020:

The line times out intermittently: https://cirrus-ci.com/task/5292757864415232?command=ci#L5215

 0 test  2020-08-25T06:50:18.197000Z TestFramework.node0 (DEBUG): RPC successfully started 
 1 node0 2020-08-25T06:50:18.198698Z [http] Received a POST request for / from 127.0.0.1:43672 
 2 node0 2020-08-25T06:50:18.199052Z [httpworker.2] ThreadRPCServer method=getnetworkinfo user=__cookie__ 
 3 test  2020-08-25T06:50:18.202000Z TestFramework.mininode (DEBUG): Listening for Bitcoin Node: 0: 
 4 node0 2020-08-25T06:50:19.011416Z [scheduler] Leaving InitialBlockDownload (latching to false) 
 5 node0 2020-08-25T06:51:15.798367Z [scheduler] Feeding 32910 bytes of dynamic environment data into RNG 
 6 test  2020-08-25T06:51:18.213000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' 
 7                                           test_function = lambda: self.is_connected
 8                                   '''
 9 test  2020-08-25T06:51:18.213000Z TestFramework (ERROR): Assertion failed 
10                                   Traceback (most recent call last):
11                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 118, in main
12                                       self.run_test()
13                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_blocksonly.py", line 26, in run_test
14                                       self.blocks_relay_conn_tests()
15                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_blocksonly.py", line 76, in blocks_relay_conn_tests
16                                       self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
17                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 561, in add_outbound_p2p_connection
18                                       p2p_conn.wait_for_connect()
19                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/mininode.py", line 422, in wait_for_connect
20                                       wait_until(test_function, timeout=timeout, lock=mininode_lock)
21                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 252, in wait_until
22                                       raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
23                                   AssertionError: Predicate ''''
24                                           test_function = lambda: self.is_connected
25                                   ''' not true after 60.0 seconds

MarcoFalke commented at 9:24 am on August 25, 2020: member

the test might be flaky (feel free to ignore the style nits)

Concept ACK

jnewbery commented at 10:18 am on August 25, 2020: member

Thanks for addressing all my comments @amitiuttarwar . Changes look good to me.

I’ll wait until you’ve resolved the test timeout issue before a full rereview.

DrahtBot added the label Needs rebase on Aug 26, 2020

in test/functional/test_framework/test_node.py:552 in 3bb87e7fb2 outdated

547+        full-relay("outbound") or block-relay-only("blockrelay") connection.
548+
549+        This method adds the p2p connection to the self.p2ps list and also
550+        returns the connection to the caller."""
551+
552+        kwargs['dstport'] = p2p_port(self.index)

robot-dreams commented at 4:02 pm on August 27, 2020:

It looks like line 520 above checks if 'dstport' not in kwargs:; naive question, would this be needed here as well?

amitiuttarwar commented at 11:51 pm on September 21, 2020:

good point. I actually ended up removing dstport and dstaddr from the peer_accept_connection function signature, because they aren’t being set here (we don’t know the exact info until the listening server is spun up, which is why we pass a callback through)- so it seems more like a red herring.

in test/functional/test_framework/mininode.py:558 in 3bb87e7fb2 outdated

550@@ -524,6 +551,33 @@ def close(self, timeout=10):
551         # Safe to remove event loop.
552         NetworkThread.network_event_loop = None
553 
554+    @classmethod
555+    def listen(cls, p2p, callback, port=None, addr=None, idx=0):
556+        if port is None:
557+            assert idx >= 0 and idx < MAX_NODES
558+            port = p2p_port(MAX_NODES-idx)

robot-dreams commented at 6:06 pm on August 27, 2020:

I’m not sure if I’m understanding this correctly, but if you have more than MAX_NODES / 2 nodes, could you get a port conflict?

e.g. with MAX_NODES = 12:

Some code for node 7 calls p2p_port(12 - 7) here to get a p2p port on the Python side
Some other code for node 5 calls p2p_port(5) to get a p2p port for a bitcoind process

amitiuttarwar commented at 0:15 am on September 22, 2020:

great question! however, I don’t think this is a problem. here’s why: the p2p_port function in util.py calculates in a few different factors, one of which is the PortSeed.n, which is set by the BitcoinTestFramework. It either explicitly sets this value or defaults to the process id, which would be different for each test. so I think this would only cause a conflict if there are conflicting uses within the same test. (but I very much welcome a double check- this stuff is CONFUSING!)

in test/functional/test_framework/mininode.py:576 in 3bb87e7fb2 outdated

571+
572+    @classmethod
573+    async def create_server(cls, addr, port, callback, proto):
574+        if (addr, port) not in cls.listeners:
575+            listener = await cls.network_event_loop.create_server(cls.peer_listener(addr, port), addr, port)
576+            logger.debug("Listening server on %s:%d should be started" % (addr, port))

robot-dreams commented at 6:22 pm on August 27, 2020:

I looked into the flaky test a bit.

Locally, I see this:

2020-08-27T06:35:37.249000Z TestFramework.mininode (DEBUG): Listening for Bitcoin Node: 0:
2020-08-27T06:35:37.250000Z TestFramework.mininode (DEBUG): Listening server on 127.0.0.1:15398 should be started

However, in the CI logs, the “Listening server on …” never appears:

2020-08-25T06:50:18.202000Z TestFramework.mininode (DEBUG): Listening for Bitcoin Node: 0: 
2020-08-25T06:51:18.213000Z TestFramework.utils (ERROR): wait_until() failed.

It looks like on the CI servers, the cls.network_event_loop.create_server call didn’t work:

Traceback (most recent call last):
  File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/mininode.py", line 575, in create_server
    listener = await cls.network_event_loop.create_server(cls.peer_listener(addr, port), addr, port)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1463, in create_server
    raise OSError(err.errno, 'error while attempting '
OSError: [Errno 98] error while attempting to bind on address ('127.0.0.1', 15020): address already in use

Unfortunately, I have no idea why this port is already in use on the CI server—the CI logs make no other mention of port 15020 before this error.

Can we guarantee that port 15020 isn’t used by any other processes on the CI server?
If not, would it make sense to retry the create_server with different ports on failure?

amitiuttarwar commented at 0:35 am on September 22, 2020:

thank you very much for looking into the flaky test :) you are a brave soul! hahaha

with ~~some~~ a lot of help from @ajtowns, my best understanding of what was probably happening here is an off-by-one error. the port allocation is intended to have 12 unique slots for each test, allowing test runner to run multiple tests in parallel. previously, the idx was starting at 0, so p2p_port was being passed (MAX_NODES), which could lead to a port conflict with another test. to fix it, I’m starting the idx at 1 & have added the validations to check the proper range. let’s see how CI does 🤞🏽

to be more specific, the port conflict I’m referring to would occur if a test had pid x and invoked p2p_port(12), and the next test had pid x+1 and invoked p2p_port(0).

robot-dreams commented at 6:26 pm on August 27, 2020: contributor

Concept ACK, I’m a big fan of this idea!

in src/net.h:356 in 3bb87e7fb2 outdated

293+     *
294+     * @param[in]   address     Address of node to try connecting to
295+     * @param[in]   conn_type   ConnectionType::OUTBOUND or ConnectionType::BLOCK_RELAY
296+     * @return      bool        Returns false if at max connection capacity
297+     */
298+    bool AddConnection(const std::string& address, const ConnectionType conn_type);

glozow commented at 1:23 pm on August 28, 2020:

Maybe add a comment here mentioning this is only used for tests? I think it could be confusing without the context (e.g. a newcomer thinking this is how connections are added in general). I initially thought it should be named AddTestOutbound or something but I imagine this could be used for something else in the future, so a comment makes more sense.

amitiuttarwar commented at 0:35 am on September 22, 2020:

done

in test/functional/test_framework/mininode.py:321 in 3bb87e7fb2 outdated

316+        vt.nServices = services
317+        vt.addrTo.ip = self.dstaddr
318+        vt.addrTo.port = self.dstport
319+        vt.addrFrom.ip = "0.0.0.0"
320+        vt.addrFrom.port = 0
321+        self.on_connection_send_msg = vt  # Will be sent soon after connection_made

glozow commented at 1:33 pm on August 28, 2020:

nit: “soon” is pretty vague… Since you’re refactoring, maybe edit a little bit like this?

0        self.on_connection_send_msg = vt  # Will be sent in connection_made callback

amitiuttarwar commented at 0:35 am on September 22, 2020:

done

in test/functional/test_framework/test_node.py:550 in 3bb87e7fb2 outdated

541@@ -542,6 +542,30 @@ def add_p2p_connection(self, p2p_conn, *, wait_for_verack=True, **kwargs):
542 
543         return p2p_conn
544 
545+    def add_outbound_p2p_connection(self, p2p_conn, *, connection_type="outbound", connect_id=0, **kwargs):
546+        """ Add an outbound p2p connection from node. Either
547+        full-relay("outbound") or block-relay-only("blockrelay") connection.
548+
549+        This method adds the p2p connection to the self.p2ps list and also
550+        returns the connection to the caller."""

glozow commented at 1:43 pm on August 28, 2020:

tiny pep nit: remove space before Add And put the closing """ on a newline

amitiuttarwar commented at 0:37 am on September 22, 2020:

done

TIL about the closing """ on a newline. looks like quite a lot of our docstrings don’t follow this convention 😛

in src/rpc/net.cpp:329 in 3bb87e7fb2 outdated

324+    NodeContext& context = EnsureNodeContext(request.context);
325+    if (!context.connman) {
326+        throw JSONRPCError(RPC_CLIENT_P2P_DISABLED, "Error: Peer-to-peer functionality missing or disabled");
327+    }
328+
329+    bool success = context.connman->AddConnection(address, conn_type);

glozow commented at 1:52 pm on August 28, 2020:

nit

0    const bool success = context.connman->AddConnection(address, conn_type);

glozow commented at 1:53 pm on August 28, 2020:

or

0    if(!context.connman->AddConnection(address, conn_type))

amitiuttarwar commented at 0:43 am on September 22, 2020:

yeah, I know I could inline but I separated it into its own variable in an attempt to emphasize that this is where the actual work happens since it gets compiled out anyways. so I don’t think the const does anything, but I threw it in there anyways

in test/functional/p2p_blocksonly.py:91 in 3bb87e7fb2 outdated

84+        conn = self.nodes[0].add_outbound_p2p_connection(P2PTxInvStore(), connection_type="blockrelay")
85+
86+        # bump time forward to ensure nNextInvSend timer pops
87+        self.nodes[0].setmocktime(int(time.time()) + 5)
88+
89+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)

glozow commented at 2:46 pm on August 28, 2020:

Why is this needed? If the test fails, is it not ok to fail for sendrawtransaction validation/mempool reasons?

amitiuttarwar commented at 0:50 am on September 22, 2020:

I don’t really understand this question- the test is essentially checking that there is a well formed transaction, submits it to the node, then checks if the p2p connection received the transaction. the goal is to isolate (as much as possible) the reason the conn didn’t receive the transaction is due to being a block-relay-only connection.

glozow commented at 2:27 am on September 27, 2020:

Hm, I guess I found it odd because sendrawtransaction will throw a JSONRPCError if validation fails just like testmempoolaccept would return False. I don’t think they’d have different results unless mempool changes in between calls or something. So it seems like something unnecessary that the test needs to pass. It doesn’t make the test incorrect or anything ofc, just seemed peculiar

jnewbery commented at 11:48 am on November 11, 2020:

I agree. This doesn’t seem necessary. You could just as easily call sendrawtransaction and then getmempoolentry to confirm that the transaction reached the mempool.

amitiuttarwar commented at 9:44 pm on November 17, 2020:

ok. I didn’t look closely at exactly what errors testmempoolaccept returns vs sendrawtransaction. I think I probably just copied the existing pattern in the tests. @glozow - are you suggesting I should just delete this line? (fine by me, just trying to understand) @jnewbery your suggestion just seems like an alternative that’s about the same?

MarcoFalke commented at 6:45 am on November 18, 2020:

sendrawtransaction error handling is undocumented, but it will indeed return an jsonrpc error when the tx couldn’t be added to the mempool

glozow commented at 2:01 am on November 23, 2020:

Yep, testmempoolaccept is unnecessary here. And if you need to make sure it’s in mempool then submit + getmempoolentry is the move.

amitiuttarwar commented at 7:22 pm on November 28, 2020:

removed testmempoolaccept call

in test/functional/p2p_blocksonly.py:73 in 3bb87e7fb2 outdated

66             second_peer.wait_for_tx(txid)
67             assert_equal(self.nodes[0].getmempoolinfo()['size'], 1)
68         self.log.info("Forcerelay peer's transaction is accepted and relayed")
69 
70+        self.nodes[0].disconnect_p2ps()
71+        self.nodes[0].generate(1)

glozow commented at 2:48 pm on August 28, 2020:

Is this to clear the mempool?

amitiuttarwar commented at 0:44 am on September 22, 2020:

yup, resetting to clean state for next test

in test/functional/p2p_blocksonly.py:100 in 3bb87e7fb2 outdated

89+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)
90+        self.nodes[0].sendrawtransaction(tx_hex)
91+        conn.sync_with_ping()
92+        assert(int(txid, 16) not in conn.get_invs())
93+
94+    def check_p2p_tx_violation(self, index=1):

glozow commented at 2:54 pm on August 28, 2020:

nit: I think height would be more clear than index

amitiuttarwar commented at 0:45 am on September 22, 2020:

thanks but I’m a pass. its true that this number is used for block height, but I think index is more accurate to what’s passed in, and height is how the index is used

jnewbery commented at 10:56 am on January 8, 2021:

(re: #19315 (review))

I don’t think you need to pass the height at all. Just get the block at the tip:

 0diff --git a/test/functional/p2p_blocksonly.py b/test/functional/p2p_blocksonly.py
 1index c592ab52b1..e6c3f75955 100755
 2--- a/test/functional/p2p_blocksonly.py
 3+++ b/test/functional/p2p_blocksonly.py
 4@@ -80,7 +80,7 @@ class P2PBlocksOnly(BitcoinTestFramework):
 5         # Ensure we disconnect if a block-relay-only connection sends us a transaction
 6         self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="block-relay-only")
 7         assert_equal(self.nodes[0].getpeerinfo()[0]['relaytxes'], False)
 8-        _, txid, tx_hex = self.check_p2p_tx_violation(index=2)
 9+        _, txid, tx_hex = self.check_p2p_tx_violation()
10 
11         self.log.info("Check that txs from RPC are not sent to blockrelay connection")
12         conn = self.nodes[0].add_outbound_p2p_connection(P2PTxInvStore(), p2p_idx=1, connection_type="block-relay-only")
13@@ -97,9 +97,10 @@ class P2PBlocksOnly(BitcoinTestFramework):
14         conn.sync_with_ping()
15         assert(int(txid, 16) not in conn.get_invs())
16 
17-    def check_p2p_tx_violation(self, index=1):
18+    def check_p2p_tx_violation(self):
19         self.log.info('Check that txs from P2P are rejected and result in disconnect')
20-        input_txid = self.nodes[0].getblock(self.nodes[0].getblockhash(index), 2)['tx'][0]['txid']
21+
22+        input_txid = self.nodes[0].getblock(self.nodes[0].getbestblockhash(), 2)['tx'][0]['txid']
23         tx = create_transaction(self.nodes[0], input_txid, self.nodes[0].getnewaddress(), amount=(50 - 0.001))
24         txid = tx.rehash()
25         tx_hex = tx.serialize().hex()

glozow commented at 2:57 pm on August 28, 2020: member

Concept ACK! Super excited for testing outbounds in functional tests!!! This is way less complicated than I imagined, but I still feel like there’s some documentation lacking. I also nitpicked a little (sorry heh).

I’m having trouble tracing where connect_id (and later idx) would be used / who is responsible for setting it. Are the functional tests supposed to pass in a connect_id? It looks like, in listen(), connect_id needs to be specified if port isn’t. This leads me to believe… in the case we don’t know what port to use, a connect_id helps specify one? It doesn’t seem like NetworkThread keeps track of what indexes have been used. So then… if I just do node.add_outbound_p2p_connection multiple times without specifying connect_id, it won’t work because it tries to use the same port (I think).

In general I’d just want some documentation and unit tests for the new NetworkThread class methods to help check that the networking code works.

practicalswift commented at 1:13 am on August 29, 2020: contributor

Concept ACK

Very nice to be able to test this extremely important logic. Thanks for doing this!

glozow commented at 5:00 pm on August 30, 2020: member

I’ve been looking into the port management stuff to see the limits of how many connections can be opened. Quick example_test.py: add 8 outbounds and 2 blockrelays to nodes[0] and nodes[1] each. This branch fails especially if you don’t specify connect_id explicitly in add_outbound_p2p_connection() - I don’t think you can add more than 1 outbound p2p connection because it’ll try to reuse the same port. I tinkered with it to make it pass.

I’m not 100% sure I understand everything correctly but I think the explanation is: -For inbounds, we care about the dstaddr/dstport which would be the same for all p2ps connected to one node. i.e. they all have the same destination -> they initiate the connection. So we use 1 port per test node, which is why arg to p2p_port needs to within the MAX_NODES range. -For outbounds, there’s no “destination” since the p2ps are not initiating the connection. You need each p2p to listen on a different port. I guess the TestNode that’s connecting would be called the “origin” or something.

in src/net.cpp:1079 in 07dd3ba792 outdated

1074+    int nOutboundFullRelay = 0;
1075+
1076+    {
1077+        LOCK(cs_vNodes);
1078+        for (const CNode* pnode : vNodes) {
1079+            if (pnode->m_conn_type == ConnectionType::OUTBOUND) {

ariard commented at 5:00 pm on September 15, 2020:

Note for the future, I think we could refactor out this code snippet in some accounting type helper. We’re computing nOutboundFullRelay/nOutboundBlockRelay in diverse parts of the codebase, at least after #19858

Like returning a struct with an aliased int for each type we have.

0struct SetConnectionType {
1       int  full_relay;
2       int   blocksonly;
3       ...
4};
5
6GetConnectionTypeCount()

amitiuttarwar commented at 0:55 am on September 22, 2020:

sounds good for a future PR 👍🏽

going to resolve this conversation to keep the review comments focused on these changes

in test/functional/p2p_blocksonly.py:5 in 3bb87e7fb2 outdated

1@@ -2,15 +2,16 @@
2 # Copyright (c) 2019 The Bitcoin Core developers
3 # Distributed under the MIT software license, see the accompanying
4 # file COPYING or http://www.opensource.org/licenses/mit-license.php.
5-"""Test p2p blocksonly"""
6+"""Test p2p blocksonly mode & block-relay-only connections."""

ariard commented at 5:03 pm on September 15, 2020:

If understand new test well, could you document the 2 behaviors tested, namely that sender doesn’t relay transaction to a block-relay-only peer and that receiver disconnect a block-relay-only peer relaying transaction ?

amitiuttarwar commented at 1:03 am on September 22, 2020:

hm, yeah those are the main behavioral checks for the block-relay-only connection, but there are also tests for blocks-only mode. the checks seem pretty well documented in the test logs (self.log.info), is there a particular reason you’d find them helpful here as well?

ariard commented at 4:58 pm on November 11, 2020:

IMO, it’s quite nice to have a succinct high-level description of test coverage without necessary getting into the details.

ariard commented at 5:06 pm on September 15, 2020: member

Concept ACK.

Thanks, it’s unlocking a bunch of test coverage, e.g some behaviors of the inbound eviction logic.

hebasto commented at 10:59 am on September 19, 2020: member

Concept ACK.

amitiuttarwar force-pushed on Sep 21, 2020

amitiuttarwar force-pushed on Sep 22, 2020

DrahtBot removed the label Needs rebase on Sep 22, 2020

amitiuttarwar commented at 1:57 am on September 22, 2020: contributor

thank you for all of this awesome feedback! 🙌🏽

and CI is green! 💃🏼
this PR is ready for the next round of review :)

some updates since last time:

rebased onto master
reworked the port logic (as described here)
removed the dstport and dstaddr from add_outbound_p2p_connection & peer_accept_connection flow. It wasn’t actually being set there because we don’t know the port until its assigned in NetworkThread.listen, so this felt misleading.
added logic so add_outbound_p2p_connection is maintaining the connection index rather than requiring tests to manually manage.
added a test p2p_add_connections that sanity checks the behaviors of the test framework functionality

open questions / next steps:

RE dstport & dstaddr: right now, there is some awkward logging in P2PConnection#connection_made. Since dstaddr and dstport aren’t able to be properly populated, if you run the tests you see Connected & Listening: 0:0. One option is to just have the logging behind an if statement that checks if there is a real value there, but I’d prefer a cleaner solution. I’ll continue looking at it, but welcome any review suggestions. @MarcoFalke seems like you introduced these methods, do you have any ideas?
RE managing the connection index: I currently have added a p2p_conn_index field to TestNode to allow it to manage the index of various connections. I find it nicer for the test to not have to worry about keeping track & passing through the correct value. However, in an offline conversation with @ajtowns he expressed the preference towards having it be explicit, in favor of transparency and easier-to-debug failure scenarios. I’m curious what others think. I’m happy to switch back to tests managing index if that’s preferable.
currently unsupported use case: with the node managing the index via a simple increment, there isn’t the ability to reconnect to the same, existing P2P connection. two options for fixing this are 1. have test manage index as mentioned in previous bullet point, or 2. allow optionally passing through the index to add_outbound_p2p_connection for that circumstance. I don’t currently have any examples of why we’d need to do this, so I’ve been more focused on getting the base cases working, but wanted to mention incase I should reprioritize / incorporate.
documentation: this stuff is complex. I’ve spent a long while trying to wrap my head around the mechanisms, and have to revisit often to really understand. I’m working on adding more documentation to hopefully make these code paths simpler to understand.

DrahtBot added the label Needs rebase on Sep 22, 2020

amitiuttarwar force-pushed on Sep 22, 2020

DrahtBot removed the label Needs rebase on Sep 22, 2020

DrahtBot added the label Needs rebase on Sep 23, 2020

amitiuttarwar force-pushed on Sep 23, 2020

DrahtBot removed the label Needs rebase on Sep 23, 2020

DrahtBot added the label Needs rebase on Sep 25, 2020

amitiuttarwar force-pushed on Sep 25, 2020

DrahtBot removed the label Needs rebase on Sep 25, 2020

in test/functional/p2p_add_connections.py:34 in b489886b53 outdated

24+        for i in range(8):
25+            self.log.info("outbound: {}".format(i))
26+            self.nodes[0].add_outbound_p2p_connection(P2PInterface())
27+
28+        self.log.info("Add 2 block-relay-only connections to node 0")
29+        for i in range(2):

mzumsande commented at 6:27 pm on September 27, 2020:

When I change this from 2 to 3 I won’t get an RPC error (“at capacity for connection type”) as I would have expected, but an “address in use” OSError from the python asyncio. Would it be possible to have some tolerance here, so that the node behavior at maximum outbound connections can be tested?

amitiuttarwar commented at 1:54 am on September 30, 2020:

🤔 good question. I expected there to be some room because we allow 12 MAX_NODES and here we are using 10 of those ports, but indeed, I get the same issue. somehow that port is already being used, but I don’t understand by what. its none of the other connections. I’ll continue digging in.

but will note that we can check the bounds of each type if they came first. eg. by adding 9 full-relay outbounds in previous loop, or adding 3 block-relay outbounds in the loop on line 33 (which come before the full relay for that node)

ajtowns commented at 3:38 am on October 11, 2020:

2 bitcoin nodes, 8 full relay outbounds, and 2 blocks-only outbounds fills up the MAX_NODES==12 ports

amitiuttarwar commented at 4:07 am on October 28, 2020:

@mzumsande do you think it would be useful to bump the port allocation, or does the workaround I proposed suffice for the use cases you’re thinking about? I’m hesitant because of the potential of creating sporadic test framework issues, but I can look deeper into it if we have a good use case.

mzumsande commented at 10:33 pm on November 10, 2020:

Not sure, but I tend to think it is ok to keep it as it is. For none of the use cases I can think of, two bitcoin nodes are necessary.

mzumsande commented at 6:41 pm on September 27, 2020: member

Concept ACK. I already used this some days ago for testing some outbound getaddr behavior, and it worked great!

amitiuttarwar commented at 1:55 am on September 30, 2020: contributor

@mzumsande, thanks for taking a look :)

I already used this some days ago for testing some outbound getaddr behavior, and it worked great!

thats so awesome!! would it be a test we can permanently introduce with this patch?

mzumsande commented at 6:39 pm on September 30, 2020: member

thats so awesome!! would it be a test we can permanently introduce with this patch?

I think it would best be one of (hopefully several) follow-ups, using the new possibilities introduced by this PR!

in src/rpc/net.cpp:303 in 0958cbc721 outdated

296@@ -296,6 +297,49 @@ static RPCHelpMan addnode()
297     };
298 }
299 
300+static RPCHelpMan addconnection()
301+{
302+    return RPCHelpMan{"addconnection",
303+        "\nOpen an outbound connection to a specified node (test only)",

hebasto commented at 10:04 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900

Could add EOL and follow the current wording:

0        "\nOpen an outbound connection to a specified node. This RPC is for testing only.\n",

?

amitiuttarwar commented at 3:27 am on October 28, 2020:

done

in src/rpc/net.cpp:317 in 0958cbc721 outdated

312+        RPCExamples{
313+            HelpExampleCli("addconnection", "\"192.168.0.6:8333\"")
314+            + HelpExampleRpc("addconnection", "\"192.168.0.6:8333\"")
315+        },
316+        [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
317+

hebasto commented at 10:23 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900 nit: It seems readability could be better w/o this empty line, no?

amitiuttarwar commented at 3:39 am on October 28, 2020:

done

in src/rpc/net.cpp:323 in 0958cbc721 outdated

318+{
319+    if (!Params().IsMockableChain()) {
320+        throw std::runtime_error("addconnection is for regression testing (-regtest mode) only");
321+    }
322+
323+    RPCTypeCheckArgument(request.params, UniValue::VSTR);

hebasto commented at 10:31 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900

0    RPCTypeCheckArgument(request.params[0], UniValue::VSTR);

I know, that this LOC is changed in the next commit, but this commit is broken in its current state.

amitiuttarwar commented at 3:39 am on October 28, 2020:

fixed (by squashing the two commits)

in src/rpc/net.cpp:332 in 0958cbc721 outdated

327+        throw JSONRPCError(RPC_CLIENT_P2P_DISABLED, "Error: Peer-to-peer functionality missing or disabled");
328+    }
329+
330+    const bool success = context.connman->AddConnection(address);
331+    if (!success) {
332+        throw JSONRPCError(RPC_CLIENT_NODE_CAPACITY_REACHED, "Error: Max number of outbound full relay connections already open.");

hebasto commented at 10:38 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900 Mind being consistent about full stops at the end of the error messages :)

amitiuttarwar commented at 3:52 am on October 28, 2020:

okay I think I got them all =P

in src/net.cpp:1085 in 0958cbc721 outdated

1079@@ -1080,6 +1080,29 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
1080     RandAddEvent((uint32_t)id);
1081 }
1082 
1083+bool CConnman::AddConnection(const std::string& address)
1084+{
1085+    int nOutboundFullRelay = 0;

hebasto commented at 10:54 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900 Mind following naming convention?

0    int outbound_full_relay = 0;

amitiuttarwar commented at 3:52 am on October 28, 2020:

done

in src/net.cpp:1102 in 0958cbc721 outdated

1097+
1098+    CSemaphoreGrant grant(*semOutbound, true);
1099+    if (!grant) return false;
1100+
1101+    CAddress addr{};
1102+    OpenNetworkConnection(addr, false, &grant, address.c_str(), ConnectionType::OUTBOUND_FULL_RELAY);

hebasto commented at 10:54 am on October 5, 2020:

0958cbc7211b04e537b3742abd34213023196900

0    OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), ConnectionType::OUTBOUND_FULL_RELAY);

amitiuttarwar commented at 3:53 am on October 28, 2020:

done

in src/net.cpp:1088 in 0fb4635bba outdated

1084+bool CConnman::AddConnection(const std::string& address, const ConnectionType conn_type)
1085 {
1086+    assert(conn_type == ConnectionType::OUTBOUND_FULL_RELAY || conn_type == ConnectionType::BLOCK_RELAY);
1087+
1088     int nOutboundFullRelay = 0;
1089+    int nOutboundBlockRelay = 0;

hebasto commented at 10:56 am on October 5, 2020:

0fb4635bbabb1bea10a215bde4c803ceb5bc5f26 Mind following naming convention?

0    int outbound_block_relay = 0;

amitiuttarwar commented at 3:53 am on October 28, 2020:

done

in src/rpc/net.cpp:306 in 0fb4635bba outdated

302@@ -303,15 +303,17 @@ static RPCHelpMan addconnection()
303         "\nOpen an outbound connection to a specified node (test only)",
304         {
305             {"address", RPCArg::Type::STR, RPCArg::Optional::NO, "The IP address and port to attempt connecting to."},
306+            {"connectiontype", RPCArg::Type::STR, RPCArg::Optional::NO, "'outbound' or 'blockrelay'"},

hebasto commented at 11:03 am on October 5, 2020:

0fb4635bbabb1bea10a215bde4c803ceb5bc5f26 Why not keep consistency about full stops at the end of argument descriptions?

amitiuttarwar commented at 3:53 am on October 28, 2020:

fixed

in src/rpc/net.cpp:361 in 0fb4635bba outdated

342 
343-    const bool success = context.connman->AddConnection(address);
344+    const bool success = context.connman->AddConnection(address, conn_type);
345     if (!success) {
346-        throw JSONRPCError(RPC_CLIENT_NODE_CAPACITY_REACHED, "Error: Max number of outbound full relay connections already open.");
347+        throw JSONRPCError(RPC_CLIENT_NODE_CAPACITY_REACHED, "Error: Already at capacity for specified connection type.");

hebasto commented at 11:10 am on October 5, 2020:

0fb4635bbabb1bea10a215bde4c803ceb5bc5f26 Does this error message describe not having grant in the AddConnection function?

amitiuttarwar commented at 4:00 am on October 28, 2020:

depends for what type. AddConnection goes through the nodes and counts how many full relays & block relays there currently are, then compares this to their respective max. it also checks the grant for the total outbound grants. if any of these fail, this error message is hit

ariard commented at 4:35 pm on December 1, 2020:

I guess that hebasto point is this message could be “Error: Already at capacity for specified connection type or max total outbound connection already exist” to cover both failure cases.

troygiorshev commented at 7:19 am on December 2, 2020:

Same as https://github.com/bitcoin/bitcoin/pull/19315/files#r533943505, consider making this more detailed? I could see this being misleading if the RPC fails because of max total outbound capacity, as opposed to max capacity for the given type.

amitiuttarwar commented at 2:21 am on December 3, 2020:

it’s hard to hit the total max without hitting the max for the type :)

the max values are outbound: 8 block-relay-only: 2 total: 10

jnewbery commented at 10:11 am on January 8, 2021:

This seems like an appropriate amount of detail for an error message (especially one for an RPC which is test-only).

hebasto changes_requested

hebasto commented at 11:13 am on October 5, 2020: member

Approach ACK b489886b533809c9a4ee2643ceba7d34c278e03e.

I think that squashing of the first two commits could make reviewing much easier.

DrahtBot added the label Needs rebase on Oct 11, 2020

amitiuttarwar force-pushed on Oct 28, 2020

fanquake removed the label Needs rebase on Oct 28, 2020

amitiuttarwar force-pushed on Oct 28, 2020

amitiuttarwar commented at 4:08 am on October 28, 2020: contributor

rebased & incorporated @hebasto’s suggestions

amitiuttarwar force-pushed on Oct 30, 2020

amitiuttarwar commented at 5:21 am on October 30, 2020: contributor

fixed a silent merge conflict that was causing travis failure.

ready for review!

in src/net.cpp:1152 in f51006b818 outdated

1177+
1178+    // max total outbound connections already exist
1179+    CSemaphoreGrant grant(*semOutbound, true);
1180+    if (!grant) return false;
1181+
1182+    OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), conn_type);

jnewbery commented at 11:21 am on November 11, 2020:

There are several reasons that OpenNetworkConnections() can fail (eg a connection to that peer already exists, we fail to open a socket, etc). It’d be nice to change OpenNetworkConnection() to return a boolean and return that to the RPC caller.

If we don’t return this value, then the RPC looks like it succeeded even if a connection has not been opened:

0→ .bitcoin-cli addconnection i-am-not-an-address outbound
1{
2  "address": "i-am-not-an-address",
3  "connectiontype": "outbound"
4}
5→ ./bitcoin-cli getpeerinfo
6[
7]

ajtowns commented at 1:53 am on November 13, 2020:

./bitcoin-cli addnode i-am-not-an-address onetry gives the same result, as does trying with a valid but unreachable address. Even with NET logging enabled, I only see an error in debug.log for the unreachable address (“connection to XXX timeout”). Having a more reliable indication somewhere of what went wrong would probably be a good idea both here and for addnode might be good. Not sure that needs to be in this PR though, behaving no worse than the existing addnode doesn’t seem too bad.

jnewbery commented at 7:35 pm on November 17, 2020:

Agree that this can be done separately if the behaviour here is the same as for addnode.

amitiuttarwar commented at 9:59 pm on November 17, 2020:

agree with the idea. I’ve started a branch to incorporate: https://github.com/amitiuttarwar/bitcoin/commits/2020-11-open-conn-improvements. I want to add some testing and review the failures / think through logging. but will do this as a follow up.

jonatack commented at 4:09 pm on November 20, 2020:

0c4d673716f2eb37efdb81ad2d7d2

0    OpenNetworkConnection(CAddress(), /* fCountFailure */ false, &grant, address.c_str(), conn_type);

jonatack commented at 4:10 pm on November 20, 2020:

I agree with returning a boolean; it may also be convenient for it to return the peer id of the connection via an out param.

amitiuttarwar commented at 7:53 pm on November 28, 2020:

resolving this conversation, tracking better return value from OpenNetworkConnection as a future improvement in #19315 (comment)

ajtowns commented at 3:07 am on December 3, 2020:

Filed an issue for this at #20552

in src/net.cpp:1176 in f51006b818 outdated

1171+        }
1172+    }
1173+
1174+    // max connections of specified type already exist
1175+    if (conn_type == ConnectionType::OUTBOUND_FULL_RELAY && outbound_full_relay >= m_max_outbound_full_relay) return false;
1176+    if (conn_type == ConnectionType::BLOCK_RELAY && outbound_block_relay >= m_max_outbound_block_relay) return false;

jnewbery commented at 11:28 am on November 11, 2020:

You don’t need to count the number both connection types if you’re only going to test against one. It might also be clearer to use the standard library count_if rather than hand writing the loop.

See what you think of this:

 0    const int max_connections = conn_type == ConnectionType::OUTBOUND_FULL_RELAY ? m_max_outbound_full_relay :
 1                                                                                   m_max_outbound_block_relay;
 2
 3    // Count existing connections
 4    int existing_connections{0};
 5    {
 6        LOCK(cs_vNodes);
 7        existing_connections = std::count_if(vNodes.begin(),
 8                                             vNodes.end(),
 9                                             [conn_type](CNode* node){return node->m_conn_type == conn_type;});
10    }
11
12    // Max connections of specified type already exist
13    if (existing_connections >= max_connections) return false;

amitiuttarwar commented at 9:59 pm on November 17, 2020:

this is great! thank you! I’ve incorporated the suggestion

in src/rpc/protocol.h:65 in f51006b818 outdated

61@@ -62,6 +62,7 @@ enum RPCErrorCode
62     RPC_CLIENT_NODE_NOT_CONNECTED   = -29, //!< Node to disconnect not found in connected nodes
63     RPC_CLIENT_INVALID_IP_OR_SUBNET = -30, //!< Invalid IP/Subnet
64     RPC_CLIENT_P2P_DISABLED         = -31, //!< No valid connection manager instance found
65+    RPC_CLIENT_NODE_CAPACITY_REACHED= -34, //!< Max number of outbound or block-relay connections already open

jnewbery commented at 11:34 am on November 11, 2020:

Perhaps make this less specific (e.g. FAILED_TO_CONNECT).

amitiuttarwar commented at 10:00 pm on November 17, 2020:

leaving it for this PR since the specificity is accurate, but updating in the branch (https://github.com/amitiuttarwar/bitcoin/commits/2020-11-open-conn-improvements) that generalizes and returns more general failures

in test/functional/p2p_blocksonly.py:98 in f51006b818 outdated

90+        self.nodes[0].setmocktime(int(time.time()) + 5)
91+
92+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)
93+        self.nodes[0].sendrawtransaction(tx_hex)
94+        conn.sync_with_ping()
95+        assert(int(txid, 16) not in conn.get_invs())

jnewbery commented at 11:51 am on November 11, 2020:

Do we need some kind of sleep/setmocktime here? The node will only send out inventory to outbound connections every 2.5s on average, so this won’t catch if we accidentally send out invs to this peer.

amitiuttarwar commented at 7:04 pm on November 17, 2020:

seems related to your previous feedback #19315 (review). thanks for helping me ensure this test is meaningful :)

the way I’ve sanity tested this assertion is by removing the connection_type="blockrelay" arg to see that it fails. I think the mocktime bump a few lines up should be covering the INVENTORY_BROADCAST_INTERVAL delay. is that what you’re referring to?

however, even though it appropriately fails (when removing blockrelay arg) on my computer, it might still be worth adding a small delay (maybe 0.1 second sleep?) to ensure the SendMessages loop is actually hit on every machine?

what do you think?

amitiuttarwar commented at 7:38 pm on November 17, 2020:

cc @MarcoFalke, test sync expert 😛

MarcoFalke commented at 6:47 am on November 18, 2020:

wouldn’t the setmocktime be called after the tx has been added to the mempool?

amitiuttarwar commented at 6:47 pm on November 18, 2020:

oh great point. I think its working fine on my machine bc the window between bumping mocktime + making txn is small enough, but I’ve just pushed an update to have the right order.

amitiuttarwar commented at 2:12 am on November 22, 2020:

@MarcoFalke, @jnewbery if you get the chance, lmk if this looks good and I’ll resolve. but leaving the thread open for now bc I want a second opinion on whether I’ve solved the issue

jnewbery commented at 12:58 pm on December 1, 2020:

The average delay between sending INVs to outbound peers is 2.5s, so bumping mocktime by 5 seconds is only double the mean. Is there any downside in bumping it by a minute so there’s overwhelming probability that the node’s timer for sending an INV to this peer has popped?

You could also update this to call sync_with_ping() twice to ensure that the node has called SendMessages() for this peer. Without that, there’s a window where the node calls ProcessMessages() for the peer, sends the pong response and the test moves on to the assert(int(txid, 16) not in conn.get_invs()) without the node hitting SendMessages(). Syncing twice with pings ensures that ProcessMessages() is called twice, which means that SendMessages() must have been called once.

amitiuttarwar commented at 3:36 am on December 3, 2020:

good point & clever trick! I’ve bumped the mocktime by a minute, added two calls to sync_with_ping() and a comment explaining this reasoning.

thanks for this thoughtful response! going to resolve the comment now, let me know incase there’s anything else I should be thinking about.

in src/net.cpp:1149 in f51006b818 outdated

1174+    // max connections of specified type already exist
1175+    if (conn_type == ConnectionType::OUTBOUND_FULL_RELAY && outbound_full_relay >= m_max_outbound_full_relay) return false;
1176+    if (conn_type == ConnectionType::BLOCK_RELAY && outbound_block_relay >= m_max_outbound_block_relay) return false;
1177+
1178+    // max total outbound connections already exist
1179+    CSemaphoreGrant grant(*semOutbound, true);

ariard commented at 7:30 pm on November 11, 2020:

I don’t understand exactly what CSemaphoreGrant is enforcing given that:

semOutbound is initialized with a value of std::min(m_max_outbound, nMaxConnections) (L2510 in src/net.cpp)
thread “addconnection” is accessing this semaphore L1179 in src/net.cpp with fTry=true thus calling CSemaphoreGrant::TryAcquire, decrementing the semaphore by one (L281 in src/sync.h) and returning a successful grant taking
thread “addconnection is calling OpenNetworkConnection which itself calls CSemaphoreGrant::MoveTo() (L2205 in src/net.cpp) if connection opening is successful. MoveTo() calls CSemaphoreGrant::Release() then CSemaphore::post(). This last function increments back the semaphore.
a thread opencon is accessing the semaphore in ThreadOpenConnections (L1918) and indirectly calls CSemaphore::wait, but the conditional predicate doesn’t matter as addconnection thread incremented back the semaphore. Thus sooner or latter wait() will return and a new outbound should be established.

CSemaphoreGrant/CSemaphore/CConnman::semOutbound are poorly documented so it’s hard to understand what exactly it aims to achieve. So we might theoritcally bypass m_max_outbound but I wouldn’t surprised you see nothing in practice as block-relay-only/full-relay are checked on their own and other outbound are short-lived ?

ajtowns commented at 2:01 am on November 13, 2020:

a.MoveTo(b) calls b.Release() – but b in this case is a brand new CSemaphoreGrant that was default initialized, so b.fHaveGrant is false and b.Release() is a no-op, and b.sem->post() is not called.

amitiuttarwar commented at 7:37 pm on November 17, 2020:

@ariard I agree that CSemaphoreGrant is far-from-ideal, and that its strange / redundant that we also have the individual counts of our different types. However, I’m not understanding exactly what you’re trying to suggest here. Are you saying that the overall design of this mechanism is confusing, and the guarantees minimal? Or are you saying that this proposed implementation is problematic?

ariard commented at 2:54 pm on December 1, 2020:

Thanks @ajtowns for the code reading correction! Okay so the semaphore should accomplish its design of preventing to open too much outbound connections. @amitiuttarwar yes I was saying the overall design of this mechanism is confusing thus making it hard to understand the proposed implementation, but not an issue introduced by the present PR. CSemaphoreGrant would gain to be documented better but not proposing myself for this :see_no_evil:

amitiuttarwar commented at 4:00 am on December 3, 2020:

haha, I fully agree 😛

going to resolve this conversation to help me keep track of outstanding issues on this PR

in src/net.h:354 in f51006b818 outdated

345@@ -346,6 +346,15 @@ class CConnman
346     bool RemoveAddedNode(const std::string& node);
347     std::vector<AddedNodeInfo> GetAddedNodeInfo();
348 
349+    /**
350+     * Attempts to open a connection. Currently only used from tests.
351+     *
352+     * @param[in]   address     Address of node to try connecting to
353+     * @param[in]   conn_type   ConnectionType::OUTBOUND or ConnectionType::BLOCK_RELAY
354+     * @return      bool        Returns false if at max connection capacity

ariard commented at 7:31 pm on November 11, 2020:

nit: “max outbound connection”

amitiuttarwar commented at 7:05 pm on November 17, 2020:

?

returns false if its at max connection of the specified type (full relay or block relay), as well as checking total outbound connections

ariard commented at 2:38 pm on December 1, 2020:

That was a nit, to specify outbound at argument comment :p

troygiorshev commented at 7:17 am on December 2, 2020:

I agree with ariard here, I would prefer this comment was longer and more verbose. “Returns false if at max connection capacity for given connection type or at max total outbound connection capacity”

MarcoFalke added the label Review club on Nov 12, 2020

amitiuttarwar force-pushed on Nov 17, 2020

amitiuttarwar force-pushed on Nov 18, 2020

amitiuttarwar commented at 11:18 pm on November 18, 2020: contributor

looks like the CI failure is just a generic timeout? if yes -> PR is ready for review. all review comments have been addressed

jnewbery commented at 10:19 am on November 19, 2020: member

looks like the CI failure is just a generic timeout?

I believe this is improved in master. @MarcoFalke would know. Perhaps try rebasing on latest master?

jonatack commented at 10:23 am on November 19, 2020: member

Yes, the valgrind fuzzer timeout issue was affecting everyone and was resolved in master a few days ago; rebasing fixed it for me. Will review.

amitiuttarwar force-pushed on Nov 19, 2020

amitiuttarwar commented at 2:19 am on November 20, 2020: contributor

thanks, rebased! and CI is green (appveyor pending)

in src/rpc/net.cpp:332 in 50720f411e outdated

327+{
328+    return RPCHelpMan{"addconnection",
329+        "\nOpen an outbound connection to a specified node. This RPC is for testing only.\n",
330+        {
331+            {"address", RPCArg::Type::STR, RPCArg::Optional::NO, "The IP address and port to attempt connecting to."},
332+            {"connectiontype", RPCArg::Type::STR, RPCArg::Optional::NO, "'outbound' or 'blockrelay'."},

jonatack commented at 4:00 pm on November 20, 2020:

0c4d67371 argument name in snakecase and improved description

0            {"connection_type", RPCArg::Type::STR, RPCArg::Optional::NO, "The type of connection to open, either (case insensitive) \"outbound\" or \"blockrelay\"."},

amitiuttarwar commented at 1:56 am on November 22, 2020:

done

in src/rpc/net.cpp:338 in 50720f411e outdated

333+        },
334+        RPCResult{
335+            RPCResult::Type::OBJ, "", "",
336+            {
337+                { RPCResult::Type::STR, "address", "Address of newly added connection." },
338+                { RPCResult::Type::STR, "connectiontype", "Type of connection opened." },

jonatack commented at 4:01 pm on November 20, 2020:

0c4d67371

0                { RPCResult::Type::STR, "connection_type", "Type of connection opened." },

amitiuttarwar commented at 1:56 am on November 22, 2020:

done

in src/rpc/net.cpp:352 in 50720f411e outdated

347+        throw std::runtime_error("addconnection is for regression testing (-regtest mode) only.");
348+    }
349+
350+    RPCTypeCheck(request.params, {UniValue::VSTR, UniValue::VSTR});
351+    const std::string address = request.params[0].get_str();
352+    const std::string conn_type_in = request.params[1].get_str();

jonatack commented at 4:01 pm on November 20, 2020:

0c4d673716f2eb37 trim and make case-insensitive

0    const std::string conn_type_in{ToLower(TrimString(request.params[1].get_str()))};

amitiuttarwar commented at 1:57 am on November 22, 2020:

cool, thanks. done

in src/rpc/net.cpp:374 in 50720f411e outdated

369+        throw JSONRPCError(RPC_CLIENT_NODE_CAPACITY_REACHED, "Error: Already at capacity for specified connection type.");
370+    }
371+
372+    UniValue info(UniValue::VOBJ);
373+    info.pushKV("address", address);
374+    info.pushKV("connectiontype", conn_type_in);

jonatack commented at 4:02 pm on November 20, 2020:

0c4d673716f2eb37

0    info.pushKV("connection_type", conn_type_in);

in src/rpc/net.cpp:367 in 50720f411e outdated

370+    }
371+
372+    UniValue info(UniValue::VOBJ);
373+    info.pushKV("address", address);
374+    info.pushKV("connectiontype", conn_type_in);
375+

jonatack commented at 4:03 pm on November 20, 2020:

0c4d673716f2eb37efdb81ad2d7d20f69de960b3 wishlist :)

0+    info.pushKV("peer id", number);
1+    info.pushKV("result", "...string description...");

amitiuttarwar commented at 0:45 am on November 22, 2020:

what would be described in the result field?

amitiuttarwar commented at 1:57 am on November 22, 2020:

also, what are some use cases you have in mind for returning the peer’s id?

jonatack commented at 8:44 pm on November 23, 2020:

Off-hand, I imagined the peer id might be useful for testing, but that can wait until there is an actual use case or need for it. The result might be a success ("result": "connection successful") or failure ("error": "unable to connect"), but I’m not sure how much addconnection would be used manually in the CLI or like #19315 (review) that returns the object (I didn’t look at how to introspect test_framework.p2p.P2PInterface objects yet). At any rate, I don’t want to hold this up. We need more p2p testing tools.

amitiuttarwar commented at 6:39 pm on November 27, 2020:

okay, I’m going to resolve this comment. we can revisit once there are specific use cases

in src/rpc/net.cpp:967 in 50720f411e outdated

961@@ -906,7 +962,9 @@ static const CRPCCommand commands[] =
962     { "network",            "clearbanned",            &clearbanned,            {} },
963     { "network",            "setnetworkactive",       &setnetworkactive,       {"state"} },
964     { "network",            "getnodeaddresses",       &getnodeaddresses,       {"count"} },
965+
966     { "hidden",             "addpeeraddress",         &addpeeraddress,         {"address", "port"} },
967+    { "hidden",             "addconnection",          &addconnection,          {"address", "connectiontype"} },

jonatack commented at 4:04 pm on November 20, 2020:

0c4d673716f2eb37efdb81ad2d7d20f69de960b3 s/connectiontype/connection_type/ (nit, sort before addpeeraddress)

in src/net.cpp:1159 in 50720f411e outdated

1155@@ -1156,6 +1156,30 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
1156     RandAddEvent((uint32_t)id);
1157 }
1158 
1159+bool CConnman::AddConnection(const std::string& address, const ConnectionType conn_type)

jonatack commented at 4:06 pm on November 20, 2020:

0c4d673716f2eb37efdb81ad2d7d20f69de960b3 nit, pass by value or by reference to const, but not const value, idem for the declaration in net.h

0bool CConnman::AddConnection(const std::string& address, ConnectionType conn_type)

amitiuttarwar commented at 1:57 am on November 22, 2020:

done

in test/functional/p2p_add_connections.py:26 in 50720f411e outdated

21+        self.disconnect_nodes(0, 1)
22+
23+        self.log.info("Add 8 outbounds to node 0")
24+        for i in range(8):
25+            self.log.info("outbound: {}".format(i))
26+            self.nodes[0].add_outbound_p2p_connection(P2PInterface())

jonatack commented at 4:13 pm on November 20, 2020:

50720f411e05 Suggest also testing passing “outbound” in one of these calls (and one with e.g. " OUTbound " if you make it case-insensitive and trimmed)

amitiuttarwar commented at 1:58 am on November 22, 2020:

added outbound to one and BlockRelay to another

in test/functional/p2p_add_connections.py:41 in 50720f411e outdated

31+            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
32+
33+        self.log.info("Add 2 block-relay-only connections to node 1")
34+        for i in range(2):
35+            self.log.info("block-relay-only: {}".format(i))
36+            self.nodes[1].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")

jonatack commented at 4:16 pm on November 20, 2020:

50720f411e05cf1bba fun nit, can use product here (thanks to @jnewbery for the tip a few months ago) to replace these two blocks:

 0+from itertools import product
 1...
 2-        self.log.info("Add 2 block-relay-only connections to node 0")
 3-        for i in range(2):
 4-            self.log.info("block-relay-only: {}".format(i))
 5-            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
 6-
 7-        self.log.info("Add 2 block-relay-only connections to node 1")
 8-        for i in range(2):
 9-            self.log.info("block-relay-only: {}".format(i))
10-            self.nodes[1].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
11+        self.log.info("Add 2 block-relay-only connections to nodes 0 and 1")
12+        for node, conn in product(range(2), range(2)):
13+            self.log.info("node {}, block-relay-only conn {}".format(node, conn))
14+            self.nodes[node].add_outbound_p2p_connection(P2PInterface(), connection_type="BlockRelay")

jonatack commented at 4:42 pm on November 20, 2020:

(“BlockRelay” in the suggestion because I was testing the case-insensitive code above)

amitiuttarwar commented at 2:08 am on November 22, 2020:

thanks but I am a pass, within the context of this test, I find this a bit less clear in terms of test structure (putting together node 0 & 1) and logging.

in test/functional/p2p_add_connections.py:56 in 50720f411e outdated

51+
52+        # check node0 has all outbound connections
53+        node0_peers = []
54+        for i in range(10):
55+            node0_peers.append(self.nodes[0].getpeerinfo()[i]['inbound'])
56+        assert_equal(node0_peers.count(False), 10)

jonatack commented at 4:18 pm on November 20, 2020:

50720f411e05c maybe log and simplify

0-        # check node0 has all outbound connections
1-        node0_peers = []
2-        for i in range(10):
3-            node0_peers.append(self.nodes[0].getpeerinfo()[i]['inbound'])
4-        assert_equal(node0_peers.count(False), 10)
5+        self.log.info("Check node 0 has outbound connections only")
6+        info = self.nodes[0].getnetworkinfo()
7+        assert_equal([0, 10], [info["connections_in"], info["connections_out"]])

jonatack commented at 4:47 pm on November 20, 2020:

(getnetworkinfo didn’t have these two fields yet when you opened this PR)

amitiuttarwar commented at 2:09 am on November 22, 2020:

cool thanks, incorporated the getnetworkinfo call. definitely cleaner.

in test/functional/p2p_add_connections.py:64 in 50720f411e outdated

59+        node1_peers = []
60+        for i in range(15):
61+            node1_peers.append(self.nodes[1].getpeerinfo()[i]['inbound'])
62+
63+        assert_equal(node1_peers.count(False), 10)
64+        assert_equal(node1_peers.count(True), 5)

jonatack commented at 4:20 pm on November 20, 2020:

50720f411e05cf1bbaa91174b389c54f3c5287cf perhaps log and simplify

0-        # check node1 has 10 outbounds & 5 inbounds
1-        node1_peers = []
2-        for i in range(15):
3-            node1_peers.append(self.nodes[1].getpeerinfo()[i]['inbound'])
4 
5-        assert_equal(node1_peers.count(False), 10)
6-        assert_equal(node1_peers.count(True), 5)
7+        self.log.info("Check node 1 has 5 inbound and 10 outbound connections")
8+        info = self.nodes[1].getnetworkinfo()
9+        assert_equal([5, 10], [info["connections_in"], info["connections_out"]])

amitiuttarwar commented at 2:10 am on November 22, 2020:

same as #19315 (review), it’s been updated

in test/functional/p2p_add_connections.py:85 in 50720f411e outdated

85+            self.nodes[0].add_outbound_p2p_connection(P2PInterface())
86+
87+        self.log.info("Add 2 block-relay-only connections to node 0")
88+        for i in range(2):
89+            self.log.info("block-relay-only: {}".format(i))
90+            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")

jonatack commented at 4:21 pm on November 20, 2020:

50720f411e05cf1bba added these as sanity checks. Could add a helper test that checks the passed number of in/out connections for each node, e.g. self.test_conns([(0, 10), (5, 10)]) to assert these convienently

 0         self.log.info("Disconnect p2p connections & try to re-open")
 1         self.nodes[0].disconnect_p2ps()
 2+        assert_equal(self.nodes[0].getnetworkinfo()["connections_out"], 0)
 3+        assert_equal(self.nodes[1].getnetworkinfo()["connections_in"], 5)
 4+        assert_equal(self.nodes[1].getnetworkinfo()["connections_out"], 10)
 5 
 6         self.log.info("Add 8 outbounds to node 0")
 7         for i in range(8):
 8             self.log.info("outbound: {}".format(i))
 9             self.nodes[0].add_outbound_p2p_connection(P2PInterface())
10+        assert_equal(self.nodes[0].getnetworkinfo()["connections_out"], 8)
11+        assert_equal(self.nodes[1].getnetworkinfo()["connections"], 15)
12 
13         self.log.info("Add 2 block-relay-only connections to node 0")
14         for i in range(2):
15             self.log.info("block-relay-only: {}".format(i))
16             self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
17+        assert_equal(self.nodes[0].getnetworkinfo()["connections_out"], 10)
18+        assert_equal(self.nodes[1].getnetworkinfo()["connections"], 15)
19 
20         self.log.info("Restart node 0 and try to reconnect to p2ps")
21         self.restart_node(0)
22@@ -83,11 +82,15 @@ class P2PAddConnections(BitcoinTestFramework):
23         for i in range(8):
24             self.log.info("outbound: {}".format(i))
25             self.nodes[0].add_outbound_p2p_connection(P2PInterface())
26+        assert_equal(self.nodes[0].getnetworkinfo()["connections_out"], 8)
27+        assert_equal(self.nodes[1].getnetworkinfo()["connections"], 15)
28 
29         self.log.info("Add 2 block-relay-only connections to node 0")
30         for i in range(2):
31             self.log.info("block-relay-only: {}".format(i))
32             self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
33+        assert_equal(self.nodes[0].getnetworkinfo()["connections_out"], 10)
34+        assert_equal(self.nodes[1].getnetworkinfo()["connections"], 15)

amitiuttarwar commented at 2:11 am on November 22, 2020:

👍 added a helper to compare connections using getnetworkinfo and added more checks.

jonatack commented at 4:39 pm on November 20, 2020: member

Approach ACK 50720f411e05cf1bbaa91174b389c54f3c5287cf

A few comments, feel free to pick and choose.

Now or later on, it might be useful for CConnman::OpenNetworkConnection and CConnman::AddConnection to return the peer id (via an out param) in case of success, and the addconnection RPC could return a "peer id": number field in the result.

amitiuttarwar force-pushed on Nov 22, 2020

amitiuttarwar commented at 1:21 am on November 23, 2020: contributor

thanks for the review @jonatack, I’ve taken most of your feedback (and left comments / questions on the rest)

looks like the one CI failure is indeed based on this code. In p2p_add_connections, in the section “Disconnect p2p connections & try to re-open”, the call to check_node_connections is failing because it seems that getnetworkinfo()['connections_out'] is returning 4 instead of 0. it seems strange because TestNode.disconnect_p2ps() waits until self.getpeerinfo() is 0 for all peers with ['subver'] set (aka p2p connections). I’m not currently seeing how this inconsistency can arise, so I’ll have to dig deeper. Sharing here incase anybody has ideas or pointers :)

in test/functional/p2p_add_connections.py:55 in e400c7eb7d outdated

50+            self.log.info("outbound: {}".format(i))
51+            self.nodes[1].add_outbound_p2p_connection(P2PInterface())
52+
53+        self.log.info("Check the connections opened as expected")
54+        self.check_node_connections(0, 0, 10)
55+        self.check_node_connections(1, 5, 10)

jonatack commented at 8:16 pm on November 23, 2020:

New helper :+1:, just add named args in the callers (maybe drop the num_ prefixes).

0        self.check_node_connections(num_node=1, num_in=5, num_out=10)

amitiuttarwar commented at 7:53 pm on November 28, 2020:

added named args

in test/functional/p2p_add_connections.py:31 in e400c7eb7d outdated

26+        self.disconnect_nodes(0, 1)
27+
28+        self.log.info("Add 8 outbounds to node 0")
29+        for i in range(8):
30+            self.log.info("outbound: {}".format(i))
31+            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="outbound")

jonatack commented at 8:33 pm on November 23, 2020:

Is there any point in also testing calls like

0self.nodes[0].addconnection("127.0.0.1:36848", "outbound")

amitiuttarwar commented at 7:56 pm on November 28, 2020:

addconnection currently doesn’t have sanity tests to the RPC endpoint, but I’m leaving that as a future improvement since I’m trying to maximize reviewability and the added benefit seems small (esp since its a test-only RPC endpoint). this future improvement is tracked in #19315 (comment)

jonatack commented at 8:49 pm on November 23, 2020: member

looks like the one CI failure is indeed based on this code. In p2p_add_connections, in the section “Disconnect p2p connections & try to re-open”, the call to check_node_connections is failing because it seems that getnetworkinfo()['connections_out'] is returning 4 instead of 0. it seems strange because TestNode.disconnect_p2ps() waits until self.getpeerinfo() is 0 for all peers with ['subver'] set (aka p2p connections). I’m not currently seeing how this inconsistency can arise, so I’ll have to dig deeper. Sharing here incase anybody has ideas or pointers :)

The test passes for me locally. The issue is only in the multiprocess CI task. Maybe a data race? Edit: maybe compare with the getpeerinfo counts? Ignore me if that doesn’t make sense; it’s late.

MarcoFalke referenced this in commit 7ae86b3c68 on Nov 28, 2020

amitiuttarwar force-pushed on Nov 28, 2020

amitiuttarwar commented at 7:48 pm on November 28, 2020: contributor

All review comments are addressed. CI is green.

The CI failure was caused by a test framework issue addressed in #20522 (thanks for the find @jnewbery!) Rebased, so hopefully now CI will be green.

I’m going to resolve comments that propose ideas for follow-ups, and am consolidating them here for convenient future reference

remove p2p_blocksonly.py wallet dependency by using miniwallet #19315 (review)
update OpenNetworkConnection() to return a boolean upon failure & handle appropriately from addconnection and addnode RPC endpoints #19315 (review), including generalizing the error message #19315 (review)
add sanity tests for addconnection RPC endpoint

Also, a reminder for reviewers about #19315 (comment), where I left some open questions around design choices, incase anyone wants to weigh in.

sidhujag referenced this in commit 795d0905a4 on Nov 29, 2020

amitiuttarwar commented at 11:51 pm on November 30, 2020: contributor

forgot to update- CI is green. this PR is ready for review!

in test/functional/p2p_add_connections.py:20 in 22ba154178 outdated

15+        self.num_nodes = 2
16+
17+    def skip_test_if_missing_module(self):
18+        self.skip_if_no_wallet()
19+
20+    def check_node_connections(self, node=None, num_in=None, num_out=None):

jnewbery commented at 1:02 pm on December 1, 2020:

You always call this function with all arguments, so no need to have defaults (and in fact the function would fail if you called it without arguments because self.nodes[None] is a TypeError.

jnewbery commented at 1:05 pm on December 1, 2020:

You could also change this to be a standalone function rather than a method in the P2PAddConnections class and just take a node as an argument rather than an index:

0def check_node_connections(node, num_in, num_out):
1    info = node.getnetworkinfo()
2    assert_equal(info["connections_in"], num_in)
3    assert_equal(info["connections_out"], num_out)

jonatack commented at 10:25 am on December 2, 2020:

Agree with @jnewbery’s suggestions. @MarcoFalke also reminded me recently in #20403 (review) that you can enforce use of named args with the splat. Example.

0    def check_node_connections(*, node, num_in, num_out):

amitiuttarwar commented at 3:39 am on December 3, 2020:

ah, yes I put in the defaults to try to enforce the named args. cool splat trick @jonatack.

also converted to standalone function

in test/functional/p2p_add_connections.py:26 in 22ba154178 outdated

21+        info = self.nodes[node].getnetworkinfo()
22+        assert_equal(info["connections_in"], num_in)
23+        assert_equal(info["connections_out"], num_out)
24+
25+    def run_test(self):
26+        self.disconnect_nodes(0, 1)

jnewbery commented at 1:08 pm on December 1, 2020:

Rather than connecting two nodes and then disconnecting them, you can just not have them connect to each other by overriding the setup_network() method:

0def setup_network(self):
1    self.setup_nodes()
2    # Don't connect the nodes

amitiuttarwar commented at 3:39 am on December 3, 2020:

👍🏽 done

in test/functional/p2p_add_connections.py:88 in 22ba154178 outdated

83+        for i in range(2):
84+            self.log.info("block-relay-only: {}".format(i))
85+            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
86+        self.check_node_connections(node=0, num_in=0, num_out=10)
87+
88+        self.check_node_connections(node=1, num_in=5, num_out=10)

jnewbery commented at 1:08 pm on December 1, 2020:

Is there any reason for this line?

troygiorshev commented at 6:49 am on December 2, 2020:

Possibly it’s nice to check that nothing has happened to node 1 since disconnecting and reconnecting the peers of node 0.

Note: This is a reply to a thread.

amitiuttarwar commented at 1:00 am on December 3, 2020:

yup exactly, sanity check that there aren’t unexpected interactions

amitiuttarwar commented at 3:39 am on December 3, 2020:

since this test is 50% sanity checks & I don’t think it detracts to leave this one in, I’m resolving this conversation.

in test/functional/p2p_add_connections.py:7 in 22ba154178 outdated

0@@ -0,0 +1,92 @@
1+#!/usr/bin/env python3
2+# Copyright (c) 2020 The Bitcoin Core developers
3+# Distributed under the MIT software license, see the accompanying
4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
5+"""Test add_outbound_p2p_connection test framework functionality"""
6+
7+from test_framework.test_framework import BitcoinTestFramework
8+from test_framework.p2p import P2PInterface

jnewbery commented at 1:09 pm on December 1, 2020:

sort

amitiuttarwar commented at 3:41 am on December 3, 2020:

done

in test/functional/p2p_add_connections.py:19 in 22ba154178 outdated

14+        self.setup_clean_chain = False
15+        self.num_nodes = 2
16+
17+    def skip_test_if_missing_module(self):
18+        self.skip_if_no_wallet()
19+

jnewbery commented at 1:09 pm on December 1, 2020:

Remove this. The test can run without a wallet.

amitiuttarwar commented at 3:42 am on December 3, 2020:

💡

thanks :)

jnewbery commented at 1:10 pm on December 1, 2020: member

Looks good. I just have a few small comments in the python tests.

in src/rpc/net.cpp:318 in 22ba154178 outdated

322@@ -322,6 +323,61 @@ static RPCHelpMan addnode()
323     };
324 }
325 
326+static RPCHelpMan addconnection()

ariard commented at 4:36 pm on December 1, 2020:

Should this RPC returns an error if the connection is already existent, like RPC_CLIENT_NODE_ALREADY_ADDED ? It’s for test-only but at least it would catch this kind of logic bug in test framework ?

amitiuttarwar commented at 1:55 am on December 11, 2020:

hmmm, I see that addnode RPC returns this error, but seems like that’s what is indicated by the bool that the AddNode function returns since it’s comparing to vAddedNodes. In the current implementation, we don’t currently know that information until we call through to OpenNetworkConnection, which simply returns if we are AlreadyConnectedToAddress(addrConnect)). We could add a call from AddConnection to check that and return early. And then it would have to populate a boolean as an in/out param (since the existing return value indicates whether there is sufficient capacity). This is all doable, but I’m not really seeing the additional value. It would be able to catch a bug in the test framework logic I’m adding, but that bug would also be caught by the tests in p2p_add_connections.py. And I don’t think this would be helpful for additional tests that use the test framework utilities? What do you think?

jnewbery commented at 10:14 am on January 8, 2021:

Similar to #19315 (review), I think more detailed error reporting can be done in a follow-up PR.

in src/rpc/net.cpp:330 in 22ba154178 outdated

333+        },
334+        RPCResult{
335+            RPCResult::Type::OBJ, "", "",
336+            {
337+                { RPCResult::Type::STR, "address", "Address of newly added connection." },
338+                { RPCResult::Type::STR, "connection_type", "Type of connection opened." },

ariard commented at 4:39 pm on December 1, 2020:

Isn’t this redundant as this information is already passed by caller and RPC won’t swap connection type for any reason ? Maybe remove this result field ?

jonatack commented at 9:52 am on December 2, 2020:

This field confirms what the RPC parsed from the user input per Postel’s Law. If you pass " BLOCKRELAY “, it’s good to know, without needing to check the source code, that it was correctly understood as “blockrelay”.

MarcoFalke commented at 10:20 am on December 2, 2020:

If the args can’t be parsed, a parse failure will be thrown already https://github.com/bitcoin/bitcoin/pull/19315/files#r533936622

jonatack commented at 10:29 am on December 2, 2020:

Yes, but in the absence of failure, feedback is better than uncertainty or requiring the user to check the source code.

amitiuttarwar commented at 3:47 am on December 3, 2020:

I feel completely indifferent as to whether there’s a return result or not. I believe the initial implementation left it out but then I got review comments requesting it be added.

Happy to do whatever reviewers prefer. To me it doesn’t feel super important one way or the other since its a test-only RPC so we don’t have to adhere to normal RPC deprecation rules.

ariard commented at 4:22 pm on December 10, 2020:

@jonatack I don’t have a strong opinion on this, I was more surprised to notify the discrepancy with other usual RPCs.

Maybe this RPC design principle should be documented in https://github.com/bitcoin/bitcoin/blob/master/doc/JSON-RPC-interface.md ? The Postel’s Law sounds a rational good enough to me but I’m not heavily developing against Core RPC so don’t feel competent here to evaluate here :)

jnewbery commented at 10:14 am on January 8, 2021:

It seems no-one has a strong opinion on this. I think it’s fine as is. Resolve?

MarcoFalke commented at 3:21 pm on January 11, 2021:

I still think this should be removed. If this can’t be parsed, a parse-failure will already be thrown.

Though, can be done in a follow-up to not invalidate reviews.

Unrelated, but posels’ law is probably not applicable to security sensitive software, see http://www.langsec.org/papers/postel-patch.pdf

jonatack commented at 8:33 pm on January 11, 2021:

That paper on Postel’s Law appears to discuss issues of computational complexity involving difficult parsing or high ambiguity, which IIRC isn’t the case for this RPC, and states among other things that a balance should be struck, which I agree with. In security-sensitive missions for airports, power grids, industry manufacturers, etc., my experience was that it depended on the context, stakeholders and product managers. Usually, anyone whose code didn’t handle simple, unambiguous input in a robust and non-annoying way got a bug report and a phone call by a PM or scrum master, so it became an instinct to do so and not provide reasons for them to beat us over the head–which I’m unlearning here to adapt to the consensus of the stakeholders on this project. It’s a process!

in test/functional/p2p_add_connections.py:43 in 22ba154178 outdated

38+        self.log.info("Add 2 block-relay-only connections to node 1")
39+        for i in range(2):
40+            self.log.info("block-relay-only: {}".format(i))
41+            self.nodes[1].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
42+
43+        self.log.info("Add some inbound connections to node 1")

ariard commented at 4:47 pm on December 1, 2020:

nit: “Add 5 inbound connections…” ?

amitiuttarwar commented at 3:47 am on December 3, 2020:

done

ariard commented at 5:00 pm on December 1, 2020: member

Sounds good, light review on the test framework changes.

in src/rpc/net.cpp:346 in 22ba154178 outdated

341+            HelpExampleCli("addconnection", "\"192.168.0.6:8333\" \"outbound\"")
342+            + HelpExampleRpc("addconnection", "\"192.168.0.6:8333\" \"outbound\"")
343+        },
344+        [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
345+{
346+    if (!Params().IsMockableChain()) {

troygiorshev commented at 6:55 am on December 2, 2020:

Is checking IsMockableChain the best way to ensure we’re in a test? Have you considered checking NetworkIDString != CBaseChainParams::REGTEST instead?

amitiuttarwar commented at 3:48 am on December 3, 2020:

oh nice, that is more direct / explicit. done.

in src/rpc/net.cpp:359 in 22ba154178 outdated

354+    if (conn_type_in == "outbound") {
355+        conn_type = ConnectionType::OUTBOUND_FULL_RELAY;
356+    } else if (conn_type_in == "blockrelay") {
357+        conn_type = ConnectionType::BLOCK_RELAY;
358+    } else {
359+        throw std::runtime_error(self.ToString());

troygiorshev commented at 7:00 am on December 2, 2020:

Probably throw JSONRPCError(RPC_INVALID_PARAMS, "Something..."); is better here. RPC_INVALID_PARAMS is a standard JSON-RPC 2.0 error. However, we seem to use our custom RPC_INVALID_PARAMETER more often, so maybe that’s a better choice. (2 instances to 136 instances)

amitiuttarwar commented at 3:50 am on December 3, 2020:

done.

yeah interesting that we have both RPC_INVALID_PARAMS and RPC_INVALID_PARAMETER. opted for the second since its more used.

in test/functional/test_framework/p2p.py:588 in 22ba154178 outdated

583+            cls.protos[(addr,port)] = None
584+            return x
585+        return pl
586+
587+    @classmethod
588+    async def create_server(cls, addr, port, callback, proto):

troygiorshev commented at 7:06 am on December 2, 2020:

nit: I’m really not fond of this name, being that it appears to be an intentional collision with asyncio’s create_server. It’s hard enough to tell who’s calling what and when in concurrent python, and this makes it harder. Maybe something like listen_helper?

amitiuttarwar commented at 3:54 am on December 3, 2020:

yeah that’s reasonable, I updated to create_listen_server

in test/functional/p2p_add_connections.py:86 in 22ba154178 outdated

81+
82+        self.log.info("Add 2 block-relay-only connections to node 0")
83+        for i in range(2):
84+            self.log.info("block-relay-only: {}".format(i))
85+            self.nodes[0].add_outbound_p2p_connection(P2PInterface(), connection_type="blockrelay")
86+        self.check_node_connections(node=0, num_in=0, num_out=10)

troygiorshev commented at 7:12 am on December 2, 2020:

Since this is a sanity check, it would be nice if the values weren’t the same as when we called this before restarting the node. Maybe add only 7 outbounds and a few inbounds?

amitiuttarwar commented at 3:54 am on December 3, 2020:

kewl, done.

troygiorshev commented at 7:13 am on December 2, 2020: contributor

ACK 22ba154178594c21e1342657f3d9a14c877623da

Code review of addconnection RPC and associated CConnman method, and of the test framework changes, and of the new p2p_add_connections.py test. Only a light review of the changes to the p2p_blocksonly.py test (just ensured that it was calling the new method correctly).

In addition to the comments below, I agree that some more documentation would be nice. A good start might be method docstrings for the new methods in P2PConnection, NetworkThread, and TestNode. I would suggest using the method docstrings to the other methods of P2PConnection (connection_made, _on_data, send_message, etc.) as a guide.

in test/functional/p2p_add_connections.py:63 in 22ba154178 outdated

58+        self.nodes[0].disconnect_p2ps()
59+        self.check_node_connections(node=0, num_in=0, num_out=0)
60+
61+        self.log.info("Add 8 outbounds to node 0")
62+        for i in range(8):
63+            self.log.info("outbound: {}".format(i))

jonatack commented at 2:04 pm on December 2, 2020:

Oh, now that the minimum Python version has been bumped to 3.6 (in #19504), you can use Python f-strings instead of format for new code.

amitiuttarwar commented at 3:54 am on December 3, 2020:

done

in src/net.cpp:1157 in ae7f4fdfe7 outdated

1152+    // Count existing connections
1153+    int existing_connections{0};
1154+    {
1155+        LOCK(cs_vNodes);
1156+        existing_connections = std::count_if(vNodes.begin(), vNodes.end(), [conn_type](CNode* node) { return node->m_conn_type == conn_type; });
1157+    }

ajtowns commented at 3:27 am on December 3, 2020:

Could shorten it to:

0    int existing_connections = WITH_LOCK(cs_vNodes,
1        return std::count_if(vNodes.begin(), vNodes.end(), [conn_type](CNode* node) { return node->m_conn_type == conn_type; });-
2    );

(EDIT: remove redundant {})

I think there’s a race condition here in that you could decide there’s currently one block relay outbound so it’s fine to create a second while at the same time another thread draws the same conclusion and also attempts to make a new outbound block relay, and you end up with three instead of two. Probably reasonably hard to get to that state though, and this is a test-only API, so shouldn’t matter.

amitiuttarwar commented at 1:25 am on December 11, 2020:

updated to WITH_LOCK syntax.

we discussed this offline, but to document here for other reviewers-

Since we hold the cs_vNodes lock for counting open connections but then release it before invoking OpenNetworkConnection, AddConnection and ThreadOpenConnections could both count, determine there’s space for an extra, and attempt to open a block-relay-only. We could fix this by having a counter / boolean that indicates opening a connection is “in progress”, but that seems unnecessary here because its a low likelihood and this is a test-only API.

amitiuttarwar force-pushed on Dec 3, 2020

in src/rpc/net.cpp:352 in d010c5ef06 outdated

347+        throw std::runtime_error("addconnection is for regression testing (-regtest mode) only.");
348+    }
349+
350+    RPCTypeCheck(request.params, {UniValue::VSTR, UniValue::VSTR});
351+    const std::string address = request.params[0].get_str();
352+    const std::string conn_type_in{ToLower(TrimString(request.params[1].get_str()))};

ajtowns commented at 3:47 am on December 3, 2020:

I don’t think the ToLower here is a good idea. The RPC interface is, in general, case-sensitive – eg, addnode .. OneTry will give an error, changing the capitalisation of a wallet name refers to a different wallet, and calling AddConnection instead of addconnection will give Method not found. Special-casing this particular argument just seems confusing.

jonatack commented at 8:08 am on December 3, 2020:

Many RPCs (sendtoaddress, sendmany, send, fundrawtransaction, walletcreatefundedpsbt, and bumpfee) have a case-insensitive string param, estimate_mode, that is also indicated as case insensitive in the various helps.

jnewbery commented at 10:23 am on December 3, 2020:

I agree with @ajtowns. We should be strict about capitalization in rpc arguments. Adding a special case here doesn’t make sense.

ajtowns commented at 10:30 am on December 3, 2020:

I thnk estimate_mode only allows case-insensitivity because it initially required all-uppercase, see #11413#pullrequestreview-271355849 . Doesn’t seem like a good example to replicate.

jonatack commented at 3:28 pm on December 3, 2020:

Ok, noted. Grepping a bit, it looks like estimate_mode is the only one.

amitiuttarwar commented at 1:25 am on December 11, 2020:

reverted to being case sensitive

in src/net.cpp:1148 in d010c5ef06 outdated

1142@@ -1143,6 +1143,30 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
1143     RandAddEvent((uint32_t)id);
1144 }
1145 
1146+bool CConnman::AddConnection(const std::string& address, ConnectionType conn_type)
1147+{
1148+    assert(conn_type == ConnectionType::OUTBOUND_FULL_RELAY || conn_type == ConnectionType::BLOCK_RELAY);

ajtowns commented at 4:56 am on December 3, 2020:

Should this just return false for different connection types rather than assert?

amitiuttarwar commented at 1:29 am on December 11, 2020:

sure. the addconnection RPC currently interprets false to be Error: Already at capacity for specified connection type., which I think is reasonable enough that I’ve kept it (capacity for other connection types is 0.) I’m also planning to make this net wider in the follow-up, as discussed here: #19315 (review). For now, I’ve updated to return false and added documentation in the header comment for AddConnection.

in src/rpc/net.cpp:356 in d010c5ef06 outdated

351+    const std::string address = request.params[0].get_str();
352+    const std::string conn_type_in{ToLower(TrimString(request.params[1].get_str()))};
353+    ConnectionType conn_type{};
354+    if (conn_type_in == "outbound") {
355+        conn_type = ConnectionType::OUTBOUND_FULL_RELAY;
356+    } else if (conn_type_in == "blockrelay") {

ajtowns commented at 5:00 am on December 3, 2020:

Shouldn’t these match ConnectionTypeAsString() – ie outbound-full-relay and block-relay-only ?

amitiuttarwar commented at 1:29 am on December 11, 2020:

yes definitely. thanks, fixed.

in src/rpc/net.cpp:367 in d010c5ef06 outdated

357+        conn_type = ConnectionType::BLOCK_RELAY;
358+    } else {
359+        throw JSONRPCError(RPC_INVALID_PARAMETER, self.ToString());
360+    }
361+
362+    NodeContext& context = EnsureNodeContext(request.context);

ajtowns commented at 5:01 am on December 3, 2020:

This is NodeContext& node everywhere else?

amitiuttarwar commented at 1:30 am on December 11, 2020:

yeah but I find that pretty confusing so I made it context 😅

ajtowns commented at 9:34 pm on December 16, 2020:

“node” seems much less confusing to me; node gives you access to all the bits of the node, request.context gives you the overall context available for the rpc. In general, picking a convention and sticking with it makes the codebase much easier to navigate.

amitiuttarwar commented at 11:48 pm on December 16, 2020:

alright, I definitely agree with the rationale of staying consistent with conventions, so I pushed a fix

in test/functional/test_framework/p2p.py:611 in 6e53cfba58 outdated

589+        if (addr, port) not in cls.listeners:
590+            listener = await cls.network_event_loop.create_server(cls.peer_listener(addr, port), addr, port)
591+            logger.debug("Listening server on %s:%d should be started" % (addr, port))
592+            cls.listeners[(addr, port)] = listener
593+        cls.protos[(addr, port)] = proto
594+        callback(addr, port)

ajtowns commented at 6:47 am on December 3, 2020:

I think this could use some refactoring and comments. Maybe something like:

 0    def listen(cls, p2p, callback, port=None, addr=None, idx=1):
 1        """Ensure a listening server is running on the given port, and run the protocol
 2           specified by `p2p` on the next connection to it. Once ready for connections,
 3           call `callback`."""
 4
 5        if port is None:
 6            assert 0 < idx <= MAX_NODES
 7            port = p2p_port(MAX_NODES - idx)
 8        if addr is None:
 9            addr = '127.0.0.1'
10
11        coroutine = cls.create_server(addr, port, callback, p2p)
12        cls.network_event_loop.call_soon_threadsafe(cls.network_event_loop.create_task, coroutine) [@classmethod](/bitcoin-bitcoin/contributor/classmethod/)
13    async def create_server(cls, addr, port, callback, proto):
14        def peer_protocol():
15            """Returns a function that does the protocol handling for a new connection.
16               To allow different connections to have different behaviour, the protocol
17               function is put in the cls.protos dict first, then the connection is made,
18               at which point this function removes the protocol function from that
19               dict, and returns it so that the event loop can start executing it."""
20            response = cls.protos.get((addr, port))
21            cls.protos[(addr, port)] = None
22            return response
23
24        if (addr, port) not in cls.listeners:
25            # When creating a listener on a given addr,port we only need to do it once;
26            # even if we want different behaviours for different connections we can do
27            # that by providing different `proto` function.
28
29            listener = await cls.network_event_loop.create_server(peer_protocol, addr, port)
30            logger.debug("Listening server on %s:%d should be started" % (addr, port))
31            cls.listeners[(addr, port)] = listener
32
33        cls.protos[(addr, port)] = proto
34        callback(addr,port)

amitiuttarwar commented at 1:57 am on December 11, 2020:

yes this looks great, thank you 🙌🏽

I like the refactor into making peer_protocol a part of create_server. Helps makes the usage pattern more apparent.

Also appreciate the comments. I was struggling to write helpful words.

Took the patch pretty much as is. Convenient that you are already a co-author on the commit :)

in test/functional/test_framework/test_node.py:549 in 6e53cfba58 outdated

545@@ -542,6 +546,29 @@ def add_p2p_connection(self, p2p_conn, *, wait_for_verack=True, **kwargs):
546 
547         return p2p_conn
548 
549+    def add_outbound_p2p_connection(self, p2p_conn, *, connection_type="outbound", **kwargs):

ajtowns commented at 6:51 am on December 3, 2020:

I think specifying the p2p_conn_index explicitly, eg with

0def add_outbound_p2p_connection(self, p2p_conn, *, p2pidx, connection_type="outbound", **kwargs)

and calling it as:

0conn = self.nodes[0].add_outbound_p2p_connection(P2PTxInvStore(), p2pidx=1, connection_type="blockrelay")

would make the tests clearer. At present it’s very hard to track what port add_outbound_p2p_connection will be trying to connect to, which is probably mostly okay while it’s not used very intensively, but if it does lead to bugs they’ll be hard to find and hard to fix with an automagic api…

Might be worth thinking about how to extend the api to allow testing #17428 – I think that would be much more comprehensible with an explicit p2pidx.

amitiuttarwar commented at 1:31 am on December 11, 2020:

ok AJ, you have convinced me, I made the counter explicit :)

ajtowns commented at 7:15 am on December 3, 2020: member

I’m biased, but looks pretty good.

amitiuttarwar force-pushed on Dec 11, 2020

amitiuttarwar commented at 8:24 pm on December 11, 2020: contributor

thank you for these thoughtful reviews!

review comments addressed and CI is green

DrahtBot added the label Needs rebase on Dec 16, 2020

amitiuttarwar force-pushed on Dec 16, 2020

amitiuttarwar commented at 7:31 pm on December 16, 2020: contributor

Rebased

DrahtBot removed the label Needs rebase on Dec 16, 2020

amitiuttarwar force-pushed on Dec 16, 2020

amitiuttarwar commented at 11:49 pm on December 16, 2020: contributor

small update to address aj’s suggestion about variable name

DrahtBot added the label Needs rebase on Dec 31, 2020

amitiuttarwar force-pushed on Dec 31, 2020

DrahtBot removed the label Needs rebase on Dec 31, 2020

amitiuttarwar commented at 7:15 pm on January 1, 2021: contributor

rebased

troygiorshev commented at 3:11 am on January 6, 2021: contributor

reACK 092da16e89b42de7e44a83b184d95a967875f135

With some help from git range-diff master 22ba154 HEAD.

The changes to the python asyncio pieces (renames, refactors, comments) are great!

DrahtBot added the label Needs rebase on Jan 7, 2021

[rpc/net] Introduce addconnection to test outbounds & blockrelay

Add a new RPC endpoint to enable opening outbound connections from
the tests. The functional test framework currently uses the addnode RPC, which
has different behavior than general outbound peers. These changes enable
creating both full-relay and block-relay-only connections. The new RPC
endpoint calls through to a newly introduced AddConnection method on
CConnman that ensures we stay within the allocated max.

5bc04e8837

[test] Add test framework support to create outbound connections.

In the interest of increasing our P2P test coverage, add support to create
full-relay or block-relay-only connections. To support this, a P2P connection
spins up a listening thread & uses a callback to trigger the node initiating
the connection.

Co-authored-by: Anthony Towns <aj@erisian.com.au>

3997ab9154

[test/refactor] P2PBlocksOnly - simplify transaction creation using blocktool helper. 99791e7560

[test/refactor] P2PBlocksOnly - Extract transaction violation test into helper.

This is in preparation for use in the next commit.

8bb6beacb1

[test] P2PBlocksOnly - Test block-relay-only connections.

Ensure we will disconnect if the peer sends us a transaction & we don't
announce transactions to the peer.

602e69e427

[test] Test the add_outbound_p2p_connection functionality

Open max number of full-relay and block-relay-only connections from a
functional test with different sorts of behaviors to ensure it behaves as
expected.

b4dd2ef800

amitiuttarwar force-pushed on Jan 7, 2021

amitiuttarwar commented at 6:31 pm on January 7, 2021: contributor

rebased

amitiuttarwar commented at 6:32 pm on January 7, 2021: contributor

thanks for the re-review @troygiorshev ! should be trivial to re-ack, the rebase was just because of adjacent lines in net.h

DrahtBot removed the label Needs rebase on Jan 7, 2021

in test/functional/test_framework/p2p.py:160 in b4dd2ef800

159 
160         loop = NetworkThread.network_event_loop
161-        conn_gen_unsafe = loop.create_connection(lambda: self, host=self.dstaddr, port=self.dstport)
162-        conn_gen = lambda: loop.call_soon_threadsafe(loop.create_task, conn_gen_unsafe)
163-        return conn_gen
164+        logger.debug('Connecting to Bitcoin Node: %s:%d' % (self.dstaddr, self.dstport))

jnewbery commented at 10:17 am on January 8, 2021:

(Only if you have to retouch this branch for other reasons): use new style string formatting ("{}".format(var)) here, or even better, f-strings (f"{var}").

in test/functional/test_framework/p2p.py:607 in b4dd2ef800

602+            # do it once. If we want different behaviors for different
603+            # connections, we can accomplish this by providing different
604+            # `proto` functions
605+
606+            listener = await cls.network_event_loop.create_server(peer_protocol, addr, port)
607+            logger.debug("Listening server on %s:%d should be started" % (addr, port))

jnewbery commented at 10:20 am on January 8, 2021:

I’m not sure what “should be” means here. I think the log could just say “Listening server on x started”. Also, could use new style string formatting or f-strings.

This isn’t a big deal. Only change it if you need to retouch the branch.

jnewbery commented at 11:00 am on January 8, 2021: member

utACK b4dd2ef8009703b81235e2d9a2a736a3a5e8152f

Just some small style suggestions, which you can take or leave, or handle in a follow-up.

This is really useful test functionality. I’d love to see it merged soon.

jnewbery commented at 11:07 am on January 8, 2021: member

I think this is very close to being ready for merge.

troygiorshev commented at 10:13 pm on January 8, 2021: contributor

reACK b4dd2ef8009703b81235e2d9a2a736a3a5e8152f

Trivial from my last ACK :)

For anyone interested resources on the python asyncio parts, the python asyncio docs, especially the transports and protocols page and the examples on the bottom of that page are useful!

in src/net.h:969 in 5bc04e8837 outdated

964+     *                          slots for this connection:
965+     *                          - conn_type not a supported ConnectionType
966+     *                          - Max total outbound connection capacity filled
967+     *                          - Max connection capacity for type is filled
968+     */
969+    bool AddConnection(const std::string& address, ConnectionType conn_type);

MarcoFalke commented at 12:53 pm on January 11, 2021:

5bc04e8837c0452923cebd1b823a85e5c4dcdfa6:

If this is only used in test, why not call it AddTestConnection? Otherwise someone might use it in “production”.

in src/rpc/net.cpp:344 in 5bc04e8837 outdated

339+        throw std::runtime_error("addconnection is for regression testing (-regtest mode) only.");
340+    }
341+
342+    RPCTypeCheck(request.params, {UniValue::VSTR, UniValue::VSTR});
343+    const std::string address = request.params[0].get_str();
344+    const std::string conn_type_in{TrimString(request.params[1].get_str())};

MarcoFalke commented at 3:29 pm on January 11, 2021:

Same commit: Any reason to trim the string? We don’t do this anywhere else.

jonatack commented at 8:56 pm on January 11, 2021:

This was my fault–I suggested it earlier on.

in test/functional/test_framework/test_node.py:74 in 3997ab9154 outdated

70@@ -71,6 +71,7 @@ def __init__(self, i, datadir, *, chain, rpchost, timewait, timeout_factor, bitc
71         """
72 
73         self.index = i
74+        self.p2p_conn_index = 1

MarcoFalke commented at 5:48 pm on January 11, 2021:

3997ab915451a702eed2153a0727b0a78c0450ac:

What is this?

in test/functional/test_framework/test_node.py:563 in 3997ab9154 outdated

558+
559+        def addconnection_callback(address, port):
560+            self.log.debug("Connecting to %s:%d %s" % (address, port, connection_type))
561+            self.addconnection('%s:%d' % (address, port), connection_type)
562+
563+        p2p_conn.peer_accept_connection(connect_cb=addconnection_callback, connect_id=p2p_idx + 1, net=self.chain, timeout_factor=self.timeout_factor, **kwargs)()

MarcoFalke commented at 5:48 pm on January 11, 2021:

3997ab915451a702eed2153a0727b0a78c0450ac: The +1 seems like a layer violation that should be done in the p2p module instead.

in test/functional/test_framework/p2p.py:164 in 3997ab9154 outdated

163-        return conn_gen
164+        logger.debug('Connecting to Bitcoin Node: %s:%d' % (self.dstaddr, self.dstport))
165+        coroutine = loop.create_connection(lambda: self, host=self.dstaddr, port=self.dstport)
166+        return lambda: loop.call_soon_threadsafe(loop.create_task, coroutine)
167+
168+    def peer_accept_connection(self, connect_id, connect_cb=lambda: None, *, net, timeout_factor):

MarcoFalke commented at 6:16 pm on January 11, 2021:

3997ab915451a702eed2153a0727b0a78c0450ac:

Any reason to add default arguments for an arg that is set in all call sites? I don’t think it makes sense to skip the connect_cb here?

Also could enforce named args with *

in test/functional/test_framework/p2p.py:440 in 3997ab9154 outdated

434@@ -414,6 +435,10 @@ def test_function():
435 
436         wait_until_helper(test_function, timeout=timeout, lock=p2p_lock, timeout_factor=self.timeout_factor)
437 
438+    def wait_for_connect(self, timeout=60):
439+        test_function = lambda: self.is_connected
440+        wait_until_helper(test_function, timeout=timeout, lock=p2p_lock)

MarcoFalke commented at 6:21 pm on January 11, 2021:

3997ab9:

Would be nice to not use the helper unless necessary. It doesn’t matter here, but it won’t propagate the scaling factor

in test/functional/test_framework/p2p.py:573 in 3997ab9154 outdated

568@@ -542,6 +569,48 @@ def close(self, timeout=10):
569         # Safe to remove event loop.
570         NetworkThread.network_event_loop = None
571 
572+    @classmethod
573+    def listen(cls, p2p, callback, port=None, addr=None, idx=1):

MarcoFalke commented at 6:27 pm on January 11, 2021:

same commit:

What is the point of allowing those args when they are never set? I’d say if there is ever a test that needs them set, that pull can make the needed changes here. Because other changes are required for this anyway to work.

in test/functional/p2p_blocksonly.py:44 in 99791e7560 outdated

39@@ -51,13 +40,13 @@ def run_test(self):
40 
41         self.log.info('Check that txs from rpc are not rejected and relayed to other peers')
42         assert_equal(self.nodes[0].getpeerinfo()[0]['relaytxes'], True)
43-        txid = self.nodes[0].testmempoolaccept([sigtx])[0]['txid']
44+
45+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)

MarcoFalke commented at 7:05 pm on January 11, 2021:

99791e7560d40ad094eaa73e0be3987581338e2d: If you touch this, I’d prefer to remove it. This doesn’t check anything

in test/functional/p2p_blocksonly.py:59 in 99791e7560 outdated

55@@ -67,8 +56,7 @@ def run_test(self):
56         assert_equal(peer_1_info['permissions'], ['relay'])
57         peer_2_info = self.nodes[0].getpeerinfo()[1]
58         assert_equal(peer_2_info['permissions'], ['relay'])
59-        assert_equal(self.nodes[0].testmempoolaccept([sigtx])[0]['allowed'], True)
60-        txid = self.nodes[0].testmempoolaccept([sigtx])[0]['txid']
61+        assert_equal(self.nodes[0].testmempoolaccept([tx_hex])[0]['allowed'], True)

MarcoFalke commented at 7:06 pm on January 11, 2021:

99791e7560d40ad094eaa73e0be3987581338e2d: Same

MarcoFalke approved

MarcoFalke commented at 7:33 pm on January 11, 2021: member

Approach ACK b4dd2ef8009703b81235e2d9a2a736a3a5e8152f 🍢

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3Approach ACK b4dd2ef8009703b81235e2d9a2a736a3a5e8152f 🍢
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUi8SQwAtkpDpuzJotuEB7qpK2DCtuR6u/mOzV6KXL8G6W4NrPkWffPEc2DJ0+4b
 8Z2APDT0XKLjHSluduIQYkkJYewQNSn+kDbXqLG8fnGfHs0VPauSPr1YEWJ6ckM0C
 9G9YKu02lX0P7iIao+PlLCUc/NRrG1eo3FmS8RCQBzn1CfnBXviwJAK3Btau6r2vW
10CC/4tyBANDD9cDqpS81bpWdnA/qggLjP/0p/cP3FIJ5kUVTdsfFhvnXW8rvk48bB
11hREsZxEP/HjF6/Tw91C4FR+AtOL3uwll94kgbf99U/ld++98TeEEZmOHu9+vVYF7
12p1mfaVevvWAcY8r+88nSy99KKecEt5DxCW56MFnB+Xlhn2DCLZWNNjiWQDB9sxDo
13jEQlDiRbypIqGlVwL0qCoJB4bCOaz+qhkErqJgEIHXL3wV7HVk1vkim08r5sIUPm
14rz7BY2lVy0lQpJAmQ2NUeZ3Ioms/fnvmUZJSFYFP5zXkX6ZXKdtdGTtCP2VC9iTT
15/68NLVfe
16=gZ6a
17-----END PGP SIGNATURE-----

Timestamp of file with hash 3232b0dd96dd7ba9660c084f1b93bc4661873271630eb1ba4b50fb2f2cb5803b -

MarcoFalke merged this on Jan 11, 2021

MarcoFalke closed this on Jan 11, 2021

sidhujag referenced this in commit b6cdb6cfae on Jan 12, 2021

luke-jr referenced this in commit 87d57b45ac on Jun 27, 2021

luke-jr referenced this in commit 78a6ae14e0 on Jun 27, 2021

Fabcien referenced this in commit e5254cd2e7 on Jan 19, 2022

DrahtBot locked this on Aug 16, 2022

[tests] Allow outbound & block-relay-only connections in functional tests. #19315

Conflicts