Test coverage of our networking code

sdaftuar commented at 6:52 pm on September 12, 2018: member

Our python functional testing framework is pretty limited for what kinds of p2p behaviors we can test. Basically, we can currently only make manual connections between bitcoind nodes (using the addnode rpc), which are treated differently in our code than outbound peers selected using addrman.

While we do have some unit-testing coverage of some of the components (like addrman, and parts of net_processing), I don’t believe we currently are able to test the overall logic of how bitcoind uses those components (I recall this coming up when working on #11560, as a specific example).

Anyway I am just mentioning this here as a potential project idea, as this is a material gap in our testing that I think would be valuable to work towards improving, and I wasn’t sure how well known this is.

maflcko added the label Tests on Sep 12, 2018

maflcko added the label good first issue on Sep 12, 2018

maflcko commented at 8:15 pm on September 12, 2018: member

Wouldn’t it be possible to feed to node some addresses of python TestNodes (either via p2p or by putting them into the address database file when setting up the node).

I haven’t given it a try yet, but I’d be surprised if testing outbound connections was conceptually a hard problem to solve.

sdaftuar commented at 8:34 pm on September 12, 2018: member

I don’t remember the entirety of the rabbit hole I went down trying to make this work, but I believe at least one of the issues I encountered had to do with special handling for IsLocal() addresses in our net code.

One thing that occurs to me is that we have a couple options, we could try to rewrite our code to make it easier to test, or we could come up with a new testing environment that is able to exercise the logic. For instance maybe launching a test that fires up nodes on different hosts (whether a dedicated cluster or some containerized thing), with some appropriate IP address assignment, would do the trick.

sdaftuar commented at 9:47 pm on March 1, 2019: member

In case anyone is thinking about this – I believe #15486 is another untestable code change in our current framework.

Sjors commented at 3:59 pm on March 2, 2019: member

For background on what we should test, the paper by @EthanHeilman e.a. explains how the P2P networking works and the kinds of eclipse attack defences that were recently added: https://eprint.iacr.org/2015/263.pdf

In addition to the problem @sdaftuar points to about the difference between manually adding nodes with addnode and spontaneous connections, there’s two other problems.

The regtest nodes are able to connect to each other via P2P on the same IP but with different ports. That’s insufficient for thoroughly testing the behavior because net.cpp checks IsLocal, and because the connection buckets are IP based. There’s also no DNS seed for regtest.
The networking code relies on real time

One approach could be a combination of regtest specific tweaks to the networking code and a few regtest specific RPC calls. The former is always scary, the latter probably fine.

Regtest specific networking tweaks

The test framework allows 5000 (PORT_RANGE) distinct port numbers. We could divide those in groups of 1000 to represent /16 IPv4 IPv4 ranges. Modulo 100 represents the IP address. That leaves 10 nodes max that a regtest node would still consider being local.

The cryptographic hash for bucket selection would be hard coded for regtest.

The DNS resolver code could have a regtest branch that fetches a fake result from a file (which tests can populate).

Add SetTime() to set the time which GetTime() can read.

Switch the DNS 11 second delay from sleep_for(std::chrono::seconds(11) to checking SetAdjusttedTime every 0.01 seconds until 11 “seconds” have passed.

Ditto for CConnman::ThreadOpenConnections; instead of the thread sleeping std::chrono::milliseconds(500) it could sleep much shorter and check the “time”.

Regtest RPC calls

settime calls SetAdjusttedTime()
methods to get the various peer lists

EthanHeilman commented at 4:10 pm on March 4, 2019: contributor

I want to second this. Better test coverage and a network level mocking framework for testing Bitcoin’s P2P network is something that would be immensely helpful. I started to work on it in 2015 as part of the eclipse attack paper but unfortunately I ran out of time to build it myself and no longer have the bandwidth.

maflcko commented at 4:19 pm on March 4, 2019: member

Obviously I am lacking the background, but how hard would it be to copy a python-mininode and a python-regtest-dns-seeder to a couple of 1 EUR/month vps boxes?

EthanHeilman commented at 4:29 pm on March 4, 2019: contributor

Obviously whoever writes the code and does the work gets to decide, but my personal preference would be a way of replacing the network interface with a faked out network interface so that:

A test could spin up a bunch of nodes,
assign them fake IP addresses,
and let them talk to each other without depending on any external systems or touching any network.

It would allow mostly deterministic tests of all sorts of complex behavior and network failures.

ariard commented at 8:16 pm on March 14, 2019: member

Is there already someone working on this ? @EthanHeilman by faking out a network interface you mean something like swapping the system-level socket interface by one of our own, which would talk to MockNetworkRouter with a set of methods on this one to simulate network failures, configure topologies, broadcast random messages (or something lighter)? Obviously, would gladly take inputs from anyone !

dongcarl commented at 9:16 pm on March 14, 2019: contributor

I’ve chatted with @theuni about this a bit, and it would seem that getting CConnman to be self contained (as in, no reference to globals from within CConnman and removing the globals inside net.cpp) would allow us to achieve what @EthanHeilman is proposing.

I would like to know what people’s thoughts are on:

@EthanHeilman’s proposal above
What’s missing aside from getting CConnman to be self-contained

If people think this is worthwhile to move ahead on, I believe it would be a great target to motivate and more importantly focus incremental net improvements.

EthanHeilman commented at 9:19 pm on March 14, 2019: contributor

@EthanHeilman Yes, I mean replacing the socket function with an interface with the same functions but that actually just routes between the local nodes. You could override the MockNetworkRouter to simulate botnet attacks, package failures, or really old nodes.

dongcarl commented at 10:04 pm on March 14, 2019: contributor

Upon further inspection, we might be able to use something like network namespaces thru pyroute2

EthanHeilman commented at 2:53 pm on March 15, 2019: contributor

I love to be able to do this in unittests for net.cpp. In those environments the MockNetworkRouter would just be a shared object injected into the components that would be talking to each other. Not saying this is the best way, but just the way I’ve always thought about it.

ariard commented at 4:50 pm on March 15, 2019: member

@dongcarl I’ve more digged into MockNetworkRouter way, if we use pyroute2 to attach fake-addresses to node instances each embbeded in its network namespace, are we gonna be able to simulate easily as much events (packets failures, …) than with our crafted router ?

theuni commented at 10:40 pm on March 19, 2019: member

@dongcarl and I discussed this today, and I wanted to chime in here because there’s a good bit of history.

Obviously whoever writes the code and does the work gets to decide, but my personal preference would be a way of replacing the network interface with a faked out network interface so that:
1. A test could spin up a bunch of nodes,

2. assign them fake IP addresses,

3. and let them talk to each other without depending on any external systems or touching any network.
It would allow mostly deterministic tests of all sorts of complex behavior and network failures.

This type of testing was almost the entire intention of the CConnman refactor, which got pretty close, and now @dongcarl has picked up lately. It is frustratingly close to being useful.

The idea, again, is to be able to run multiple networking instances inside of one bitcoind process.

It would be trivial to implement (I have it done in an ancient branch somewhere) the logic to spoof connection data inside of CConnman in order to simulate different test conditions. Phony IPs, fake latency, dropped packets, peer selection algorithm changes, etc. That’s how these tests SHOULD work. Spinning up entire instances of bitcoind just to test net behavior is overkill.

I really think that time and energy would be better spent on finishing up the isolated CConnman instantiation.

dongcarl commented at 11:14 pm on March 19, 2019: contributor

From my discussion with @theuni today, if we’re going with the route that @EthanHeilman seems to be advocating for, we thought it best to center this work around a minimal, motivating test case that should make the rest very easy. Here’s what we came up with:

Instantiate a CConnman (let’s call it c1) with a huge nSendBufferMaxSize
Instantiate a CConnman (let’s call it c2) with a tiny nReceiveFloodSize
Start sending messages from c1 to c2 and ensure that the buffer size restrictions are honored.

I believe being able to do this would have caught bugs like #12392. CConnman being instantiable like this would allow us to inspect why a CConnman instance might have stopped receiving data rather than just guessing based on pings and timing checks.

I love to be able to do this in unittests for net.cpp. In those environments the MockNetworkRouter would just be a shared object injected into the components that would be talking to each other. Not saying this is the best way, but just the way I’ve always thought about it.

I believe that if we achieve instantiable CConnman, a MockNetworkRouter would be quite easy.

mmachicao referenced this in commit f46c5a7eed on Aug 5, 2019

practicalswift commented at 4:11 pm on August 21, 2019: contributor

FWIW, a related observation from #16660:

From what I can tell the connection eviction logic is not covered by our test suite (functional and unit tests).

More specifically the following is never executed with nInbound >= nMaxInbound:

https://github.com/bitcoin/bitcoin/blob/ed9a2a37c1878ba6b75c39b7cb263dc16c52a295/src/net.cpp#L962-L970

Context:

https://github.com/bitcoin/bitcoin/blob/ed9a2a37c1878ba6b75c39b7cb263dc16c52a295/src/net.cpp#L800-L808

Due to AttemptToEvictConnection not being tested the following net.cpp functions – which are only called directly or indirectly via AttemptToEvictConnection – are currently untested:

0static bool ReverseCompareNodeMinPingTime(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
1static bool ReverseCompareNodeTimeConnected(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
2static bool CompareNetGroupKeyed(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
3static bool CompareNodeBlockTime(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
4static bool CompareNodeTXTime(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
5static void EraseLastKElements(std::vector<T>& elements, Comparator comparator, size_t k)

mmachicao referenced this in commit aa8f0ada10 on Aug 27, 2019

ajtowns commented at 8:30 am on August 29, 2019: contributor

I’m not sure I understand this; it seems to me like there’s three levels here:

basic p2p stuff that doesn’t really care about the details and works fine with -connect and addnode
checking the expected differences in behaviour between different types of nodes (connect, addnode, onetry, outbound, feeler, blocksonly)
checking peer discovery, and netgroup behaviour and ipv4/ipv6/tor/nat behaviour

We can do (1) now, I think we could do (2) with a variant of “addnode” that lets us vary the feeler, manual_connection and blocks_only parameters; but for (3) I think we need a real network that’s not different ports on localhost (due to IsLocal()) and also not different addresses on 192.168.0.0 or 10.0.0.0 (due to IsRoutable()), which seems like it means a real infrastructure setup rather than something that will fit into our existing test suites?

I guess I’m mostly thinking of functional tests and more emergent sorts of behaviour; mocking CConnman seems great for unit tests.

EthanHeilman commented at 3:27 pm on August 30, 2019: contributor

checking peer discovery, and netgroup behaviour and ipv4/ipv6/tor/nat behaviour

We can do (1) now, I think we could do (2) with a variant of “addnode” that lets us vary the feeler, manual_connection and blocks_only parameters; but for (3) I think we need a real network that’s not different ports on localhost (due to IsLocal()) and also not different addresses on 192.168.0.0 or 10.0.0.0 (due to IsRoutable()), which seems like it means a real infrastructure setup rather than something that will fit into our existing test suites?

You could do (3) but sticking a shim in front of the networking calls in Bitcoin allowing you to receive connections from mock peers. This should work with the existing test setup. For instance InterruptibleRecv would be overridden in tests with a MockInterruptibleRecv.

https://github.com/bitcoin/bitcoin/blob/b799ebcc17ea914b6e50f97f008f498ca31e0f36/src/netbase.cpp#L324

mzumsande commented at 7:11 pm on September 2, 2019: contributor

I played around with the ideas of @Sjors and @MarcoFalke in order to understand the challenges for functional tests better and got a POC to work with very few code changes:

A python mininode sends our node an addr message with a couple of local addresses having different ports, and more mininodes are set up to wait for connections on these ports (some tweaks to mininode necessary in order for it to wait for incoming connection and handle the other side of the VERSION logic correctly)
Regtest-only: In ProcessMessages for addr, a (so far very crude) mapping from the local ports to external IPs is introduced, so that addrman is populated with mocked IPs that are non-local and belong to different /16 IPv4 ranges. Possibly, a more sophisticated mapping as suggested by @Sjors could be benefitial.
Regtest-only: In CConnman::ThreadOpenConnections (after the bucket-specific logic is finished based on the mocked IPs), OpenNetworkConnection is called using the original local IP (using *pszDest so that isLocal() checks are skipped). On the level of CNode nothing is mocked so that we don’t need to mock networking calls.

With this, my test node will connect to up to 8 of the mininodes I set up and treat them as regular outbound peers. It will also attempt feeler connections and after 15759 is merged, blocks-only connections should be initiated, so that all kind of logic based on inbound/outbound behaviour can be tested.

Of course it is a disadvantage we need to introduce some kind of flag for functional test mock mode and change the logic based on this in at least two places.

Do you think that is an approach worth following further?

ariard commented at 2:50 am on September 3, 2019: member

I would say modifying P2P code with some regtest-only paths is a annoying, mostly if we need further unplanned changes after the 2 ones you mention to cover more cases. That’s said I’m curious about your POC, do you have a branch somewhere ?

I stayed on the idea of having multiple CConnman plugged to a MockNetworkRouter with this last one having its network-changing methods being called based on some fuzzer input.

mzumsande commented at 10:43 am on September 3, 2019: contributor

Sure, https://github.com/mzumsande/bitcoin/commits/201909_p2p_test_outbound (and I agree that it is annoying).

ajtowns commented at 11:39 am on September 3, 2019: contributor

That’s said I’m curious about your POC, do you have a branch somewhere ?

I had a go at making the p2p mininode support outbound connections from bitcoind too; it also supports non-manual outbounds (ie “addnode” but doesn’t act like “addnode”) and block-relay outbounds (per #15759) as well. See https://github.com/ajtowns/bitcoin/commits/201909-p2poutbound

maflcko removed the label good first issue on Dec 25, 2019

maflcko referenced this in commit 5af16a4db7 on Jun 12, 2020

sidhujag referenced this in commit 0fa1d20796 on Jun 13, 2020

jnewbery commented at 7:50 pm on June 18, 2020: contributor

#19315 adds the ability to test outbounds and other connection types in the test framework.

practicalswift commented at 9:49 pm on June 21, 2020: contributor

#19203 introduces a thin SOCKET wrapper which can be used for fuzzing (FuzzedSocket) or mocking. Review welcome :)

maflcko referenced this in commit 6af013792f on Jan 11, 2021

PastaPastaPasta referenced this in commit c70bb67c0e on Mar 5, 2022

vijaydasmp referenced this in commit 0ad2a86d8e on Dec 28, 2022

vijaydasmp referenced this in commit 0e11ffcd01 on Dec 29, 2022

vijaydasmp referenced this in commit 7220a8bd77 on Dec 30, 2022

vijaydasmp referenced this in commit eb52fc9c09 on Dec 30, 2022

vijaydasmp referenced this in commit 42a9c5cf8b on Dec 30, 2022

vijaydasmp referenced this in commit 9276609261 on Dec 30, 2022

vijaydasmp referenced this in commit 42df3431b7 on Jan 3, 2023

vijaydasmp referenced this in commit eb0d1d8ca0 on Jan 4, 2023

vijaydasmp referenced this in commit 7f1ba8a5b1 on Jan 4, 2023

vijaydasmp referenced this in commit 5460d64684 on Jan 4, 2023

vijaydasmp referenced this in commit d97bbba3ec on Jan 4, 2023

vijaydasmp referenced this in commit e316987eba on Jan 6, 2023

vijaydasmp referenced this in commit 5ad9c1a06b on Jan 17, 2023

vijaydasmp referenced this in commit d95e9080df on Jan 17, 2023

vijaydasmp referenced this in commit 9df469a180 on Jan 24, 2023

vijaydasmp referenced this in commit b0ceda77b0 on Jan 24, 2023

vijaydasmp referenced this in commit 10af13670b on Feb 14, 2023

vijaydasmp referenced this in commit 65d761dbdb on Feb 15, 2023

vijaydasmp referenced this in commit 71a33951ec on Feb 15, 2023

vijaydasmp referenced this in commit 4bd7335f7d on Feb 18, 2023

UdjinM6 referenced this in commit 0de5f7018a on Feb 20, 2023

vijaydasmp referenced this in commit ca0475d0d9 on Feb 20, 2023

vijaydasmp referenced this in commit ae0f0a0f7c on Feb 21, 2023

PastaPastaPasta referenced this in commit 434420f120 on Feb 28, 2023

Test coverage of our networking code #14210