net: Send txs without delay on regtest

MarcoFalke commented at 9:01 pm on April 23, 2019: member

Send txs immediately to all peers, just like blocks are solved immediately on regtest.

Overall, the tests run three times as fast for me now, because they no longer wait for the poisson tx send to timeout.

This is only a regtest change and it is possible to disable it with -tx_relay_force_flush=0 should specific testing needs require it.

Alternative to:

Faster tests with topological mempool rpc sync 🚀 #15858

sipa commented at 9:02 pm on April 23, 2019: member

I don’t think we want to do this generically, as it will prevent testing the normal relay logic?

MarcoFalke commented at 9:07 pm on April 23, 2019: member

This preserves the normal relay logic except that the delay is always zero (for inbound and outbound peers). Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

naumenkogs commented at 9:20 pm on April 23, 2019: member

Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

It’s not fundamentally impossible though, is it? I think remember using part of some stuck PR (Luke’s?) to test both directions.

Anyway, could you motivate this change a bit? I see some description in #15858, but it is closed now :)

fanquake added the label P2P on Apr 23, 2019

MarcoFalke commented at 9:38 pm on April 23, 2019: member

Added a commit to make it a regtest command line flag

DrahtBot commented at 10:52 pm on April 23, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15759 ([p2p] Add 2 outbound blocks-only connections by sdaftuar)
#14856 (net: remove more CConnman globals (theuni) by dongcarl)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

gmaxwell commented at 5:29 am on April 24, 2019: contributor

I really don’t like special casing the tests like this, the more we do this the more tests become useless just for show things– and, of course, the special casing code itself can contribute to bugs too. Wouldn’t it be better to run some of these tests in parallel so that their latency is less relevant?

AM5800 commented at 8:13 am on April 24, 2019: none

@gmaxwell but if there are no tests for delay, what is the point of keeping delay in tests at all? If it breaks, no one will notice it anyway.

Also on:

Testing that there is a difference in the delay between inbound and outbound peers was already impossible, I believe, since we can not create outbound connections.

How about starting 3 nodes like this: node0 -> node1 -> node2 and then generating txs on node1. One then can measure how long it took node0 and node2 to receive those transactions (by polling for mempool contents or even by logs).

AM5800 commented at 8:20 am on April 24, 2019: none

By the way, @MarcoFalke, if your intent is to only speedup several tests in which p2p is not test subject - you can just whitelist peers in such tests.

MarcoFalke commented at 12:03 pm on April 24, 2019: member

I’d rather not whitelist peers in tests. Not only is it tedious to pass in the command line flag, but also it affects DoS scores and some other things, I believe.

jnewbery commented at 2:00 pm on April 24, 2019: member

Concept ACK adding this as default in the functional tests.

I don’t think we want to do this generically, as it will prevent testing the normal relay logic

I can’t think of any cases in the functional tests where we do anything except submit a transaction and wait for it to propagate. Running with m_tx_relay_force_flush will achieve exactly the same except that wait will be much shorter.

Notably, I don’t think there are any tests in the functional test suite that explicitly test the poisson distribution for tx relay. Really, that should be tested at the unit test level. We could add a single functional test with m_tx_relay_force_flush disabled if we want a higher-level test that simply tests that tx relay is not completely broken.

I really don’t like special casing the tests like this

In general I agree, but the change here is so minor that I think it’s ok. I’m much more concerned about other shortcomings in our functional test suite, such as not having peers outbound peers or peers on non-local network addresses.

Wouldn’t it be better to run some of these tests in parallel

That’s already the default

sdaftuar commented at 2:39 pm on April 24, 2019: member

My instinct here is that our design goal should be to make the p2p connections in regtest be more like mainnet, and not less. We already have a problem that in regtest, we don’t exercise our networking code enough (as mentioned in #14210 – an issue I opened after I got the impression that many people were unaware of our code-coverage deficiences with respect to p2p behavior). So adding more special cases seems like it’s going in the wrong direction.

On the other hand it would be great for the tests to run faster. My suggestion would be to adopt an approach more like #15858 was trying to do, where we leave bitcoind unchanged but choose to make certain tests opt-out of testing the p2p behavior. I think that also makes it easier to reason about what a test is testing, if you can see clearly in the test what parts of the code are being exercised (rather than implicitly relying on some non-standard behaviors we’ve opted bitcoind into in the test framework).

I only briefly looked at #15858 so far but I think it seemed to be going in the right direction. Having individual tests be modified one-by-one to opt into a faster behavior seems fine to me; reviewers can determine whether the code coverage change is logical or not based on what the test is doing and what the rest of the test suite already covers. The only caveat I’d add is that I think we should make it more explicit whether a call to sync_mempools() is exercising the p2p sync logic or not, to make the tests more readable and understandable.

As an aside, I think someone should just add a test that tests poisson delays – that is currently possible in our test framework (inbounds can be tested via mininode, outbounds can be tested with multiple bitcoind’s because there’s no difference in our p2p logic between addnode peers and regular outbound peers, I believe).

So I’m a weak concept NACK on adding functionality to bitcoind that allows changing the relay delays; I’m a strong concept NACK on defaulting all our functional tests into special-cased behavior (if we were to go with this approach, I think we should only opt-in tests that we choose to exercise non-standard behavior on, rather than take a blanket approach).

Regarding parallelization, I think the point is that perhaps developers can immediately start saving their own time (and achieve the same benefits of this PR) by just increasing the number of simultaneous jobs they run with? I often run test_runner.py with the defaults myself, and this PR has reminded me that I could cut the time down significantly by adding more parallelization, so I will try to remember that going forward.

MarcoFalke commented at 3:35 pm on April 24, 2019: member

The response to “the tests run slow” shouldn’t be “just run more of them in parallel”. We might have users or developers that would like to run the tests on weak hardware (e.g. dual core cpu or hdd). Those constraints set a limit on how many tests can be run in parallel. Some of the tests might have random sleeps in them, but you can’t rely on those because they are – after all – random.

It already takes about an hour to compile from scratch and run all tests on a dual core cpu with hdd. I don’t see how compilation could be sped up any time soon (it would probably require getting rid of the boost hpp includes and the serialize.h includes, neither trivially possible). So having the tests run faster with a low risk setting like this seems like a nice way to kick the can down the road.

Because the flag can be set on the command line, it shouldn’t hinder anyone from increasing the test coverage of our tx relay code.

I do see the argument that regtest should be as close to mainnet as possible, but there is also a reason to have some regtest values differ from mainnet, when they aid testing.

jnewbery commented at 3:53 pm on April 24, 2019: member

:slightly_frowning_face: I much prefer this approach to #15858 and don’t see why adding 30 lines of gnarly topo-sort code in the test framework is in any way preferable to adding the one-line change:

0            bool fSendTrickle = pto->fWhitelisted;
1            bool fSendTrickle = connman->m_tx_relay_force_flush || pto->fWhitelisted;

in SendMessages(). If the concern is whether the faster sync behaviour should be default of not, then it’s equally possible to make either change default to on or off.

The only change in this PR is one of timing. The other PR completely changes how transactions are relayed between nodes on the test network, so in fact this PR is results in less difference in behaviour from a ‘real’ network. I can easily come up with scenarios where a regression in tx relay would be caught with this change, but hidden by the change in #15858 which basically removes p2p tx relay between nodes under test.

So my preference would be to use this PR, but default to off to satisfy the concerns of @sipa @gmaxwell and @sdaftuar. It can be explicitly enabled in the tests that @MarcoFalke picked out in 15881.

gmaxwell commented at 6:08 pm on April 24, 2019: contributor

The response to “the tests run slow” shouldn’t be “just run more of them in parallel”

Why not? At least to the extent that slowness is due to sleeps in the protocol design then running them in parallel is a perfectly reasonable response to it.

but if there are no tests for delay, what is the point of keeping delay in tests at all? If it breaks, no one will notice it anyway.

Because they are part of the system. The tests (for the most part) are system tests. Even if there is not a specific test for the delay behaviour in most possible ways that it could break, if it broke, it would break the test. Consider, if a rare crasher bug, say a divide by zero, were introduced in PoissonNextSend then currently the tests could eventually catch it even though there is no direct test– but if that code was never run it could never be caught by the tests.

Also, having additional conditional behaviour in the codebase complicates the code – opportunities for confusion, time wasted fixing bugs (or testing code) which is specific to the test environment. What happens when some bool sign sense gets flipped and the delay skipping gets turned on in mainnet? Or when there is a race in getdata handling that doesn’t work without delays? Additional conditional test only code also frustrates efforts to improve the code by hitting high multiple-condition-decision branch coverage (because of dead production code in tests, and dead test code in coverage).

Three times faster is no joke. But the further we make the tested system less like the actual system the less useful the tests become. We could make them infinity faster by removing them completely… @jnewbery ’s one line version is a lot more attractive. And while I wouldn’t seen any problem if some particularly tx-relay-serailization-delay sensitive tests used it, we really shouldn’t be doing the bulk of our system testing that way simply because its different from production. We should be trying to minimize the difference to production… What we could do is identify tests where the p2p behaviour is irrelevant to the test itself and the source of a large delay and bypass it for just those tests. However, we should take care that doing so doesn’t leave some part of the production system unused in the tests (e.g. we shouldn’t do this to all tests that relay transactions…).

I feel like maybe we’ve lost the distinction between system and unit testing, and we’re asking tests which are structurally system tests to work and perform like unit tests. In general I’m not the biggest fan of unit testing and prefer to achieve as much tests via system tests as possible, but when performance considerations are causing us to test increasingly more modified versions of the system it might make sense to split things up and stop trying to use the whole system when what we’re trying to do is a unit test.

kostyantyn commented at 8:55 am on April 25, 2019: contributor

It looks that the main concern is that this PR introduces a new case condition and if we misapply it, some part of the code (concretely PoissonNextSend) might not be executed when it’s expected to be called or another way around.

What if we instead of enabling/disabling the delay, we pass a parameter to the function (actually PoissonNextSend already accepts average_interval_seconds but seems it’s better to introduce another one). Then we are sure that this code is executed, every operand is taken into account, we write unit test for values we use in testnet/mainnet, and we can set the desired parameter for regtest that will cancel out the delay.

instagibbs commented at 7:26 pm on April 25, 2019: member

@kostyantyn yes I think if this kind of approach is to be taken at all, it should be parameterized and shortened by default.

gmaxwell commented at 0:07 am on April 26, 2019: contributor

@kostyantyn The “unit test” aspect is only part of it. In complex systems many interesting bugs emerge out of interactions of multiple parts. If all the tests significantly change the delays then conditions that are only reachable in the presence of delays won’t get hit. Testing both with and without delays is probably better, but if only one is done it should be kept closer to actual use.

MarcoFalke force-pushed on Apr 26, 2019

jonasschnelli commented at 7:16 am on April 29, 2019: contributor

I tend to agree with @gmaxwell, @sdaftuar, @sipa. Optimizing tests for performance by reducing test-authenticity seems suboptimal to me.

We might have users or developers that would like to run the tests on weak hardware (e.g. dual core cpu or hdd). Those constraints set a limit on how many tests can be run in parallel.

IMO the last important property a test should have is beeing fast (not saying its completely irrelevant).

MarcoFalke commented at 7:49 pm on April 30, 2019: member

Devs are less likely to run the tests (or a specific test) if it takes too long to run. We might as well remove slow tests if only one or two devs run them on high-end hardware. Long tests are excluded on travis and I doubt that a bunch of developers on different architectures pass in --extended.

Moreover, our unit or functional tests are horribly unsuitable to find actual issues with the p2p code in a scenario such as mainnet. The transaction workload is trivial and highly synthetic, not at all comparable to mainet.

I do see the argument of dead code paths when there is no delay by default, so I’ll rework this pull to default to the mainnet setting unless a test opts in to the 0-delay. Testing both settings a bit might actually lead to more test coverage, as race conditions might be easier to trigger with a 0-delay.

MarcoFalke force-pushed on Apr 30, 2019

net: Send txs without delay on regtest fa614dec3f

MarcoFalke force-pushed on Apr 30, 2019

Add regtest command line flag tx_relay_force_flush fadb718bf2

MarcoFalke force-pushed on Apr 30, 2019

jnewbery commented at 8:11 pm on April 30, 2019: member

Devs are less likely to run the tests (or a specific test) if it takes too long to run.

Totally agree with this. I doubt anyone runs the dbcache test for example.

When rebasing a PR with many commits, I would like to run the entire test suite over every commit to make sure that no intermediate commits break anything. Having very long running tests precludes this.

our unit or functional tests are horribly unsuitable to find actual issues with the p2p code in a scenario such as mainnet. The transaction workload is trivial and highly synthetic, not at all comparable to mainet.

Again, very much agree with this. My view is that the functional tests test functionality. If we want to catch p2p issues we should build a stress test and a soak test environment, and also run a bunch of nodes on mainnet with heavy diags/logging/analytics/etc.

MarcoFalke closed this on Apr 30, 2019

MarcoFalke deleted the branch on Apr 30, 2019

laanwj commented at 7:02 am on May 1, 2019: member

Devs are less likely to run the tests (or a specific test) if it takes too long to run. We might as well remove slow tests if only one or two devs run them on high-end hardware. Long tests are excluded on travis and I doubt that a bunch of developers on different architectures pass in –extended.

IMO fast tests are very good. It encourages running the tests more often, makes Travis faster, and usually indicates the tests don’t spend a lot of time on waiting and polling loops, or doing unnecessary stuff that doesn’t increase test coverage and causes potential random errors (and ever higher parallelism increases this latter risk by affecting delays randomly).

So in this case: yes, normal transaction propagation should be tested, but also, this is only necessary in part of the tests that deal with testing the P2P network. In the rest it’s allowable to make some shortcuts, like disabling or speeding up randomized behavior.

What I don’t like (in this particular approach) is that bitcoind accumulates more test-specific behavior and functions, which make it harder to maintain and which (as @gmaxwell says) blurs the line between unit and functional tests, so I personally prefer #15858’s appraoch. That’s not a hard NACK on this though. I was just missing the context when seeing that PR closed.

MarcoFalke restored the branch on May 1, 2019

MarcoFalke commented at 5:38 pm on May 1, 2019: member

I personally prefer #15858’s appraoch

That other approach is going to bypass p2p tx relay completely. So is guaranteed to decrease test coverage.

This pull request is a one-line patch to net_processing (off by default) so that both scenarios are tested. This is increasing test coverage, as I explained above.

fanquake referenced this in commit 270616228b on Nov 7, 2019

MarcoFalke deleted the branch on Nov 26, 2021

PastaPastaPasta referenced this in commit 7792d63d25 on Jun 19, 2022

PastaPastaPasta referenced this in commit 262671182d on Jun 19, 2022

DrahtBot locked this on Nov 26, 2022

net: Send txs without delay on regtest #15881

Conflicts