Use mockable time everywhere in net

sipa commented at 9:53 pm on September 27, 2020: member

The fact that net_processing uses a mix of mockable tand non-mockable time functions made it hard to write functional tests for #19988.

I’m opening this as a separate PR as I believe it’s independently useful. In some ways this doesn’t go quite as far as it could, as there are now several data structures that could be converted to std::chrono types as well now. I haven’t done that here, but I’m happy to reconsider that.

sipa force-pushed on Sep 27, 2020

DrahtBot added the label P2P on Sep 27, 2020

sipa force-pushed on Sep 28, 2020

fanquake deleted a comment on Sep 28, 2020

DrahtBot commented at 3:53 am on September 28, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20044 (Make all of net_processing (and some of net) use std::chrono types by sipa)
#19893 (test: Remove or explain syncwithvalidationinterfacequeue by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/net_processing.cpp:4157 in 1281d7d31d outdated

4153@@ -4155,7 +4154,7 @@ bool PeerManager::SendMessages(CNode* pto)
4154             // Only actively request headers from a single peer, unless we're close to today.
4155             if ((nSyncStarted == 0 && fFetch) || pindexBestHeader->GetBlockTime() > GetAdjustedTime() - 24 * 60 * 60) {
4156                 state.fSyncStarted = true;
4157-                state.nHeadersSyncTimeout = GetTimeMicros() + HEADERS_DOWNLOAD_TIMEOUT_BASE + HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER * (GetAdjustedTime() - pindexBestHeader->GetBlockTime())/(consensusParams.nPowTargetSpacing);
4158+                state.nHeadersSyncTimeout = current_time.count() + HEADERS_DOWNLOAD_TIMEOUT_BASE + HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER * (GetAdjustedTime() - pindexBestHeader->GetBlockTime())/(consensusParams.nPowTargetSpacing);

MarcoFalke commented at 6:27 am on September 28, 2020:

This is not type safe, because someone could changed the type of current_time to have more or less precision than it has now. Please see the count_ helpers.

sipa commented at 6:56 am on September 28, 2020:

That’s indeed better, done.

A bit more abstractly, I don’t think that’s very different from the risk of nHeadersSyncTimeout being changed to represent a time in nanoseconds instead of microseconds, for example. We’re mixing std::chrono types with integers, and the conversions will always involve these risks. The correct approach is to get rid of all of those, and replace them all with std::chrono types. I’m happy to look into that as follow-up PR.

in src/net_processing.cpp:4632 in 1281d7d31d outdated

4631@@ -4636,7 +4632,6 @@ bool PeerManager::SendMessages(CNode* pto)
4632             !pto->HasPermission(PF_FORCERELAY) // peers with the forcerelay permission should not filter txs to us
4633         ) {
4634             CAmount currentFilter = m_mempool.GetMinFee(gArgs.GetArg("-maxmempool", DEFAULT_MAX_MEMPOOL_SIZE) * 1000000).GetFeePerK();
4635-            int64_t timeNow = GetTimeMicros();

MarcoFalke commented at 6:31 am on September 28, 2020:

Is there any reason to subtract the time it took to send previous messages from the next send? It probably doesn’t matter too much, because we assume that sending all messages is instantaneous, but getting the current time is also trivial complexity, so I have a slight preference to keep the getter here.

sipa commented at 7:04 am on September 28, 2020:

I generally think we’re making way too many time calls already, and I doubt microsecond precisions matters at all for the fee filter sending.

ajtowns commented at 0:36 am on October 7, 2020:

a) We’re not actually sending the messages there, just moving them in memory to a different queue, so it should take microseconds b) This is often only called in intervals of 100ms, and randomly delayed depending on how long it takes to deal with other peers, so accuracy won’t be that fine-grained anyway c) The only times affected by not using an updated timestamp here are ones chosen in comparison to 5min or 10min intervals

So I don’t see how not using a bumped time here can have any meaningful impact?

in test/functional/wallet_resendwallettransactions.py:69 in 1281d7d31d outdated

63@@ -64,6 +64,10 @@ def run_test(self):
64         # Transaction should be rebroadcast approximately 24 hours in the future,
65         # but can range from 12-36. So bump 36 hours to be sure.
66         node.setmocktime(now + 36 * 60 * 60)
67+        # Give some time for trickle to occur
68+        time.sleep(1)
69+        node.syncwithvalidationinterfacequeue()

MarcoFalke commented at 6:32 am on September 28, 2020:

Can you explain why this is needed, what this is trying to achieve? #19893

sipa commented at 7:05 am on September 28, 2020:

I copied it from elsewhere; it works fine without the syncwithvalidationinterfacequeue call, so I’ve removed it.

in src/net_processing.cpp:4662 in 1281d7d31d outdated

4662             // until scheduled broadcast, then move the broadcast to within MAX_FEEFILTER_CHANGE_DELAY.
4663-            else if (timeNow + MAX_FEEFILTER_CHANGE_DELAY * 1000000 < pto->m_tx_relay->nextSendTimeFeeFilter &&
4664+            else if (current_time.count() + MAX_FEEFILTER_CHANGE_DELAY * 1000000 < pto->m_tx_relay->nextSendTimeFeeFilter &&
4665                      (currentFilter < 3 * pto->m_tx_relay->lastSentFeeFilter / 4 || currentFilter > 4 * pto->m_tx_relay->lastSentFeeFilter / 3)) {
4666-                pto->m_tx_relay->nextSendTimeFeeFilter = timeNow + GetRandInt(MAX_FEEFILTER_CHANGE_DELAY) * 1000000;
4667+                pto->m_tx_relay->nextSendTimeFeeFilter = current_time.count() + GetRandInt(MAX_FEEFILTER_CHANGE_DELAY) * 1000000;

MarcoFalke commented at 6:33 am on September 28, 2020:

none of the .count in this scope are type safe (see comment above)

sipa commented at 7:05 am on September 28, 2020:

Fixed.

MarcoFalke commented at 6:34 am on September 28, 2020: member

Concept ACK, but I don’t like that this patch makes the code more fragile.

naumenkogs commented at 6:42 am on September 28, 2020: member

Concept ACK

sipa force-pushed on Sep 28, 2020

MarcoFalke added the label Refactoring on Sep 28, 2020

practicalswift commented at 1:13 pm on September 28, 2020: contributor

Concept ACK

in test/functional/p2p_tx_download.py:162 in e5c0223614 outdated

174-
175-        self.test_inv_block()
176-
177-        self.test_tx_requests()
178+        # Run each test against new bitcoind instances, as setting mocktimes has long-term effects on when
179+        # the next trickle relay event happens.

ajtowns commented at 0:14 am on October 7, 2020:

I’m not sure it’s a good idea to link mockable time to network delays in general – if you bump the mocktime by a hours/days that shouldn’t risk triggering “hey this peer hasn’t said anything for ages, let’s disconnect” conditions, and equally if you backdate the mocktime for some reason, that shouldn’t cause network tasks to say “oh, we decided we don’t need to do anything until time t, which is apparently years away” and effectively hang. Having to tweak tests feels a bit like a canary…

My intuition for this was we have two sorts of delays – short ones (<30s) that just prevent us busy looping when there’s nothing to actually do and don’t generally need to be precisely controlled, and long ones (minutes/hours/days) that are human visible and that we’ll obviously need mocktime to test because minutes/hours/days is effectively forever.

But maybe we’ve really got two different sorts of mocking that we want to do – “set the time to X” and it stays there until we’re ready to set it to some different time “Y” (what we’ve got now), and “everything that was going to happen in the next 5 minutes, get it done now” (something more like what the mockscheduler rpc does)? You could implement the latter by bumping the system clock (rather than replacing it), and only allowing the bump amount to increase.

I suspect at a minimum we’d want to switch to the std::chrono::time_point abstraction and use different clocks so that we could tell these cases apart by type, but that doesn’t seem like a good thing to dive into immediately before feature freeze. (I had a sketch I was working on a while back at https://github.com/ajtowns/bitcoin/commits/201908-systime fwiw)

And mocktime is only relevant for regtest, so the downsides of this approach are pretty limited (presumably to just these test case changes)

sipa commented at 10:40 pm on October 7, 2020:

I think we’ll indeed a number of timers for different purposes, eventually:

System time (possibly adjusted by peer data), for use in block and wallet transaction timestamps (as they have an absolute UNIX epoch as reference ).
Steady time, mockable, for timing application level events (tx relay, pinging, addr rebroadcast, …).
Steady time, for avoiding busy looping (like select() timeout, message handler 100ms delay, …).

The last one indeed I think shouldn’t be mockable, but things like ping timeouts I think should be. It just becomes too complicated when multiple clocks are mixed within one layer of the code - even if it means using a mockable clock for some delays that are in practice very short.

As for dealing with the fact that making small delays mockable complicates testing, I think the best solution is a “add X to mock time offset, but keep the clock running”, if that’s what you meant.

MarcoFalke commented at 9:40 am on October 8, 2020:

I disagree that system time (used for blocks and wallet txs) should not be mockable. For example, sometimes we want to create a deterministic blockchain, so the block times need to be fixed. I know it is possible to use the python block generator, but I don’t see a strong reason to disallow system time to be mocked.

Edit: So it is also important to be able to say “set time to a constant” instead of just “add offset, but keep clock running”.

ajtowns commented at 7:54 pm on October 8, 2020:

I think the idea was to have two ways of mocking – fix the system time to X (and prevent it from advancing, so time goes t, t+1, t+2, t+3, X, X, X, X), or bump the system time by D (so time goes t, t+1, t+2, t+3, t+D+3, t+D+4, t+D+5).

in src/net_processing.cpp:562 in e5c0223614 outdated

558@@ -559,7 +559,7 @@ static bool MarkBlockAsReceived(const uint256& hash) EXCLUSIVE_LOCKS_REQUIRED(cs
559         }
560         if (state->vBlocksInFlight.begin() == itInFlight->second.second) {
561             // First block on the queue was received, update the start download time for the next one
562-            state->nDownloadingSince = std::max(state->nDownloadingSince, GetTimeMicros());
563+            state->nDownloadingSince = std::max<int64_t>(state->nDownloadingSince, GetTime<std::chrono::microseconds>().count());

ajtowns commented at 0:39 am on October 7, 2020:

Swich to count_microseconds(GetTime<std::chrono::microseconds>()) and lose the <int64_t> ?

sipa commented at 11:24 pm on October 7, 2020:

Done.

in src/net_processing.cpp:3640 in e5c0223614 outdated

3609@@ -3610,7 +3610,7 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
3610                     // Matching pong received, this ping is no longer outstanding
3611                     bPingFinished = true;
3612                     const auto ping_time = ping_end - pfrom.m_ping_start.load();
3613-                    if (ping_time.count() > 0) {
3614+                    if (ping_time.count() >= 0) {

ajtowns commented at 1:29 am on October 7, 2020:

Not really directly related? I think this makes sense for instances where time isn’t increasing, which can happen with mocktime or (I think) on systems that return cached values of the system time.

sipa commented at 11:23 pm on October 7, 2020:

WIthout this you see “timing mishap” errors in the function test logs.

ajtowns commented at 0:14 am on October 8, 2020:

Right, but I mean you already see those errors without the other changes in this patch.

sipa commented at 2:12 am on October 8, 2020:

Oh, ok. Gone.

ajtowns commented at 2:37 am on October 8, 2020:

Err? A separate commit with Allow for ping response time of 0us (avoids "timing mishap" errors when mocktime is set) or similar would be fine and clear IMO, seems a bit trivial to separate into a PR of its own. (Or just leaving it in and letting people who’re curious trace it back to this PR and read the explanation above would also be fine) But either way.

sipa commented at 2:41 am on October 8, 2020:

Ok!

ajtowns approved

ajtowns commented at 2:34 am on October 7, 2020: member

ACK e5c0223614d61d3d885b1c7eaf5ee095cd7d15ed

Checked conversion from GetTime() to count_microsecounds(GetTime<>()) and that variables that are now mockable are only compared with mockable times.

jnewbery commented at 8:41 am on October 7, 2020: member

ACK e5c0223614d61d3d885b1c7eaf5ee095cd7d15ed

sipa force-pushed on Oct 7, 2020

sipa force-pushed on Oct 8, 2020

in test/functional/wallet_resendwallettransactions.py:68 in 44b17b2a4d outdated

63@@ -64,6 +64,9 @@ def run_test(self):
64         # Transaction should be rebroadcast approximately 24 hours in the future,
65         # but can range from 12-36. So bump 36 hours to be sure.
66         node.setmocktime(now + 36 * 60 * 60)
67+        # Give some time for trickle to occur
68+        time.sleep(1)

MarcoFalke commented at 7:26 am on October 8, 2020:

It seems a bit confusing to “use mockable time everywhere”, but then start adding wallclock sleeps in tests. Also, the comment refers to “trickle”, but should probably say “wallet resend”. (The trickle happens when you call setmocktime + 600 in the next line). I suggest the following diff:

 0diff --git a/test/functional/wallet_resendwallettransactions.py b/test/functional/wallet_resendwallettransactions.py
 1index 9336f94ee3..f2f48b8bcd 100755
 2--- a/test/functional/wallet_resendwallettransactions.py
 3+++ b/test/functional/wallet_resendwallettransactions.py
 4@@ -64,8 +64,8 @@ class ResendWalletTransactionsTest(BitcoinTestFramework):
 5         # Transaction should be rebroadcast approximately 24 hours in the future,
 6         # but can range from 12-36. So bump 36 hours to be sure.
 7         node.setmocktime(now + 36 * 60 * 60)
 8-        # Give some time for trickle to occur
 9-        time.sleep(1)
10+        # Tell scheduler to call MaybeResendWalletTxs now
11+        node.mockscheduler(1)
12         node.setmocktime(now + 36 * 60 * 60 + 600)
13         peer_second.wait_for_broadcast([txid])
14

sipa commented at 8:05 am on October 8, 2020:

Done.

MarcoFalke approved

MarcoFalke commented at 7:39 am on October 8, 2020: member

ACK 93fca5fbd84ff793625f1c2bd469cfee68aa07f1 🐎

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK 93fca5fbd84ff793625f1c2bd469cfee68aa07f1 🐎
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUg9lwv9Hr28B7UGDp2zRH285YsHrhEiqoQTWbSpuL9dQPRWTAD4ygHN9qW4TbZt
 8DOsuHRZOAjFBinWH/gYIUUW3F8WvIchHMgNY+p4+IkNbYIAMruDmMvbDTldUq0Hu
 9n9QzyB6Oz2OW9p8hMFJTbPB+8rVwMwA4H8ZwooghiKNszd/sFVFi5j5Pxgd6XlFi
10MDkYpUmS3wQt1Kb2ya4SyOVPdlGQpje9ASjqCNSDz+CJOoJ9fFRZ+txEgn5lgTKI
11o/oPoNVd2ZIQamx79+1y4kzLmkBBxbxP96ulC6l1UCMA9kKzX1m6yafSgN4Lc3Wo
1201G5qBGawYXtWb48LPQGEYhVVIM0AcukRvpMA1TBpXkVPKDtvO/8/QEqkMZRnmaI
13C10s9QP3TNuGhkIY4AUwVQkFmQEbWaePNVVdMd/ee3WO1N2VKuiH/+9/zxpMq9cf
14tuoffg+0Y2MhRxVFjey23gjDoea+OQjo2KfRApyhELaqejX48k9A3oUbwW7NrUYb
15IznFTsZW
16=b0jx
17-----END PGP SIGNATURE-----

Timestamp of file with hash 03cb2a91902242d76811cd547cea6874233f1f46c555bd142069e3cbfa6b48fc -

Use mockable time everywhere in net_processing ec3916f40a

Avoid 'timing mishap' warnings when mocking b6834e312a

sipa force-pushed on Oct 8, 2020

MarcoFalke commented at 8:15 am on October 8, 2020: member

ACK b6834e312a 🌶

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK b6834e312a 🌶
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUiqVAwAkgjqSWeoLAE5Eej6vdWXcdAqJlztAhCbdlyCZJJRd/Wu34rtBvHC0fwW
 8UsmVfevPkIL4ejJqUsII+h2QidQv8Ie//EiX+9HtdeQIbpX0AgWJd6mb5jdS8NJw
 9sXwpFfL+x+3H8weUk7mxlIvPtYLZm1VsrlhWR6QW1HG0wL3o9/MVNho3c78lOjxL
10nQz//fbX6hCLD067gE2ujx7wJsBzbux8jn57oifZq2KPK+6ENmVy8s63u7qDocRw
11Gc7xpBp2KmnRlfX0Juik8QjNOGOIuNBytRhuZwaLNWKupQT3ItxGWvI8jk7G8mov
12cKL6csUlRx7iM0z3bHjzZRabbOp8lT29chErWaB6mprVpmC0QJMEw/+JkW6NARnd
13LNpniw8ttC6Y4Nyci0DCQsDLzyQ942I6tGTWV+gR7XqHMC3DRd5/zCJsP/zhkrXT
14Aac9tNB45brc8Vl534j+og+pZ7oxSKug6IUwmImavdqRjl0rq/+RsTdFr7Vk9IeG
152MJK2Tjc
16=pOLL
17-----END PGP SIGNATURE-----

Timestamp of file with hash df57ab8fe3ed162f8bb109f94ab3daf5cb2dd502b92b384bc065c8b7c69084eb -

jnewbery commented at 9:17 am on October 8, 2020: member

utACK b6834e312a6a7bb395ec7266bc9469384639df96

fanquake requested review from ajtowns on Oct 8, 2020

naumenkogs commented at 9:31 am on October 8, 2020: member

utACK b6834e3

I agree with the comments by @ajtowns, but I guess it can only introduce problems on regtest? And since this PR is necessary for #19988, I think it’s fine, that can be fixed later.

MarcoFalke merged this on Oct 8, 2020

MarcoFalke closed this on Oct 8, 2020

in src/net_processing.cpp:585 in b6834e312a

581@@ -582,7 +582,7 @@ static bool MarkBlockAsReceived(const uint256& hash) EXCLUSIVE_LOCKS_REQUIRED(cs
582         }
583         if (state->vBlocksInFlight.begin() == itInFlight->second.second) {
584             // First block on the queue was received, update the start download time for the next one
585-            state->nDownloadingSince = std::max(state->nDownloadingSince, GetTimeMicros());
586+            state->nDownloadingSince = std::max(state->nDownloadingSince, count_microseconds(GetTime<std::chrono::microseconds>()));

rebroad commented at 3:35 pm on February 23, 2021:

seems like a backwards step turning neat code into lengthy code - why not use a function for this or change GetTimeMicros() itself?

Fabcien referenced this in commit 6f09bb224d on May 18, 2021

DrahtBot locked this on Aug 16, 2022

Use mockable time everywhere in net_processing #20027

Conflicts