Switch CNode::m_ping_start and CNetMessage::m_time to mockable time, so that tests can be added.
Mockable time is also type-safe, since it uses std::chrono
Switch CNode::m_ping_start and CNetMessage::m_time to mockable time, so that tests can be added.
Mockable time is also type-safe, since it uses std::chrono
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.<!--2502f1a698b3751726fa55edcda76cd3-->
| Coverage | Change (pull 18638, 0f96c4b9c8eed32c3fdf1203a2f9a2be59a5f4a2) | Reference (master, 75fcfdaf8b4b68ef1d1fcd606fb36c9265e15918) |
|---|---|---|
| Lines | +0.0020 % | 90.3582 % |
| Functions | +0.1431 % | 85.0787 % |
| Branches | +0.0089 % | 51.8202 % |
<sup>Updated at: 2020-04-15T00:09:25.322904.</sup>
Concept ACK.
Won't this potentially cause issues (e.g. ping timeouts) in P2P tests that already use mockable time for other things, but are not expecting it to have an effect on pings?
(seems not, given that all the tests still pass)
The -peertimeout is still measured in wall clock time, and because most of the test run through faster than what the -peertimeout is, this should not lead to issues in general. There are two tests test/functional/feature_maxuploadtarget.py and test/functional/feature_bip68_sequence.py which mine a lot of blocks (more than 100) under mocktime. This can take time when the tests are run under valgrind, so I had to bump the -peertimeout for those tests.
Other tests should not be affected. If they are, it should be trivial to fix them up.
Concept ACK: mockable is better than non-mockable :)
Rebased
97 | + ping_delay = 29 98 | + self.mock_forward(ping_delay) 99 | + wait_until(lambda: 'ping' in no_pong_node.last_message) 100 | + no_pong_node.send_and_ping(msg_pong(no_pong_node.last_message.pop('ping').nonce)) 101 | + self.check_peer_info(pingtime=ping_delay, minping=ping_delay, pingwait=None) 102 | +
From what I understand, minping is the best/shortest round trip time measured so far. Thus, would it be a good idea to repeat this chunk with a shorter ping_delay in order to check that minping gets updated correctly? I tried with ping_delay = 25 locally and it was updated from 29 to 25 as expected.
Also I'm just wondering, why is ping_delay 29? I'm not sure if it's arbitrary or a chosen magic number.
Good point! Will extend the test as suggested by you.
60 | + self.mock_forward(3) 61 | + assert no_pong_node.last_message.pop('ping').nonce != 0 62 | + self.check_peer_info(pingtime=None, minping=None, pingwait=3) 63 | + 64 | + self.log.info('Reply without nonce cancels ping') 65 | + with self.nodes[0].assert_debug_log(['pong peer=0: Short payload']):
Be aware that a msg_pong_corrupt is sent (not a msg_pong)
Ah, that's it 👍
73 | @@ -74,7 +74,7 @@ void test_one_input(const std::vector<uint8_t>& buffer) 74 | p2p_node.SetSendVersion(PROTOCOL_VERSION); 75 | g_setup->m_node.peer_logic->InitializeNode(&p2p_node); 76 | try { 77 | - (void)ProcessMessage(&p2p_node, random_message_type, random_bytes_data_stream, GetTimeMillis(), Params(), *g_setup->m_node.mempool, g_setup->m_node.connman.get(), g_setup->m_node.banman.get(), std::atomic<bool>{false});
I think microseconds is correct (I think millis would be a bug if this wasn't in a unit test), but since it looks like you're switching from millis to micros, could this be an issue at all? The test passes for me.
Yes, this looks like a bug in the test to me as well. Luckily by making the time type safe, this is less likely to happen in the future.
Regardless, instead of depending on the current time, which is not deterministic and might cause issues while fuzzing, the fuzz test should ask the fuzz engine for a time value.
However, this seems out of scope for the changes in this pull request.
Was able to build and all the tests passed locally. I don't know enough to solidly review the cpp code, but I tried to be more thorough with the functional tests. Everything makes sense to me, I just left a few questions.
Addressed feedback by @gzhao408
Concept ACK. Reviewing/testing.
685 | @@ -686,7 +686,8 @@ const uint256& V1TransportDeserializer::GetMessageHash() const 686 | return data_hash; 687 | } 688 | 689 | -CNetMessage V1TransportDeserializer::GetMessage(const CMessageHeader::MessageStartChars& message_start, int64_t time) { 690 | +CNetMessage V1TransportDeserializer::GetMessage(const CMessageHeader::MessageStartChars& message_start, const std::chrono::microseconds& time)
std::chrono times are cheaply copied ints; would it be simpler/safer/faster to pass by value rather than reference to const?
If yes, idem in net.h::L645+698, net_processing.cpp::L2085, and fuzz/process_message.cpp::L32
Reference: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-in
I always follow rule 1 (const&), because it is always cheap and prevents the overhead of having to think how many words a type has.
void f1(const string& s); // OK: pass by reference to const; always cheap
Changed, though.
1141 | @@ -1141,9 +1142,9 @@ void CConnman::InactivityCheck(CNode *pnode) 1142 | LogPrintf("socket receive timeout: %is\n", nTime - pnode->nLastRecv); 1143 | pnode->fDisconnect = true; 1144 | } 1145 | - else if (pnode->nPingNonceSent && pnode->nPingUsecStart + TIMEOUT_INTERVAL * 1000000 < GetTimeMicros()) 1146 | + else if (pnode->nPingNonceSent && pnode->m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL} < GetTime<std::chrono::microseconds>())
Not changed by this PR, but noting that this seems to be the only place in the codebase where nPingNonceSent is tested for truthiness with implicit conversion instead of against an integer value.
Indeed unrelated.
560 | 561 | // Raw ping time is in microseconds, but show it to user as whole seconds (Bitcoin users should be well used to small numbers with many decimal places by now :) 562 | stats.m_ping_usec = nPingUsecTime; 563 | stats.m_min_ping_usec = nMinPingUsecTime; 564 | - stats.m_ping_wait_usec = nPingUsecWait; 565 | + stats.m_ping_wait_usec = count_microseconds(ping_wait);
Do you think it would be better to eventually convert these primitives to duration to no longer need helpers like count_microseconds? (or am I confused)
I admit that the documentation isn't the best https://doxygen.bitcoincore.org/time_8h.html#a50aae909d5edf426afb66cd4d30c8d00
However, those helpers will be needed to convert durations to raw integers for:
Extended documentation
Thanks!
ACK fa704719529 good to reduce use of primitives for counting time in favor of the standard library and mockability, though it's fiddly and easy to introduce regressions while doing so (thanks for adding the tests). Builds with gcc and clang compile cleanly and the node seems to run normally. Built and ran the fuzzers without issue.
<details><summary><code>fuzz/process_message</code> (ran for more than a day)</summary><p>
$ src/test/fuzz/process_message ../qa-assets/fuzz_seed_corpus/
INFO: Seed: 2449945750
INFO: Loaded 1 modules (498754 inline 8-bit counters): 498754 [0x561ad95753c8, 0x561ad95ef00a),
INFO: Loaded 1 PC tables (498754 PCs): 498754 [0x561ad95ef010,0x561ad9d8b430),
INFO: 52251 files found in ../qa-assets/fuzz_seed_corpus/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
INFO: seed corpus: files: 52251 min: 1b max: 3984182b total: 462001668b rss: 227Mb
[#512](/bitcoin-bitcoin/512/) pulse cov: 6973 ft: 7092 corp: 10/13b exec/s: 256 rss: 516Mb
[#1024](/bitcoin-bitcoin/1024/) pulse cov: 7037 ft: 7168 corp: 12/19b exec/s: 256 rss: 516Mb
[#2048](/bitcoin-bitcoin/2048/) pulse cov: 7317 ft: 7541 corp: 18/45b exec/s: 292 rss: 516Mb
.../...
[#3094033](/bitcoin-bitcoin/3094033/) REDUCE cov: 34621 ft: 110638 corp: 2221/41Mb exec/s: 23 rss: 1025Mb L: 92/1048576 MS: 1 EraseBytes-
[#3102635](/bitcoin-bitcoin/3102635/) NEW cov: 34621 ft: 110890 corp: 2222/41Mb exec/s: 22 rss: 1025Mb L: 111/1048576 MS: 2 CopyPart-ChangeByte-
[#3102867](/bitcoin-bitcoin/3102867/) NEW cov: 34621 ft: 110899 corp: 2223/41Mb exec/s: 22 rss: 1025Mb L: 107/1048576 MS: 2 CopyPart-CMP- DE: "-bans"-
</p></details>
<details><summary><code>fuzz/p2p_transport_deserializer</code></summary><p>
$ src/test/fuzz/p2p_transport_deserializer ../qa-assets/fuzz_seed_corpus/
INFO: Seed: 159618322
INFO: Loaded 1 modules (498301 inline 8-bit counters): 498301 [0x55c1d40d0d48, 0x55c1d414a7c5),
INFO: Loaded 1 PC tables (498301 PCs): 498301 [0x55c1d414a7c8,0x55c1d48e4f98),
INFO: 52840 files found in ../qa-assets/fuzz_seed_corpus/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
INFO: seed corpus: files: 52840 min: 1b max: 3984182b total: 491369132b rss: 167Mb
[#8192](/bitcoin-bitcoin/8192/) pulse cov: 452 ft: 453 corp: 1/1b exec/s: 4096 rss: 254Mb
[#16384](/bitcoin-bitcoin/16384/) pulse cov: 1131 ft: 2680 corp: 34/1054b exec/s: 4096 rss: 389Mb
[#32768](/bitcoin-bitcoin/32768/) pulse cov: 1137 ft: 5091 corp: 77/7383b exec/s: 2978 rss: 536Mb
.../...
[#1036084](/bitcoin-bitcoin/1036084/) REDUCE cov: 1137 ft: 7819 corp: 149/844Kb exec/s: 293 rss: 824Mb L: 6113/666283 MS: 4 ChangeByte-EraseBytes-CMP-CopyPart- DE: "\x0d\x00\x00\x00\x00\x00\x00\x00"-
[#1048576](/bitcoin-bitcoin/1048576/) pulse cov: 1137 ft: 7819 corp: 149/844Kb exec/s: 294 rss: 824Mb
[#1050168](/bitcoin-bitcoin/1050168/) REDUCE cov: 1137 ft: 7819 corp: 149/843Kb exec/s: 294 rss: 824Mb L: 25076/666283 MS: 4 ShuffleBytes-InsertByte-ShuffleBytes-EraseBytes-
</p></details>
35 | + pass 36 | + 37 | + 38 | +class PingPongTest(BitcoinTestFramework): 39 | + def set_test_params(self): 40 | + self.setup_clean_chain = True
Is a clean chain needed for p2p testing here? The tests seem to be fine with this set to the default of false.
Sending a ping/pong shouldn't require any blocks, so I set this to false for now. This can be removed when the test is extended and needs blocks.
Did you still want to set this to false? :angel:
Seems overly expensive to copy the blockchain when it is not needed, no?
This might be a wrong conception I've had for a long time. I thought it was cheaper to use the cached data directories rather than set up a clean chain, and that the test framework defaults to self.setup_clean_chain = False for this reason. I reckoned the default was the cheaper option and the more expensive one was opt-in, so I always used False where possible.
It might be a good idea to add a sentence on performance in the "General test-writing advice" section in functional/README.md where it states "Set the self.setup_clean_chain variable in set_test_params() to control whether or not to use the cached data directories."
The reason this default to a chain is that most tests require a chain, so so the least verbose option is the default.
Also, defaulting to a chain prevents test writers from accidentally mining a chain when it is not needed. Mining is orders of magnitude slower than copying.
Added documentation to address @jonatack's feedback
Code review re-ACK fabb382 changes since last review per git diff fa70471 fabb382 are passing time by value to GetMessage and ProcessMessage and improved documentation.
15 | + wait_until, 16 | +) 17 | +from test_framework.test_framework import BitcoinTestFramework 18 | +from test_framework.util import assert_equal 19 | + 20 | +PING_INTERVAL = 2 * 60
nit: if you retouch, could add a comment expressing the time units or along the lines of:
PING_INTERVAL = 2 * 60 # 2 minutes (corresponds to net_processing::PING_INTERVAL)
re-ACK fa94d6fef799a900c15dce9638eace5cc10087ae changes per git range-diff b3ec1fe fabb382 fa94d6f are updates for a recent merge converting pointers to references in net processing code. Re-did clang and fuzz debug builds, re-ran the new test p2p_ping and the process_message and p2p_transport_deserializer fuzz tests locally.
Rebased. Should be easy to re-ACK with git range-diff bitcoin-core/master A B
Code review re-ACK fa33654 re-read code, verified rebase per git range-diff 4b5c919 fa94d6f fa33654, previous tested ACKs still valid
55 | + self.mock_time = int(time.time()) 56 | + self.mock_forward(0) 57 | + 58 | + self.log.info('Check that ping is sent after connection is established') 59 | + no_pong_node = self.nodes[0].add_p2p_connection(NodeNoPong()) 60 | + self.mock_forward(3)
This is peertimeout right? If yes, we use peertimeout at 3 places any maybe it's more clear to define it at the outermost scope as well like PING_INTERVAL? Or maybe there's no need :)
peertimeout is wall clock time, not mock time, I think. So this can be any value, as long as it matches pingwait=3 below, I'd say
ACK faab4aaf2fa1153c6d76efc8113fa01b06943ece, ran functional tests and read the python scripts.
56 | + self.mock_forward(0) 57 | + 58 | + self.log.info('Check that ping is sent after connection is established') 59 | + no_pong_node = self.nodes[0].add_p2p_connection(NodeNoPong()) 60 | + self.mock_forward(3) 61 | + assert no_pong_node.last_message.pop('ping').nonce != 0
Why pop here, as opposed to just accessing the value? Seems like having (message_count['ping'] == 1 and 'ping' not in last_message) == True could be confusing.
Maybe decrementing message_count['ping'] every pop is an easier solution, being that this idea is used throughout the test.
assert no_pong_node.last_message['ping'].nonce != 0
Or maybe
assert no_pong_node.last_message.get('ping').nonce != 0
or
assert no_pong_node.last_message.pop('ping').nonce != 0
no_pong_node.message_count['ping'] -= 1
I wasn't aware that message_count exists, nor am I using it in this test. Given that it is almost unused right now and causes (potential) confusion, I'd say to remove it
the reason I use pop is to ensure the value is discarded after a read and has no way to poison follow-up test cases
68 | + 69 | + self.log.info('Reply without ping') 70 | + with self.nodes[0].assert_debug_log([ 71 | + 'pong peer=0: Unsolicited pong without ping, 0 expected, 0 received, 8 bytes', 72 | + ]): 73 | + no_pong_node.send_and_ping(msg_pong())
Why send_and_ping and not send_message, here and throughout? Doesn't seem like it would make a difference, and maybe it's best that we don't have extraneous pings and pongs.
I don't like to use assert_debug_log as a way to synchronize/poll for a change
send_and_ping is a way to flush all messages in the buffer (on both sides, send and receive) and ensure they are processed (and any errors have been logged).
Also, sending a ping and receiving a ping should never interfere, unless I am missing something obvious
7 | + 8 | +import time 9 | + 10 | +from test_framework.messages import ( 11 | + msg_pong, 12 | +)
nit: single line for a single import?
Good point. Can be fixed in a follow-up
ACK fa3365430c5fb57d7c0b5f2bce9fbbe290be93c3
Reviewed, verified everything runs.
I enjoyed how each sub-test flows into the next one. Hard to see at first but they really work together.
A few questions.
Thanks for the review everyone. If there are ways to improve the test, I am happy to review them in a follow-up