test: Fix and clean p2p_invalid_messages functional tests

troygiorshev commented at 1:20 pm on June 5, 2020: contributor

This PR touches only the p2p_invalid_messages.py functional test module. There are two main goals accomplished here. First, it fixes the “invalid message size” test, which previously made a message that was invalid for multiple reasons. Second, it refactors the file into a single consistent style. This file appears to have originally had two authors, with different styles and some test duplication.

It should now be easier and quicker to understand this module, anticipating the upcoming BIP324 and AltNet changes.

This should probably go in ahead of #19107, but the two are not strictly related.

fanquake added the label Tests on Jun 5, 2020

MarcoFalke renamed this:
~~p2p: Fix and clean p2p_invalid_messages functional tests~~
test: Fix and clean p2p_invalid_messages functional tests
on Jun 5, 2020

in test/functional/p2p_invalid_messages.py:127 in 793aa7a2e7 outdated

129-                4 +  # magic
130-                12  # msgtype
131-            )
132-            # modify len to MAX_SIZE + 1
133-            msg = msg[:cut_len] + struct.pack("<I", 0x02000000 + 1) + msg[cut_len + 4:]
134+            # Create a message with PAYLOAD SIZE = MAX_SIZE + 1

MarcoFalke commented at 2:55 pm on June 5, 2020:

in commit 793aa7a2e74460bdb504e8f9e2c642b6e607015c:

The comment is incorrect. MAX_SIZE is not equal to VALID_DATA_LIMIT

troygiorshev commented at 4:42 am on June 8, 2020:

Ok, agreed. No need to introduce an undefined term.

in test/functional/p2p_invalid_messages.py:67 in 271949980d outdated

89         self.log.info("Sending a bunch of large, junk messages to test memory exhaustion. May take a bit...")
90 
91         # Run a bunch of times to test for memory exhaustion.
92         for _ in range(80):
93-            node.p2p.send_message(msg_at_size)
94+            self.nodes[0].p2p.send_message(msg_at_size)

MarcoFalke commented at 2:56 pm on June 5, 2020:

in commit 271949980d:

0            conn.send_message(msg_at_size)

troygiorshev commented at 4:42 am on June 8, 2020:

Done

in test/functional/p2p_invalid_messages.py:77 in 271949980d outdated

163-            node.add_p2p_connection(P2PDataStore())
164-
165-        # Node is still up.
166-        conn = node.add_p2p_connection(P2PDataStore())
167+        conn.sync_with_ping(timeout=400)
168+        assert self.nodes[0].p2p.is_connected

MarcoFalke commented at 2:57 pm on June 5, 2020:

0        assert conn.p2p.is_connected

Same

troygiorshev commented at 4:42 am on June 8, 2020:

Done

MarcoFalke commented at 2:58 pm on June 5, 2020: member

Thanks for splitting out the test changes. Though I am not sure about removing the _tweak_msg_data_size without replacement. We should test the following cases:

header indicates size a, but payload is of size a-1
header indicates size a, but payload is of size a+1

Is this tested anywhere else?

Concept ACK 271949980d 📇

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3Concept ACK 271949980d 📇
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUg9/wv/TOZ2pZ6cdN9wpOnvPdWaLFpI7dfr6MT5LQnkmi499eC/DUrCy788Xc9D
 8rSQd0M4Ex8qe8sRDxHwVZtk6NaXMGCSRThMhWq0loc/cz2U266JT8s4DXXW1tkRP
 9Dm+iIgjArC/8kkFPB7Ci9CuNWprEPbhjOCtwYMcrfk2U7WyBY+/+MMSJNVhK+Jqc
10n/4wNKZ+zR5cDTJjttcTe6LyU7DPRkLSFBxCbE9BzPxuFwq0jqqNzW3PUgO8Y3Ly
11SxfVyBFOqniKDWIuf7Q4+98/wnI9BpYAIb+9JChEuc8SShix1knYkhiXq4yehPui
12ruTC7WlV0xmpQnpu3WTnhInizljR49w0UYTFT2EXc0ZPZNrSb3DbAxgzkQJZdA+9
13GfwY+rvkAtdHg5SnFPJP1moFFQdleNjZYLS7cGkpsKmovsyL74p6qB/yKLTNxJzW
14XcYgl3ju5ykOIPtzIptZYrJJn3Llkn128W6C+21X2uaF0NtlsXDn3mP/Qwm3B0yF
15YXE9F8ud
16=nUN3
17-----END PGP SIGNATURE-----

Timestamp of file with hash d78ad05b778745908e2ab5b35cdd924d160c6acceba435f073770d6a7bdc6aac -

jnewbery commented at 10:00 pm on June 5, 2020: member

@MarcoFalke

We should test the following cases:

header indicates size a, but payload is of size a-1

header indicates size a, but payload is of size a+1

These aren’t strictly header errors. In (1), the message won’t finish being read, because the deserializer is waiting for another byte. When the following message is received:

the first byte of the header will be read as the final byte of the payload of the previous message
that payload may or may not succeed deserialization and processing in net_processing
the following message header will fail deserialization because it’s misaligned. It’ll fail the message start bytes check and be disconnected.

In (2), the following message header will be misaligned for similar reasons and will result in disconnect.

In both cases, it’s the subsequent message that causes the disconnect. I don’t think we necessarily need to test this, because invalid headers are tested elsewhere.

troygiorshev force-pushed on Jun 8, 2020

Remove two unneeded tests

Test 1 is a duplicate of test_size() later in the file.  Inexplicably,
this test does not work on macOS, whereas test_size() does.

Test 2 is problematic for two reasons.  First, it always fails with an
invalid checksum, which is probably not what was intended.  Second, it's
not defined at this layer what the behavior should be.  Hypothetically,
if this test was fixed so that it gave messages with valid checksums,
then the message would pass successfully thought the network layer and
fail only in the processing layer.  A priori the network layer has no
idea what the size of a message "actually" is.

The "Why does behavior change at 78 bytes" is because of the following:

print(len(node.p2p.build_message(msg))) # 125
=> Payload size = 125 - 24 = 101
If we take 77 bytes, then there are 101 - 77 = 24 left
That's exactly the size of a header
So, bitcoind deserializes the header and rejects it for some other reason
(Almost always an invalid size (too large))
But, if we take 78 bytes, then there are 101 - 78 = 23 left
That's not enough to fill a header, so the socket stays open waiting for
more data.  That's why we sometimes have to push additional data in
order for the peer to disconnect.

Additionally, both of these tests use the "conn" variable.  For fun, go
look at where it's declared.  (Hint: test_large_inv().  Don't we all
love python's idea of scope?)

57890abf2c

Move size limits to module-global

As well, this renames those variables to match PEP8 and this clears up
the comment relating to VALID_DATA_LIMIT.

Admittedly, this commit is mainly to make the following ones cleaner.

ff1e7b8844

Fix "invalid message size" test

This test originally made a message with an invalid stated length, and
an invalid checksum.  This was because only the header was changed, but
the checksum stayed the same.  This was fine for now because we check
the header first to see if it has a valid stated size, and we disconnect
if it does not, so we never end up checking for the checksum.  If this
behavior was to change, this test would become a problem.  (Indeed I
discovered this when playing around with this behavior).  By instead
creating a message with an oversized payload from the start, we create a
message with an invalid stated length but a valid checksum, as intended.

Additionally, this takes advantage to the newly module-global
VALID_DATA_LIMIT as opposed to the magic 0x02000000.  Yes, 4MB < 32MiB,
but at the moment when receiving a message we check both, so this makes
the test tighter.

5c4648d17b

troygiorshev force-pushed on Jun 8, 2020

troygiorshev commented at 5:00 am on June 8, 2020: contributor

Rebased and made suggested changes. Will continue thinking about what the correct test should be for a malicious indicated size.

in test/functional/p2p_invalid_messages.py:136 in f7a2ef0076 outdated

150+            conn.send_message(msg_at_size)
151+
152+        # Check that, even though the node is being hammered by nonsense from one
153+        # connection, it can still service other peers in a timely way.
154+        for _ in range(20):
155+            conn.sync_with_ping(timeout=2)

MarcoFalke commented at 10:52 am on June 8, 2020:

conn is now the same peer, previously it was a different one. Why? Also the test fails because of this.

troygiorshev commented at 12:53 pm on June 8, 2020:

fixed, thanks

Refactor resource exhaustion test

This is a simple refactor of the specified test.  It is now brought in
line with the rest of the tests in the module.  This should make things
easier to debug, as all of the tests are now grouped together at the
top.

af2a145e57

troygiorshev force-pushed on Jun 8, 2020

MarcoFalke commented at 12:59 pm on June 8, 2020: member

I don’t see the downside of having a test that does:

Send a corrupt msg with wrongly indicates size (off by one in either way)
Send a valid msg and observe a disconnect

It should be possible to write such a test in 5-10 lines of python. Is there any reason I am missing that such a test would be inappropriate?

troygiorshev commented at 5:03 pm on June 8, 2020: contributor

Regarding the situation where the payload is too large, consider the following test. A node can not tell the difference between these two “messages”, they’re equivalent. As opposed to testing for an oversized payload, we should test separately for:

Checking that a node responds well to a message with an invalid checksum
Checking that a node responds well to <24 bytes of complete junk

Sending >=24 bytes of junk (enough to fill a header or more) is covered by the rest of the tests in this module.

The situation where the payload is too small is effectively the same. The beginning of the next message will be lost, and we’ll be back to receiving junk (the rest of the next message).

An incorrect payload size is IMO too high up the chain of events to be included. It’s what results from an incorrect payload size that we should be checking for.

 0def test_equivalent_large(self):
 1    '''Make a message with an oversized payload
 2    Or, make a smaller message with the wrong checksum, then send a junk byte
 3    '''
 4    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
 5    msg_base = msg_generic(b'badmsg', data=b'd'*5)
 6    msg_1 = conn.build_message(msg_base)
 7    cut_len = (
 8        4 +  # magic
 9        12  # msgtype
10    )
11    msg_1 = msg_1[:cut_len] + struct.pack("<I", len(msg_base.data) - 1) + msg_1[cut_len + 4:]
12    self.nodes[0].p2p.send_raw_message(msg_1)
13    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
14        # The usual connected check no longer works because the messages are misaligned
15        #conn.sync_with_ping(timeout=1)
16        assert self.nodes[0].p2p.is_connected
17        self.nodes[0].disconnect_p2ps()
18    #
19    # Equivalent
20    #
21    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
22    msg_base = msg_generic(b'badmsg', data=b'd'*(5-1))
23    msg_1 = conn.build_message(msg_base)
24    cut_len = (
25        4 +  # magic
26        12 +  # msgtype
27        4 # size
28    )
29    msg_1 = msg_1[:cut_len] + b'\xcd\xbc\xd2\x84' + msg_1[cut_len + 4:]
30    self.nodes[0].p2p.send_raw_message(msg_1)
31    self.nodes[0].p2p.send_raw_message(b'd')
32    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
33        # The usual connected check no longer works because the messages are misaligned
34        #conn.sync_with_ping(timeout=1)
35        assert self.nodes[0].p2p.is_connected
36        self.nodes[0].disconnect_p2ps()

troygiorshev commented at 9:00 pm on June 8, 2020: contributor

Just for fun, I’ll point out that we wont always disconnect on a corrupted payload size. I’ve made a proof of concept where the corruption neither results in a disconnect nor a misalignment. All it needs is two normal bitcoin messages and two single bit flips (one in the first message’s payload size, and the other in the second message’s command). It can be found here.

gabbieian38 approved

DrahtBot commented at 2:12 am on June 10, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#19107 (p2p: Refactor, move all header verification into the network layer, without changing behavior by troygiorshev)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Coverage

Coverage	Change (pull 19177, d8ef2511f04976c81084bba851b11bf74a0f1db7)	Reference (master, 19e919217e6d62e3640525e4149de1a4ae04e74f)
Lines	+0.0373 %	90.6714 %
Functions	+0.0000 %	85.6033 %
Branches	+0.0194 %	51.9388 %

Updated at: 2020-06-12T18:13:45.902164.

jnewbery commented at 3:13 pm on June 12, 2020: member

ACK af2a145e575f23c64909e6cf1fb323c603bda7ca

This is a good change and removes a lot of confused and confusing test code that isn’t testing what it purports to.

Future PRs could clean this test further:

the “resource_exhaustion” test should be removed/renamed from resource exhaustion. This was originally added to verify that receiving large messages wouldn’t result in memory blowup, but that was a bad idea (https://github.com/bitcoin/bitcoin/pull/14522#issuecomment-466044015) and had to be removed.
the “even though the node is being hammered by nonsense from one connection, it can still service other peers in a timely way.” comment is wrong. All outgoing messages from the Python message thread are serial, so by the time we get to sending the pings, all of the previous large messages have been processed already.
I can’t see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message? I see this was merged without review (https://github.com/bitcoin/bitcoin/pull/15697) and there’s no indication of why it’s needed.
the msg_unrecognized class can be removed and replaced with a msg_generic. The serialized length of a msg_unrecognized message seems to have caused confusion in the past (see VALID_DATA_LIMIT ) because people haven’t realised the length of the length field for a serialized string

EDIT: struck out point 2. It was untrue as pointed out by Marco below.

MarcoFalke commented at 5:32 pm on June 12, 2020: member

all of the previous large messages have been processed already.

I don’t think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (review)

jnewbery commented at 6:22 pm on June 12, 2020: member

I don’t think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (comment)

You’re right. I was wrong. When I run this locally, the message sending appears to be serial because bitcoind is able to keep up with the messages as quickly as they’re delivered. When bitcoind is running more slowly, such as under sanitizers, it can’t keep up and Python’s asyncio is able to write to both the sockets for conn and conn2 while bitcoind is processing the large invalid messages. See https://travis-ci.org/github/bitcoin/bitcoin/jobs/695869826#L3577 for example.

MarcoFalke commented at 6:33 pm on June 12, 2020: member

I don’t think we necessarily need to test this, because invalid headers are tested elsewhere.

Where?

In general, I still don’t understand why it is ok to remove test coverage without any replacement. We have never done this in the past and I don’t see why this pull should be an exception. Don’t get me wrong, I like the cleanups here and I’d like to see it merged, but I think this should and can be done without removing coverage.

If the functional tests are the wrong place to put such a test, great then move it to the unit tests, but simply removing should be the option of last resort.

For reference (master):

Screenshot_2020-06-12 LCOV - total_coverage info - src net cpp(1)

this pull:

Screenshot_2020-06-12 LCOV - total_coverage info - src net cpp

MarcoFalke commented at 7:21 pm on June 12, 2020: member

Sorry, false alarm. The “red” line will only become “blue” when the socket buffer is filled in two chunks or more. This might happen non-deterministically based on network-congestion on the test machine and has nothing to do with the changes in this pull request.

MarcoFalke commented at 7:29 pm on June 12, 2020: member

re-ACK af2a145e57 🍦

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3re-ACK af2a145e57 🍦
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUiFBwv+KRk8EKRDiTcHW/1+SbEEwKl26fup1QVoLiueaW5IWuwuG6k/vgx0ZIIA
 8SIUVmXEXgJ/wJ+8oWtT6rR/ulh5i1S6vR38d0o1HYg6vR+sUOeuAuC/LYoKxJMHD
 9/AVdf9tXHk0pxIJiPOfP4DJITc/DYqcIpEDVcj22I5HJ8nrkHJcd9prMmDmYs60e
10qniyKvw6c6GufEr4ziSNgs6r9Gzng7hDV7wFX/0bzy24259tedShoLOZQPdEldqa
11vVyWiKufdpXb5UUhxBt68fmYDTFW5ivWhwujZGiWjNNMmiDmqwMIIglxfKjm9o3a
12lEqe/19jVDf6VGcG8uo2IZap/5G319q76298GhTaLyd8qA7kvLe/f418QKeKUoVk
13NZa6LW1jx6A49xQ1iwcQKWYkklNOK8OTCJi1AlmlFVwPy5+PKztvEl8+myYAfuyo
14qUJ+eY0sqdCjjJJ4KsHYPcWD1uWTP3XxDgRGGA3NWbdYaQyAMLWfryc7MGj/UJU6
15qm7Tuj6b
16=hxKG
17-----END PGP SIGNATURE-----

Timestamp of file with hash c5c47520e34b1feab503ae488c8195cfa5311aa8dfb1311b7826eb27cebc4497 -

MarcoFalke merged this on Jun 12, 2020

MarcoFalke closed this on Jun 12, 2020

jnewbery commented at 2:57 am on June 13, 2020: member

I can’t see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message?

Done in #19264

sidhujag referenced this in commit d5007c520a on Jun 13, 2020

Fabcien referenced this in commit c91add3229 on May 13, 2021

random-zebra referenced this in commit bffe509aed on Jun 28, 2021

DrahtBot locked this on Feb 15, 2022

test: Fix and clean p2p_invalid_messages functional tests #19177

Conflicts

Coverage