test: Fix and clean p2p_invalid_messages functional tests #19177

pull troygiorshev wants to merge 4 commits into bitcoin:master from troygiorshev:p2p-refactor-fix-tests changing 1 files +28 −131
  1. troygiorshev commented at 1:20 pm on June 5, 2020: contributor

    This PR touches only the p2p_invalid_messages.py functional test module. There are two main goals accomplished here. First, it fixes the “invalid message size” test, which previously made a message that was invalid for multiple reasons. Second, it refactors the file into a single consistent style. This file appears to have originally had two authors, with different styles and some test duplication.

    It should now be easier and quicker to understand this module, anticipating the upcoming BIP324 and AltNet changes.

    This should probably go in ahead of #19107, but the two are not strictly related.

  2. fanquake added the label Tests on Jun 5, 2020
  3. MarcoFalke renamed this:
    p2p: Fix and clean p2p_invalid_messages functional tests
    test: Fix and clean p2p_invalid_messages functional tests
    on Jun 5, 2020
  4. in test/functional/p2p_invalid_messages.py:127 in 793aa7a2e7 outdated
    129-                4 +  # magic
    130-                12  # msgtype
    131-            )
    132-            # modify len to MAX_SIZE + 1
    133-            msg = msg[:cut_len] + struct.pack("<I", 0x02000000 + 1) + msg[cut_len + 4:]
    134+            # Create a message with PAYLOAD SIZE = MAX_SIZE + 1
    


    MarcoFalke commented at 2:55 pm on June 5, 2020:

    in commit 793aa7a2e74460bdb504e8f9e2c642b6e607015c:

    The comment is incorrect. MAX_SIZE is not equal to VALID_DATA_LIMIT


    troygiorshev commented at 4:42 am on June 8, 2020:
    Ok, agreed. No need to introduce an undefined term.
  5. in test/functional/p2p_invalid_messages.py:67 in 271949980d outdated
    89         self.log.info("Sending a bunch of large, junk messages to test memory exhaustion. May take a bit...")
    90 
    91         # Run a bunch of times to test for memory exhaustion.
    92         for _ in range(80):
    93-            node.p2p.send_message(msg_at_size)
    94+            self.nodes[0].p2p.send_message(msg_at_size)
    


    MarcoFalke commented at 2:56 pm on June 5, 2020:

    in commit 271949980d:

    0            conn.send_message(msg_at_size)
    

    troygiorshev commented at 4:42 am on June 8, 2020:
    Done
  6. in test/functional/p2p_invalid_messages.py:77 in 271949980d outdated
    163-            node.add_p2p_connection(P2PDataStore())
    164-
    165-        # Node is still up.
    166-        conn = node.add_p2p_connection(P2PDataStore())
    167+        conn.sync_with_ping(timeout=400)
    168+        assert self.nodes[0].p2p.is_connected
    


    MarcoFalke commented at 2:57 pm on June 5, 2020:
    0        assert conn.p2p.is_connected
    

    Same


    troygiorshev commented at 4:42 am on June 8, 2020:
    Done
  7. MarcoFalke commented at 2:58 pm on June 5, 2020: member

    Thanks for splitting out the test changes. Though I am not sure about removing the _tweak_msg_data_size without replacement. We should test the following cases:

    header indicates size a, but payload is of size a-1
    header indicates size a, but payload is of size a+1
    

    Is this tested anywhere else?

    Concept ACK 271949980d 📇

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3Concept ACK 271949980d 📇
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUg9/wv/TOZ2pZ6cdN9wpOnvPdWaLFpI7dfr6MT5LQnkmi499eC/DUrCy788Xc9D
     8rSQd0M4Ex8qe8sRDxHwVZtk6NaXMGCSRThMhWq0loc/cz2U266JT8s4DXXW1tkRP
     9Dm+iIgjArC/8kkFPB7Ci9CuNWprEPbhjOCtwYMcrfk2U7WyBY+/+MMSJNVhK+Jqc
    10n/4wNKZ+zR5cDTJjttcTe6LyU7DPRkLSFBxCbE9BzPxuFwq0jqqNzW3PUgO8Y3Ly
    11SxfVyBFOqniKDWIuf7Q4+98/wnI9BpYAIb+9JChEuc8SShix1knYkhiXq4yehPui
    12ruTC7WlV0xmpQnpu3WTnhInizljR49w0UYTFT2EXc0ZPZNrSb3DbAxgzkQJZdA+9
    13GfwY+rvkAtdHg5SnFPJP1moFFQdleNjZYLS7cGkpsKmovsyL74p6qB/yKLTNxJzW
    14XcYgl3ju5ykOIPtzIptZYrJJn3Llkn128W6C+21X2uaF0NtlsXDn3mP/Qwm3B0yF
    15YXE9F8ud
    16=nUN3
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash d78ad05b778745908e2ab5b35cdd924d160c6acceba435f073770d6a7bdc6aac -

  8. jnewbery commented at 10:00 pm on June 5, 2020: member

    @MarcoFalke

    We should test the following cases:

    1. header indicates size a, but payload is of size a-1
    2. header indicates size a, but payload is of size a+1

    These aren’t strictly header errors. In (1), the message won’t finish being read, because the deserializer is waiting for another byte. When the following message is received:

    • the first byte of the header will be read as the final byte of the payload of the previous message
    • that payload may or may not succeed deserialization and processing in net_processing
    • the following message header will fail deserialization because it’s misaligned. It’ll fail the message start bytes check and be disconnected.

    In (2), the following message header will be misaligned for similar reasons and will result in disconnect.

    In both cases, it’s the subsequent message that causes the disconnect. I don’t think we necessarily need to test this, because invalid headers are tested elsewhere.

  9. troygiorshev force-pushed on Jun 8, 2020
  10. troygiorshev force-pushed on Jun 8, 2020
  11. Remove two unneeded tests
    Test 1 is a duplicate of test_size() later in the file.  Inexplicably,
    this test does not work on macOS, whereas test_size() does.
    
    Test 2 is problematic for two reasons.  First, it always fails with an
    invalid checksum, which is probably not what was intended.  Second, it's
    not defined at this layer what the behavior should be.  Hypothetically,
    if this test was fixed so that it gave messages with valid checksums,
    then the message would pass successfully thought the network layer and
    fail only in the processing layer.  A priori the network layer has no
    idea what the size of a message "actually" is.
    
    The "Why does behavior change at 78 bytes" is because of the following:
    
    print(len(node.p2p.build_message(msg))) # 125
    => Payload size = 125 - 24 = 101
    If we take 77 bytes, then there are 101 - 77 = 24 left
    That's exactly the size of a header
    So, bitcoind deserializes the header and rejects it for some other reason
    (Almost always an invalid size (too large))
    But, if we take 78 bytes, then there are 101 - 78 = 23 left
    That's not enough to fill a header, so the socket stays open waiting for
    more data.  That's why we sometimes have to push additional data in
    order for the peer to disconnect.
    
    Additionally, both of these tests use the "conn" variable.  For fun, go
    look at where it's declared.  (Hint: test_large_inv().  Don't we all
    love python's idea of scope?)
    57890abf2c
  12. Move size limits to module-global
    As well, this renames those variables to match PEP8 and this clears up
    the comment relating to VALID_DATA_LIMIT.
    
    Admittedly, this commit is mainly to make the following ones cleaner.
    ff1e7b8844
  13. Fix "invalid message size" test
    This test originally made a message with an invalid stated length, and
    an invalid checksum.  This was because only the header was changed, but
    the checksum stayed the same.  This was fine for now because we check
    the header first to see if it has a valid stated size, and we disconnect
    if it does not, so we never end up checking for the checksum.  If this
    behavior was to change, this test would become a problem.  (Indeed I
    discovered this when playing around with this behavior).  By instead
    creating a message with an oversized payload from the start, we create a
    message with an invalid stated length but a valid checksum, as intended.
    
    Additionally, this takes advantage to the newly module-global
    VALID_DATA_LIMIT as opposed to the magic 0x02000000.  Yes, 4MB < 32MiB,
    but at the moment when receiving a message we check both, so this makes
    the test tighter.
    5c4648d17b
  14. troygiorshev force-pushed on Jun 8, 2020
  15. troygiorshev commented at 5:00 am on June 8, 2020: contributor
    Rebased and made suggested changes. Will continue thinking about what the correct test should be for a malicious indicated size.
  16. in test/functional/p2p_invalid_messages.py:136 in f7a2ef0076 outdated
    150+            conn.send_message(msg_at_size)
    151+
    152+        # Check that, even though the node is being hammered by nonsense from one
    153+        # connection, it can still service other peers in a timely way.
    154+        for _ in range(20):
    155+            conn.sync_with_ping(timeout=2)
    


    MarcoFalke commented at 10:52 am on June 8, 2020:
    conn is now the same peer, previously it was a different one. Why? Also the test fails because of this.

    troygiorshev commented at 12:53 pm on June 8, 2020:
    fixed, thanks
  17. Refactor resource exhaustion test
    This is a simple refactor of the specified test.  It is now brought in
    line with the rest of the tests in the module.  This should make things
    easier to debug, as all of the tests are now grouped together at the
    top.
    af2a145e57
  18. troygiorshev force-pushed on Jun 8, 2020
  19. MarcoFalke commented at 12:59 pm on June 8, 2020: member

    I don’t see the downside of having a test that does:

    • Send a corrupt msg with wrongly indicates size (off by one in either way)
    • Send a valid msg and observe a disconnect

    It should be possible to write such a test in 5-10 lines of python. Is there any reason I am missing that such a test would be inappropriate?

  20. troygiorshev commented at 5:03 pm on June 8, 2020: contributor

    Regarding the situation where the payload is too large, consider the following test. A node can not tell the difference between these two “messages”, they’re equivalent. As opposed to testing for an oversized payload, we should test separately for:

    • Checking that a node responds well to a message with an invalid checksum
    • Checking that a node responds well to <24 bytes of complete junk

    Sending >=24 bytes of junk (enough to fill a header or more) is covered by the rest of the tests in this module.

    The situation where the payload is too small is effectively the same. The beginning of the next message will be lost, and we’ll be back to receiving junk (the rest of the next message).

    An incorrect payload size is IMO too high up the chain of events to be included. It’s what results from an incorrect payload size that we should be checking for.

     0def test_equivalent_large(self):
     1    '''Make a message with an oversized payload
     2    Or, make a smaller message with the wrong checksum, then send a junk byte
     3    '''
     4    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
     5    msg_base = msg_generic(b'badmsg', data=b'd'*5)
     6    msg_1 = conn.build_message(msg_base)
     7    cut_len = (
     8        4 +  # magic
     9        12  # msgtype
    10    )
    11    msg_1 = msg_1[:cut_len] + struct.pack("<I", len(msg_base.data) - 1) + msg_1[cut_len + 4:]
    12    self.nodes[0].p2p.send_raw_message(msg_1)
    13    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
    14        # The usual connected check no longer works because the messages are misaligned
    15        #conn.sync_with_ping(timeout=1)
    16        assert self.nodes[0].p2p.is_connected
    17        self.nodes[0].disconnect_p2ps()
    18    #
    19    # Equivalent
    20    #
    21    conn = self.nodes[0].add_p2p_connection(P2PDataStore())
    22    msg_base = msg_generic(b'badmsg', data=b'd'*(5-1))
    23    msg_1 = conn.build_message(msg_base)
    24    cut_len = (
    25        4 +  # magic
    26        12 +  # msgtype
    27        4 # size
    28    )
    29    msg_1 = msg_1[:cut_len] + b'\xcd\xbc\xd2\x84' + msg_1[cut_len + 4:]
    30    self.nodes[0].p2p.send_raw_message(msg_1)
    31    self.nodes[0].p2p.send_raw_message(b'd')
    32    with self.nodes[0].assert_debug_log(['CHECKSUM ERROR (badmsg, 4 bytes), expected a359d53a was cdbcd284']):
    33        # The usual connected check no longer works because the messages are misaligned
    34        #conn.sync_with_ping(timeout=1)
    35        assert self.nodes[0].p2p.is_connected
    36        self.nodes[0].disconnect_p2ps()
    
  21. troygiorshev commented at 9:00 pm on June 8, 2020: contributor
    Just for fun, I’ll point out that we wont always disconnect on a corrupted payload size. I’ve made a proof of concept where the corruption neither results in a disconnect nor a misalignment. All it needs is two normal bitcoin messages and two single bit flips (one in the first message’s payload size, and the other in the second message’s command). It can be found here.
  22. gabbieian38 approved
  23. DrahtBot commented at 2:12 am on June 10, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19107 (p2p: Refactor, move all header verification into the network layer, without changing behavior by troygiorshev)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    Coverage

    Coverage Change (pull 19177, d8ef2511f04976c81084bba851b11bf74a0f1db7) Reference (master, 19e919217e6d62e3640525e4149de1a4ae04e74f)
    Lines +0.0373 % 90.6714 %
    Functions +0.0000 % 85.6033 %
    Branches +0.0194 % 51.9388 %

    Updated at: 2020-06-12T18:13:45.902164.

  24. jnewbery commented at 3:13 pm on June 12, 2020: member

    ACK af2a145e575f23c64909e6cf1fb323c603bda7ca

    This is a good change and removes a lot of confused and confusing test code that isn’t testing what it purports to.

    Future PRs could clean this test further:

    • the “resource_exhaustion” test should be removed/renamed from resource exhaustion. This was originally added to verify that receiving large messages wouldn’t result in memory blowup, but that was a bad idea (https://github.com/bitcoin/bitcoin/pull/14522#issuecomment-466044015) and had to be removed.
    • ~the “even though the node is being hammered by nonsense from one connection, it can still service other peers in a timely way.” comment is wrong. All outgoing messages from the Python message thread are serial, so by the time we get to sending the pings, all of the previous large messages have been processed already.~
    • I can’t see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message? I see this was merged without review (https://github.com/bitcoin/bitcoin/pull/15697) and there’s no indication of why it’s needed.
    • the msg_unrecognized class can be removed and replaced with a msg_generic. The serialized length of a msg_unrecognized message seems to have caused confusion in the past (see VALID_DATA_LIMIT ) because people haven’t realised the length of the length field for a serialized string

    EDIT: struck out point 2. It was untrue as pointed out by Marco below.

  25. MarcoFalke commented at 5:32 pm on June 12, 2020: member

    all of the previous large messages have been processed already.

    I don’t think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (review)

  26. jnewbery commented at 6:22 pm on June 12, 2020: member

    I don’t think this is true. At least on travis the test failed when the ping was sent on a different connection. See #19177 (comment)

    You’re right. I was wrong. When I run this locally, the message sending appears to be serial because bitcoind is able to keep up with the messages as quickly as they’re delivered. When bitcoind is running more slowly, such as under sanitizers, it can’t keep up and Python’s asyncio is able to write to both the sockets for conn and conn2 while bitcoind is processing the large invalid messages. See https://travis-ci.org/github/bitcoin/bitcoin/jobs/695869826#L3577 for example.

  27. MarcoFalke commented at 6:33 pm on June 12, 2020: member

    I don’t think we necessarily need to test this, because invalid headers are tested elsewhere.

    Where?

    In general, I still don’t understand why it is ok to remove test coverage without any replacement. We have never done this in the past and I don’t see why this pull should be an exception. Don’t get me wrong, I like the cleanups here and I’d like to see it merged, but I think this should and can be done without removing coverage.

    If the functional tests are the wrong place to put such a test, great then move it to the unit tests, but simply removing should be the option of last resort.

    For reference (master):

    Screenshot_2020-06-12 LCOV - total_coverage info - src net cpp(1)

    this pull:

    Screenshot_2020-06-12 LCOV - total_coverage info - src net cpp

  28. MarcoFalke commented at 7:21 pm on June 12, 2020: member
    Sorry, false alarm. The “red” line will only become “blue” when the socket buffer is filled in two chunks or more. This might happen non-deterministically based on network-congestion on the test machine and has nothing to do with the changes in this pull request.
  29. MarcoFalke commented at 7:29 pm on June 12, 2020: member

    re-ACK af2a145e57 🍦

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3re-ACK af2a145e57 🍦
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUiFBwv+KRk8EKRDiTcHW/1+SbEEwKl26fup1QVoLiueaW5IWuwuG6k/vgx0ZIIA
     8SIUVmXEXgJ/wJ+8oWtT6rR/ulh5i1S6vR38d0o1HYg6vR+sUOeuAuC/LYoKxJMHD
     9/AVdf9tXHk0pxIJiPOfP4DJITc/DYqcIpEDVcj22I5HJ8nrkHJcd9prMmDmYs60e
    10qniyKvw6c6GufEr4ziSNgs6r9Gzng7hDV7wFX/0bzy24259tedShoLOZQPdEldqa
    11vVyWiKufdpXb5UUhxBt68fmYDTFW5ivWhwujZGiWjNNMmiDmqwMIIglxfKjm9o3a
    12lEqe/19jVDf6VGcG8uo2IZap/5G319q76298GhTaLyd8qA7kvLe/f418QKeKUoVk
    13NZa6LW1jx6A49xQ1iwcQKWYkklNOK8OTCJi1AlmlFVwPy5+PKztvEl8+myYAfuyo
    14qUJ+eY0sqdCjjJJ4KsHYPcWD1uWTP3XxDgRGGA3NWbdYaQyAMLWfryc7MGj/UJU6
    15qm7Tuj6b
    16=hxKG
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash c5c47520e34b1feab503ae488c8195cfa5311aa8dfb1311b7826eb27cebc4497 -

  30. MarcoFalke merged this on Jun 12, 2020
  31. MarcoFalke closed this on Jun 12, 2020

  32. jnewbery commented at 2:57 am on June 13, 2020: member

    I can’t see any reason for launching a new async task to test the magic bytes. Why not just send a message with invalid magic bytes using send_raw_message?

    Done in #19264

  33. sidhujag referenced this in commit d5007c520a on Jun 13, 2020
  34. Fabcien referenced this in commit c91add3229 on May 13, 2021
  35. random-zebra referenced this in commit bffe509aed on Jun 28, 2021
  36. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me