qa: Intermittent failure in p2p_invalid_messages.py #34688

issue hebasto openend this issue on February 27, 2026
  1. hebasto commented at 9:51 am on February 27, 2026: member

    https://github.com/hebasto/bitcoin-core-nightly/actions/runs/22472651613/job/65092911380

     0226/457 - p2p_invalid_messages.py failed, Duration: 49 s
     1
     2stdout:
     32026-02-27T05:08:56.634731Z TestFramework (INFO): PRNG seed is: 5797869944991691337
     42026-02-27T05:08:56.636098Z TestFramework (INFO): Initializing test directory /home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test_runner__🏃_20260227_050213/p2p_invalid_messages_233
     52026-02-27T05:08:57.195131Z TestFramework (INFO): Test message with header split across two buffers is received
     62026-02-27T05:08:57.466935Z TestFramework (INFO): Test duplicate version message is ignored
     72026-02-27T05:08:57.646521Z TestFramework (INFO): Test message with invalid magic bytes disconnects peer
     82026-02-27T05:08:57.767976Z TestFramework (INFO): Test message with invalid checksum logs an error
     92026-02-27T05:08:57.947981Z TestFramework (INFO): Test message with oversized payload disconnects peer
    102026-02-27T05:08:58.102953Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:13796 due to [Errno 32] Broken pipe
    112026-02-27T05:08:58.109567Z TestFramework (INFO): Test message with invalid message type logs an error
    122026-02-27T05:08:58.405594Z TestFramework (INFO): Test addrv2: empty
    132026-02-27T05:08:58.585560Z TestFramework (INFO): Test addrv2: no addresses
    142026-02-27T05:08:58.767099Z TestFramework (INFO): Test addrv2: too long address
    152026-02-27T05:08:58.946655Z TestFramework (INFO): Test addrv2: unrecognized network
    162026-02-27T05:08:59.008191Z TestFramework (INFO): Test inv message of size 50001 is logged as misbehaving
    172026-02-27T05:09:01.148695Z TestFramework (INFO): Test getdata message of size 50001 is logged as misbehaving
    182026-02-27T05:09:06.174765Z TestFramework (INFO): Test headers message of size 2001 is logged as misbehaving
    192026-02-27T05:09:06.404428Z TestFramework (INFO): Test headers message with invalid proof-of-work is logged as misbehaving and disconnects peer
    202026-02-27T05:09:06.798120Z TestFramework (INFO): Test headers message with non-continuous headers sequence is logged as misbehaving
    212026-02-27T05:09:07.158954Z TestFramework (INFO): Test node stays up despite many large junk messages
    222026-02-27T05:09:07.400454Z TestFramework (INFO): (a) Send 80 messages, each of maximum valid data size (4MB)
    232026-02-27T05:09:16.495755Z TestFramework (INFO): (b) Check node still services peers in a timely way
    242026-02-27T05:09:45.985524Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
    25        def test_function():
    26            if check_connected:
    27                assert self.is_connected
    28            return test_function_in()
    29'''
    302026-02-27T05:09:45.985753Z TestFramework (ERROR): Unexpected exception:
    31Traceback (most recent call last):
    32  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test/functional/test_framework/test_framework.py", line 142, in main
    33    self.run_test()
    34    ~~~~~~~~~~~~~^^
    35  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/build/test/functional/p2p_invalid_messages.py", line 79, in run_test
    36    self.test_resource_exhaustion()
    37    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
    38  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/build/test/functional/p2p_invalid_messages.py", line 344, in test_resource_exhaustion
    39    conn2.sync_with_ping(timeout=2)
    40    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
    41  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test/functional/test_framework/p2p.py", line 721, in sync_with_ping
    42    self.wait_until(test_function, timeout=timeout)
    43    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    44  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test/functional/test_framework/p2p.py", line 606, in wait_until
    45    wait_until_helper_internal(test_function, timeout=timeout, lock=p2p_lock, timeout_factor=self.timeout_factor, check_interval=check_interval)
    46    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    47  File "/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test/functional/test_framework/util.py", line 436, in wait_until_helper_internal
    48    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
    49AssertionError: Predicate ''''
    50        def test_function():
    51            if check_connected:
    52                assert self.is_connected
    53            return test_function_in()
    54''' not true after 16.0 seconds
    552026-02-27T05:09:46.044897Z TestFramework (INFO): Not stopping nodes as test failed. The dangling processes will be cleaned up later.
    562026-02-27T05:09:46.045151Z TestFramework (WARNING): Not cleaning up dir /home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test_runner__🏃_20260227_050213/p2p_invalid_messages_233
    572026-02-27T05:09:46.045272Z TestFramework (ERROR): Test failed. Test logging available at /home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test_runner__🏃_20260227_050213/p2p_invalid_messages_233/test_framework.log
    582026-02-27T05:09:46.045578Z TestFramework (ERROR): 
    592026-02-27T05:09:46.045796Z TestFramework (ERROR): Hint: Call /home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test/functional/combine_logs.py '/home/runner/work/bitcoin-core-nightly/bitcoin-core-nightly/test_runner_₿_🏃_20260227_050213/p2p_invalid_messages_233' to consolidate all logs
    602026-02-27T05:09:46.045918Z TestFramework (ERROR): 
    612026-02-27T05:09:46.046033Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    622026-02-27T05:09:46.046180Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    632026-02-27T05:09:46.046254Z TestFramework (ERROR): 
    64
    65
    66stderr:
    67[node 0] Cleaning up leftover process
    
  2. hebasto added the label Tests on Feb 27, 2026
  3. maflcko added the label CI failed on Feb 27, 2026
  4. maflcko commented at 10:55 am on February 27, 2026: member

    No idea how this could happen. All prior 80 badmsg were handled, so this just never finished the ping-pong loop. I guess your VM is OOM or Python is in garbage collection.

    See your log:

    0    test  2026-02-27T05:09:17.476716Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:13796: msg_pong(nonce=00000013) 
    1    test  2026-02-27T05:09:45.985524Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' 
    

    The message correctly arrives, but Python fails to see the pong nonce within 16 seconds, and also fails to see the failure after that within another 12.5 second delay.

  5. maflcko commented at 10:57 am on February 27, 2026: member

    My reply in LLM version:

    ✦ Yes, you have hit on the exact technical anomaly that explains the failure: The total time elapsed (28.5 seconds) is nearly double the reported 16-second timeout.

    This discrepancy proves that the test didn’t just “fail to get a pong”; the entire testing environment (the Python process and the Node) collapsed under the pressure of the 320MB junk flood.

    Here is the breakdown of that 28.5-second gap:

    1. The 12.5-Second “Ghost” Hang (Python Process)
    • Iteration 19 ends: 05:09:17.476Z
    • Iteration 20 timeout starts: 05:09:29.985Z (Calculated as 45.985s - 16.0s)
    • The Gap: There are 12.5 seconds where the Python TestFramework main thread was completely frozen. It had finished iteration 19 and was supposed to immediately send the pings for iteration 20.

    Why? Before part (b) started, you sent 80 messages of 4MB each.

    • AsyncIO Saturation: The Python asyncio loop was likely still struggling to pump those 320MB of data through the socket or handling the internal “send complete” callbacks.
    • Memory/GC Pressure: As mentioned before, the repr() of those 4MB messages created massive string allocations. Python’s Garbage Collector likely triggered a “Stop the World” collection during this gap to clean up the gigabytes of temporary string data, freezing the test logic for 12 seconds.
    1. The 16-Second “Queue” Hang (Bitcoin Node)
    • Iteration 20 starts its wait_until: 05:09:29.985Z
    • Iteration 20 fails: 05:09:45.985Z
    • The Gap: This is the actual wait_until timeout period.

    Why? Even though you confirmed the node eventually handled all 160 “badmsg” logs, it was doing so while iteration 20 was running.

    • The msghand Clog: Bitcoin Core handles messages in a single thread (msghand). Even if it only takes 100ms to “reject” a 4MB message and write two log lines, doing that 80 times takes 8 seconds.
    • Queue Latency: The ping from conn2 was stuck in the node’s receive queue behind the massive 4MB junk messages. In a high-load CI environment (with slow Disk I/O for logging), the node simply couldn’t reach the ping at the back of the line before the 16-second clock ran out.

    Conclusion The test failed because of a cumulative performance collapse.

    1. The Test Process hung for 12.5s (likely GC/AsyncIO overhead).
    2. The Node was then too slow for the remaining 16s (Message/Log backlog).

    The reason you don’t see the “Send message” logs for iteration 20 is that by the time the Python process finally woke up from its 12.5s freeze to send them, the system was so starved for resources that the logs were either dropped, buffered, or the wait_until timer started before the logger could even flush to the console.

  6. hebasto commented at 11:25 am on February 27, 2026: member

    No idea how this could happen. All prior 80 badmsg were handled, so this just never finished the ping-pong loop. I guess your VM is OOM…

    I’ll increase RAM then.

  7. hebasto closed this on Feb 27, 2026

  8. fanquake commented at 11:28 am on February 27, 2026: member
    A lot of these intermittent issues coming from the nightly repo just seem to be that the underlying hardware is inconsistent/underpowered? Maybe we should have some filter for them, so that an issue isn’t opened here, until it is seen at least N times, or at least has a reproducer?
  9. hebasto commented at 11:35 am on February 27, 2026: member

    A lot of these intermittent issues coming from the nightly repo just seem to be that the underlying hardware is inconsistent/underpowered? Maybe we should have some filter for them, so that an issue isn’t opened here, until it is seen at least N times, or at least has a reproducer?

    Fair point. I’ll open these kinds of issues more sparingly in the future.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-09 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me