Intermittent failures in interface_usdt_mempool.py #27380

issue dergoegge openend this issue on March 31, 2023
  1. dergoegge commented at 12:18 pm on March 31, 2023: member

    https://cirrus-ci.com/task/5779522121891840

     0 node0 2023-03-31T11:45:58.812384Z (mocktime: 2023-04-14T11:46:01Z) [http] [httpserver.cpp:257] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:38238 
     1 node0 2023-03-31T11:45:58.812543Z (mocktime: 2023-04-14T11:46:01Z) [httpworker.2] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=getpeerinfo user=__cookie__ 
     2 test  2023-03-31T11:45:58.814000Z TestFramework (INFO): Hooking into mempool:rejected tracepoint... 
     3 test  2023-03-31T11:45:59.798000Z TestFramework (INFO): Sending invalid transaction... 
     4 test  2023-03-31T11:45:59.799000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:15944: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=76b2ca9a8c3ffa3c454db9591dfaf5c4979926154e33a5bef67412ea0eaa16c5 n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=49.99968800 scriptPubKey=51202913b252fe537830f843bfdc5fa7d20ba48639a87c86ff837b92d083c55ad7c1)] wit=CTxWitness(CScriptWitness(51,c00000000000000000000000000000000000000000000000000000000000000001)) nLockTime=0)) 
     5 test  2023-03-31T11:45:59.799000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:15944: msg_ping(nonce=00000002) 
     6 node0 2023-03-31T11:45:59.799697Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:3169] [ProcessMessage] [net] received: tx (133 bytes) peer=0 
     7 node0 2023-03-31T11:45:59.800472Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [txmempool.cpp:658] [check] [mempool] Checking mempool with 2 transactions and 2 inputs 
     8 node0 2023-03-31T11:45:59.800746Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:4174] [ProcessMessage] [mempoolrej] c917a71e43b825756c1d45985fcc87f8c245db0654f5eeb000a92dc2cccb1d6c from peer=0 was not accepted: min relay fee not met, 0 < 104 
     9 node0 2023-03-31T11:45:59.800913Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:3169] [ProcessMessage] [net] received: ping (8 bytes) peer=0 
    10 node0 2023-03-31T11:45:59.800953Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net.cpp:2841] [PushMessage] [net] sending pong (8 bytes) peer=0 
    11 test  2023-03-31T11:45:59.801000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:15944: msg_pong(nonce=00000002) 
    12 node0 2023-03-31T11:45:59.849969Z (mocktime: 2023-04-14T11:46:01Z) [http] [httpserver.cpp:257] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:38238 
    13 test  2023-03-31T11:45:59.850000Z TestFramework (INFO): Polling buffer... 
    14 node0 2023-03-31T11:45:59.850174Z (mocktime: 2023-04-14T11:46:01Z) [httpworker.3] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
    15 test  2023-03-31T11:45:59.898000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
    16 test  2023-03-31T11:45:59.898000Z TestFramework (ERROR): Assertion failed 
    17                                   Traceback (most recent call last):
    18                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
    19                                       self.run_test()
    20                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 339, in run_test
    21                                       self.rejected_test()
    22                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 323, in rejected_test
    23                                       assert_equal(EXPECTED_REJECTED_EVENTS, handled_rejected_events)
    24                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 56, in assert_equal
    25                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    26                                   AssertionError: not(1 == 0)
    27 test  2023-03-31T11:45:59.899000Z TestFramework (DEBUG): Closing down network thread 
    28 test  2023-03-31T11:45:59.949000Z TestFramework (INFO): Stopping nodes 
    29 test  2023-03-31T11:45:59.950000Z TestFramework.node0 (DEBUG): Stopping node 
    
  2. dergoegge commented at 12:19 pm on March 31, 2023: member

    Have seen this a couple times by now.

    cc @0xB10C @virtu

  3. 0xB10C commented at 11:18 am on April 1, 2023: contributor

    Had a brief look. The mempoolrej log message shows that we indeed rejected the transaction but the test didn’t see the reject.

    Can’t reproduce this locally with about 20 test runs. All pass. Might be a CI timing problem then? Will try playing around with the bpf.perf_buffer_poll() timeout.

    Will investigate further.

  4. 0xB10C commented at 12:24 pm on April 1, 2023: contributor

    Other CI runs with similar behavior:

  5. 0xB10C commented at 7:48 am on April 11, 2023: contributor

    I’ve been trying to reproduce this by re-running the CI on master https://github.com/bitcoin/bitcoin/commit/49b87bfe7e2799d25ce709123ecafa872b36e87a: https://cirrus-ci.com/task/4749392670883840

    So far all runs passed..

  6. maflcko commented at 8:26 am on April 11, 2023: member
    Maybe it reproduces faster with rr?
  7. 0xB10C commented at 9:31 am on April 13, 2023: contributor
    Thanks, didn’t know rr. Especially the chaos mode (https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mode.html) seems to be helpful here. However, rr and bpf don’t work well together yet. rr asserts when running the python USDT interface tests. I’ve tried to patch rr to make this work but haven’t been successful yet.
  8. jonatack commented at 1:51 am on April 20, 2023: member
  9. 0xB10C commented at 10:05 am on April 20, 2023: contributor
    Gave this another look. No luck with rr and haven’t been able to reproduce this locally or in the CI.
  10. maflcko commented at 10:10 am on April 20, 2023: member
    Maybe it will “fix itself” after #27360 ?
  11. mzumsande commented at 3:31 pm on May 5, 2023: contributor

    Maybe it will “fix itself” after #27360 ?

    Unfortunately, it didn’t (https://github.com/bitcoin/bitcoin/runs/13206824577 from master after #27360 was merged)

  12. fanquake referenced this in commit bf07e3a47e on May 5, 2023
  13. fanquake referenced this in commit 4a72af9ca4 on May 6, 2023
  14. sidhujag referenced this in commit 8ba35b080a on May 7, 2023
  15. fanquake commented at 5:34 pm on May 10, 2023: member

    master @ e0a70c5b4f2c691e8d6b507c8ce879f0b0424254. https://cirrus-ci.com/task/5247512164958208:

     0 node0 2023-05-10T17:29:06.449483Z (mocktime: 2023-05-24T17:29:09Z) [httpworker.1] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
     1 test  2023-05-10T17:29:06.450000Z TestFramework (INFO): Polling buffer... 
     2 test  2023-05-10T17:29:06.486000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
     3 test  2023-05-10T17:29:06.486000Z TestFramework (ERROR): Assertion failed 
     4                                   Traceback (most recent call last):
     5                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
     6                                       self.run_test()
     7                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 339, in run_test
     8                                       self.rejected_test()
     9                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 323, in rejected_test
    10                                       assert_equal(EXPECTED_REJECTED_EVENTS, handled_rejected_events)
    11                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 55, in assert_equal
    12                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    13                                   AssertionError: not(1 == 0)
    14 test  2023-05-10T17:29:06.487000Z TestFramework (DEBUG): Closing down network thread
    15...
    16interface_usdt_mempool.py                              | ✖ Failed  | 5 s
    
  16. 0xB10C referenced this in commit 681dbac720 on May 15, 2023
  17. maflcko commented at 7:21 am on May 24, 2023: member
  18. 0xB10C commented at 8:15 pm on May 24, 2023: contributor
    I’ll give this another look next week.
  19. RandyMcMillan referenced this in commit fba198ab3e on May 27, 2023
  20. virtu commented at 3:46 pm on June 2, 2023: contributor

    @0xB10C and I had a look at this today. Here’s what we found so far:

    1. We believe the issue is caused by one or more of the assertions failing in the handle_rejected_event callback (although we are not sure why they fail).
    2. The test framework’s assert_equal() communicates failed assertions via exceptions. Unfortunately exceptions raised in bcc callback functions are not properly propagated (Exception ignored on calling ctypes callback function: <function PerfEventArray._open_perf_buffer.<locals>.raw_cb_ at 0xffff83490e50>), so they only appear on stderr, not in the logs. As a consequence, the logs only show that the test case failed, not why it failed.
    3. Because (2) is a general problem with functional tests for tracepoints, I’ll create a PR that fixes the issue for all of them. The best solution we came up with so far is wrapping all assert_equals() in bcc callback functions in try blocks and logging any AssertionError exceptions using self.log(). Once this (or a better solution, preferable one that involves not ignoring exceptions in ctype callback functions, if that’s possible) is merged, the logs should tell us why the mempool:rejected test fails sporadically.
  21. virtu commented at 8:38 am on June 6, 2023: contributor

    Since the failure hasn’t been observed since #27177 was merged, there’s a good chance it fixed the issue.

    However, it still seems like a good idea to address the issue of missing check_equal() exception information in the logs, so I created #27831 to make sure all tracepoint tests properly log failed check_equal() assertions.

  22. maflcko commented at 8:45 am on June 6, 2023: member
    Ok, closing for now. Let’s reopen if this is an issue again.
  23. maflcko closed this on Jun 6, 2023

  24. 0xB10C commented at 9:12 am on June 6, 2023: contributor

    Since the failure hasn’t been observed since #27177 was merged, there’s a good chance it fixed the issue.

    For future reference, adding a bit more details on this what we think was happening here. We couldn’t actually verify it as we haven’t been able to reproduce this locally and the assertions in the callback aren’t logged in the CI logs (fixed in #27831).

    The mempool:rejected interface test constructs a zero-fee transaction and sends it to the test-node via P2P. This transaction is policy-rejected. The interface test tests that the txid and the rejection reason are what we expect them to be. Due to get_utxo() sometimes using an immature UTXO (see #27177), the transaction was consensus-rejected with bad-txns-premature-spend-of-coinbase. The assert_equal("bad-txns-premature-spend-of-coinbase", "min relay fee not met") would fail.

  25. maflcko commented at 9:14 am on June 6, 2023: member
    Can it be forced to “reproduce” on master by adding a typo to the error string in assert_equal?
  26. 0xB10C referenced this in commit 4091d621f1 on Jun 6, 2023
  27. 0xB10C commented at 10:01 am on June 6, 2023: contributor

    Can it be forced to “reproduce” on master by adding a typo to the error string in assert_equal?

    see https://github.com/0xB10C/bitcoin/commit/4091d621f16b6316eef5551a56c00bf0e2c0e541 and this CI task https://cirrus-ci.com/task/4909169740873728

  28. glozow commented at 1:49 pm on June 13, 2023: member
  29. maflcko commented at 2:56 pm on June 13, 2023: member
    #27831 improves the logging, probably.
  30. maflcko reopened this on Jun 13, 2023

  31. fanquake commented at 9:11 am on June 21, 2023: member
  32. fanquake referenced this in commit a7261da479 on Jun 22, 2023
  33. sidhujag referenced this in commit 197df44985 on Jun 22, 2023
  34. achow101 referenced this in commit 8e0cf4f90c on Jun 23, 2023
  35. sidhujag referenced this in commit 7b1ed7d1fa on Jun 25, 2023
  36. maflcko commented at 7:50 pm on June 29, 2023: member

    https://cirrus-ci.com/task/4626519738286080?logs=ci#L3262

     0 node0 2023-06-29T19:45:37.525879Z (mocktime: 2023-07-13T19:45:38Z) [httpworker.2] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
     1 test  2023-06-29T19:45:37.526000Z TestFramework (INFO): Polling buffer... 
     2 test  2023-06-29T19:45:37.527000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
     3 test  2023-06-29T19:45:37.527000Z TestFramework (ERROR): Assertion failed 
     4                                   Traceback (most recent call last):
     5                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
     6                                       self.run_test()
     7                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 329, in run_test
     8                                       self.rejected_test()
     9                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 310, in rejected_test
    10                                       assert_equal(event.reason.decode("UTF-8"), "min relay fee not met")
    11                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 57, in assert_equal
    12                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    13                                   AssertionError: not( == min relay fee not met)
    
  37. fanquake commented at 2:24 pm on July 3, 2023: member

    https://cirrus-ci.com/task/6504094303518720?logs=ci#L3264

     0 test  2023-07-03T14:16:38.121000Z TestFramework (INFO): Polling buffer... 
     1 node0 2023-07-03T14:16:38.121067Z (mocktime: 2023-07-17T14:16:35Z) [http] [httpserver.cpp:255] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:40312 
     2 node0 2023-07-03T14:16:38.121207Z (mocktime: 2023-07-17T14:16:35Z) [httpworker.2] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
     3 test  2023-07-03T14:16:38.122000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
     4 test  2023-07-03T14:16:38.122000Z TestFramework (ERROR): Assertion failed 
     5                                   Traceback (most recent call last):
     6                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
     7                                       self.run_test()
     8                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 329, in run_test
     9                                       self.rejected_test()
    10                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 310, in rejected_test
    11                                       assert_equal(event.reason.decode("UTF-8"), "min relay fee not met")
    12                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 57, in assert_equal
    13                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    14                                   AssertionError: not( == min relay fee not met)
    15 test  2023-07-03T14:16:38.123000Z TestFramework (DEBUG): Closing down network thread 
    16 test  2023-07-03T14:16:38.173000Z TestFramework (INFO): Stopping nodes 
    17 test  2023-07-03T14:16:38.173000Z TestFramework.node0 (DEBUG): Stopping node 
    
  38. maflcko commented at 3:20 pm on July 3, 2023: member
    Maybe remove the sub-test case?
  39. virtu commented at 7:51 am on July 5, 2023: contributor

    Even with the extra information, I’m somewhat at a loss because I cannot reproduce the problem locally. From what I understand:

    • the tracepoint got called, the handler was executed
    • the tx hash is correct (so bpf_usdt_readarg_p(1, ctx, &rejected.hash, HASH_LENGTH) in trace_rejected works as expected)
    • reject reason is usually correct; but sometime’s it’s an empty string, suggesting something went wrong with bpf_usdt_readarg_p(2, ctx, &rejected.reason, MAX_REJECT_REASON_LENGTH) in trace_rejected. Why this only would only happen somethings, I do not know. If the environment is 100% reproducible, this should not happen.

    My first reaction to your suggestion of removing the sub-test case was to first try and change the code so it triggers another rejection reason. But without knowing what’s going wrong right now, this feels like poking around in the dark.

  40. maflcko commented at 8:28 am on July 5, 2023: member
    Looking at the code, I fail to see how this could even pass once, because you can’t assume a pointer created by std::string{}.c_str() can be used after the statement at all, no?
  41. maflcko commented at 8:35 am on July 5, 2023: member
    Or do the TRACEx macros create a copy of the data in-line? If yes, it could make sense to document that in doc/tracing.md#c_str
  42. willcl-ark commented at 9:53 am on July 5, 2023: member

    AFAIU the macros themselves don’t copy any data, rather just define a static probe point. Any arguments passed to them just get incorporated into the assembly code. The tracing program (BPF) if loaded into the kernel can run code at the trace points. Here it can choose to, and AFAIU does in our case with BPF, copy the data into a different kernel space map when it hits a tracepoint, which can then be read by the USDT probe in our python test.

    So the data does get copied but not by the macro, rather by the BPF program.

  43. maflcko commented at 10:00 am on July 5, 2023: member

    Is the tracing program executed before the “full-expression” that encompasses the std::string{}.c_str() call has been fully evaluated?

    Anything that executes after the expression has been evaluated, will operate on destroyed memory. See https://en.cppreference.com/w/cpp/language/lifetime

  44. willcl-ark commented at 11:11 am on July 5, 2023: member

    I’m probably outside of my wheelhouse now, but as I understand the tracepoint defines a place in the binary where a tracing program gets to insert and run (its own) code, if it’s loaded in the kernel. The arguments to the macro are passed directly to the (assembly) code provided by the tracing program in that expression and at that execution point BPF will copy values into it’s own map in the kernel if a probe is set up. Other tracing programs or runtimes can choose not to copy data if they don’t need/want to.

    By using helper function bpf_usdt_readarg in our test program we should be able to read (pointers) between kernel and user space safely too. In the recent failure here, in theory, the macro should handle the pointers ok as neither tx nor result go out of scope before it’s called, at which point the copy should be done.

    All that said, it’s obviously still going wrong somehow…

  45. maflcko commented at 11:19 am on July 5, 2023: member

    neither tx nor result go out of scope before it’s called

    I don’t think the lifetime of result is relevant here. GetRejectReason() gives birth to a fresh temporary copy of the string, having its own lifetime.

    All that said, it’s obviously still going wrong somehow…

    Yeah, I am just guessing that this is a lifetime issue. It would be good to rule this out.

  46. maflcko added the label Tests on Jul 17, 2023
  47. maflcko commented at 11:51 am on July 17, 2023: member
    Ok, I’ll disable the broken test due to lack of progress here.
  48. fanquake referenced this in commit d09c8bc730 on Jul 17, 2023
  49. sidhujag referenced this in commit 12ac3f1f77 on Jul 17, 2023
  50. janus referenced this in commit 260576bd51 on Sep 11, 2023
  51. maflcko added the label CI failed on Jan 24, 2024
  52. maflcko removed the label CI failed on Jul 31, 2024
  53. fanquake referenced this in commit 9eb57d1ab6 on Jul 31, 2024
  54. fanquake commented at 9:19 am on August 1, 2024: member
    I guess we can close this for now, and can reopen if #30559 is reverted?
  55. fanquake closed this on Aug 1, 2024

  56. maflcko commented at 12:00 pm on August 2, 2024: member

    https://github.com/bitcoin/bitcoin/actions/runs/10214494976/job/28262011108?pr=29641#step:7:9088

     0test  2024-08-02T11:52:52.096000Z TestFramework (INFO): Hooking into mempool:rejected tracepoint... 
     1 test  2024-08-02T11:52:53.880000Z TestFramework (INFO): Sending invalid transaction... 
     2 node0 2024-08-02T11:52:53.880644Z (mocktime: 2024-08-16T11:52:53Z) [http] [httpserver.cpp:305] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:46704 
     3 node0 2024-08-02T11:52:53.880818Z (mocktime: 2024-08-16T11:52:53Z) [httpworker.2] [rpc/request.cpp:232] [parse] [rpc] ThreadRPCServer method=getblockchaininfo user=__cookie__ 
     4 test  2024-08-02T11:52:53.883000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:13268: msg_tx(tx=CTransaction(version=2 vin=[CTxIn(prevout=COutPoint(hash=6e95b1706527bac148b682dad0df924b578a2b92fa24de08d5eb6e846b06c07e n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=50.00000000 scriptPubKey=51202913b252fe537830f843bfdc5fa7d20ba48639a87c86ff837b92d083c55ad7c1)] wit=CTxWitness(CScriptWitness(51,c00000000000000000000000000000000000000000000000000000000000000001)) nLockTime=0)) 
     5 test  2024-08-02T11:52:53.884000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:13268: msg_ping(nonce=00000000) 
     6 test  2024-08-02T11:52:53.884000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:13268: msg_ping(nonce=00000002) 
     7 node0 2024-08-02T11:52:53.886782Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net_processing.cpp:3710] [ProcessMessage] [net] received: tx (133 bytes) peer=0 
     8 node0 2024-08-02T11:52:53.887424Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [txmempool.cpp:664] [check] [mempool] Checking mempool with 0 transactions and 0 inputs 
     9 node0 2024-08-02T11:52:53.887636Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net_processing.cpp:3209] [ProcessInvalidTx] [mempoolrej] 1ca65439fa59f1a719bc6cfd2dd34fb9c2dc6a8c699ece5b4d5397b084f37f95 (wtxid=15bb5f2afba6c18e6ba825160ebcf8fb735e2755045400439d1242cb6499a4e9) from peer=0 was not accepted: min relay fee not met, 0 < 104 
    10 node0 2024-08-02T11:52:53.887802Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net_processing.cpp:3710] [ProcessMessage] [net] received: ping (8 bytes) peer=0 
    11 node0 2024-08-02T11:52:53.887841Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net.cpp:3808] [PushMessage] [net] sending pong (8 bytes) peer=0 
    12 node0 2024-08-02T11:52:53.887991Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net_processing.cpp:3710] [ProcessMessage] [net] received: ping (8 bytes) peer=0 
    13 test  2024-08-02T11:52:53.888000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:13268: msg_pong(nonce=00000000) 
    14 test  2024-08-02T11:52:53.888000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:13268: msg_pong(nonce=00000002) 
    15 node0 2024-08-02T11:52:53.888038Z (mocktime: 2024-08-16T11:52:53Z) [msghand] [net.cpp:3808] [PushMessage] [net] sending pong (8 bytes) peer=0 
    16 node0 2024-08-02T11:52:53.935933Z (mocktime: 2024-08-16T11:52:53Z) [http] [httpserver.cpp:305] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:46704 
    17 test  2024-08-02T11:52:53.936000Z TestFramework (INFO): Polling buffer... 
    18 node0 2024-08-02T11:52:53.936089Z (mocktime: 2024-08-16T11:52:53Z) [httpworker.0] [rpc/request.cpp:232] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
    19 test  2024-08-02T11:52:53.937000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
    20 test  2024-08-02T11:52:53.937000Z TestFramework (ERROR): Assertion failed 
    21                                   Traceback (most recent call last):
    22                                     File "/home/runner/work/_temp/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
    23                                       self.run_test()
    24                                     File "/home/runner/work/_temp/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 318, in run_test
    25                                       self.rejected_test()
    26                                     File "/home/runner/work/_temp/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 299, in rejected_test
    27                                       assert_equal(event.reason.decode("UTF-8"), "min relay fee not met")
    28                                     File "/home/runner/work/_temp/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 75, in assert_equal
    29                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
    30                                   AssertionError: not( == min relay fee not met)
    
  57. maflcko reopened this on Aug 2, 2024

  58. fanquake referenced this in commit 9c6c667bc2 on Aug 2, 2024
  59. 0xB10C commented at 9:43 pm on August 5, 2024: contributor

    I’m once again trying to (reliably) reproduce this to be able to have a chance at debugging it. My current approach is running the mempool:rejected test 100 times along with the other functional tests (https://github.com/0xB10C/bitcoin/tree/2024-08-27380-intermitting-usdt-mempool-failures).

    However, no luck yet getting it to a point where I can reproduce it reliably. Sometimes all 100 runs succeed, other times it fails at the second run.

    Edit:

  60. maflcko commented at 10:35 am on August 9, 2024: member

    Given that you can reproduce on 50% of your new action runs, does the following diff work around the bug?

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index cf4dce3762..d2b6969370 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -1873,10 +1873,13 @@ MempoolAcceptResult AcceptToMemoryPool(Chainstate& active_chainstate, const CTra
     5 
     6         for (const COutPoint& hashTx : coins_to_uncache)
     7             active_chainstate.CoinsTip().Uncache(hashTx);
     8+        std::string copy{result.m_state.GetRejectReason()};
     9+        auto c_str{copy.c_str()};
    10         TRACE2(mempool, rejected,
    11                 tx->GetHash().data(),
    12-                result.m_state.GetRejectReason().c_str()
    13+                c_str
    14         );
    15+        UninterruptibleSleep(1ms);
    16     }
    17     // After we've (potentially) uncached entries, ensure our coins cache is still within its size limits
    18     BlockValidationState state_dummy;
    
  61. 0xB10C referenced this in commit e9a4f41300 on Aug 12, 2024
  62. 0xB10C referenced this in commit 3d806eafa0 on Aug 12, 2024
  63. mzumsande commented at 8:58 pm on August 12, 2024: contributor

    Maybe copying the string into a static or global variable would help rule out a lifetime issue?

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index 9126d8e928..683d13e0e5 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -1863,6 +1863,8 @@ MempoolAcceptResult AcceptToMemoryPool(Chainstate& active_chainstate, const CTra
     5     CTxMemPool& pool{*active_chainstate.GetMempool()};
     6 
     7     std::vector<COutPoint> coins_to_uncache;
     8+    static char string_buffer[120];
     9+
    10     auto args = MemPoolAccept::ATMPArgs::SingleAccept(chainparams, accept_time, bypass_limits, coins_to_uncache, test_accept);
    11     MempoolAcceptResult result = MemPoolAccept(pool, active_chainstate).AcceptSingleTransaction(tx, args);
    12     if (result.m_result_type != MempoolAcceptResult::ResultType::VALID) {
    13@@ -1873,9 +1875,10 @@ MempoolAcceptResult AcceptToMemoryPool(Chainstate& active_chainstate, const CTra
    14 
    15         for (const COutPoint& hashTx : coins_to_uncache)
    16             active_chainstate.CoinsTip().Uncache(hashTx);
    17+        strncpy(string_buffer, result.m_state.GetRejectReason().c_str(), 120 - 1);
    18         TRACE2(mempool, rejected,
    19                 tx->GetHash().data(),
    20-                result.m_state.GetRejectReason().c_str()
    21+                string_buffer
    22         );
    23     }
    24     // After we've (potentially) uncached entries, ensure our coins cache is still within its size limits
    
  64. 0xB10C referenced this in commit 503342104c on Aug 12, 2024
  65. 0xB10C commented at 3:15 pm on August 13, 2024: contributor

    Thanks @mzumsande - all 11 CI runs with your patch succeed.

    This should confirm it’s indeed a lifetime issue.

  66. mzumsande commented at 7:22 pm on August 13, 2024: contributor

    This is still really confusing to me.

    I don’t understand tracepoints in depth, but it looks like there is a race between bitcoind continuing to run (and overwriting data after it goes out of scope), and the data-collecting program (bpftrace) reading data from the memory location the c_str() result points to. But if that was the case, it would be a very general problem with tracepoints, and I don’t see how it would specifically lead to failures in this exact situation with the mempool-rejected tracepoint.

    I also don’t understand why this specifically happens in CI runs with sanitizers enabled. Naively I would expect that sanitizers make the C++ program run slower (while not affecting the speed of bpftrace), so if there really was a race, I would expect failures to happen less often with sanitizers, not more.

  67. 0xB10C commented at 2:34 pm on November 26, 2024: contributor

    To fix a potential lifetime issue, it is probably helpful to replace bpf_usdt_readarg_p() (which isn’t even documented) in:

    https://github.com/bitcoin/bitcoin/blob/f7144b24be09e7db2173a0b15d73f99a08b98118/test/functional/interface_usdt_mempool.py#L103-L104

    with bpf_usdt_readarg() followed by bpf_probe_read_user() or a bpf_probe_read_user_str(). We use bpf_usdt_readarg_p() in multiple places.

    From the documentation:

    For safety, all user address space memory reads must pass through bpf_probe_read_user().

    This is also how it’s documented in “6. USDT probes - bcc reference guide” and how it’s done in e.g. this example mysqld_query.py.

    Happy to give this another spin.

  68. 0xB10C commented at 4:46 pm on December 9, 2024: contributor

    Happy to give this another spin.

    To check that this is still an issue, I ran https://github.com/bitcoin/bitcoin/compare/master...0xB10C:bitcoin:refs/heads/2024-11-27380-intermitting-usdt-mempool-failures in CI 20 times. It failed 6 times. I then ran it with https://github.com/0xB10C/bitcoin/commit/8a3eeb48482955e9d0cd26632a03457e35a10568 and it didn’t fail a single time. It seems that getting rid of bpf_usdt_readarg_p indeed fixes this.

    I’ll work on a PR to get rid of our bpf_usdt_readarg_p usage in all tests.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-15 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me