test, tracing: don’t use problematic bpf_usdt_readarg_p() #31848

pull 0xB10C wants to merge 3 commits into bitcoin:master from 0xB10C:2025-01-dont-use-bpf_usdt_readarg_p changing 8 files +115 −59
  1. 0xB10C commented at 4:07 pm on February 12, 2025: contributor

    Instead of using the undocumented bcc helper bpf_usdt_readarg_p(), use bpf_usdt_readarg() and bpf_probe_read_user()/bpf_probe_read_user_str() as documented in the bcc USDT reference guide.

    Note that the bpf_probe_read_user() documentation says the following:

    For safety, all user address space memory reads must pass through bpf_probe_read_user().

    It’s assumed that using bpf_usdt_readarg_p() caused a lifetime issue. With bpf_usdt_readarg() and bpf_probe_read_user(), this doesn’t seem to be a problem anymore.

    This allows to revert https://github.com/bitcoin/bitcoin/commit/faed5337435f025811caeb5f782ecbf9683a24b3 and closes #27380.

  2. test: don't use bpf_usdt_readarg_p
    Instead of using the undocumented bcc helper bpf_usdt_readarg_p(),
    use bpf_usdt_readarg() [1] and bpf_probe_read_user{_str}() [2, 3] as
    documented in the bcc USDT reference guide [1].
    
    Note that the bpf_probe_read_user() documentation says the following:
    
    > For safety, all user address space memory reads must pass through bpf_probe_read_user().
    
    It's assumed that using bpf_usdt_readarg_p() caused a lifetime issue.
    See https://github.com/bitcoin/bitcoin/issues/27380#issuecomment-2286505348
    With bpf_usdt_readarg() and bpf_probe_read_user(), this doesn't seem
    to be a problem anymore. See https://github.com/bitcoin/bitcoin/issues/27380#issuecomment-2528671656
    
    [1]: https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#6-usdt-probes
    [2]: https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#10-bpf_probe_read_user
    [3]: https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#11-bpf_probe_read_user_str
    35ae6ff60f
  3. contrib: don't use bpf_usdt_readarg_p ec47ba349d
  4. Revert "test: Disable known broken USDT test for now"
    This reverts commit faed5337435f025811caeb5f782ecbf9683a24b3.
    
    This commit worked around a lifetime issue likely caused by using
    bpf_usdt_readarg_p(). Since we don't use bpf_usdt_readarg_p() anymore
    this commit can be reverted.
    
    See the discussion in https://github.com/bitcoin/bitcoin/issues/27380#issuecomment-2500962838
    a0b66b4bff
  5. DrahtBot commented at 4:07 pm on February 12, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31848.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK i-am-yuvi, willcl-ark

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  6. 0xB10C commented at 4:14 pm on February 12, 2025: contributor
    I did >15 re-runs of https://github.com/bitcoin/bitcoin/compare/master...0xB10C:bitcoin:2024-12-dont-use-bpf_usdt_readarg_p-testing similar to the 20 runs in #27380 (comment) to make sure this won’t intermittently fail again.
  7. glozow added the label Tests on Feb 12, 2025
  8. laanwj requested review from laanwj on Feb 12, 2025
  9. willcl-ark commented at 11:29 am on February 13, 2025: member

    Approach ACK

    So if I understand the change correctly, the new (documented/recommended) behaviour is to always first read_arg() which gets the memory location of the data, then read_user[_str]() which then actually performs the copy from user memory to the BPF memory space.

    If I have this correct, then using the documented, seemingly more-reliable, and more explicit approach seems fine to me.

    Not sure how I could test the reliability of this myself, but #31848 (comment) looks promising.

  10. 0xB10C commented at 1:14 pm on February 13, 2025: contributor

    So if I understand the change correctly, the new (documented/recommended) behaviour is to always first read_arg() which gets the memory location of the data, then read_user{_str}() which then actually performs the copy from user memory to the BPF memory space.

    Correct. Here’s an example from the mempool:added tracepoint test.

    • (1) initialize a null-pointer phash
    • (2) read the first tracepoint argument (a user-space address) into the phash pointer
    • (3) copy the value behind the phash pointer into added.hash
    0  void *phash = NULL;                                              // (1)
    1  bpf_usdt_readarg(1, ctx, &phash);                                // (2)
    2  bpf_probe_read_user(&added.hash, sizeof(added.hash), phash);     // (3)
    

    bpf_usdt_readarg_p() is a helper that does this, but apparently not safely enough

  11. laanwj commented at 2:52 pm on February 13, 2025: member

    bpf_usdt_readarg_p() is a helper that does this, but apparently not safely enough

    i looked into it a bit; the implementation of bpf_usdt_readarg_p is here: https://github.com/iovisor/bcc/blob/master/src/cc/frontends/clang/b_frontend_action.cc#L1270

    It’s a macro implemented in the clang frontend:

     0        } else if (Decl->getName() == "bpf_usdt_readarg_p") {
     1          text = "({ u64 __addr = 0x0; ";
     2          text += "_bpf_readarg_" + current_fn_ + "_" + args[0] + "(" +
     3                  args[1] + ", &__addr, sizeof(__addr));";
     4
     5          bool overlap_addr = false;
     6          text += check_bpf_probe_read_user(StringRef("bpf_probe_read_user"),
     7                  overlap_addr);
     8          if (overlap_addr) {
     9            error(GET_BEGINLOC(Call), "bpf_probe_read_user not found. Use latest kernel");
    10            return false;
    11          }
    12
    13          text += "(" + args[2] + ", " + args[3] + ", (void *)__addr);";
    14          text += "})";
    15          rewriter_.ReplaceText(expansionRange(Call->getSourceRange()), text);
    16        }
    

    At first glance it generates the exact same code: a bpf_readarg folllowed by bpf_probe_read_user of that address. So it’s unclear why the explicit formulation would change anything.

    However, check_bpf_probe_read_user does not always return bpf_probe_read_user! If the symbol cannot be resolved for the specific kernel it returns bpf_probe_read instead. Maybe that’s it, and calling check_bpf_probe_read_user works around it? Not sure.

    On the other hand there’s also some #defining going on which does the same (but not sure it’s under the same conditions).

  12. in contrib/tracing/log_raw_p2p_msgs.py:90 in a0b66b4bff
    88     bpf_usdt_readarg(1, ctx, &msg->peer_id);
    89-    bpf_usdt_readarg_p(2, ctx, &msg->peer_addr, MAX_PEER_ADDR_LENGTH);
    90-    bpf_usdt_readarg_p(3, ctx, &msg->peer_conn_type, MAX_PEER_CONN_TYPE_LENGTH);
    91-    bpf_usdt_readarg_p(4, ctx, &msg->msg_type, MAX_MSG_TYPE_LENGTH);
    92+    bpf_usdt_readarg(2, ctx, &paddr);
    93+    bpf_probe_read_user_str(&msg->peer_addr, sizeof(msg->peer_addr), paddr);
    


    laanwj commented at 2:57 pm on February 13, 2025:
    Using bpf_probe_read_user_str here at least makes sure that only until the zero-terminator is read and not the entire buffer. That’s a clear improvement.
  13. fanquake added this to the milestone 29.0 on Feb 25, 2025
  14. DrahtBot requested review from willcl-ark on Feb 27, 2025
  15. i-am-yuvi commented at 8:39 am on February 27, 2025: contributor

    Tested ACK a0b66b4bffaa6bc354c293785b785c2da2cef4de

    • I’ve tested all the functional tests and it passes (5/5)

      • interface_usdt_coinselection.py ✅
      • interface_usdt_mempool.py ✅
      • interface_usdt_net.py ✅
      • interface_usdt_utxo.py ✅
      • interface_usdt_validation.py ✅
    • All tracing tools works fine (3/3)

    • Did 5-6 runs of CI similar to this one, which can be seen here all of them succeeded.

    • The issue comment looks promising to me!

    Note: All of the tests has been executed in a fully synced node

  16. willcl-ark approved
  17. willcl-ark commented at 2:53 pm on March 4, 2025: member

    tACK a0b66b4bffaa6bc354c293785b785c2da2cef4de

    I also re-ran CI on this PR 10 times, of which all passed.

    The code changes all look correct to me too, following the principle of the pointers being null initialized, having addressed assigned to them, and then the value being copied.

    Hopefully this can finally end the intermittent failures 🤞🏼

  18. DrahtBot requested review from willcl-ark on Mar 4, 2025
  19. glozow commented at 7:39 pm on March 4, 2025: member
    @willcl-ark you acked your commit on top of the PR. It is effectively an ack, but do you mind editing your comment?
  20. willcl-ark commented at 10:29 am on March 5, 2025: member

    Thanks Gloria, nice catch :)

    Comment amended.

  21. glozow merged this on Mar 5, 2025
  22. glozow closed this on Mar 5, 2025

  23. laanwj commented at 3:28 pm on March 5, 2025: member
    Post-merge code review ACK a0b66b4bffaa6bc354c293785b785c2da2cef4de
  24. 0xB10C deleted the branch on Mar 5, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me