tracing: tests for USDT tracepoints #23296

issue 0xB10C openend this issue on October 17, 2021
  1. 0xB10C commented at 2:23 pm on October 17, 2021: member

    #22006 added the first three USDT based tracepoints to Bitcoin Core. To provide a semi-stable tracepoint API the tracepoints need test coverage. The tracepoints can be tested in the functional tests using the Python wrapper of BCC.

    Before adding more tracepoints, the existing three tracepoints from #22006 should be tested.

    Notes:

    1. We currently only support the tracepoints on Linux. The tests should be skipped on other operating systems.
    2. Hooking into the tracepoints via the Linux kernel requires special privileges. Since kernel version 5.8. (Aug. 2020) the CAP_BPF can be used. On older kernel version the overloaded catch-all capability CAP_SYS_ADMIN is required. Functional tests shouldn’t require CAP_SYS_ADMIN as that essentially means running the test suite with root privileges.
    3. The tests require the BCC Python library. This should be an optional dependency. Tests should be skipped if the dependency isn’t present.

    The connect_block tracepoint can be tested by mining blocks with transactions and checking that the tracepoint passes the correct data. The net inbound_message and outbound_message tracepoints can be tested by checking the traffic between two nodes.

  2. 0xB10C added the label Feature on Oct 17, 2021
  3. MarcoFalke added the label Tests on Oct 18, 2021
  4. 0xB10C commented at 3:24 pm on December 18, 2021: member

    Note so I don’t forget about this:

    By chance I came across: https://github.com/bitcoin/bitcoin/blob/c006ab29ceec9274dc85a0de7f7d0502021a4b87/ci/test/04_install.sh#L28-L30

    Which adds the CAP_SYS_PTRACE capability to the docker container being started. If we’d run the functional tests in docker we use --cap-add BPF (assuming a 5.8+ kernel).

    docker docs: Runtime privilege and Linux capabilities

  5. fanquake referenced this in commit 542e405a85 on Jan 10, 2022
  6. sidhujag referenced this in commit 5ca60a8fdc on Jan 10, 2022
  7. 0xB10C commented at 4:05 pm on January 24, 2022: member

    Next to CAP_BPF, CAP_PERFMON is likely also required.


    I’ve started to work on how to run tracepoint tests in the functional test suite:

    1. on developer machines without requiring to run the functional test framework as root.
    2. in the CI (particularly our current cirrus CI setup)
  8. 0xB10C commented at 2:20 pm on January 29, 2022: member

    I’ve focused on getting a proof-of-concept (PoC) running on CirrussCI for now. I solved a few of the bigger issues, but I’m at a point where I think the CI containers don’t have enough permissions to attach to the USDT tracepoints. I’ve learned a bit about the CI and want to document my process here for future re-attempts.

    My PoC branch is https://github.com/0xB10C/bitcoin/commits/2022-01-tracepoint-ci-poc

    tl;dr: I think we are not able to set the required Linux capabilities to test USDT tracepoints on CirrusCi: Unable to set capabilities [--caps=CAP_SYS_ADMIN+ep].

    first commit: My approach for the PoC was to drop all tasks besides the [lint] task for less CI load. No need to re-test the same Bitcoin Core code over and over. I’ve then added a CI task specifically for the USDT tracing tests, disabling e.g. the unit tests and existing functional tests for faster CI feedback. I checked that the newly added CI task builds the (or uses the cached) depends including the systemtap depends added in #23724 and that use usdt is true the configure stage.

    second commit: The next step was to set the prerequisites for the USDT functional tests up. This test should only be run when a) the tracepoints are compiled and b) the Python bcc (iovisor/bcc) is installed. I’ve added the relevant checks to test_framework.py and a ENABLE_USDT_TRACEPOINTS flag in the config.ini.in. Additionally, I’ve added bpfcc-tools to the installed packages for the bcc Python package.

    third commit: Then, I’ve added an interface_usdt.py PoC functional test based on the log_raw_p2p_msgs.py example. It checks that at least one inbound and one outbound P2P message is received over the USDT interface.

    fourth commit: The BPF bytecode compilation from the provided C code failed. The kernel headers are missing. It took me a while to remember that containers use the host’s kernel. This means the relevant kernel headers can’t be found/installed in the Ubuntu jammy container. According to the Cirrus CI documentation, the container instances run on a GKE cluster of compute-optimized instances running in Google Cloud. These run Google’s Container-Optimized OS COS with their own kernel. Specifically, (I think) the cos-89-16108-534-8 release with the COS-5.4.144 Linux kernel. It’s possible to retrieve the kernel source and run the make oldconfig and make prepare steps using the kernel configuration in /proc/config.gz. This is, for example, done in iovisor/kubectl-trace (Kubernetes tracing) for COS too. See fetch-linux-headers.sh. Fetching the kernel headers and the make steps take a few minutes. However I think this could be cached and only needs to be done one time when Cirrus CI upgrades their host OS to a newer version. We need to set the BCC_KERNEL_SOURCE env var to the generated kernel headers. The BPF bytecode compilation succeeds now.

    fifth commit: We lack the required permissions to open the BPF maps. I assumed this could be solved by adding the required capabilities (to docker). It wasn’t clear to me that DOCKER_EXEC does, in fact, not run docker exec <args>. The CI is run in the CirrusCI container itself at the moment. See DANGER_RUN_CI_ON_HOST. (Maybe just EXEC would be a better name). So adding --cap-add=CAP_X to DOCKER_ADMIN (see #23296 (comment)) did not work. I’ve opted to use capsh to set the required capabilities. With the host’s 5.4 Linux kernel, we can’t use CAP_BPF and CAP_PERFMON as they were introduced in 5.8. We have to fall back to CAP_SYS_ADMIN. However, this fails with Unable to set capabilities [--caps=CAP_SYS_ADMIN+ep]. Later, I discovered https://github.com/cirruslabs/cirrus-ci-docs/issues/654, where MarcoFalke already asked about this for wine.

    Here’s the failed CI task: https://cirrus-ci.com/task/6305351615119360


    Maybe @MarcoFalke has further ideas? Or @laanwj? I think having automatically run tests for the tracepoints is important. I don’t think running these on a different CI is a good solution though.

    I’ll be focusing on a PoC for local development machines for now. It should be possible to run these with sudo capsh ..., dropping privilege while only keeping the required capabilities. While these functional tests would be skipped on most development machines, they could at least be run to e.g. test release candidates and on PRs adding new tracepoints (with tests).

  9. MarcoFalke commented at 2:32 pm on January 29, 2022: member

    Linux containers in Cirrus CI already run in docker (I don’t know which permissions they have turned on, though), so it is not possible to start docker again.

    Maybe just EXEC would be a better name

    Yeah, or CI_EXEC

    While it may be possible to specify the flags for the one task running outside the Cirrus infrastructure (https://cirrus-ci.com/task/5786179323822080?logs=ci#L21), I think it may be preferable to not rely on that to make the tasks easier to run locally inside docker. I use this often with DANGER_RUN_CI_ON_HOST.

    It is possible that arm, freebsd or macos run on actual vms. If any of them support usdt, you could try writing or adjusting the task for them.

  10. 0xB10C commented at 2:51 pm on January 29, 2022: member

    Linux containers in Cirrus CI already run in docker (I don’t know which permissions they have turned on, though), so it is not possible to start docker again.

    Makes sense. These capabilities are enabled (via capsh --print):

    0cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=eip
    

    Maybe just EXEC would be a better name

    Yeah, or CI_EXEC

    Agree, that’s even better.

    It is possible that arm, freebsd or macos run on actual vms. If any of them support usdt, you could try writing or adjusting the task for them.

    Good point. We tested the tracepoints in #23724 on arm64. CirrusCi runs their ARM tasks on AWS. I’ll try to adjust the arm task, though I think they’ll run the Kubernetes containers with the same permissions/capabilites.

  11. 0xB10C commented at 4:04 pm on February 16, 2022: member
    Adding tests for all existing tracepoints in #24358. These are currently skipped in the CI (and probably most other test suite runs too).
  12. laanwj referenced this in commit 6c9460edae on Apr 6, 2022
  13. laanwj closed this on Apr 6, 2022

  14. 0xB10C commented at 11:16 am on April 6, 2022: member
    GitHub seems to have decided that “partly fixes #23296” means it should close this issue. I still want to see if we can get the tests running in the CI. I’ll extract the “running the tests in the CI” part into a new issue.
  15. 0xB10C commented at 11:30 am on April 6, 2022: member
    For reference, @laanwj noted in #24358 (comment) how he was able to run the tracepoint tests with a non-root user on his system. This is important developer documentation. I’ll try to reproduce this on my system and will add it to either the docs, write a blog post about it, or both.
  16. 0xB10C referenced this in commit b0e8a6a9d1 on Jul 2, 2022
  17. 0xB10C referenced this in commit 4c88ee2dcc on Jul 2, 2022
  18. 0xB10C referenced this in commit a221a5b957 on Jul 2, 2022
  19. MarcoFalke referenced this in commit 1e68ce3f8b on Jul 7, 2022
  20. MarcoFalke referenced this in commit 023e12abf5 on Jul 7, 2022
  21. MarcoFalke referenced this in commit 3bc412f752 on Jul 7, 2022
  22. MarcoFalke referenced this in commit b7cb2d2666 on Jul 7, 2022
  23. MarcoFalke referenced this in commit 4d8adf3208 on Jul 8, 2022
  24. MarcoFalke referenced this in commit 18ef1e41ed on Jul 8, 2022
  25. 0xB10C referenced this in commit 24a5b1f304 on Jul 8, 2022
  26. 0xB10C referenced this in commit d97474f04c on Jul 8, 2022
  27. 0xB10C referenced this in commit cc0b8bb6f4 on Jul 8, 2022
  28. 0xB10C referenced this in commit cc7335edc8 on Jul 8, 2022
  29. 0xB10C referenced this in commit 034bad9977 on Jul 29, 2022
  30. MarcoFalke referenced this in commit eeb5a94e27 on Aug 1, 2022
  31. stickies-v referenced this in commit 625964018a on Aug 2, 2022
  32. Rspigler referenced this in commit ed517d0a4b on Aug 21, 2022
  33. janus referenced this in commit 21a26ceed4 on Jan 20, 2023
  34. DrahtBot locked this on Apr 6, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me