test: USDT tracepoint interface tests

0xB10C commented at 4:02 pm on February 16, 2022: member

This adds functional tests for the USDT tracepoints added in #22006 and #22902. This partially fixes #23296. The tests are probably skipped on most systems as these tests require:

a Linux system with a kernel that supports BPF (and available kernel headers)
that Bitcoin Core is compiled with tracepoints for USDT support (default when compiled with depends)
bcc installed
the tests are run with a privileged user that is able to e.g. do BPF syscalls and load BPF maps

The tests are not yet run in our CI as the CirrusCI containers lack the required permissions (see #23296 (comment)). Running the tests in a VM in the CI could work, but I haven’t experimented with this yet. The priority was to get the actual tests done first to ensure the tracepoints work as intended for the v23.0 release. Running the tracepoint tests in the CI is planned as the next step to finish #23296.

The tests can, however, be run against e.g. release candidates by hand. Additionally, they provide a starting point for tests for future tracepoints. PRs adding new tracepoint should include tests. This makes reviewing these PRs easier.

The tests require privileges to execute BPF sycalls (CAP_SYS_ADMIN before Linux kernel 5.8 and CAP_BPF and CAP_PERFMON on 5.8+) and permissions to /sys/kernel/debug/tracing/. It’s currently recommended to run the tests in a virtual machine (or on a VPS) where it’s sensible to use the root user to gain these privileges. Never run python scripts you haven’t carefully reviewed with root permissions! It’s unclear if a non-root user can even gain the required privileges. This needs more experimenting.

The goal here is to test the tracepoint interface to make sure the documented interface does not break by accident. The tracepoints expose implementation details. This means we also need to rely on implementation details of Bitcoin Core in these functional tests to trigger the tracepoints. An example is the test of the utxocache:flush tracepoint: On Bitcoin Core shutdown, the UTXO cache is flushed twice. The corresponding tracepoint test expects two flushes, too - if not, the test fails. Changing implementation details could cause these tests to fail and the tracepoint API to break. However, we purposefully treat the tracepoints only as semi-stable. The tracepoints should not block refactors or changes to other internals.

DrahtBot added the label Build system on Feb 16, 2022

laanwj commented at 7:57 am on February 18, 2022: member

Nice! Concept ACK.

arnabsen1729 commented at 3:31 pm on February 18, 2022: contributor

I tested it on my machine:

First I ran the tests on master. Only interface_usdt_utxocache.py failed. After I pulled the commits from follow-up #23907, the test was successful.
the tests for net and validation were successful in both scenarios.
running tests without root privileges showed the message no permissions to use BPF and they were skipped.

Tested with bpftrace v0.12.0

0$ bpftrace --version
1bpftrace v0.12.0

test: checks for tracepoint tests

For testing the USDT tracepoint API in the functional tests we
require:
 - that we are on a Linux system*
 - that Bitcoin Core is compiled with tracepoints
 - that bcc and the the Python bcc module [0] is installed
 - that we run the tests with the required permissions**
otherwise we skip the tests.

*:  We currently only support tracepoints on Linux. Tracepoints are
    not compiled on other platforms.
**: Currently, we check for root permissions via getuid == 0. It's
    unclear if it's even possible to run the tests a non-root user
    with e.g. CAP_BPF, CAP_PERFMON, and access to /sys/kernel/debug/
    tracing/. Anyone running these tests as root should carefully
    review them first and then run them in a disposable VM.

[0]: https://github.com/iovisor/bcc/blob/master/INSTALL.md

c934087b62

test: net:in/out_message tracepoint tests

This adds tests for the net:inbound_message and net:outbound_message
tracepoint interface.

34b27bac68

test: utxocache:* tracepoint tests

This adds tests for the
- utxocache:flush
- utxocache:uncache
- utxocache:add
- utxocache:spent
tracepoint interfaces.

260e28ece8

test: validation:block_connected tracepoint test

This adds a test for the validation:block_connected tracepoint.

76c60d7b31

0xB10C force-pushed on Feb 20, 2022

0xB10C commented at 2:01 pm on February 20, 2022: member

#23907 is merged. Rebased and ready for review.

0xB10C marked this as ready for review on Feb 20, 2022

fanquake requested review from jb55 on Feb 20, 2022

fanquake requested review from laanwj on Feb 20, 2022

theStack commented at 1:59 pm on February 25, 2022: member

Concept ACK

jb55 commented at 6:32 pm on February 25, 2022: member

Tests pass on my end, but oddly enough when I run bpftrace contrib/tracing/log_utxos.bt I get a segfault. anyone else seeing this?

0$ bpftrace --version
1bpftrace v0.14.1

 0Thread 1 "bpftrace" received signal SIGSEGV, Segmentation fault.
 10x00007fffefc816e4 in llvm::PointerType::get(llvm::Type*, unsigned int) () from /nix/store/sfydnaab0wxn2qm3pkaab5x1fcagzxpf-llvm-11.1.0-lib/lib/libLLVM-11.so
 2(gdb) bt
 3[#0](/bitcoin-bitcoin/0/)  0x00007fffefc816e4 in llvm::PointerType::get(llvm::Type*, unsigned int) ()
 4   from /nix/store/sfydnaab0wxn2qm3pkaab5x1fcagzxpf-llvm-11.1.0-lib/lib/libLLVM-11.so
 5[#1](/bitcoin-bitcoin/1/)  0x000000000055fc37 in bpftrace::ast::CodegenLLVM::createFormatStringCall(bpftrace::ast::Call&, int&, std::vector<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<bpftrace::Field, std::allocator<bpftrace::Field> > >, std::allocator<std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<bpftrace::Field, std::allocator<bpftrace::Field> > > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bpftrace::AsyncAction) ()
 6[#2](/bitcoin-bitcoin/2/)  0x0000000000569c87 in bpftrace::ast::CodegenLLVM::visit(bpftrace::ast::Call&) ()
 7[#3](/bitcoin-bitcoin/3/)  0x0000000000558dd6 in bpftrace::ast::CodegenLLVM::accept(bpftrace::ast::Node*) ()
 8[#4](/bitcoin-bitcoin/4/)  0x0000000000558ee7 in bpftrace::ast::CodegenLLVM::visit(bpftrace::ast::ExprStatement&) ()
 9[#5](/bitcoin-bitcoin/5/)  0x0000000000558dd6 in bpftrace::ast::CodegenLLVM::accept(bpftrace::ast::Node*) ()
10[#6](/bitcoin-bitcoin/6/)  0x000000000056edce in bpftrace::ast::CodegenLLVM::generateProbe(bpftrace::ast::Probe&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::FunctionType*, bool, std::optional<int>, bool) ()
11[#7](/bitcoin-bitcoin/7/)  0x000000000056fbaf in bpftrace::ast::CodegenLLVM::visit(bpftrace::ast::Probe&) ()
12[#8](/bitcoin-bitcoin/8/)  0x0000000000558dd6 in bpftrace::ast::CodegenLLVM::accept(bpftrace::ast::Node*) ()
13[#9](/bitcoin-bitcoin/9/)  0x0000000000558f6e in bpftrace::ast::CodegenLLVM::visit(bpftrace::ast::Program&) ()
14[#10](/bitcoin-bitcoin/10/) 0x0000000000558dd6 in bpftrace::ast::CodegenLLVM::accept(bpftrace::ast::Node*) ()
15[#11](/bitcoin-bitcoin/11/) 0x0000000000558feb in bpftrace::ast::CodegenLLVM::generate_ir() ()
16[#12](/bitcoin-bitcoin/12/) 0x000000000043d622 in main ()

0xB10C commented at 6:39 pm on February 25, 2022: member

@jb55 which version of bpftrace are you running? I’ve got the segfault on 0.13 and 0.14 too, it works fine with 0.12. The issue seems to be related to the printf in the unroll followed by another printf.

This is kinda off topic here, but there have been quite a few problems with bpftrace on different versions. People reported on IRC that the example scripts don’t work with their bpftrace version. Have been thinking about dropping the examples until bpftrace is more stable.

jb55 commented at 6:48 pm on February 25, 2022: member

tACK 76c60d7b31ccc50b226cdbc5e38be0bd67603408

laanwj commented at 1:35 pm on March 30, 2022: member

It’s unclear if a non-root user can even gain the required privileges. This needs more experimenting.

I managed to get the required privileges as a user, and get the tests to run (don’t forget to patch test/functional/test_framework/test_framework.py to remove the euid check):

 0# chmod 755 /sys/kernel/debug # might want to assign a special group or such
 1# chmod 755 /sys/kernel/debug/tracing
 2# chmod 666 /sys/kernel/debug/tracing/uprobe_events
 3# echo "3" > /proc/sys/kernel/perf_event_paranoid # 3 or lower will do, 4 won't
 4# setpriv --ambient-caps +cap_38,+cap_39 --inh-caps +cap_38,+cap_39 --init-groups --reuid=1000 --regid=1000 bash
 5$ getpcaps $$
 6310068: cap_perfmon,cap_bpf=eip
 7$ cd …/bitcoin
 8$ test/functional/interface_usdt_net.py
 9(passes)
10$ test/functional/interface_usdt_validation.py
11(passes)
12$ test/functional/interface_usdt_utxocache.py
13(fails on some probably unrelated assertion)

It’s apparently also possible to assign fixed capabilities to users through /etc/security/capability.confin some environments. I haven’t tried this yet.

BTW: test_runner.py fails on all the new tests for me. Although the test itself passes, warnings are printed to stderr, I think causing it to be marked as failed:

 0stderr:
 1In file included from <built-in>:2:                                                                       
 2In file included from /virtual/include/bcc/bpf.h:12:
 3In file included from include/linux/types.h:6:
 4In file included from include/uapi/linux/types.h:14:
 5In file included from include/uapi/linux/posix_types.h:5:
 6In file included from include/linux/stddef.h:5:
 7In file included from include/uapi/linux/stddef.h:2:
 8In file included from include/linux/compiler_types.h:80:
 9include/linux/compiler-clang.h:41:9: warning: '__HAVE_BUILTIN_BSWAP32__' macro redefined [-Wmacro-redefine
10d]
11#define __HAVE_BUILTIN_BSWAP32__
12        ^
13<command line>:4:9: note: previous definition is here 
14#define __HAVE_BUILTIN_BSWAP32__ 1
15        ^
16…

laanwj commented at 2:59 pm on March 30, 2022: member

Tested ACK 76c60d7b31ccc50b226cdbc5e38be0bd67603408

Will open a PR to fix the RPC assertion in interface_usdt_utxocache.py. It’s unrelated.

laanwj referenced this in commit 71038a151e on Mar 30, 2022

in test/functional/test_framework/test_framework.py:841 in 76c60d7b31

836+
837+    def skip_if_no_bpf_permissions(self):
838+        """Skip the running test if we don't have permissions to do BPF syscalls and load BPF maps."""
839+        # check for 'root' permissions
840+        if os.geteuid() != 0:
841+            raise SkipTest("no permissions to use BPF (please review the tests carefully before running them with higher privileges)")

laanwj commented at 3:19 pm on March 30, 2022:

As alternative to / in addition to os.geteuid() != 0 for non-root use we’ll probably want to check CapEff in /proc/<pid>/status for (1 << CAP_PERFMON) | (1 << CAP_BPF).

0CAP_PERFMON = 38
1CAP_BPF = 39

Not necessarily in this PR though. I think it’s fine to merge as-is and do that later.

jamesob commented at 3:42 pm on March 30, 2022: member

Concept ACK. Nice functional demonstration of USDT usage, too. I’ll test this soon.

MarcoFalke referenced this in commit 1a54c060b3 on Mar 31, 2022

laanwj merged this on Apr 6, 2022

laanwj closed this on Apr 6, 2022

laanwj commented at 11:08 am on April 6, 2022: member

Concept ACK. Nice functional demonstration of USDT usage, too. I’ll test this soon. @jamesob I’m going ahead and merge this as I think it’s ready. I hope you’ll still get around to testing it though.

0xB10C deleted the branch on Apr 6, 2022

sidhujag referenced this in commit 5fe3dc462f on Apr 6, 2022

luke-jr referenced this in commit ff364b8295 on May 21, 2022

MarcoFalke referenced this in commit eeb5a94e27 on Aug 1, 2022

Fabcien referenced this in commit 7415dd1f91 on Dec 5, 2022

DrahtBot locked this on Apr 6, 2023

test: USDT tracepoint interface tests #24358