Add experimental syscall sandboxing using seccomp-bpf (Linux secure computing mode).
Enable filtering of system calls using seccomp-bpf: allow only explicitly allowlisted (expected) syscalls to be called.
The syscall sandboxing implemented in this PR is an experimental feature currently available only under Linux x86-64.
To enable the experimental syscall sandbox the -sandbox=<mode>
option must be passed to bitcoind
:
0 -sandbox=<mode>
1 Use the experimental syscall sandbox in the specified mode
2 (-sandbox=log-and-abort or -sandbox=abort). Allow only expected
3 syscalls to be used by bitcoind. Note that this is an
4 experimental new feature that may cause bitcoind to exit or crash
5 unexpectedly: use with caution. In the "log-and-abort" mode the
6 invocation of an unexpected syscall results in a debug handler
7 being invoked which will log the incident and terminate the
8 program (without executing the unexpected syscall). In the
9 "abort" mode the invocation of an unexpected syscall results in
10 the entire process being killed immediately by the kernel without
11 executing the unexpected syscall.
The allowed syscalls are defined on a per thread basis.
I’ve used this feature since summer 2020 and I find it to be a helpful testing/debugging addition which makes it much easier to reason about the actual capabilities required of each type of thread in Bitcoin Core.
Quick start guide:
0$ ./configure
1$ src/bitcoind -regtest -debug=util -sandbox=log-and-abort
2…
32021-06-09T12:34:56Z Experimental syscall sandbox enabled (-sandbox=log-and-abort): bitcoind will terminate if an unexpected (not allowlisted) syscall is invoked.
4…
52021-06-09T12:34:56Z Syscall filter installed for thread "addcon"
62021-06-09T12:34:56Z Syscall filter installed for thread "dnsseed"
72021-06-09T12:34:56Z Syscall filter installed for thread "net"
82021-06-09T12:34:56Z Syscall filter installed for thread "msghand"
92021-06-09T12:34:56Z Syscall filter installed for thread "opencon"
102021-06-09T12:34:56Z Syscall filter installed for thread "init"
11…
12# A simulated execve call to show the sandbox in action:
132021-06-09T12:34:56Z ERROR: The syscall "execve" (syscall number 59) is not allowed by the syscall sandbox in thread "msghand". Please report.
14…
15Aborted (core dumped)
16$
About seccomp and seccomp-bpf:
In computer security, seccomp (short for secure computing mode) is a facility in the Linux kernel. seccomp allows a process to make a one-way transition into a “secure” state where it cannot make any system calls except exit(), sigreturn(), and read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system’s resources but isolates the process from them entirely.
[…]
seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux. (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.)