Extended Berkeley Packet Filter (eBPF) is an in-kernel virtual machine that
runs user-supplied eBPF programs to extend kernel functionality. These programs
can be hooked to probes or events in the kernel and used to collect useful
kernel statistics, monitor, and debug. A program is
loaded into the kernel using the bpf(2)
syscall and is provided by the user
as a binary blob of eBPF machine instructions. The Android build system has
support for compiling C
programs to eBPF using simple build file syntax described in this document.
More information about eBPF internals and architecture can be found at Brendan Gregg's eBPF page.
Android includes an eBPF loader and library that loads eBPF programs at boot time.
Android BPF loader
During Android boot, all eBPF programs located at /system/etc/bpf/
are
loaded. These programs are binary objects built by the Android build system
from C programs and are accompanied by Android.bp
files in the Android source
tree. The build system stores the generated objects at /system/etc/bpf
, and
those objects become part of the system image.
Format of an Android eBPF C program
An eBPF C program must have the following format:
#include <bpf_helpers.h>
/* Define one or more maps in the maps section, for example
* define a map of type array int -> uint32_t, with 10 entries
*/
DEFINE_BPF_MAP(name_of_my_map
, ARRAY, int, uint32_t, 10);
/* this also defines type-safe accessors:
* value * bpf_name_of_my_map_lookup_elem(&key);
* int bpf_name_of_my_map_update_elem(&key, &value, flags);
* int bpf_name_of_my_map_delete_elem(&key);
* as such it is heavily suggested to use lowercase *_map names.
* Also note that due to compiler deficiencies you cannot use a type
* of 'struct foo' but must instead use just 'foo'. As such structs
* must not be defined as 'struct foo {}' and must instead be
* 'typedef struct {} foo'.
*/
DEFINE_BPF_PROG("PROGTYPE/PROGNAME", AID_*, AID_*, PROGFUNC)(..args..) {
<body-of-code
... read or write to MY_MAPNAME
... do other things
>
}
LICENSE("GPL"); // or other license
Where:
name_of_my_map
is the name of your map variable. This name informs the BPF loader of the type of map to create and with what parameters. This struct definition is provided by the includedbpf_helpers.h
header.PROGTYPE/PROGNAME
represents the type of the program and program name. The type of the program can be any of those listed in the following table. When a type of program isn't listed, there is no strict naming convention for the program; the name just needs to be known to the process that attaches the program.PROGFUNC
is a function that, when compiled, is placed in a section of the resulting file.
kprobe | Hooks PROGFUNC onto at a kernel instruction using the
kprobe infrastructure. PROGNAME must be the name of the kernel
function being kprobed. Refer to the kprobe kernel documentation for more information about
kprobes.
|
---|---|
tracepoint | Hooks PROGFUNC onto a tracepoint. PROGNAME must be
of the format SUBSYSTEM/EVENT . For example, a tracepoint section
for attaching functions to scheduler context switch events would be
SEC("tracepoint/sched/sched_switch") , where sched is
the name of the trace subsystem, and sched_switch is the name
of the trace event. Check the trace events kernel
documentationfor more information about tracepoints.
|
skfilter | Program functions as a networking socket filter. |
schedcls | Program functions as a networking traffic classifier. |
cgroupskb, cgroupsock | Program runs whenever processes in a CGroup create an AF_INET or AF_INET6 socket. |
Additional types can be found in the Loader source code.
For example, the following myschedtp.c
program adds information about the
latest task PID that has run on a particular CPU. This program achieves its goal
by creating a map and defining a tp_sched_switch
function which can be
attached to the sched:sched_switch
trace event. For more information, see
Attaching programs to tracepoints.
#include <linux/bpf.h> #include <stdbool.h> #include <stdint.h> #include <bpf_helpers.h> DEFINE_BPF_MAP(cpu_pid_map, ARRAY, int, uint32_t, 1024); struct switch_args { unsigned long long ignore; char prev_comm[16]; int prev_pid; int prev_prio; long long prev_state; char next_comm[16]; int next_pid; int next_prio; }; DEFINE_BPF_PROG("tracepoint/sched/sched_switch", AID_ROOT, AID_SYSTEM, tp_sched_switch) (struct switch_args *args) { int key; uint32_t val; key = bpf_get_smp_processor_id(); val = args->next_pid; bpf_cpu_pid_map_update_elem(&key, &val, BPF_ANY); return 1; // return 1 to avoid blocking simpleperf from receiving events } LICENSE("GPL");
The LICENSE macro is used to verify if the program is compatible with the
kernel's license when the program makes use of BPF helper functions provided by
the kernel. Specify the name of your program's license in string form, such as
LICENSE("GPL")
or LICENSE("Apache 2.0")
.
Format of the Android.bp file
For the Android build system to build an eBPF .c
program, you must
create an entry in the Android.bp
file of the project. For example, to
build an eBPF C program named bpf_test.c
, make the following
entry in your project's Android.bp
file:
bpf { name: "bpf_test.o", srcs: ["bpf_test.c"], cflags: [ "-Wall", "-Werror", ], }
This entry compiles the C program resulting in the object
/system/etc/bpf/bpf_test.o
. On boot, the Android system automatically loads
the bpf_test.o
program into the kernel.
Files available in sysfs
During boot, the Android system automatically loads all the eBPF objects from
/system/etc/bpf/
, creates the maps that the program needs, and pins the loaded
program with its maps to the BPF file system. These files can then be used for
further interaction with the eBPF program or reading maps. This section
describes the conventions used for naming these files and their locations in
sysfs.
The following files are created and pinned:
For any programs loaded, assuming
PROGNAME
is the name of the program andFILENAME
is the name of the eBPF C file, the Android loader creates and pins each program at/sys/fs/bpf/prog_FILENAME_PROGTYPE_PROGNAME
.For example, for the previous
sched_switch
tracepoint example inmyschedtp.c
, a program file is created and pinned to/sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch
.For any maps created, assuming
MAPNAME
is the name of the map andFILENAME
is the name of the eBPF C file, the Android loader creates and pins each map to/sys/fs/bpf/map_FILENAME_MAPNAME
.For example, for the previous
sched_switch
tracepoint example inmyschedtp.c
, a map file is created and pinned to/sys/fs/bpf/map_myschedtp_cpu_pid_map
.bpf_obj_get()
in the Android BPF library returns a file descriptor from the pinned/sys/fs/bpf
file. This file descriptor can be used for further operations, such as reading maps or attaching a program to a tracepoint.
Android BPF library
The Android BPF library is named libbpf_android.so
and is part of the system
image. This library provides the user with low-level eBPF capabilities needed
for creating and reading maps, creating probes, tracepoints, and perf buffers.
Attach programs to tracepoints
Tracepoint programs are loaded automatically at boot. After loading, the tracepoint program must be activated using these steps:
- Call
bpf_obj_get()
to obtain the programfd
from the pinned file's location. For more information, refer to the Files available in sysfs. - Call
bpf_attach_tracepoint()
in the BPF library, passing it the programfd
and the tracepoint name.
The following code sample shows how to to attach the sched_switch
tracepoint
defined in the previous myschedtp.c
source file (error checking isn't shown):
char *tp_prog_path = "/sys/fs/bpf/prog_myschedtp_tracepoint_sched_sched_switch"; char *tp_map_path = "/sys/fs/bpf/map_myschedtp_cpu_pid"; // Attach tracepoint and wait for 4 seconds int mProgFd = bpf_obj_get(tp_prog_path); int mMapFd = bpf_obj_get(tp_map_path); int ret = bpf_attach_tracepoint(mProgFd, "sched", "sched_switch"); sleep(4); // Read the map to find the last PID that ran on CPU 0 android::bpf::BpfMap<int, int> myMap(mMapFd); printf("last PID running on CPU %d is %d\n", 0, myMap.readValue(0));
Read from the maps
BPF maps support arbitrary complex key and value structures or types. The
Android BPF library includes an android::BpfMap
class that makes use of C++
templates to instantiate BpfMap
based on the key and value type for the
map in question. The previous code sample demonstrates using a BpfMap
with key
and value as integers. The integers can also be arbitrary structures.
Thus the templatized BpfMap
class lets you define a custom BpfMap
object suitable for the particular map. The map can then be accessed using the
custom-generated functions, which are type aware, resulting in cleaner code.
For more information about BpfMap
, refer to the
Android sources.
Debug issues
During boot time, several messages related to BPF loading are logged. If the
loading process fails for any reason, a detailed log message is provided
in logcat. Filtering the logcat logs by bpf
prints all the messages and
any detailed errors during load time, such as eBPF verifier errors.
Examples of eBPF in Android
The following programs in AOSP provide additional examples of using eBPF:
The
netd
eBPF C program is used by the networking daemon (netd) in Android for various purposes such as socket filtering and statistics gathering. To see how this program is used, check the eBPF traffic monitor sources.The
time_in_state
eBPF C program calculates the amount of time an Android app spends at different CPU frequencies, which is used to calculate power.In Android 12, the
gpu_mem
eBPF C program tracks total GPU memory usage for each process and for the entire system. This program is used for GPU memory profiling.