eBPF Program Using the libbpf Framework

Introduction

In order to make eBPF programs portable across different versions of the Kernel, we have to write our eBPF programs using the CO-RE approach (compile once, run everywhere). CO-RE allows eBPF programs to include information about the layout of data structures they were compiled with, and has a mechanism for adjusting how fields (members) are accessed in the data structure layout. The Linux Kernel source code headers can change between versions of Linux and an eBPF program can include several individual header files, but we can also use bpftool to generate vmlinux.h header file from a running system containing all the data structure information of the Kernel that an eBPF program might need to use.

There are a few libraries that take care of the CO-RE relocation capability, and libbpf being the original C library takes care of this relocation capability.

We use the Clang compiler with the "-g" flag to compile eBPF programs, and Clang includes the CO-RE relocations, derived from the BTF information describing the Kernel data structures. BTF (BPF Type Format) is a format for expressing the layout of data structures and functions signatures; in CO-RE this is used to determine the differences between structures used at compile time and structures present on the system's Kernel during runtime, because data structures might be different when the eBPF program was compiled on a system with a certain Kernel version from the layout of data structures (having same name) available on the system with another Kernel version where we intend to run the eBPF program, that be build earlier on the other system.

When a user space program loads an eBPF program into the Kernel, the CO-RE mechanism requires the bytecode to be adjusted, using the CO-RE relocation information compiled into the object, in order to compensate the differences of data structure layout between data structures that were present when the eBPF program was compiled and data structures that are currently available on the machine where we are running the eBPF program.

Cloning the Lunar Kernel

Clone the Lunar Kernel:

bash
git clone https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/lunar

Search needed NVMe Kernel driver function definitions and data structures.

Analyzing the nvme_submit_user_cmd Function

Now let's see in the Kernel code how does the "nvme_submit_user_cmd" function definition look like, what arguments does it have and what can we extract from these arguments which are passed to the "nvme_submit_user_cmd" function.

We can see in the Kernel source that it has a number of arguments and the second argument is a pointer of type "struct nvme_command":

c
grep -rnI " nvme_submit_user_cmd(" lunar/

lunar/drivers/nvme/host/ioctl.c:141

static int nvme_submit_user_cmd(struct request_queue *q,
                struct nvme_command *cmd, u64 ubuffer,
                unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
                u32 meta_seed, u64 *result, unsigned timeout, bool vec)
{

Let's check what members does the "struct nvme_command" have?

c
grep -rnI "struct nvme_command {" lunar/

lunar/include/linux/nvme.h:1740

struct nvme_command {
        union {
                struct nvme_common_command common;
                struct nvme_rw_command rw;
                struct nvme_identify identify;
                struct nvme_features features;
                struct nvme_create_cq create_cq;
                struct nvme_create_sq create_sq;
                struct nvme_delete_queue delete_queue;
                struct nvme_download_firmware dlfw;
                struct nvme_format_cmd format;
                struct nvme_dsm_cmd dsm;
                struct nvme_write_zeroes_cmd write_zeroes;
                struct nvme_zone_mgmt_send_cmd zms;
                struct nvme_zone_mgmt_recv_cmd zmr;
                struct nvme_abort_cmd abort;
                struct nvme_get_log_page_command get_log_page;
                struct nvmf_common_command fabrics;
                struct nvmf_connect_command connect;
                struct nvmf_property_set_command prop_set;
                struct nvmf_property_get_command prop_get;
                struct nvmf_auth_common_command auth_common;
                struct nvmf_auth_send_command auth_send;
                struct nvmf_auth_receive_command auth_receive;
                struct nvme_dbbuf dbbuf;
                struct nvme_directive_cmd directive;
        };
};

Within "struct nvme_command" we would like to access the "struct nvme_common_command common" member. Now let's check the layout of the "struct nvme_common_command" data structure in the Kernel source code:

c
grep -rnI "struct nvme_common_command {" lunar/

lunar/include/linux/nvme.h:907

struct nvme_common_command {
        __u8                    opcode;
        __u8                    flags;
        __u16                   command_id;
        __le32                  nsid;
        __le32                  cdw2[2];
        __le64                  metadata;
        union nvme_data_ptr     dptr;
        struct_group(cdws,
        __le32                  cdw10;
        __le32                  cdw11;
        __le32                  cdw12;
        __le32                  cdw13;
        __le32                  cdw14;
        __le32                  cdw15;
        );
};

"struct nvme_common_command" has this "opcode" member plus a number of other members like command_id, nsid (Namespace ID), cdw10 (Command Dword 10 is an NVMe command specific field) which values we want to trace with our eBPF program.

Setting Up the Development Environment

Clone the libbpf-bootstrap repository and submodules and install dependencies:

bash
sudo apt install clang libelf1 libelf-dev zlib1g-dev
bash
git clone --recurse-submodules https://github.com/libbpf/libbpf-bootstrap

This will also clone the submodule repositories:

text
https://github.com/libbpf/blazesym
https://github.com/libbpf/bpftool
https://github.com/libbpf/libbpf

Create eBPF program development directory and copy here all the utility files and directories from libbpf-bootstrap:

bash
mkdir nvme_ebpf
cd nvme_ebpf
bash
cp -r ../libbpf-bootstrap/{blazesym,bpftool,libbpf} .
cp ../libbpf-bootstrap/examples/c/Makefile .

The eBPF program needs the definitions of any kernel data structures and types that it is going to refer to. BTF-enabled tools like "bpftool" can generate an appropriate header file from the BTF information included in the kernel, and this file is conventionally called vmlinux.h, this vmlinux.h file defines all the kernel's data types:

bash
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

When you compile the source into an eBPF object file, that object will include BTF information that matches the definitions used in this header file. Later, when the program is run on a target machine, the user space program that loads it into the kernel will make adjustments to account for differences between this build-time BTF information and the BTF information for the kernel that's running on that target machine.

Modifying the Makefile

Modify the Makefile you copied from "libbpf-bootstrap/examples/c/Makefile" according to the below patch:

diff
--- ~/libbpf-bootstrap/examples/c/Makefile	2023-09-08 10:52:53.242558117 +0200
+++ Makefile	2023-09-08 11:42:08.759224020 +0200
@@ -1,12 +1,12 @@
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 OUTPUT := .output
 CLANG ?= clang
-LIBBPF_SRC := $(abspath ../../libbpf/src)
-BPFTOOL_SRC := $(abspath ../../bpftool/src)
+LIBBPF_SRC := $(abspath libbpf/src)
+BPFTOOL_SRC := $(abspath bpftool/src)
 LIBBPF_OBJ := $(abspath $(OUTPUT)/libbpf.a)
 BPFTOOL_OUTPUT ?= $(abspath $(OUTPUT)/bpftool)
 BPFTOOL ?= $(BPFTOOL_OUTPUT)/bootstrap/bpftool
-LIBBLAZESYM_SRC := $(abspath ../../blazesym/)
+LIBBLAZESYM_SRC := $(abspath blazesym/)
 LIBBLAZESYM_INC := $(abspath $(LIBBLAZESYM_SRC)/include)
 LIBBLAZESYM_OBJ := $(abspath $(OUTPUT)/libblazesym.a)
 ARCH ?= $(shell uname -m | sed 's/x86_64/x86/' \
@@ -16,15 +16,16 @@
              | sed 's/mips.*/mips/' \
              | sed 's/riscv64/riscv/' \
              | sed 's/loongarch64/loongarch/')
-VMLINUX := ../../vmlinux/$(ARCH)/vmlinux.h
+#VMLINUX := ../../vmlinux/$(ARCH)/vmlinux.h
+VMLINUX := vmlinux.h
 # Use our own libbpf API headers and Linux UAPI headers distributed with
 # libbpf to avoid dependency on system-wide headers, which could be missing or
 # outdated
-INCLUDES := -I$(OUTPUT) -I../../libbpf/include/uapi -I$(dir $(VMLINUX)) -I$(LIBBLAZESYM_INC)
+INCLUDES := -I$(OUTPUT) -Ilibbpf/include/uapi -I$(dir $(VMLINUX)) -I$(LIBBLAZESYM_INC)
 CFLAGS := -g -Wall
 ALL_LDFLAGS := $(LDFLAGS) $(EXTRA_LDFLAGS)
 
-APPS = minimal minimal_legacy bootstrap uprobe kprobe fentry usdt sockfilter tc ksyscall
+APPS = nvme_trace
 
 CARGO ?= $(shell which cargo)
 ifeq ($(strip $(CARGO)),)

The final version of the Makefile will look like this:

makefile
vi Makefile
-----------------------------------------------------------------------------------------------
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
OUTPUT := .output
CLANG ?= clang
LIBBPF_SRC := $(abspath libbpf/src)
BPFTOOL_SRC := $(abspath bpftool/src)
LIBBPF_OBJ := $(abspath $(OUTPUT)/libbpf.a)
BPFTOOL_OUTPUT ?= $(abspath $(OUTPUT)/bpftool)
BPFTOOL ?= $(BPFTOOL_OUTPUT)/bootstrap/bpftool
LIBBLAZESYM_SRC := $(abspath blazesym/)
LIBBLAZESYM_INC := $(abspath $(LIBBLAZESYM_SRC)/include)
LIBBLAZESYM_OBJ := $(abspath $(OUTPUT)/libblazesym.a)
ARCH ?= $(shell uname -m | sed 's/x86_64/x86/' \
             | sed 's/arm.*/arm/' \
             | sed 's/aarch64/arm64/' \
             | sed 's/ppc64le/powerpc/' \
             | sed 's/mips.*/mips/' \
             | sed 's/riscv64/riscv/' \
             | sed 's/loongarch64/loongarch/')
#VMLINUX := ../../vmlinux/$(ARCH)/vmlinux.h
VMLINUX := vmlinux.h
# Use our own libbpf API headers and Linux UAPI headers distributed with
# libbpf to avoid dependency on system-wide headers, which could be missing or
# outdated
INCLUDES := -I$(OUTPUT) -Ilibbpf/include/uapi -I$(dir $(VMLINUX)) -I$(LIBBLAZESYM_INC)
CFLAGS := -g -Wall
ALL_LDFLAGS := $(LDFLAGS) $(EXTRA_LDFLAGS)

APPS = nvme_trace

CARGO ?= $(shell which cargo)
ifeq ($(strip $(CARGO)),)
BZS_APPS :=
else
BZS_APPS := profile
APPS += $(BZS_APPS)
# Required by libblazesym
ALL_LDFLAGS += -lrt -ldl -lpthread -lm
endif

# Get Clang's default includes on this system. We'll explicitly add these dirs
# to the includes list when compiling with `-target bpf` because otherwise some
# architecture-specific dirs will be "missing" on some architectures/distros -
# headers such as asm/types.h, asm/byteorder.h, asm/socket.h, asm/sockios.h,
# sys/cdefs.h etc. might be missing.
#
# Use '-idirafter': Don't interfere with include mechanics except where the
# build would have failed anyways.
CLANG_BPF_SYS_INCLUDES ?= $(shell $(CLANG) -v -E - &1 \
    | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')

ifeq ($(V),1)
    Q =
    msg =
else
    Q = @
    msg = @printf '  %-8s %s%s\n'					\
              "$(1)"						\
              "$(patsubst $(abspath $(OUTPUT))/%,%,$(2))"	\
              "$(if $(3), $(3))";
    MAKEFLAGS += --no-print-directory
endif

define allow-override
  $(if $(or $(findstring environment,$(origin $(1))),\
            $(findstring command line,$(origin $(1)))),,\
    $(eval $(1) = $(2)))
endef

$(call allow-override,CC,$(CROSS_COMPILE)cc)
$(call allow-override,LD,$(CROSS_COMPILE)ld)

.PHONY: all
all: $(APPS)

.PHONY: clean
clean:
    $(call msg,CLEAN)
    $(Q)rm -rf $(OUTPUT) $(APPS)

$(OUTPUT) $(OUTPUT)/libbpf $(BPFTOOL_OUTPUT):
    $(call msg,MKDIR,$@)
    $(Q)mkdir -p $@

# Build libbpf
$(LIBBPF_OBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)/libbpf
    $(call msg,LIB,$@)
    $(Q)$(MAKE) -C $(LIBBPF_SRC) BUILD_STATIC_ONLY=1		      \
            OBJDIR=$(dir $@)/libbpf DESTDIR=$(dir $@)		      \
            INCLUDEDIR= LIBDIR= UAPIDIR=			      \
            install

# Build bpftool
$(BPFTOOL): | $(BPFTOOL_OUTPUT)
    $(call msg,BPFTOOL,$@)
    $(Q)$(MAKE) ARCH= CROSS_COMPILE= OUTPUT=$(BPFTOOL_OUTPUT)/ -C $(BPFTOOL_SRC) bootstrap


$(LIBBLAZESYM_SRC)/target/release/libblazesym.a::
    $(Q)cd $(LIBBLAZESYM_SRC) && $(CARGO) build --release

$(LIBBLAZESYM_OBJ): $(LIBBLAZESYM_SRC)/target/release/libblazesym.a | $(OUTPUT)
    $(call msg,LIB, $@)
    $(Q)cp $(LIBBLAZESYM_SRC)/target/release/libblazesym.a $@

# Build BPF code
$(OUTPUT)/%.bpf.o: %.bpf.c $(LIBBPF_OBJ) $(wildcard %.h) $(VMLINUX) | $(OUTPUT) $(BPFTOOL)
    $(call msg,BPF,$@)
    $(Q)$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH)		      \
             $(INCLUDES) $(CLANG_BPF_SYS_INCLUDES)		      \
             -c $(filter %.c,$^) -o $(patsubst %.bpf.o,%.tmp.bpf.o,$@)
    $(Q)$(BPFTOOL) gen object $@ $(patsubst %.bpf.o,%.tmp.bpf.o,$@)

# Generate BPF skeletons
$(OUTPUT)/%.skel.h: $(OUTPUT)/%.bpf.o | $(OUTPUT) $(BPFTOOL)
    $(call msg,GEN-SKEL,$@)
    $(Q)$(BPFTOOL) gen skeleton $< > $@

# Build user-space code
$(patsubst %,$(OUTPUT)/%.o,$(APPS)): %.o: %.skel.h

$(OUTPUT)/%.o: %.c $(wildcard %.h) | $(OUTPUT)
    $(call msg,CC,$@)
    $(Q)$(CC) $(CFLAGS) $(INCLUDES) -c $(filter %.c,$^) -o $@

$(patsubst %,$(OUTPUT)/%.o,$(BZS_APPS)): $(LIBBLAZESYM_OBJ)

$(BZS_APPS): $(LIBBLAZESYM_OBJ)

# Build application binary
$(APPS): %: $(OUTPUT)/%.o $(LIBBPF_OBJ) | $(OUTPUT)
    $(call msg,BINARY,$@)
    $(Q)$(CC) $(CFLAGS) $^ $(ALL_LDFLAGS) -lelf -lz -o $@

# delete failed targets
.DELETE_ON_ERROR:

# keep intermediate (.skel.h, .bpf.o, etc) targets
.SECONDARY:
-----------------------------------------------------------------------------------------------

Writing the User Space Application

Now write the C application which will load the eBPF program into the Kernel:

c
vi nvme_trace.c
-----------------------------------------------------------------------------------------------
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "nvme_trace.skel.h"

static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
{
    return vfprintf(stderr, format, args);
}

static volatile sig_atomic_t stop;

static void sig_int(int signo)
{
    stop = 1;
}

int main(int argc, char **argv)
{
    struct nvme_trace_bpf *skel;
    int err;

    /* Set up libbpf errors and debug info callback */
    libbpf_set_print(libbpf_print_fn);

    /* Open load and verify BPF application */
    skel = nvme_trace_bpf__open_and_load();
    if (!skel) {
        fprintf(stderr, "Failed to open BPF skeleton\n");
        return 1;
    }

    /* Attach tracepoint handler */
    err = nvme_trace_bpf__attach(skel);
    if (err) {
        fprintf(stderr, "Failed to attach BPF skeleton\n");
        goto cleanup;
    }

    if (signal(SIGINT, sig_int) == SIG_ERR) {
        fprintf(stderr, "can't set signal handler: %s\n", strerror(errno));
        goto cleanup;
    }

    printf("Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` "
           "to see output of the BPF programs.\n");

    while (!stop) {
        fprintf(stderr, ".");
        sleep(1);
    }

cleanup:
    nvme_trace_bpf__destroy(skel);
    return -err;
}
-----------------------------------------------------------------------------------------------

Writing the eBPF Program

Now write the eBPF program:

c
vi nvme_trace.bpf.c
-----------------------------------------------------------------------------------------------
#include "vmlinux.h"
#include 
#include 
#include 
#include 
#include "nvme_trace.h"

char LICENSE[] SEC("license") = "Dual BSD/GPL";

SEC("kprobe/nvme_submit_user_cmd")
int BPF_KPROBE(do_nvme_submit_user_cmd, void *q, struct nvme_command *cmd)
{
    pid_t pid;
    char comm[16];
    __u8  opcode;
    __u16 command_id;
    __le32 nsid;
    __le32 cdw10;

    pid = bpf_get_current_pid_tgid() >> 32;

    bpf_get_current_comm(&comm, sizeof(comm));
 
    opcode = BPF_CORE_READ(cmd, common.opcode);
    command_id = BPF_CORE_READ(cmd, common.command_id);
    nsid = BPF_CORE_READ(cmd, common.nsid);
    cdw10 = BPF_CORE_READ(cmd, common.cdws.cdw10);

    /*
    // __________ALTERNATIVE____________
    struct nvme_common_command common = {};
    bpf_core_read(&common, sizeof(common), &cmd->common);
    bpf_core_read(&opcode, sizeof(opcode), &common.opcode);
    bpf_core_read(&command_id, sizeof(command_id), &common.command_id);
    bpf_core_read(&nsid, sizeof(nsid), &common.nsid);
    bpf_core_read(&cdw10, sizeof(cdw10), &common.cdws.cdw10);
    */

    bpf_printk("KPROBE ENTRY pid = %d, comm = %s, opcode = %x, command_id = %x, nsid = %x, cdw10 = %x", 
               pid, comm, opcode, command_id, nsid, cdw10);

    return 0;
}
-----------------------------------------------------------------------------------------------

Creating Additional Header File

Also create this extra header file which will include some of the needed NVMe Kernel driver data structure declarations that we extracted from the Kernel source:

c
vi nvme_trace.h
-----------------------------------------------------------------------------------------------
#define struct_group(NAME, MEMBERS...)  \
        __struct_group(/* no tag */, NAME, /* no attrs */, MEMBERS)


struct nvme_sgl_desc {
        __le64  addr;
        __le32  length;
        __u8    rsvd[3];
        __u8    type;
} __attribute__((preserve_access_index));

struct nvme_keyed_sgl_desc {
        __le64  addr;
        __u8    length[3];
        __u8    key[4];
        __u8    type;
} __attribute__((preserve_access_index));

union nvme_data_ptr {
        struct {
                __le64  prp1;
                __le64  prp2;
        };
        struct nvme_sgl_desc    sgl;
        struct nvme_keyed_sgl_desc ksgl;
} __attribute__((preserve_access_index));

struct nvme_common_command {
        __u8                    opcode;
        __u8                    flags;
        __u16                   command_id;
        __le32                  nsid;
        __le32                  cdw2[2];
        __le64                  metadata;
        union nvme_data_ptr     dptr;
        struct_group(cdws,
        __le32                  cdw10;
        __le32                  cdw11;
        __le32                  cdw12;
        __le32                  cdw13;
        __le32                  cdw14;
        __le32                  cdw15;
        ) __attribute__((preserve_access_index));
} __attribute__((preserve_access_index));


struct nvme_command {
    union {
        struct nvme_common_command common;
    };
} __attribute__((preserve_access_index));
-----------------------------------------------------------------------------------------------

Directory Structure

At this moment this is the content of our eBPF development directory will look as shown below:

bash
~/nvme_ebpf$ ls
blazesym  bpftool  libbpf  Makefile  nvme_trace.bpf.c  nvme_trace.c  vmlinux.h

Compiling the Program

Now lets compile the eBPF program and userspace loading program with "make all":

bash
make all

  MKDIR    .output
  MKDIR    .output/libbpf
  LIB      libbpf.a
  MKDIR    /home/zilard/nvme_ebpf/.output//libbpf/staticobjs
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/bpf.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/btf.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/libbpf.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/libbpf_errno.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/netlink.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/nlattr.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/str_error.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/libbpf_probes.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/bpf_prog_linfo.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/btf_dump.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/hashmap.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/ringbuf.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/strset.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/linker.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/gen_loader.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/relo_core.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/usdt.o
  CC       /home/zilard/nvme_ebpf/.output//libbpf/staticobjs/zip.o
  AR       /home/zilard/nvme_ebpf/.output//libbpf/libbpf.a
  INSTALL  bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h bpf_helpers.h bpf_helper_defs.h bpf_tracing.h bpf_endian.h bpf_core_read.h skel_internal.h libbpf_version.h usdt.bpf.h
  INSTALL  /home/zilard/nvme_ebpf/.output//libbpf/libbpf.pc
  INSTALL  /home/zilard/nvme_ebpf/.output//libbpf/libbpf.a 
  MKDIR    bpftool
  BPFTOOL  bpftool/bootstrap/bpftool
...                        libbfd: [ on  ]
...               clang-bpf-co-re: [ on  ]
...                          llvm: [ on  ]
...                        libcap: [ OFF ]
  MKDIR    /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/include/bpf
  INSTALL  /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/include/bpf/hashmap.h
  INSTALL  /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/include/bpf/relo_core.h
  INSTALL  /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/include/bpf/libbpf_internal.h
  MKDIR    /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/
  MKDIR    /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/
  MKDIR    /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/bpf.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/btf.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/libbpf.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/libbpf_errno.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/netlink.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/nlattr.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/str_error.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/libbpf_probes.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/bpf_prog_linfo.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/btf_dump.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/hashmap.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/ringbuf.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/strset.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/linker.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/gen_loader.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/relo_core.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/staticobjs/usdt.o
  AR       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/libbpf/libbpf.a
  INSTALL  bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h bpf_helpers.h bpf_helper_defs.h bpf_tracing.h bpf_endian.h bpf_core_read.h skel_internal.h libbpf_version.h usdt.bpf.h
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/main.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/common.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/json_writer.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/gen.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/btf.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/xlated_dumper.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/btf_dumper.o
  CC       /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/disasm.o
  LINK     /home/zilard/nvme_ebpf/.output/bpftool/bootstrap/bpftool
  BPF      .output/nvme_trace.bpf.o
  GEN-SKEL .output/nvme_trace.skel.h
  CC       .output/nvme_trace.o
  BINARY   nvme_trace

If the compilation was successful we are going to see the "nvme_trace" created in our directory and a hidden directory called ".output":

bash
~/nvme_ebpf$ ls
blazesym  bpftool  libbpf  Makefile  nvme_trace  nvme_trace.bpf.c  nvme_trace.c  vmlinux.h   .output

The eBPF program object file "nvme_trace.bpf.o" is found in the .output hidden folder, and also a bunch of eBPF header files and libbpf object files:

bash
~/nvme_ebpf$ ls -a .output/
.   bpf      libbpf    nvme_trace.bpf.o  nvme_trace.skel.h     pkgconfig
..  bpftool  libbpf.a  nvme_trace.o      nvme_trace.tmp.bpf.o

~/nvme_ebpf$ ls -a .output/pkgconfig/
.  ..  libbpf.pc

~/nvme_ebpf$ ls -a .output/bpf
.                bpf.h              btf.h            libbpf_version.h
..               bpf_helper_defs.h  libbpf_common.h  skel_internal.h
bpf_core_read.h  bpf_helpers.h      libbpf.h         usdt.bpf.h
bpf_endian.h     bpf_tracing.h      libbpf_legacy.h

~/nvme_ebpf$ ls -a .output/libbpf
.  ..  libbpf.a  libbpf.pc  staticobjs

~/nvme_ebpf$ ls -a .output/libbpf/staticobjs/
.                 btf_dump.o    libbpf_errno.o   netlink.o    str_error.o
..                btf.o         libbpf.o         nlattr.o     strset.o
bpf.o             gen_loader.o  libbpf_probes.o  relo_core.o  usdt.o
bpf_prog_linfo.o  hashmap.o     linker.o         ringbuf.o    zip.o

~/nvme_ebpf$ ls -a .output/bpftool/
.  ..  bootstrap

~/nvme_ebpf$ ls -a .output/bpftool/bootstrap/
.        btf_dumper.d  common.o  gen.o          main.d
..       btf_dumper.o  disasm.d  json_writer.d  main.o
bpftool  btf.o         disasm.o  json_writer.o  xlated_dumper.d
btf.d    common.d      gen.d     libbpf         xlated_dumper.o

Running the Program

Now lets run the eBPF loading application which loads the eBPF program into the Kernel:

bash
sudo ./nvme_trace 
libbpf: loading object 'nvme_trace_bpf' from buffer
libbpf: elf: section(2) .symtab, size 168, link 1, flags 0, type=2
libbpf: elf: section(3) kprobe/nvme_submit_user_cmd, size 472, link 0, flags 6, type=1
libbpf: sec 'kprobe/nvme_submit_user_cmd': found program 'do_nvme_submit_user_cmd' at insn offset 0 (0 bytes), code size 59 insns (472 bytes)
libbpf: elf: section(4) license, size 13, link 0, flags 3, type=1
libbpf: license of nvme_trace_bpf is Dual BSD/GPL
libbpf: elf: section(5) .rodata, size 86, link 0, flags 2, type=1
libbpf: elf: section(6) .relkprobe/nvme_submit_user_cmd, size 16, link 2, flags 40, type=9
libbpf: elf: section(7) .BTF, size 2334, link 0, flags 0, type=1
libbpf: elf: section(8) .BTF.ext, size 492, link 0, flags 0, type=1
libbpf: looking for externs among 7 symbols...
libbpf: collected 0 externs total
libbpf: map 'nvme_tra.rodata' (global data): at sec_idx 5, offset 0, flags 80.
libbpf: map 0 is "nvme_tra.rodata"
libbpf: sec '.relkprobe/nvme_submit_user_cmd': collecting relocation for section(3) 'kprobe/nvme_submit_user_cmd'
libbpf: sec '.relkprobe/nvme_submit_user_cmd': relo #0: insn #52 against '.rodata'
libbpf: prog 'do_nvme_submit_user_cmd': found data map 0 (nvme_tra.rodata, sec 5, off 0) for insn 52
libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
libbpf: map 'nvme_tra.rodata': created successfully, fd=4
libbpf: sec 'kprobe/nvme_submit_user_cmd': found 5 CO-RE relocations
libbpf: CO-RE relocating [2] struct pt_regs: found target candidate [184] struct pt_regs in [vmlinux]
libbpf: prog 'do_nvme_submit_user_cmd': relo #0:  [2] struct pt_regs.si (0:13 @ offset 104)
libbpf: prog 'do_nvme_submit_user_cmd': relo #0: matching candidate #0  [184] struct pt_regs.si (0:13 @ offset 104)
libbpf: prog 'do_nvme_submit_user_cmd': relo #0: patched insn #0 (LDX/ST/STX) off 104 -> 104
libbpf: CO-RE relocating [7] struct nvme_command: found target candidate [127897] struct nvme_command in [nvme_core]
libbpf: CO-RE relocating [7] struct nvme_command: found target candidate [127890] struct nvme_command in [nvme]
libbpf: prog 'do_nvme_submit_user_cmd': relo #1:  [7] struct nvme_command.common.opcode (0:0:0:0 @ offset 0)
libbpf: prog 'do_nvme_submit_user_cmd': relo #1: matching candidate #0  [127897] struct nvme_command.common.opcode (0:0:0:0 @ offset 0)
libbpf: prog 'do_nvme_submit_user_cmd': relo #1: matching candidate #1  [127890] struct nvme_command.common.opcode (0:0:0:0 @ offset 0)
libbpf: prog 'do_nvme_submit_user_cmd': relo #1: patched insn #8 (ALU/ALU64) imm 0 -> 0
libbpf: prog 'do_nvme_submit_user_cmd': relo #2:  [7] struct nvme_command.common.command_id (0:0:0:2 @ offset 2)
libbpf: prog 'do_nvme_submit_user_cmd': relo #2: matching candidate #0  [127897] struct nvme_command.common.command_id (0:0:0:2 @ offset 2)
libbpf: prog 'do_nvme_submit_user_cmd': relo #2: matching candidate #1  [127890] struct nvme_command.common.command_id (0:0:0:2 @ offset 2)
libbpf: prog 'do_nvme_submit_user_cmd': relo #2: patched insn #15 (ALU/ALU64) imm 2 -> 2
libbpf: prog 'do_nvme_submit_user_cmd': relo #3:  [7] struct nvme_command.common.nsid (0:0:0:3 @ offset 4)
libbpf: prog 'do_nvme_submit_user_cmd': relo #3: matching candidate #0  [127897] struct nvme_command.common.nsid (0:0:0:3 @ offset 4)
libbpf: prog 'do_nvme_submit_user_cmd': relo #3: matching candidate #1  [127890] struct nvme_command.common.nsid (0:0:0:3 @ offset 4)
libbpf: prog 'do_nvme_submit_user_cmd': relo #3: patched insn #24 (ALU/ALU64) imm 4 -> 4
libbpf: prog 'do_nvme_submit_user_cmd': relo #4:  [7] struct nvme_command.common.cdws.cdw10 (0:0:0:7:1:0 @ offset 40)
libbpf: prog 'do_nvme_submit_user_cmd': relo #4: matching candidate #0  [127897] struct nvme_command.common.cdws.cdw10 (0:0:0:7:1:0 @ offset 40)
libbpf: prog 'do_nvme_submit_user_cmd': relo #4: matching candidate #1  [127890] struct nvme_command.common.cdws.cdw10 (0:0:0:7:1:0 @ offset 40)
libbpf: prog 'do_nvme_submit_user_cmd': relo #4: patched insn #32 (ALU/ALU64) imm 40 -> 40
Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF programs.
...........................................

We're using bpf_printk to print a message in the kernel tracing log, you can find this log in /sys/kernel/debug/tracing/trace_pipe. bpf_printk() helper function in the kernel always sends output to the same predefined pseudofile location: /sys/kernel/debug/tracing/trace_pipe. You need root privileges to access and continue reading the content of this file.

We have to install nvme-cli so we can trigger the invocation of this "nvme_submit_user_cmd" kernel function.

Now let's install nvme-cli in order to send an admin command to one of the NVME SSD devices:

bash
sudo apt install nvme-cli

Testing the Program

As soon as we start executing the nvme-cli tool to list NVMe devices on the system:

bash
sudo nvme list

Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            S4DYNX0R756769       SAMSUNG MZVLB512HBJQ-000L2               1          93.20  GB / 512.11  GB    512   B +  0 B   3L1QEXF7

The "nvme_submit_user_cmd" function is invoked in the NVMe Kernel driver level, and the kprobe attached by our eBPF program will trace the NVMe data structure members that we are hooked onto, opcode, command-id, nsid, cdw10.

This means that the nvme-cli tool triggers the "nvme_submit_user_cmd" nvme kernel driver function 2 times, same opcode, 0x6 (HEX) which in terms of NVMe admin commands it means "Identify":

text
sudo cat /sys/kernel/debug/tracing/trace_pipe

            nvme-4943    [010] d..31  1967.763968: bpf_trace_printk: KPROBE ENTRY pid = 4943, comm = nvme, opcode = 6, command_id = 0, nsid = 1, cdw10 = 0
            nvme-4943    [010] d..31  1967.764521: bpf_trace_printk: KPROBE ENTRY pid = 4943, comm = nvme, opcode = 6, command_id = 0, nsid = 1, cdw10 = 3

Now let's run an NVME admin passthru command to trigger a short device self-test in the NVME SSD:

bash
sudo nvme admin-passthru /dev/nvme0 -n 0x1 --opcode=0x14 --cdw10=0x1 -r
Admin Command Device Self-test is Success and result: 0x00000000

Now our eBPF program and kprobe captures the struct data members and the Python script prints out the following data:

text
sudo cat /sys/kernel/debug/tracing/trace_pipe
...
            nvme-4946    [004] d..31  2026.971492: bpf_trace_printk: KPROBE ENTRY pid = 4946, comm = nvme, opcode = 14, command_id = 0, nsid = 1, cdw10 = 1

Opcode 0x14 (HEX) means "Device Self-test" according to "NVM Express Base Specification Revision 2.0a" "Figure 138: Opcodes for Admin Commands". cdw10 Command Dword 10 is a command specific field. Namespace ID is 0x1.

Related Documents