SUSE Labs Conference 2019 - Live debugging with ftrace

Published: 19 September 2019
on channel: SUSE Labs

445

None

ftrace is useful for data gathering and debugging on runnning system.

There's two methods of enabling function tracing in Linux on x86. One is
with just "gcc -pg" and the other is "gcc -pg -mfentry". The former will use
calls to a special function "mcount" after the frame is set up in all C
functions. The latter will add calls to a special function called "fentry"
as the very first instruction of all C functions.

At compile time, there is a check to see if gcc supports, -mfentry, and if
it does, it will use that, because it is more versatile and less error prone
for function tracing.

Starting with v4.19, the minimum gcc supported to build the Linux kernel,
was raised to version 4.6. That also happens to be the first gcc version to
support -mfentry. Since on x86, using gcc versions from 4.6 and beyond will
unconditionally enable the -mfentry, it will no longer use mcount as the
method for inserting calls into the C functions of the kernel. This means
that there is no point in continuing to maintain mcount in x86.

main tracers are:
function is enabled by default. traces function entry.
function_graph - prints functions graph. traces function on entry and exist.
nop - traces nothing

the function tracer format output:

[tracing]# echo function > current_tracer
[tracing]# cat trace
-0 [000] 1726568.996435: hrtimer_get_next_event <-get_next_timer_interrupt
-0 [000] 1726568.996436: _spin_lock_irqsave <-hrtimer_get_next_event
-0 [000] 1726568.996436: _spin_unlock_irqrestore <-hrtimer_get_next_event
-0 [000] 1726568.996437: rcu_needs_cpu <-tick_nohz_stop_sched_tick
-0 [000] 1726568.996438: enter_idle <-cpu_idle
...

The above shows you the process name (), PID (0) the CPU that the trace executed on ([000]), a time-stamp in seconds with the decimal places down to microseconds (1726568.996435) the function being traced (hrtimer_get_next_event) and the parent that called that function (get_next_timer_interrupt).

Self-tracing:

To trace self and see the functions sequence the following ftrace settings have to be applied:

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing

sysctl kernel.ftrace_enabled=1
echo 10000 > ${dir}/buffer_size_kb
echo function_graph > ${dir}/current_tracer
echo 1 > ${dir}/tracing_on
echo $$ > ${dir}/set_ftrace_pid
$1
echo 0 > ${dir}/tracing_on
cat ${dir}/trace > trace

Debugging events with ftrace:

Below in an example of how to use ftrace events. We setup the kswapd event and setup a filter for memory allocations with the order > 2

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing
echo nop > current_tracer
echo 10000 > buffer_size_kb
echo 1 > options/stacktrace
echo 1 > events/kmem/mm_vmscan_wakeup_kswapd/enable
echo "order>2» > events/kmem/mm_vmscan_wakeup_kswapd/filter
echo > trace
echo 1 > tracing_on

When any function is listed in the set_ftrace_filter, only those functions will be traced. This will help the performance of the system when the trace is active. Tracing every function incurs a large overhead, but when using the set_ftrace_filter, only those functions listed in that file will have the NOPs changed to call the tracer. Depending on which functions are being traced, just having a couple of hundred functions enabled is hardly noticeable.

#!/bin/bash

mount -t debugfs nodev /sys/kernel/debug
pushd /sys/kernel/debug/tracing

echo 100000 > buffer_size_kb

echo nop > current_tracer
#echo function > current_tracer
#echo cgroup_mount > set_ftrace_filter
#echo sget >> set_ftrace_filter
#echo deactivate_locked_super >> set_ftrace_filter
#echo deactivate_super >> set_ftrace_filter
#echo grab_super >> set_ftrace_filter
#echo get_active_super >> set_ftrace_filter

#echo ub_fini_cgroup >> set_ftrace_filter
#echo cgroup_kill_sb >> set_ftrace_filter
#echo freeze_super >> set_ftrace_filter
#echo clone_mnt >> set_ftrace_filter

#echo cgroup_diput >> set_ftrace_filter
#echo cgroup_rmdir >> set_ftrace_filter
#echo cgroup_mkdir >> set_ftrace_filter
#echo ub_init_cgroup >> set_ftrace_filter
#echo cgroup_kernel_mount >> set_ftrace_filter
#echo get_beancounter_by_name >> set_ftrace_filter

Debugging with kprobes:

echo 1 > options/stacktrace

Here we can see the function arguments wsing the offset. We can take it for example from the crash utility:

crash> struct super_block.s_active
struct super_block {
[140] atomic_t s_active;
}

echo 'p:super_probe1 deactivate_super sb=%di s_active=+0x8c(%di)' > kprobe_events
echo 'p:super_probe2 deactivate_locked_super sb=%di s_active=+0x8c(%di)' >> kprobe_events

echo 'p:super_probe3 cgroup_cfts_commit sb=+0x0(+0x68(%di)) s_active=+0x8c(+0x0(+0x68(%di)))' >> kprobe_events
#echo 'p:super_probe4 cgroup_dput.isra.25 sb=+0x0(+0xc0(%di)) s_active=+0x8c(+0x0(+0xc0(%di)))' >> kprobe_events
ec