A snapshotting kernel module for fuzzing10 Dec 2021 Tagged: security
Right as the pandemic was starting in March/April 2020, I spent a couple of weekends writing a Loadable Kernel Module (LKM) for Linux, designed to add a syscall which could be used by a fuzzer to quickly restore program state instead of using a conventional fork/exec loop. This was originally suggested on the AFL++ Ideas page, and it nicely intersected a bunch of stuff I’m familiar with so I wanted to take a crack at it.
My implementation can be found in the now archived GitHub repo: https://github.com/kallsyms/snapshot-lkm. It’s deprecated in favor of the AFL++ version, however as of Dec. 2020 that has also been frozen as it’s a significant amount of work to update it for each kernel version, since the module requires hooking some internal kernel functions which change frequently.
My initial work was heavily based on the original kernel patchset from the SSLab at Georgia Tech, which can be found here.
I’d strongly recommend reading the original paper to understand more about how the innerds of the snapshotting work.
To summarize though, the basic idea is to add a new syscall (
snapshot()) which can either snapshot or restore the “important” bits of the current process so a new fuzz case can be run.
This avoids the excessive overhead of a normal
fork(), giving a very nice speed up versus conventional fuzzers.
Understanding the Original Implementation
The first thing I needed to do was actually extract a diff/patch of what the paper implemented.
The main repo is (unfortunately) a full fork of linux, but squashed so we can’t easily
git diff to see what was implemented.
A quick non-git
diff against a freshly-cloned linux v4.8.10 repo quickly fixed that, giving us the patch.
I was surprised at how small it was.
There were only a total of 4 files that had meaningful changes which would affect normal program flow. The rest are either header files, syscall definitions, or the snapshot/restore implementation itself.
Breaking down each major function change:
file.c:dup_fd: when a
files_struct(basically the set of file descriptors opened by a task) is duplicated (e.g. in
fork()), the newly created
files_structneeds to have its snapshot metadata initialized.
exit.c:do_group_exit: when a task exits as part of the entire group going down, snapshot metadata needs to be cleaned up.
exit.c:exit_group: when a task calls
exit(), the snapshot is implicitly restored.
fork.c:dup_mm: when a task’s memory mappings are duplicated, the new
mm_struct’s snapshot metadata needs to be initialized.
memory.c:do_wp_page: when a page fault occurs (when writing to a copy-on-write page), the snapshotting code may have some work to do.
memory.c:do_anonymous_page: when an anonymous (non-file-backed) page is accessed for the first time, the page that was mapped (added to a PTE) needs to be recorded, as the PTE may need to be restored.
With this understanding of what’s needed to “inject” into the kernel, let’s talk a bit about how I went about doing that.
Linux has some crazy built-in tech that very few people know about. One of these is kernel probes, or kprobes. Kprobes are a way for things (be it a superuser in userland using the sysfs interface or another kernel module using the kernel interface) to, well, probe the kernel. You can set probe points on nearly any function in the kernel (even ones not EXPORTed for normal module use), and fetch values from the state at the time the probe is hit. And if you’re using the kernel-land interface (i.e. from a module), you can even overwrite registers (including the instruction pointer!) when your callback fires.
Almost everything in the snapshot process could be written “out-of-band” of the normal kernel functions (meaning it’s just observing what the kernel is doing and tracking state outside of any normal kernel structures), however in one place, the modifications cause a function to return early.
There’s a neat trick you can do with kprobes to emulate this behavior: set the instruction pointer to a stub function which immediately returns.
Since that stub was never actually
called (specifically, since no return instruction pointer was pushed to the stack), when that stub
returns, it will pop off the return IP that the probed function should have returned to, effectively giving us a way to return early.
This will only work if the probe is on the very first instruction of a function (otherwise the stack may have been expanded by the probed function), but this will be the case for us so we’re set.
The docs have a bit more detail about what you actually need to do to achieve this with the kprobe subsystem.
Hooking: syscall table
In addition to the purely-additive things we need to run when certain kernel functions are called, we also need to completely hijack the
exit syscall and add a new syscall entirely to do our snapshotting.
Side note: as the AFL++ devs did in their version, the snapshot operation should probably have been implemented as an
However, since I was treating this as a proof-of-concept and I already needed to do syscall table rewriting for
exit() I figured I might as well do the same for
snapshot(), and chose to overwrite the
tuxcall() syscall since it’s completely unused.
Anyways, to get control over the syscalls we need to overwrite the syscall table which Linux uses to dispatch syscalls to their respective handlers.
If the kernel is “nice” and has the
sys_call_table as a named symbol, we can use that.
In the case it doesn’t though, the quickest way I found to do this is find where in kernel memory the address of the
read() syscall handler is immediately followed by the address of the
write() syscall handler, since those are the first two syscalls. This is implemented in
The only other thing we need to do to hook the syscall table is make sure we make that memory writable before trying to overwrite it. And to do that, I decided to temporarily disable the write protect bit (bit 16) in cr0 instead of messing around with properly making the memory R/W. Again, proof-of-concept code :)
Now, with all of that out of the way, let’s do a quick overview of the module implementation.
Starting at the (logical) top, in
mod_init we grab the address of the syscall table, flip the WP bit in cr0, save the existing handlers, and overwrite the handler pointers with our own.
void **syscall_table = get_syscall_table(); ... _write_cr0(read_cr0() & (~(1 << 16))); orig_sct_snapshot_entry = syscall_table[__NR_snapshot]; orig_sct_exit_group = syscall_table[__NR_exit_group]; syscall_table[__NR_snapshot] = &sys_snapshot; syscall_table[__NR_exit_group] = &sys_exit_group; _write_cr0(read_cr0() | (1 << 16));
Next, we hook the two functions we need (
page_add_new_anon_rmap) with their respective handlers.
This uses a small wrapper I wrote which keeps track of all registered hooks so that we can cleanly tear them all down when the module unloads.
if (!try_hook("do_wp_page", &wp_page_hook)) ... if (!try_hook("page_add_new_anon_rmap", &do_anonymous_hook)) ...
Lastly, we call into the main snapshotting code so it can do some initialization (just grabbing some addresses out of kallsyms).
At this point, we’re initialized, our hooks are installed, and we’re ready for a “snapshot syscall aware” program to run.
From this point down, there’s really very little that was changed from the original patchset.
The only exceptions are:
- When that program calls our snapshot syscall, it hits the handler which in turn dispatches either
recover_snapshot. Those functions are (IIRC) completely unmodified from the original patchset.
- The hooks need to read out of the
pt_regspassed in to grab the arguments that were actually passed to the hooked function (example).
- The one place which requires us to return early overwrites the instruction pointer to a stub function as described above.
Wrapping Things Up
When I originally wrote back to the AFL++ maintainers about this, my implementation did “work”, but only for a few seconds before the kernel would oops. I suspected there was some locking that needs to occur that I wasn’t doing (because it’s always locking bugs), but I went ahead and passed this on to them, laying the groundwork for their (much improved) implementation. With that version working well, they were able to achieve >3x speedup in certain target programs which (if this was a more maintainable strategy) would be a great improvement. As they note in the README however, “due to syscall hooking and the never ending changes in the kernel we are unable to maintain it as we are busy working on libafl.”
Despite not being adopted, this was a very fun project to work on at the end of the day and a strategy that I feel like could be useful to other applications that need to make light modifications to the kernel.