You might need more than
`-Dlog4j2.formatMsgNoLookups`

13 December 2021

Recently, everyone's been very sad to learn that the ubiquitous (in the Java world) log4j2 library will happily process certain “lookups” in the messages (after parameter substitution!), causing all sorts of fun.

The official mitigation advice for the CVE says that:

In previous releases (>2.10) this behavior can be mitigated by setting system property “log4j2.formatMsgNoLookups” to “true”

Unfortunately, this is insufficient: it works in many applications, but not all. It is sufficient for applications whose log4j appender configurations look something like this:

<Appenders>
    <Console name="Console" target="SYSTEM_OUT">
        <PatternLayout pattern="%d{ISO8601}\t%p\t%C{1}\t%m%n"/>
    </Console>
</Appenders>

In this configuration, the system property correctly prevents log4j from formatting lookups found in the message (%m), and all is well.

Unfortunately, it's also common (recommended, even!) to use context lookups in layouts to classify log messages further. The log4j documentation claims that $${ctx:VAR} and %X{VAR} achieve the same result for the pattern layout specifically. Unfortunately, this isn't true: recursive lookups are expanded in VAR in the former case, but not the latter. If your log4j appender pattern uses $${ctx:VAR} with an attacker-controlled variable, the vulnerability remains in any version of log4j <2.16.0.

I have reported this issue to the log4j maintainers.

Mitigations for older log4j

The issue of context lookups means that there is no obvious global setting that old releases of log4j will respect in order to disable message lookups. Because of this, the simplest mitigation for systems which contain old versions of log4j is to ensure that the buggy library is never loaded.

This is quite simple to achieve by wrapping an affected process' syscalls, e.g. via an LD_PRELOADed wrapper (at least if the Java runtime is dynamically linked, which those on my systems are). Unfortunately, it is not always straightforward to keep track of every Java process that might be launched—e.g. if a dynamic orchestration scheme spawns containers.

This could quite easily be resolved by inspecting every call to exec(2) and instrumenting any suspicious Java processes. While the kernel sadly doesn't provide this functionality out of the box, the ftrace framework makes it easy to hook kernel functions, so a kernel module can easily inspect new process creations by adding an ftrace hook to finalize_exec().

Since checking if a process is Java and adding an appropriate LD_PRELOADed shim is nontrivial, and involves a lot of manipulation of userspace memory, it is preferable to accomplish this in userspace. While communicating with a userspace daemon is relatively easy, just passing a pid to it is not ideal, due to both race conditions and complexities in naming processes in a different PID namespace from the daemon (particularly necessary if containerized processes must be inspected). The kernel's pidfd subsystem is ideal for this (and is more generally a long-overdue recognition that linux fds are better thought of as a general mechanism for offering userspace unforgeable refcounted handles to kernel resources, i.e. kernel capabilities), but unfortunately the syscall to create a new pidfd is not EXPORT_SYMBOLed. Consequently, the module that provides this functionality requires CONFIG_KPROBES and CONFIG_KALLSYMS enabled in order to acquire a handle to pidfd_create, and may be somewhat brittle with regards to kernel version updates.

The remainder of the kernel module is fairly pedestrian: in essence, every call to finalize_exec() enqueues the current process to a ring buffer and blocks on a completion (signalled when userspace releases an anon_inode fd created for this purpose). When a userspace client reads from a character device set up by the module, it receives a message containing the fds (newly mapped into its fd space) of the newly-created process's pidfd and completion fd, and can then inspect and manipulate the process as it sees fit.

With these pieces out of the way, a simple userspace client completes the mitigation. It checks if the process is a Java process which should be introspected (for my systems, the check of /proc/pid/exe is sufficient), and then, if it is, ptrace(2)s the new process and injects an LD_PRELOAD environment variable. Since the process has not yet started running, the ELF ABI (this particular code only works on amd64) guarantees the placement of the environment on the stack. One slight wrinkle is that, since the target process may be in a different mount namespace, the filename of the shared object to preload may not be accessible to it. A useful trick in this situation is to spawn a child process which has the file open and then re-associates itself with the target's namespaces, allowing the target process to reference the file via /proc/helperpid/fd/filefd. This does require that the target process has a procfs mountpoint (the specific mountpoint is found via parsing /proc/targetpid/mountinfo), but this is generally likely to be the case, e.g. in containers.

This, along with a bit of Nix to build it, is packaged up into a tarball here.

You might need more than -Dlog4j2.formatMsgNoLookups

Mitigations for older log4j

You might need more than
`-Dlog4j2.formatMsgNoLookups`