Seamless file descriptor transfer between processes with pidfd and pidfd_getfd
A while ago, I wrote about how file descriptors can be transferred over Unix Domain Sockets between processes, when a parent child relationship doesn’t exist between the two processes. One of the use cases for file descriptor transfer between processes is during deployment of network proxies that handle ingress traffic. However, the APIs offered by the kernel for file descriptor transfer between processes have been awkward to use and riddled with a number of gotchas.
On newer versions of Linux (5.6 and above), a far better API exists to achieve the aforementioned goal.
A Primer on Processes
A running instance of a program is called a
process. Processes are referred to using a process ID (
pid), which is an arbitrary number chosen by the kernel, usually limited to 32768, but on some distros, the max limit is 2²². The
init system (
launchd on MacOS,
systemd on some of the more recent Linux distros) is assigned
The Problem with PIDs
One of the problems with process IDs has been the fact that they aren’t unique.
Let’s assume there’s a process X with pid 19448. Let’s also assume that there exists another process in the system, process Y, that is communicating with process X by referring to its pid (such as signalling pid 19448).
If now process X were to terminate, the same pid 19448 might be reissued by the kernel to another, newer process Z. This is called pid recycling.
At this point, if process Y signals pid 19448, it’s process Z that’ll get the signal, not process X that was initially assigned pid 19448.
This problem isn’t limited to signals. It applies to any API/system call that works with pids. Common examples include
pkill and more.
It’s probably worth mentioning here that this is a solved problem in other operating systems, most notably FreeBSD with procdesc.
What is pidfd?
Unlike process ID which is any random integer assigned by the kernel, a
pidfd is a persistent file descriptor that refers to another process. As with all file descriptors, pidfds are private to the process that has requested for the file descriptor.
pidfd_open system call will allow process Y to get a file descriptor referring to process X. Another way to get this file descriptor is from
/proc/pid_id. Yet another way to get is by setting the
CLONE_PIDFD flag on the
clone(2) system call.
Once process Y has a
pidfd referring to process X, it can use the
pidfd_send_signal system call to send a signal to process X. If process X has already terminated, the
pidfd_send_signal call will fail with the error ESRCH.
So, a process can get a file descriptor referring to another process with the
pidfd_open system call. But this won’t help solve the file descriptor transfer problem, where one process can transfer/send its file descriptors over to another proceess.
/proc/pid/fd in theory lists all files a process has access to, it doesn’t list file descriptors referring to pipes (
S_IFIFO), sockets (
S_IFSOCK), or other objects that do not appear in the filesystem hierarchy.
In 2020, on Linux versions 5.6 and above, a new system call was added to Linux that’ll enable a process to obtain a duplicate of a file descriptor of another process referred to by a
pidfd with the
pidfd_getfd system call. Both the file descriptor and its duplicate share the file status flags and file offset. This applies to all kinds of files, including socket files. Operations on the socket (such as
recvmsg()) can be performed via the duplicate file descriptor.
Effectively, this single system call obviates the incredibly unintuitive and error-prone APIs for file descriptor transfer between processes over Unix Domain Sockets as described in my previous post.
The calling process must have the ability to call
ptrace (or to be more specific, the
PTRACE_MODE_ATTACH_REALCREDS access mode check, which governs the permission to read from or write to another process) on the target process from which it wants to get duplicate copies of file descriptors.
For another, more security focused use case of
pidfd_getfd, the post Seccomp Notify — New Frontiers in Unprivileged Container Development by Christian Brauner makes for really fun and informative reading.