File Descriptor Transfer over Unix Domain Sockets

Unix Domain Sockets

It’s commonly known that Unix domain sockets allow communication between processes on the same host system. Unix domain sockets are used in many popular systems: HAProxy, Envoy, AWS’s Firecracker virtual machine monitor, Kubernetes, Docker and Istio to name a few.

UDS: A Brief Primer

Like network sockets, Unix domain sockets support both stream and datagram socket types. However, unlike network sockets that take an IP address and a port as the address, a Unix domain socket address takes the form of a pathname. Unlike network sockets, I/O across Unix domain sockets do not involve operations on the underlying device (which makes Unix domain sockets a lot faster compared to network sockets for performing IPC on the same host).

Socket Files != Normal Files

First, the socket file /tmp/uds.sock is marked as a socket. When stat() is applied to this pathname, it returns the value S_IFSOCK in the file-type component of the st_mode field of the stat structure.

root@1fd53621847b:~/uds# ./uds
^C
root@1fd53621847b:~/uds# ls -ls /tmp
total 0
0 srwxr-xr-x 1 root root 0 Aug 5 01:45 uds.sock
root@1fd53621847b:~/uds# stat /tmp/uds.sock
File: /tmp/uds.sock
Size: 0 Blocks: 0 IO Block: 4096 socket
Device: 71h/113d Inode: 1835567 Links: 1
Access: (0755/srwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-08-05 01:45:41.650709000 +0000
Modify: 2020-08-05 01:45:41.650709000 +0000
Change: 2020-08-05 01:45:41.650709000 +0000
Birth: -root@5247072fc542:~/uds# ls -F /tmp
uds.sock=
root@5247072fc542:~/uds#
  • remove() or more commonly, unlink(2) on Linux
struct sockaddr_un {
sa_family_t sun_family; /* Always AF_UNIX */
char sun_path[108]; /* Pathname */
};
struct sockaddr_un {
u_char sun_len;
u_char sun_family;
char sun_path[104];
};

bind(2) will fail when trying to bind to an existing path

The SO_REUSEPORT option allows multiple network sockets on any given host to connect to the same address and the port. The very first socket to try to bind to the given port needs to set the SO_REUSEPORT option, and any subsequent socket can bind to the same port.

int fd = socket(domain, socktype, 0);int optval = 1;
setsockopt(sfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));
bind(sfd, (struct sockaddr *) &addr, addrlen);

SOCKETPAIR(2)

The socketpair() function creates two sockets that are then connected together. In a manner of speaking, this is very similar to pipe, except that it supports bidirectional transfer of data.

Data Transfer over UDS

Now that we’ve established that a Unix domain socket allows communication between two processes on the same host, it’s time to explore what kind of data can be transferred over a Unix domain socket.

File Descriptors vs File Description

Note that I mentioned file descripTION and not file descripTOR. The difference between the two is subtle and isn’t often well understood.

sendmsg and recvmsg

The signature for the sendmsg function call on Linux is the following:

ssize_t sendmsg(
int socket,
const struct msghdr *message,
int flags
);
ssize_t recvmsg(
int sockfd,
const struct msghdr *msg,
int flags
);
struct msghdr {
void *msg_name; /* optional address */
socklen_t msg_namelen; /* size of address */
struct iovec *msg_iov; /* scatter/gather array */
int msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* ancillary data, see below */
socklen_t msg_controllen; /* ancillary data buffer len */
int msg_flags; /* flags on received message */
};
struct cmsghdr {
socklen_t cmsg_len; /* data byte count, including header */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
/* followed by */
unsigned char cmsg_data[];
};

Ancillary Data Transfer

While there are a plethora of gotchas with such data transfer, when used correctly, it can be a pretty powerful mechanism to achieve a number of goals.

  • SCM_CREDENTIALS
  • SCM_SECURITY
struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *msgh);
struct cmsghdr *CMSG_NXTHDR(struct msghdr *
msgh, struct cmsghdr *cmsg);
size_t CMSG_ALIGN(size_t
length);
size_t CMSG_SPACE(size_t
length);
size_t CMSG_LEN(size_t
length);
unsigned char *CMSG_DATA(struct cmsghdr *
cmsg);

SCM_RIGHTS

SCM_RIGHTS allows a process to send or receive a set of open file descriptors from another process using sendmsg.

struct cmsghdr {
socklen_t cmsg_len; /* data byte count, including header */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
/* followed by */
unsigned char cmsg_data[];
};

SCM_RIGHTS Gotchas

As mentioned, there are a number of gotchas when trying to pass ancillary data over Unix domain sockets.

Need to send some “real” data along with the ancillary message

On Linux, at least one byte of “real data” is required to successfully send ancillary data over a Unix domain stream socket.

File Descriptors can be dropped

If the buffer cmsg_data used to receive the ancillary data containing the file descriptors is too small (or is absent), then the ancillary data is truncated (or discarded) and the excess file descriptors are automatically closed in the receiving process.

recvmsg quirks

sendmsg and recvmsg act similar to send and recv system calls, in that there isn’t a 1:1 mapping between every send call and every recv call.

Limit on the number of File Descriptions

The kernel constant SCM_MAX_FD ( 253 (or 255 in kernels before 2.6.38)) defines a limit on the number of file descriptors in the array.

When is it useful to transfer file descriptors?

A very concrete real world use case where this is used is zero downtime proxy reloads.

Conclusion

Transferring file descriptors over a Unix domain socket can prove to be very powerful if used correctly. I hope this post gave you a slightly better understanding of Unix domain sockets and features it enables.

References:

  1. https://www.man7.org/linux/man-pages/man7/unix.7.html
  2. https://blog.cloudflare.com/know-your-scm_rights/
  3. LWN.net has an interesting article on creating cycles when passing file descriptions over a Unix domain socket and implications for the fabulous new io_uring kernel API. https://lwn.net/Articles/779472/
  4. The Linux Programming Interface https://learning.oreilly.com/library/view/the-linux-programming/9781593272203/
  5. UNIX Network Programming: The Sockets Networking API https://learning.oreilly.com/library/view/the-sockets-networking/0131411551/

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cindy Sridharan

Cindy Sridharan

10.9K Followers

@copyconstruct on Twitter. views expressed on this blog are solely mine, not those of present or past employers.