Attach a keyboard to a Docker container

2020-12-17

technology , docker

I wanted to connect a keyboard-mouse-combo to a virtual remote desktop running inside a Docker container.

My use case

I have a wall-mounted Android tablet displaying always-on content (“info screen”) in my home:

For security reasons the tablet can’t reach the internet and actually has a VNC viewer that displays a virtual framebuffer from inside a Docker container on my server. Here’s the conceptual overview:

I wanted to attach a keyboard to the “tablet”. The thing is, the tablet’s only USB port holds the charger cable. I could maybe rig up a USB hub, but that’s not pretty. I had an idea: what if I could connect an extra keyboard to my server computer that actually hosts the virtual screen, and logically connect the keyboard to that graphical environment?

The problem

The thing is, the way things usually work is that there’s only one graphical environment inside an OS and all keyboards and mice (= input devices) are automatically “digitally wired up” to that one. Here’s what I mean:

evdev (“Event device”) is Linux’s subsystem for representing input devices. The main graphical environment (X.org) selfishly reads all evdev input devices for their input. I can’t simply “map evdev” inside the container - even if that worked, both graphical environments would receive all input from both keyboards.

What I wanted to do

This didn’t seem to be easily possible at first: (it’s a rare use case, I know)

In my particular case I’m using screen-server for the container and here’s the main components inside that make up its graphical environment:

Xvfb (“X virtual framebuffer”), which is an X11 server process that represents a virtual (= not connected to a physical monitor) screen.
x11vnc (X11 VNC) is like remote desktop for any X11 environment, whether the screen is virtual or also physically connected.

I kept searching if there’s any config option where either Xvfb or x11vnc could grab only a specified input device. I didn’t seem to find any. There’s probably good reasons:

x11vnc is not designed for local input anyway (the input usually comes from remote computers).
Xvfb usual use cases are total automation (like running headless browsers) or like in this case, just a virtual screen whose input comes from outside (like x11vnc in this case).

The solution

evdev has low-level support for our use case

Turns out evdev has this concept of “grabbing”, which means that you can open a handle to a keyboard/mouse(/combination) and grab it for exclusive access. The technical option name is EVIOCGRAB (I can’t seem to find documentation for it).

This means that only you are receiving the input, and everyone else reading that same keyboard will receive nothing. Once you close the keyboard (or your process crashes), the OS un-grabs it and thus cleans up it for you. Very nice :)

Getting comfortable with evdev

evdev represents each input device as a file that regular programs can read from. I can list input devices simply by listing files in a directory:

$ ls /dev/input/by-id
usb-Logitech_USB_Receiver-if01-event-kbd
usb-Logitech_USB_Receiver-if01-event-mouse

Now let’s test reading from a file (you probably need $ sudo prefix):

$ cat /dev/input/by-id/usb-Logitech_USB_Receiver-if01-event-kbd
(now each time you press a key some garbage shows up)

Reading from the input device in grabbed mode

Let’s take some (Go-flavoured) pseudocode. You can open the device just like a regular file. The magic is adding the EVIOCGRAB after you’ve opened the file. It’s usually issued with a function named IOCTL (“I/O control”).

keyboard, err := os.Open("/dev/input/by-id/usb-Logitech_USB_Receiver-if01-event-kbd")
if err != nil {
	panic(err)
}

// 1 means grab, 0 means un-grab
if err := unix.IoctlSetInt(keyboard.Fd(), EVIOCGRAB, 1); err != nil {
	panic(err)
}

If you’re interested, here’s a real-world example.

I’m not sure if there exists a universal solution

Universal solution would have to be done at X11 server level because that’s the one who’s reading the input that gets delivered to the GUI applications. It’d have to be done in one of these ways:

Tell each server which event devices they’d listen to (so they don’t have overlap) OR
Have one server listen to all input (= default config), but the other server listen to only a subset, and in grabbed mode

Grabbing specific input devices is implemented for X.org - there’s evdev support and a GrabDevice option, but it doesn’t seem to be implemented for Xvfb. They’re both X11 servers, but different implementations (for different use cases). Summarized:

Product	Protocol	Use case	Supports input devices	Supports grabbing
X.org	X11	Main graphical environment	yes	yes
Xvfb	X11	Virtualized display	no	no

Input (and grabbing) doesn’t seem to be implemented in Xvfb:

StackOverflow question
Web search for “xvfb keyboard input” yields nothing of interest
Other people seem to be going the same route

But since I was using a highly specialized environment (screen-server), I was ok with adding specific code to do this input device grabbing (and delivering) for my project.

So the rest of this article discusses a DIY-code solution.

Using with screen-server

The user has to map the keyboard inside the container with $ docker run argument --device /dev/input/by-id/usb-Logitech_USB_Receiver-if01-event-kbd. One could also bind the whole /dev/input device tree (= all input devices), but since those can be used to keylog, for security-wise it’s best to only map the required devices.

The code I linked to earlier is only receiving the input events. I had to also code how to react to those as well. I wanted to just “pipe” them inside the X11 server (in my case, Xvfb). Turns out keyboard/mouse handling in X11 is quite complicated, so I used X11’s XTEST extension to pipe the input, because it made everything simpler (I didn’t have to track which window is active to send input to correct window etc.). I found out about XTEST by reading xdotool source code (a tool for sending “fake input”).

The evdev-to-X11-pipe code is here. It’s not (yet) pretty, but it works.

You’ll have to read documentation for screen-server about how this is configured (I’ll not post it here in case it gets outdated), but there’s config option in screen-server to specify which input device to map inside which virtual screen (the software supports multiple screens). It works quite well.

Closing thoughts, recap

To recap, we just “mapped” a physical keyboard inside a virtual GUI environment - but we had to write DIY code for it. As far as I’m aware there isn’t an off-the-shelf way that works with Xvfb.

Linux has its downsides (you have lots to configure and to understand), but it really goes to show you that there’s a facility for many niche needs. Contrast that to Windows, I couldn’t imagine Windows enabling me to have a container with a virtual display and a minimal GUI environment and a specific keyboard inside of it. Sure, you maybe could cobble up something similar, but you really can’t control how all of it’s composed and Microsoft gets to decide when the individual components are removed / stop working.

Linux is pretty rad.