12-11-2015 | Jonathan Robe | Text only version of this article
This article was originaly published in Linux Voice, issue 2, May 2014. This issue is now available under a Creative Commons BY-SA license. In a nutshell: you can modify and share all content from the magazine (apart from adverts), even for commercial purposes, providing you credit Linux Voice as the original source, and retain the same license.
This remix is converted manually to Markdown and HTML for ease of archiving and copy-pasting.
If you like this, please subscribe to Linux Voice. It is an awesome magazine with an awesome team of people. Click here to subscribe from just GBP 38 and get future issues straight to your door or inbox! (DRM Free PDF's and more available).
If you like this website and want to support it AND get $10 Digital Ocean credit (2 months free), use this link to order: https://www.digitalocean.com/?refcode=7435ae6b8212 (referral link).
Other converted Linux Voice articles can be found here.
Enterprise-grade virtualisation on a real kernel.
While Linux containers have been around for a while, they've recently been gaining more recognition as a lightweight alternative to traditional virtualisation products like KVM or VMWare. With the arrival of LXC, Docker, and the next generation of distributions, we're all likely to see a lot more of them over the coming decade.
As with all virtualisation, the idea of containers is to make it easy to run multiple applications on a single host, all the while ensuring each remains separate. This enables the administrator to carefully manage the resources assigned to each application and to ensure that they can't interfere with each other.
What makes containers different to traditional products is that they don't do any hardware emulation. Instead, the applications in question all run directly on top of the host kernel, just like any other process. Separation between the running containers is achieved through the careful use of a number of Linux kernel features.
Control Groups (
cgroups) are the first of these features, and are probably the best known. They provide a mean for administrators to group processes, and all their future children, into hierarchical groups. Various subsystems can then be used to strictly manage the processes and the resources they interact with.
If you have systemd installed, you can quickly inspect what cgroup your processes are running in with the
ps -aeo pid,cgroup,command
Running this, you should see that all processes are running in cgroups that exist in a hierarchy below the systemd cgroup. You could use systemd unit files to manage the resources assigned to a service (indeed, if you're using systemd, this is probably the best way to use cgroups), but you can also interact with cgroups directly, too.
There are a collection of tools available in the
libcgroup-tools package, including
cgcreate, for example. You can use this tool to create a new cgroup as follows:
cgcreate -g memory,cpu:mysql
This will create a new cgroup called
mysql which has been tied to the memory and cpu subsystems. You can then take advantage of a command such as
cgset, or interact directly with the virtual filesystem exposed by cgroups, to manipulate the resource limits of this newly created group:
cgset -r swappiness=xxx /sys/fs/cgroups/memory/ mysql
This command will set the
swappiness parameter of all processes running in the
mysql cgroup to
xxx. To add a process to the cgroup, all you need to do is echo its PID to the tasks file in the cgroup's filesystem or use the
Image 1: The highlighted area shows the cgroup in which the different processes are running. As you can see, all are either in the systemd defaults of
Namespace isolation is the other key technology that makes containers possible on Linux. Each namespace wraps a particular system resource, and makes processes running inside that namespace believe they have their own instance of that resource. There are six namespaces in Linux:
init(PID 1) and allows for easy migration between systems. ) (PID = Process ID)
A quick way to experiment with namespaces yourself is to use the
unshare command. This will run a particular program, removing its connection to a particular namespace of its parent:
sudo unshare -u /bin/bash
This will create a new bash process that doesn't share its parent UTS namespace. If you now set the hostname to
foo, you'll then be able to look, in another shell on the same system, and see that the hostname in the root (original) namespace hasn't changed.
Image 2: The output of this long listing in the
/sys/fs/cgroup directory shows all the different subsystems that are available for managing processes with cgroups on a default Fedora 20 install.
Now that you have an idea of what the underlying technologies do, let's take a look at Linux Containers (
LXC), a userspace interface that brings them together. To install the LXC userspace tools, you need to install the
lxc package on Ubuntu and Fedora, but in the case of the latter, you should also install
lxc-extras for a better experience.
Once that's done, creating a new container, depending on your requirements, can be simple. In the
/usr/share/lxc/templates directory, you'll find a collection of scripts that will create some default containers, including
Ubuntu system containers, and
Alpine application containers. To put one of these to use, all you need to do is run the following command:
lxc-create -n linux-voice -t /usr/share/lxc/templates/busybox --dir /home/jon/containers/linux-voice
-n: sets the name of the container.
-t: says which template you want to use.
--dir: says where you want the rootfs for the new container to be created.
This command creates a directory in
/var/lib/lxc with the name set by the
-n flag. The contents of this directory are populated by the script specified with the
If you look at, say, the
BusyBox template, you'll see that this script sets up a filesystem hierarchy, copies appropriate binaries and installs important pieces of configuration with
heredoc statements. Inside the created directory, you'll also find that a config file has been created. This defines which system resources are to be isolated and controlled by the container.
man lxc.conf command goes in to detail on what options can be put in this file, but a few key examples will be helpful:
lxc.cgroup.cpu.shares = 1234: Sets the share of CPU that the container has.
lxc.utsname = linux-voice: Sets the hostname of the container.
lxc.mount.entry = /lib/home/jon/containers/busybox/lib: Specifies directories on the host filesystem that should be mounted in the container.
This configuration file means you can apply the existing templates in quite flexible ways, but if you really want to create a custom container, you're going to have to set to work creating your own template script.
As the LXC man page says, creating a system container is paradoxically easier than creating an application container.
In the latter case, you have to start by figuring out which resources you want to isolate from the rest of the system, and then figure out how to populate the appropriate parts of the file system etc. In the former case, you simply isolate everything, much simpler.
Once you've created your container with
lxc-create and modified the config file as you see fit, you can start it with the
lxc-start command, use
lxc-console to get a console in it, and shut it down with
While cgroups and namespaces have reached a degree of maturity in Linux, the user experience still has some room for improvement. If you found the
lxc-commands tricky to use, you might want to install
libvirt-sandbox, which will provide a set of scripts and extensions for using LXC through the familiar