Proxmox VE 7 Corosync QDevice in a Docker container
Published: 17-04-2022 | Author: Remy van Elst | Text only version of this article
Table of Contents
At home I have a 2 node Proxmox VE cluster consisting of 2 HP EliteDesk Mini machines, both running with 16 GB RAM and both an NVMe and SATA SSD with ZFS on root (256 GB). It's small enough (physically) and is just enough for my homelab needs specs wise. I have a few services at home, the rest runs on 'regular' virtual private servers online. Proxmox VE, a virtualization stack based on Debian, has support for clustering. Clustering can mean different things to different people, in this case the cluster support means a group of physical servers, but Proxmox also supports actual high-availability with fail over and (live) migration. I also have a small NAS with spinning disks for shared storage via NFS, which is helpful when experimenting with live migration.
My modest two node Proxmox VE cluster, with extra quorum device
For a cluster (in any sense of the word), you need at least 3 nodes, otherwise there is no quorum. Meaning, if one node goes down, it (and the other node) cannot know if the problem is at their side or the other side. With an uneven number of nodes, one node can always ask another node, hey, it is just me or do so see the issue as well? If it receives no reply, it knows it's their problem, if the other node does reply, they know it's the third node that has the problem. Corosync, the cluster software used by Proxmox, supports an external Quorum device. This is a small piece of software running on a third node which provides an extra vote for the quorum (the extra vote) without being a Proxmox VE server. Any cluster with a even number of nodes can get such a split-brain situation, and in my experience, those are bad.
A two node proxmox cluster shuts down all virtual machines and containers in the case of one node failure, the cluster is in read-only mode. So if you power down one of your proxmox servers, the other one is offline as well, even if you've just migrated all machines to that node. With the extra quorum device, you can safely power down one node.
I'm developing a desktop monitoring app, Leaf Node Monitoring, open source, but paid. For Windows, Linux & Android, go check it out.
Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.
You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days.
In my case I wanted to run this on my NAS, since (physical) space is a premium. The NAS supports Docker, this guide explains how to run the QDevice for Proxmox in a Docker container. The docker container is a Debian 11 container with systemd and openssh, since the proxmox QDevice setup requires that. More like an LXC container than a docker container. The NAS itself does not run Debian and has no corosync packages, which is why I resorted to this route.
If you can run an extra debian server, there is no need to go the Docker route, you can just follow the regular guide. There is a qdevice docker image but that guide does not work for Proxmox VE 7 and requires a lot of manual setup. Using my method involves a lot less steps, since you're basically running an extra debian VPS (a container with systemd and openssh).
I've been working with Corosync since 2012, have written a few posts in 2013 so I consider myself experienced in corosync usage. The Proxmox web interface makes clustering very easy.
Proxmox VE cluster QDevice technical explanation
My cluster has 2 nodes, both use ZFS as local storage. The Proxmox Docs explains all the details on running a cluster and the GUI makes it very easy to setup. For this guide I'm assuming you already have the 2 node cluster setup.
Quoting the documentation page with more information on the QDevice:
Corosync External Vote Support
This section describes a way to deploy an external voter in a Proxmox VE cluster. When configured, the cluster can sustain more node failures without violating safety properties of the cluster communication.
For this to work, there are two services involved:
QDevice daemonwhich runs on each Proxmox VE node
- An external vote daemon which runs on an independent server
As a result, you can achieve higher availability, even in smaller setups (for example 2+1 nodes).
QDevice Technical Overview
The Corosync Quorum Device (
QDevice) is a daemon which runs on each cluster
node. It provides a configured number of votes to the cluster's quorum
subsystem, based on an externally running third-party arbitrator's decision.
Its primary use is to allow a cluster to sustain more node failures than
standard quorum rules allow. This can be done safely as the external device
can see all nodes and thus choose only one set of nodes to give its vote.
This will only be done if said set of nodes can have quorum (again) after
receiving the third-party vote.
QDevice Net is supported as a third-party arbitrator. This is
a daemon which provides a vote to a cluster partition, if it can reach the
partition members over the network. It will only give votes to one partition
of a cluster at any time. It's designed to support multiple clusters and is
almost configuration and state free. New clusters are handled dynamically and
no configuration file is needed on the host running a
The only requirements for the external host are that it needs network access
to the cluster and to have a
corosync-qnetd package available. We provide a
package for Debian based hosts, and other Linux distributions should also
have a package available through their respective package manager. Note In
contrast to corosync itself, a
QDevice connects to the cluster over TCP/IP.
The daemon may even run outside of the cluster's LAN and can have longer
latencies than 2 ms.
QDevices for clusters with an even number of nodes and recommend it
for 2 node clusters, if they should provide higher availability. For clusters
with an odd node count, we currently discourage the use of QDevices. The
reason for this is the difference in the votes which the
QDevice provides for
each cluster type. Even numbered clusters get a single additional vote, which
only increases availability, because if the
QDevice itself fails, you are in
the same position as with no
QDevice at all.
On the other hand, with an odd numbered cluster size, the
(N-1) votes, where N corresponds to the cluster node count. This alternative
behavior makes sense; if it had only one additional vote, the cluster could
get into a split-brain situation. This algorithm allows for all nodes but
one (and naturally the
QDevice itself) to fail. However, there are two
drawbacks to this:
If the QNet daemon itself fails, no other node may fail or the cluster immediately loses quorum. For example, in a cluster with 15 nodes, 7 could fail before the cluster becomes inquorate. But, if a QDevice is configured here and it itself fails, no single node of the 15 may fail. The QDevice acts almost as a single point of failure in this case.
The fact that all but one node plus
QDevicemay fail sounds promising at first, but this may result in a mass recovery of HA services, which could overload the single remaining node. Furthermore, a Ceph server will stop providing services if only
((N-1)/2)nodes or less remain online.
Proxmox VE host setup, part 1
On all your Proxmox VE 7 machines you must manually install an extra package:
apt install corosync-qdevice
Again, I'm assuming you have your cluster configured already. I'm also using different VLAN's (extra USB 3 gigabit nic's) for the different networks, but I'm leaving that outside of the scope of this guide to keep it simple.
After installing the package we can continue on configuring our Docker
container. When the docker container is running, we finish of the setup
by configuring the new
qdevice from one of our Proxmox VE hosts.
Docker container setup
The Docker container cannot run on one of your proxmox servers or inside a VM on proxmox. You must run it on a different, external server. (You technically could run it on proxmox, but that defeats the whole point.)
I'm using my NAS, which supports Docker. The container image is based on
this image, which builds a debian image with OpenSSH and systemd. Mine
is simplified to only run Debian Bullseye (11). As opposed to the example
CAP_SYS_ADMIN is not required, the systemd version in Debian 11
is recent enough. The Docker image also installs the
changing the permissions on the
/etc/corosync folder to the correct user.
The container should automatically start at boot of the docker host, but if you want to run it on another host, copy the data volume folder and start it there. It will then have all the config files required.
Create a folder on the Docker host where the Corosync container will store it's corosync cluster config:
mkdir -p /volume1/docker/qnetd/corosync-data
This can be any path you like, the container will mount it as a volume.
Navigate to the folder one level above, where I'll store the Dockerfile and some other info:
On my docker host, all named containers get a folder under
/volume1/docker/$container-name, with subfolders for each volume.
Place the following contents inside:
FROM debian:bullseye RUN echo 'debconf debconf/frontend select teletype' | debconf-set-selections RUN apt-get update RUN apt-get dist-upgrade -qy RUN apt-get install -qy --no-install-recommends systemd systemd-sysv corosync-qnetd openssh-server RUN apt-get clean RUN rm -rf /var/lib/apt/lists/* /var/log/alternatives.log /var/log/apt/history.log /var/log/apt/term.log /var/log/dpkg.log RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config RUN echo 'root:password' | chpasswd RUN chown -R coroqnetd:coroqnetd /etc/corosync/ RUN systemctl mask -- dev-hugepages.mount sys-fs-fuse-connections.mount RUN rm -f /etc/machine-id /var/lib/dbus/machine-id FROM debian:bullseye COPY --from=0 / / ENV container docker STOPSIGNAL SIGRTMIN+3 VOLUME [ "/sys/fs/cgroup", "/run", "/run/lock", "/tmp" ] CMD [ "/sbin/init" ]
password part in the line
'root:password' to a secure password which will be
used for SSH root login.
Build the docker image:
docker build . -t debian-qdevice
-t flag will give the image a recognizable name. You can now
start a container based on that image:
docker run -d -it \ --name qnetd \ --net=macvlan \ --ip=192.0.2.20 \ -v /volume1/docker/qnetd/corosync-data:/etc/corosync \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ --restart=always \ debian-qdevice:latest
I'm using macvlan to give my containers their own IP address. The
Proxmox QDevice Setup code does not show support for a different SSH
port, which is why I gave the container it's own IP. You could put a
name/port combo in your
~/.ssh/config file on the Proxmox hosts like so:
Host 192.0.2.19 HostName 192.0.2.19 Port 2222 User root
Then you can start the container without
--net=macvlan --ip=192... but you must
specify the ports used by corosync and SSH:
-p 5403:5403 -p 2222:22. The setup
command later on can use the IP of the docker host, but be extra careful that you
have setup your SSH config file correctly.
192.0.2.19 is the example IP
of my docker host, not of the docker container. You could unexpectedly run
commands as root on your Docker host.
The start command will print out a UUID of the new container, you can check
if it's running with the
docker ps command:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 45d861ce6acf debian-qdevice:latest "/sbin/init" About a minute ago Up About a minute qnetd d07f990f245e pihole/pihole:latest "/s6-init" 8 days ago Up 8 days (healthy) pihole2
Test if you can SSH into the container with the username
root and your
password. If SSH works, continue on with the guide.
Proxmox VE host setup, part 2
This part has to be done on one of your Proxmox VE servers, it is synced automatically to the other servers. Execute the following command:
pvecm qdevice setup 192.0.2.20
You're asked for the root password:
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" /bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys email@example.com's password:
The rest of the output is all setup of corosync:
Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'firstname.lastname@example.org'" and check to make sure that only the key(s) you wanted were added. INFO: initializing qnetd server Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db INFO: copying CA cert and initializing on all nodes node 'pve1': Creating /etc/corosync/qdevice/net/nssdb password file contains no data node 'pve1': Creating new key and cert db node 'pve1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt node 'pve1': Importing CA node 'pve2': Creating /etc/corosync/qdevice/net/nssdb password file contains no data node 'pve2': Creating new key and cert db node 'pve2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt node 'pve2': Importing CA INFO: generating cert request Creating new certificate request Generating key. This may take a few moments... Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq INFO: copying exported cert request to qnetd server INFO: sign and export cluster cert Signing cluster certificate Certificate stored in /etc/corosync/qnetd/nssdb/cluster-cluster1.crt INFO: copy exported CRT INFO: import certificate Importing signed cluster certificate Notice: Trust flag u is set automatically if the private key is present. pk12util: PKCS12 EXPORT SUCCESSFUL Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12 INFO: copy and import pk12 cert to all nodes node 'pve1': Importing cluster certificate and key node 'pve1': pk12util: PKCS12 IMPORT SUCCESSFUL node 'pve2': Importing cluster certificate and key node 'pve2': pk12util: PKCS12 IMPORT SUCCESSFUL INFO: add QDevice to cluster configuration INFO: start and enable corosync qdevice daemon on node 'pve1'... Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service. INFO: start and enable corosync qdevice daemon on node 'pve2'... Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service. Reloading corosync.conf... Done
You can check the status of the cluster and quorum device with the following command:
Example output with the Quorum device setup:
Cluster information ------------------- Name: cluster1 Config Version: 7 Transport: knet Secure auth: on Quorum information ------------------ Date: Sun Apr 17 21:31:06 2022 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 0x00000001 Ring ID: 1.98 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 0x00000001 1 A,V,NMW 192.0.2.10 (local) 0x00000002 1 A,V,NMW 192.0.2.11 0x00000000 1 Qdevice
If you have not setup the QDevice correctly, the last part of the output will be different, showing no votes for the QDevice:
Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 2 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 0x00000001 1 A,V,NMW 192.0.2.10 (local) 0x00000002 1 A,V,NMW 192.0.2.11 0x00000000 0 Qdevice (votes 1)
Check the Docker container (login via SSH), you should see
the daemon running via
root@d74eb68b6507:~# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.1 17676 9568 ? Ss 19:30 0:00 /sbin/init coroqne+ 30 0.0 0.2 17692 13572 ? Ss 19:30 0:00 /usr/bin/corosync-qnetd -f root 33 0.0 0.0 4332 2140 pts/0 Ss+ 19:30 0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400 root 34 0.0 0.1 13272 7644 ? Ss 19:30 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups root 97 6.5 0.1 13736 8256 ? Ss 19:34 0:00 \_ sshd: root@pts/1 root 103 2.0 0.0 3964 3420 pts/1 Ss 19:34 0:00 \_ -bash root 106 0.0 0.0 6696 3000 pts/1 R+ 19:34 0:00 \_ ps auxf
At my first attempt of Dockerizing the QDevice, I forgot to set
the correct permissions on the
/etc/corosync folder. The daemon
failed to start. There is no logging inside the container, but after
running the daemon manually as root (which worked) and inspecting the
systemd unit file, the cause of the issue was clear.
The QDevice is not visible inside the web interface as far as I know.Tags: cluster , corosync , docker , high-availability , homelab , kvm , linux , lxc , proxmox , proxmox-ve , qemu , sysadmin , tutorials , virtualization