This is a text-only version of the following page on https://raymii.org:
---
Title : Proxmox VE 7 Corosync QDevice in a Docker container
Author : Remy van Elst
Date : 17-04-2022
Last update : 29-01-2024 04:30
URL : https://raymii.org/s/tutorials/Proxmox_VE_7_Corosync_QDevice_in_Docker.html
Format : Markdown/HTML
---
At home I have a 2 node Proxmox VE cluster consisting of 2 HP EliteDesk Mini
machines, both running with 16 GB RAM and both an NVMe and SATA SSD with ZFS
on root (256 GB). It's small enough (physically) and is just enough for my
homelab needs specs wise. I have a few services at home, the rest runs on
'regular' virtual private servers online. Proxmox VE, a virtualization stack
based on Debian, has support for clustering. Clustering can mean different
things to different people, in this case the cluster support means a group
of physical servers, but Proxmox also supports actual high-availability with
fail over and (live) migration. I also have a small NAS with spinning disks
for shared storage via NFS, which is helpful when experimenting with live
migration.
* Update 29-01-2024: The Docker image now contains a health check command
and sets the sticky bit on `/var/run` so corosync-qnetd can create its
runtime directory.
![cluster][8]
> My modest two node Proxmox VE cluster, with extra quorum device
For a cluster (in any sense of the word), you need at least 3 nodes, otherwise
there is no quorum. Meaning, if one node goes down, it (and the other node)
cannot know if the problem is at their side or the other side. With an uneven
number of nodes, one node can always ask another node, hey, it is just me or
do so see the issue as well? If it receives no reply, it knows it's their
problem, if the other node does reply, they know it's the third node that has
the problem. Corosync, the cluster software used by Proxmox, supports an external
Quorum device. This is a small piece of software running on a third node
which provides an extra vote for the quorum (the extra vote) without being a Proxmox VE
server. Any cluster with a even number of nodes can get such a split-brain
situation, and in my experience, those are bad.
A two node proxmox cluster shuts down all virtual machines and containers in
the case of one node failure, the cluster is in read-only mode. So if you
power down one of your proxmox servers, the other one is offline as well,
even if you've just migrated all machines to that node. With the extra quorum
device, you can safely power down one node.
Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:
I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!
Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.
You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $200 credit for 60 days. Spend $25 after your credit expires and I'll get $25!
In my case I wanted to run this on my NAS, since (physical) space is a
premium. The NAS supports Docker, this guide explains how to run the QDevice
for Proxmox in a Docker container. The docker container is a Debian 11
container with systemd and openssh, since the proxmox QDevice setup requires
that. More like an LXC container than a docker container. The NAS itself does
not run Debian and has no corosync packages, which is why I resorted to this
route.
If you can run an extra debian server, there is no need to go the Docker
route, you can just follow the regular guide. There is a [qdevice docker
image][3] but that guide does not work for Proxmox VE 7 and requires a lot of
manual setup. Using my method involves a lot less steps, since you're
basically running an extra debian VPS (a container with systemd and
openssh).
I've been working with Corosync since 2012, have written [a few posts in 2013]
[7] so I consider myself experienced in corosync usage. The Proxmox web
interface makes clustering very easy.
### Proxmox VE cluster QDevice technical explanation
My cluster has 2 nodes, both use ZFS as local storage. The [Proxmox Docs][1]
explains all the details on running a cluster and the GUI makes it very easy
to setup. For this guide I'm assuming you already have the 2 node cluster setup.
Quoting the [documentation page][2] with more information on the QDevice:
#### Corosync External Vote Support
This section describes a way to deploy an external voter in a Proxmox VE
cluster. When configured, the cluster can sustain more node failures without
violating safety properties of the cluster communication.
For this to work, there are two services involved:
- A `QDevice daemon` which runs on each Proxmox VE node
- An external vote daemon which runs on an independent server
As a result, you can achieve higher availability, even in smaller setups
(for example 2+1 nodes).
#### QDevice Technical Overview
The Corosync Quorum Device (`QDevice`) is a daemon which runs on each cluster
node. It provides a configured number of votes to the cluster's quorum
subsystem, based on an externally running third-party arbitrator's decision.
Its primary use is to allow a cluster to sustain more node failures than
standard quorum rules allow. This can be done safely as the external device
can see all nodes and thus choose only one set of nodes to give its vote.
This will only be done if said set of nodes can have quorum (again) after
receiving the third-party vote.
Currently, only `QDevice Net` is supported as a third-party arbitrator. This is
a daemon which provides a vote to a cluster partition, if it can reach the
partition members over the network. It will only give votes to one partition
of a cluster at any time. It's designed to support multiple clusters and is
almost configuration and state free. New clusters are handled dynamically and
no configuration file is needed on the host running a `QDevice`.
The only requirements for the external host are that it needs network access
to the cluster and to have a `corosync-qnetd` package available. We provide a
package for Debian based hosts, and other Linux distributions should also
have a package available through their respective package manager. Note In
contrast to corosync itself, a `QDevice` connects to the cluster over TCP/IP.
The daemon may even run outside of the cluster's LAN and can have longer
latencies than 2 ms.
We support `QDevices` for clusters with an even number of nodes and recommend it
for 2 node clusters, if they should provide higher availability. For clusters
with an odd node count, **we currently discourage the use of QDevices.** The
reason for this is the difference in the votes which the `QDevice` provides for
each cluster type. Even numbered clusters get a single additional vote, which
only increases availability, because if the `QDevice` itself fails, you are in
the same position as with no `QDevice` at all.
On the other hand, with an odd numbered cluster size, the `QDevice` provides
(N-1) votes, where N corresponds to the cluster node count. This alternative
behavior makes sense; if it had only one additional vote, the cluster could
get into a split-brain situation. This algorithm allows for all nodes but
one (and naturally the `QDevice` itself) to fail. However, there are two
drawbacks to this:
- If the QNet daemon itself fails, no other node may fail or the cluster
immediately loses quorum. For example, in a cluster with 15 nodes, 7 could
fail before the cluster becomes inquorate. But, if a QDevice is configured
here and it itself fails, no single node of the 15 may fail. The QDevice acts
almost as a single point of failure in this case.
- The fact that all but one node plus `QDevice` may fail sounds promising at
first, but this may result in a mass recovery of HA services, which could
overload the single remaining node. Furthermore, a Ceph server will stop
providing services if only `((N-1)/2)` nodes or less remain online.
(end quote)
### Proxmox VE host setup, part 1
On all your Proxmox VE 7 machines you must manually install an extra
package:
apt install corosync-qdevice
Again, I'm assuming you have your cluster configured already. I'm also
using different VLAN's (extra USB 3 gigabit nic's) for the different networks,
but I'm leaving that outside of the scope of this guide to keep it simple.
After installing the package we can continue on configuring our Docker
container. When the docker container is running, we finish of the setup
by configuring the new `qdevice` from one of our Proxmox VE hosts.
### Docker container setup
The Docker container cannot run on one of your proxmox servers or inside a
VM on proxmox. You must run it on a different, external server. (You technically
could run it on proxmox, but that defeats the whole point.)
I'm using my NAS, which supports Docker. The container image is based on
[this image][4], which builds a debian image with OpenSSH and systemd. Mine
is simplified to only run Debian Bullseye (11). As opposed to the example
command, `CAP_SYS_ADMIN` is not required, the systemd version in Debian 11
is recent enough. The Docker image also installs the `corosync-qnetd` including
changing the permissions on the `/etc/corosync` folder to the correct user.
The container should automatically start at boot of the docker host,
but if you want to run it on another host, copy the data volume folder
and start it there. It will then have all the config files required.
Create a folder on the Docker host where the Corosync container
will store it's corosync cluster config:
mkdir -p /volume1/docker/qnetd/corosync-data
This can be any path you like, the container will mount it as a volume.
Navigate to the folder one level above, where
I'll store the Dockerfile and some other info:
cd /volume1/docker/qnetd
On my docker host, all named containers get a folder under
`/volume1/docker/$container-name`, with subfolders for each volume.
Create a `Dockerfile`:
vim Dockerfile
Place the following contents inside:
FROM debian:bullseye
RUN echo 'debconf debconf/frontend select teletype' | debconf-set-selections
RUN apt-get update && apt-get dist-upgrade -qy && apt-get install -qy --no-install-recommends systemd systemd-sysv corosync-qnetd openssh-server && apt-get clean && rm -rf /var/lib/apt/lists/* /var/log/alternatives.log /var/log/apt/history.log /var/log/apt/term.log /var/log/dpkg.log
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# Change password to something secure
RUN echo 'root:password' | chpasswd
RUN chmod 1777 /var/run
RUN chown -R coroqnetd:coroqnetd /etc/corosync/
RUN systemctl mask -- dev-hugepages.mount sys-fs-fuse-connections.mount
RUN rm -f /etc/machine-id /var/lib/dbus/machine-id
FROM debian:bullseye
COPY --from=0 / /
ENV container docker
STOPSIGNAL SIGRTMIN+3
VOLUME [ "/sys/fs/cgroup", "/run", "/run/lock", "/tmp" ]
HEALTHCHECK CMD corosync-qnetd-tool -s
EXPOSE 5403
CMD [ "/sbin/init" ]
Change the `password` part in the line `'root:password'` to a secure password which will be
used for SSH root login.
Build the docker image:
docker build . -t debian-qdevice
The `-t` flag will give the image a recognizable name. You can now
start a container based on that image:
docker run -d -it \
--name qnetd \
--net=macvlan \
--ip=192.0.2.20 \
-v /path/to/your/docker/folder/qnetd/corosync-data:/etc/corosync \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--restart=always \
debian-qdevice:latest
I'm using [macvlan][5] to give my containers their own IP address. The
[Proxmox QDevice Setup][6] code does not show support for a different SSH
port, which is why I gave the container it's own IP. You could put a
name/port combo in your `~/.ssh/config` file on the Proxmox hosts like so:
Host 192.0.2.19
HostName 192.0.2.19
Port 2222
User root
Then you can start the container without `--net=macvlan --ip=192...` but you must
specify the ports used by corosync and SSH: `-p 5403:5403 -p 2222:22`. The setup
command later on can use the IP of the docker host, but be extra careful that you
have setup your SSH config file correctly. `192.0.2.19` is the example IP
of my docker host, not of the docker container. You could unexpectedly run
commands as root on your Docker host.
The start command will print out a UUID of the new container, you can check
if it's running with the `docker ps` command:
docker ps
Example output:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
45d861ce6acf debian-qdevice:latest "/sbin/init" About a minute ago Up About a minute qnetd
Test if you can SSH into the container with the username `root` and your
password. If SSH works, continue on with the guide.
### Proxmox VE host setup, part 2
This part has to be done on one of your Proxmox VE servers,
it is synced automatically to the other servers. Execute the
following command:
pvecm qdevice setup 192.0.2.20
You're asked for the root password:
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@192.0.2.20's password:
The rest of the output is all setup of corosync:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@192.0.2.20'"
and check to make sure that only the key(s) you wanted were added.
INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
INFO: copying CA cert and initializing on all nodes
node 'pve1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve1': Creating new key and cert db
node 'pve1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve1': Importing CA
node 'pve2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve2': Creating new key and cert db
node 'pve2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve2': Importing CA
INFO: generating cert request
Creating new certificate request
Generating key. This may take a few moments...
Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq
INFO: copying exported cert request to qnetd server
INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-cluster1.crt
INFO: copy exported CRT
INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12
INFO: copy and import pk12 cert to all nodes
node 'pve1': Importing cluster certificate and key
node 'pve1': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'pve2': Importing cluster certificate and key
node 'pve2': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration
INFO: start and enable corosync qdevice daemon on node 'pve1'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
INFO: start and enable corosync qdevice daemon on node 'pve2'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done
You can check the status of the cluster and quorum device with the following command:
pvecm status
Example output with the Quorum device setup:
Cluster information
-------------------
Name: cluster1
Config Version: 7
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sun Apr 17 21:31:06 2022
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.98
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 192.0.2.10 (local)
0x00000002 1 A,V,NMW 192.0.2.11
0x00000000 1 Qdevice
If you have not setup the QDevice correctly, the last part
of the output will be different, showing no votes for the
QDevice:
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 192.0.2.10 (local)
0x00000002 1 A,V,NMW 192.0.2.11
0x00000000 0 Qdevice (votes 1)
Check the Docker container (login via SSH), you should see
the daemon running via `ps auxf`:
root@d74eb68b6507:~# ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 17676 9568 ? Ss 19:30 0:00 /sbin/init
coroqne+ 30 0.0 0.2 17692 13572 ? Ss 19:30 0:00 /usr/bin/corosync-qnetd -f
root 33 0.0 0.0 4332 2140 pts/0 Ss+ 19:30 0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400
root 34 0.0 0.1 13272 7644 ? Ss 19:30 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root 97 6.5 0.1 13736 8256 ? Ss 19:34 0:00 \_ sshd: root@pts/1
root 103 2.0 0.0 3964 3420 pts/1 Ss 19:34 0:00 \_ -bash
root 106 0.0 0.0 6696 3000 pts/1 R+ 19:34 0:00 \_ ps auxf
At my first attempt of Dockerizing the QDevice, I forgot to set
the correct permissions on the `/etc/corosync` folder. The daemon
failed to start. There is no logging inside the container, but after
running the daemon manually as root (which worked) and inspecting the
systemd unit file, the cause of the issue was clear.
The QDevice is not visible inside the web interface as far as I know.
[1]: https://pve.proxmox.com/wiki/Cluster_Manager
[2]: https://web.archive.org/web/20220323202732/https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
[3]: https://github.com/modelrockettier/docker-corosync-qnetd
[4]: https://github.com/alehaa/docker-debian-systemd
[5]: https://docs.docker.com/network/macvlan/
[6]: https://web.archive.org/web/20220417190820/https://github.com/proxmox/pve-cluster/blob/master/data/PVE/CLI/pvecm.pm#L149
[7]: /s/tags/corosync.html
[8]: /s/inc/img/pve-cluster.png
---
License:
All the text on this website is free as in freedom unless stated otherwise.
This means you can use it in any way you want, you can copy it, change it
the way you like and republish it, as long as you release the (modified)
content under the same license to give others the same freedoms you've got
and place my name and a link to this site with the article as source.
This site uses Google Analytics for statistics and Google Adwords for
advertisements. You are tracked and Google knows everything about you.
Use an adblocker like ublock-origin if you don't want it.
All the code on this website is licensed under the GNU GPL v3 license
unless already licensed under a license which does not allows this form
of licensing or if another license is stated on that page / in that software:
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see .
Just to be clear, the information on this website is for meant for educational
purposes and you use it at your own risk. I do not take responsibility if you
screw something up. Use common sense, do not 'rm -rf /' as root for example.
If you have any questions then do not hesitate to contact me.
See https://raymii.org/s/static/About.html for details.