I'm a Linux/Unix sysadmin with experience in High Availability, scaling and clustering, security, (Open)SSL and general linux system administration. I've worked as a sysadmin (devops) for Certificate Authorities, Hospitals, Managed Service providers, Datacenters Development shops and large Internet Service providers. I currently work for an Openstack provider. I like to design, build and manage large, complex and high available systems. I like to work with configuration management tools and version control systems. Documentation, monitoring and backups are things I do first, not when the time allows it later.
This is my personal website, please do note that these articles do not reflect opinions or policies of any of my (previous) employers, only my personal one.

Latest Items

Adding IPv6 to a keepalived and haproxy cluster

24-09-2017 | Remy van Elst

At work I regularly build high-available clusters for customers, where the setup is distributed over multiple datacenters with failover software. If one component fails, the service doesn't experience issues or downtime due to the failure. Recently I was tasked with expanding a cluster setup to be also reachable via IPv6. This article goes over the settings and configuration required for haproxy and keepalived for IPv6. The internal cluster will only be IPv4, the loadbalancer terminates HTTP and HTTPS connections.


atop is broken on Ubuntu 16.04 (version 1.26): trap divide error

18-09-2017 | Remy van Elst

Recently a few of my Ubuntu 16.04 machines had issues and I was troubleshooting them, noticing `atop` logs missing. atop is a very handy tool which can be setup to record system state every X minutes, and we set it up to run every 5 minutes. You can then at a later moment see what the server was doing, even sorting by disk, memory, cpu or network usage. This post discusses the error and a quick fix.


Backup OpenStack object store or S3 with rclone

17-08-2017 | Remy van Elst

This is a guide that shows you how to make backups of an object storage service like OpenStack swift or S3. Most object store services save data on multiple servers, but deleting a file also deletes it from all servers. Tools like rsync or scp are not compatible most of the time with these services, unless there is a proxy that translates the object store protocol to something like SFTP. rclone is an rsync-like, command line tool that syncs files and directories from cloud storage services like OpenStack swift, Amazon S3, Google cloud/drive, dropbox and more. By having a local backup of the contents of your cloud object store you can restore from accidental deletion or easily migrate between cloud providers. Syncing between cloud providers is also possible. It can also help to lower the RTO (recovery time objective) and backups are just always a good thing to have and test.


Openstack Horizon, remove the loading modal with uBlock Origin

25-05-2017 | Remy van Elst

The OpenStack dashboard, Horizon, is a great piece of software to manage your OpenStack resources via the web. However, it has, in my opinion, a very big usability issue. The loading dialog that appears after you click a link. It blocks the entire page and all other links. So, whenever I click, I have to wait three to five seconds before I can do anything else. Clicked the wrong menu item? Sucks to be you, here have some loading. Clicked a link and quickly want to open something in a new tab while the page is still loading? Nope, not today. It's not that browsers have had a function to show that a page is loading, no, of course, the loading indication that has been there forever is not good enough. Let's re-invent the wheel and significantly impact the user experience. With two rules in uBlock Origin this loading modal is removed and you can work normally again in Horizon


Distributed load testing with Tsung

13-04-2017 | Remy van Elst

At $dayjob I manage a large OpenStack Cloud. Next to that I also build high-performance and redundant clusters for customers. Think multiple datacenters, haproxy, galera or postgres or mysql replication, drbd with nfs or glusterfs and all sorts of software that can (and sometimes cannot) be clustered (redis, rabbitmq etc.). Our customers deploy their application on there and when one or a few components fail, their application stays up. Hypervisors, disks, switches, routers, all can fail without actual service downtime. Next to building such clusters, we also monitor and manage them. When we build such a cluster (fully automated with Ansible) we do a basic load test. We do this not for benchmarking or application flow testing, but to optimize the cluster components. Simple things like the mpm workers or threads in Apache or more advanced topics like MySQL or DRBD. Optimization there depends on the specifications of the servers used and the load patterns. Tsung is a high-performance but simple to configure and use piece of software written in Erlang. Configuration is done in a simple readable XML file. Tsung can be run distributed as well for large setups. It has good reporting and a live web interface for status and reports during a test.


Run software on the tty1 console instead of getty login on Ubuntu 14.04 and 16.04

10-04-2017 | Remy van Elst

Recently I wanted to change the default login prompt on the tty1 console on an OpenStack instance to automatically run htop. Instead of logging in via the console, I wanted it to start up htop right away and nothing else. Ubuntu 14.04 uses init and Ubuntu 16.04 uses systemd. Both ways are shown in this tutorial.


Check HTTP status code for a page on all DNS records

09-04-2017 | Remy van Elst

This is a small snippet using curl to check the status code of a given URL on all DNS records for a given domain. This site has a few A records in round robin mode, and sometimes the automatic deployment fails. Using this query I can check which server is the culprit and fix it manually.


Burn in testing for new Hypervisor and Storage server hardware

08-04-2017 | Remy van Elst

This article talks over how and why to do burn in testing on hypervisor and storage servers. I work at a fairly large cloud provider, where we have a lot of hardware. Think thousands of hardware servers and multiple ten thousand harddisks. It's all technology, so stuff breaks, and at our scale, stuff breaks often. One of my pet projects for the last period has been to automate the burn-in testing for our virtualisation servers and the storage machines. We run OpenStack and use KVM for the hypervisors and a combination of different storage technology for the volume storage servers. Before they go in production, they are tested for a few days with very intensive automated usage. We've noticed that they either fail then, or not. This saves us from having to migrate customers off of new production servers just a few days after they've gone live. The testing is of course all automated.


Openstack Soft Delete - recover deleted instances

18-03-2017 | Remy van Elst

This article contains both an end user and OpenStack administrator guide to set up and use soft_delete with OpenStack Nova. If an instance is deleted with nova delete, it's gone right away. If soft_delete is enabled, it will be queued for deletion for a set amount of time, allowing end-users and administrators to restore the instance with the nova restore command. This can save your ass (or an end-users bottom) if the wrong instance is removed. Setup is simple, just one variable in nova.conf. There are some caveats we'll also discuss here.


Running WPS-8 (Word Processing System) on the DEC(mate) PDP-8i and SIMH

09-03-2017 | Remy van Elst

This article covers running WPS-8 on a modern day emulator. WPS-8 is a word processor by DEC for the PDP-8. The PDP-8 was a very populair 12-bit minicomputer from 1965. WPS-8 was released around 1970, it came bundled with DEC's VT78 terminal. This terminal bundle was also known as the DECmate. This article covers the setup of the emulator, simh with the correct disk images and terminal settings. It covers basic usage and the features of WPS-8 and it has a section on key remapping. The early keyboards used with WPS-8 have small but incompatible differences with recent keyboards, but nothing that xterm remapping can't fix.


Traceroute IPv6 to Smokeping Target config

05-03-2017 | Remy van Elst

This little one-liner converts the output of traceroute for IPv6 to Smokeping Target output. I had only setup my smokepings for IPv4. Recently we had an issue were network config was borked and the whole IPv4 network was not announced via BGP anymore. I was at home troubleshooting, but finding nothing since I have native IPv6 and that part still worked. My Smokeping did show loss, and that explicitly uses IPv4. That helped with the debugging a lot. I have a [IPv4][4] version of this article as well.


Ansible: access group vars for groups the current host is not a member of

27-01-2017 | Remy van Elst

This guide shows you how to access group variables for a group the current host is not a member of. In Ansible you can access other host variables using `hostvars['hostname']` but not group variables. The way described here is workable, but do I consider it a dirty hack. So why did I need this? I have a setup where ssl is offloaded by haproxy servers, but the virtual hosts and ssl configuration are defined in Apache servers. The loadbalancers and appservers are two different hostgroups, the ssl settings are in the appserver group_vars, which the hosts in the loadbalancer group need to access. The best way to do this is change the haproxy playbooks and configuration and define the certificates there, but in this specific case that wasn't a workable solution. Editing two yaml files (one for the appservers and one for the loadbalancers) was not an option in this situation.


Sparkling Network

12-01-2017 | Remy van Elst

This is an overview of all the servers in the Sparkling Network, mostly as an overview for myself, but it might be interesting for others. It also has a status overview of the nodes.


haproxy: restrict specific URLs to specific IP addresses

09-01-2017 | Remy van Elst

This snippet shows you how to use haproxy to restrict certain URLs to certain IP addresses. For example, to make sure your admin interface can only be accessed from your company IP address.


OpenStack: Quick and automatic instance snapshot backup and restore (and before an apt upgrade) with nova backup

20-12-2016 | Remy van Elst

This is a guide that shows you how to create OpenStack instance snapshots automatically, quick and easy. This allows you to create a full backup of the entire instance. This guide has a script that makes creating snapshots from an OpenStack VM automatic via cron. The script uses the `nova backup` function, therefore it also has retention and rotation of the backups. It also features an option to create a snapshot before every apt action, upgrade/install/remove. This way, you can easily restore from the snapshot when something goes wrong after an upgrade. Snapshots are very usefull to restore the entire instance to an earlier state. Do note that this is not the same as a file based backup, you can't select a few files to restore, it's all or nothing.


Create a PDP-8 OS8 RK05 system disk from RX01 floppies with SIMH (and get text files in and out of the PDP-8)

07-12-2016 | Remy van Elst

This guide shows you how to build an RK05 bootable system disk with OS/8 on it for the PDP-8, in the SIMH emulator. We will use two RX01 floppies as the build source, copy over all the files and set up the LPT printer and the PTR/PIP paper tape punch/readers. As an added bonus the article also shows you how to get text files in and out of the PDP-8 sytem using the printer and papertape reader / puncher.


Overflow the Investigatory Powers Bill!

24-11-2016 | Remy van Elst

I read an article on The Register regarding the Investigatory Powers Bill. The part were ISP's are forced to save their customers browsing history for a year is the most horryfing part, just as that whole bill. Let's hope the political process and organizations like the Open Rights Group and the EFF have enough lobbying power to change people's minds. If that fails, then we can all try to overflow the logging. Just as some people put keywords in their mail signatures to trigger automatic filters and generate noise, we should all generate as much data and noise as possible. This way the information they do gather will not be usefull, it will take too much time, storage and effort to process it and thus the project will fail. 2 years ago I wrote a small Python script which browser the web for you, all the time. Running that on one or two Raspberry Pi's or other small low power computers 24/7 will generate a lot of noise in the logging and filtering.


Build a FreeBSD 11.0-release Openstack Image with bsd-cloudinit

14-11-2016 | Remy van Elst

We are going to prepare a FreeBSD image for Openstack deployment. We do this by creating a FreeBSD 11.0-RELEASE instance, installing it and converting it using bsd-cloudinit. We'll use the CloudVPS public Openstack cloud for this. We'll be using the Openstack command line tools, like nova, cinder and glance. A FreeBSD image with Cloud Init will automatically resize the disk to the size of the flavor and it will add your SSH key right at boot. You can use Cloud Config to execute a script at first boott, for example, to bootstrap your system into Puppet or Ansible. If you use Ansible to manage OpenStack instances you can integrate it without manually logging in or doing anything manually.


Nitrokey gnuk firmware update via DFU

11-10-2016 | Remy van Elst

The Nitrokey (start, all of them) can be upgraded to a newer GNUK firmware. However, this can only be done via ST Link or DFU, if you use the Gnuk USB firmware upgrade you will brick the device. This guide shows you how to attach a DFU adapter and how to flash firmware to a Nitrokey, both for upgrading or unbricking an USB upgraded one.


MySQL restore after a crash and disk issues

10-10-2016 | Remy van Elst

Recently I had to restore a MySQL server. The hardware had issues with the storage and required some FSCK's, disk replacements and a lot of RAID and LVM love to get working again. Which was the easy part. MySQL was a bit harder to fix. This post describes the proces I used to get MySQL working again with a recent backup. In this case it was a replicated setup so the client had no actual downtime.


Firefox History stats with Bash

25-09-2016 | Remy van Elst

This is a small script to gather some statistics from your Firefox history. First we use sqlite3 to parse the Firefox history database and get the last three months, then we remove all the IP addresses and port numbers and finally we sort and count it.


All Items