Skip to main content

Raymii.org Raymii.org Logo

Quis custodiet ipsos custodes?
Home | About | All pages | Cluster Status | RSS Feed | Gopher

Corosync Pacemaker - Execute script on failover

Published: 20-11-2013 | Author: Remy van Elst | Text only version of this article


❗ This post is over seven years old. It may no longer be up to date. Opinions may have changed.

With Corosync/Pacemaker there is no easy way to simply run a script on failover. There are good reasons for this, but sometimes you want to do something simple. This tutorial describes how to change the Dummy OCF resource to execute a script on failover.

Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.

You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days.

In this example it is a script which triggers a few SNMP traps, sends an alert to Nagios and sends some data to Graphite. SNMP alone could be done with the ocf:heartbeat:ClusterMon resource, but the other stuff not.

This is a very very simple way of doing it, I find it more a quick hack. For example, the script path is hard coded. For me that is not a problem because both the script as the Dummy resource are managed via Ansible, so I can change them any time.

Start by copying the Dummy resource over to a new resource. On Ubuntu the resource files are located here:

/usr/lib/ocf/resource.d/heartbeat/

In there, copy the Dummy file to a new resource, for example FailOverScript. If you don't have the Dummy resource, you can also find it here.

Edit the name and description:

Name:

meta_data() {
    cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="FailOverScript" version="0.9">
<version>1.0</version>

Description:

<longdesc lang="en">
Script ran on Failover
</longdesc>
<shortdesc lang="en">Script ran on Failover</shortdesc>

Make sure the script you want to execute is placed on the host, and is executable (chmod +x /usr/local/bin/script).

A bit lower in the file, edit the dummy_start function. Add the script path below the if [ $? = $OCF_SUCCESS ]; then and above the return $OCF_SUCCESS lines. Like so:

dummy_start() {
    dummy_monitor
    /usr/local/bin/failover.sh
    if [ $? =  $OCF_SUCCESS ]; then
    return $OCF_SUCCESS
    fi
    touch ${OCF_RESKEY_state}
}

After that has been done, replace all instances of Dummy and dummy with your name of choice:

sed -i 's/Dummy/FailOverScript' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
sed -i 's/dummy/failoverscript' /usr/lib/ocf/resource.d/heartbeat/FailOverScript

Test the script using the ocf-tester program to see if you have any mistakes:

ocf-tester -n resourcename /usr/lib/ocf/resource.d/heartbeat/FailOverScript

Output:

Beginning tests for /usr/lib/ocf/resource.d/heartbeat/FailOverScript...
/usr/sbin/ocf-tester: 214: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to ra-api-1.dtd
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/FailOverScript failed 1 tests

Oops. Seems we need xmllint. On Ubuntu, install it:

apt-get install libxml2-utils

Test again, you'll see it will pass:

Beginning tests for /usr/lib/ocf/resource.d/heartbeat/FailOverScript...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
/usr/lib/ocf/resource.d/heartbeat/FailOverScript passed all tests

As an extra test, to see if the script you've created is correctly executed, you can do a test start of the resource:

 export OCF_ROOT=/usr/lib/ocf
 bash -x /usr/lib/ocf/resource.d/heartbeat/FailOverScript start

To use this resource, add it like so:

crm configure primitive script ocf:heartbeat:FailOverScript op monitor interval="30"

If you want to test it, you can for example let the script send you an email. Put a node in standby and see if you get an email.

Tags: cluster , corosync , crm , heartbeat , high-availability , network , pacemaker , tutorials