Corosync Notes

02-11-2013 | Remy van Elst


Table of Contents


What the hell are all the components?

  • Pacemaker: Resource manager
  • Corosync: Messaging layer
  • Heartbeat: Also a messaging layer
  • Resource Agents: Scripts that know how to control various services

    Pacemaker is the thing that starts and stops services (like your database or mail server) and contains logic for ensuring both that they are running, and that they are only running in one location (to avoid data corruption).

    But it cant do that without the ability to talk to instances of itself on the other node(s), which is where Heartbeat and/or Corosync come in.

    Think of Heartbeat and Corosync as dbus but between nodes. Somewhere that any node can throw messages on and know that they'll be received by all its peers. This bus also ensures that everyone agrees who is (and is not) connected to the bus and tells Pacemaker when that list changes.

If you want to make sure that the commands below execute on all cluster nodes, append the -w parameter to the crm command, it stands for wait. Like so: crm -w resource stop virtual-ip.

Get corosync cluster status

crm_mon --one-shot -V

or

crm status

Put node on standby

Execute on node you want to put in standby.

crm node standby

Put node online again (after standby)

Execute on node you want to put online again.

crm node online

If you want to put a node online or in standby from another cluster node, append the node name to the commands above, like so:

crm node standby NODENAME

Disable stonith (shoot the other node in the head)

crm configure property stonith-enabled=false

Add a simple shared IP resource

crm configure primitive failover-ip ocf:heartbeat:IPaddr2 params ip=10.0.2.10 cidr_netmask=32 op monitor interval=10s

This tells Pacemaker three things about the resource you want to add. The first field, ocf, is the standard to which the resource script conforms to and where to find it. The second field is specific to OCF resources and tells the cluster which namespace to find the resource script in, in this case heartbeat. The last field indicates the name of the resource script.

View all available resource classes
crm ra classes

Output:

heartbeat
lsb
ocf / heartbeat pacemaker
stonith
View all the OCF resource agents provided by Pacemaker and Heartbeat
crm ra list ocf pacemaker

Output:

ClusterMon    Dummy         HealthCPU     HealthSMART   Stateful      SysInfo
SystemHealth  controld      o2cb          ping          pingd

For Heartbeat:

crm ra list ocf heartbeat

Output:

AoEtarget            AudibleAlarm         CTDB                 ClusterMon
Delay                Dummy                EvmsSCC              Evmsd
Filesystem           ICP                  IPaddr               IPaddr2
IPsrcaddr            IPv6addr             LVM                  LinuxSCSI
MailTo               ManageRAID           ManageVE             Pure-FTPd
Raid1                Route                SAPDatabase          SAPInstance
SendArp              ServeRAID            SphinxSearchDaemon   Squid
Stateful             SysInfo              VIPArip              VirtualDomain
WAS                  WAS6                 WinPopup             Xen
Xinetd               anything             apache               conntrackd
db2                  drbd                 eDir88               ethmonitor
exportfs             fio                  iSCSILogicalUnit     iSCSITarget
ids                  iscsi                jboss                ldirectord
lxc                  mysql                mysql-proxy          nfsserver
nginx                oracle               oralsnr              pgsql
pingd                portblock            postfix              proftpd
rsyncd               scsi2reservation     sfex                 symlink
syslog-ng            tomcat               vmware

Add simple apache resource

crm configure primitive apache-ha ocf:heartbeat:apache params configfile=/etc/apache2/apachd2.conf op monitor interval=1min

Make sure Apache and the Virtual IP are on the same node

crm configure colocation apache-with-ip inf: apache-ha failover-ip

Make sure that when either one crashes they both are recovered on another node:

crm configure order apache-after-ip mandatory: failover-ip apache-ha

Stop a resource

crm resource stop $`RESOURCENAME

Delete a resource

crm configure delete $RESOURCENAME

Remove a node from the cluster

crm node delete $NODENAME

Stop all cluster resources

crm configure property stop-all-resources=true

Clean up warnings and errors for a resource

crm resource cleanup $RESOURCENAME

Erase entire config

crm configure erase

Disable quorum (when using only two nodes)

crm configure property no-quorum-policy=ignore

Let the shared IP go back to the primary node when it is up after failover

crom configure rsc_defaults resource-stickiness=100

sysctl

In order to be able to bind on a IP which is not yet defined on the system, we need to enable non local binding at the kernel level.

Temporary:

echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind

Permanent:

Add this to /etc/sysctl.conf:

net.ipv4.ip_nonlocal_bind = 1

Enable with:

sysctl -p

Sources


Tags: cluster, corosync, crm, heartbeat, high-availability, network, pacemaker,