Skip to main content

Raymii.org Logo (IEC resistor symbol) logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Corosync Notes

Published: 02-11-2013 | Author: Remy van Elst | Text only version of this article


Table of Contents

  • Sources

  • What are all the components?

    If you like this article, consider sponsoring me by trying out a Digital Ocean VPS. With this link you'll get $100 credit for 60 days). (referral link)

    Pacemaker is the thing that starts and stops services (like your database or mail server) and contains logic for ensuring both that they are running, and that they are only running in one location (to avoid data corruption).

    But it cant do that without the ability to talk to instances of itself on the other node(s), which is where Heartbeat and/or Corosync come in.

    Think of Heartbeat and Corosync as dbus but between nodes. Somewhere that any node can throw messages on and know that they'll be received by all its peers. This bus also ensures that everyone agrees who is (and is not) connected to the bus and tells Pacemaker when that list changes.

    If you want to make sure that the commands below execute on all cluster nodes, append the -w parameter to the crm command, it stands for wait. Like so: crm -w resource stop virtual-ip.

    Get corosync cluster status

    crm_mon --one-shot -V
    

    or

    crm status
    

    Put node on standby

    Execute on node you want to put in standby.

    crm node standby
    

    Put node online again (after standby)

    Execute on node you want to put online again.

    crm node online
    

    If you want to put a node online or in standby from another cluster node, append the node name to the commands above, like so:

    crm node standby NODENAME
    

    Disable stonith (shoot the other node in the head)

    crm configure property stonith-enabled=false
    

    Add a simple shared IP resource

    crm configure primitive failover-ip ocf:heartbeat:IPaddr2 params ip=10.0.2.10 cidr_netmask=32 op monitor interval=10s
    

    This tells Pacemaker three things about the resource you want to add. The first field, ocf, is the standard to which the resource script conforms to and where to find it. The second field is specific to OCF resources and tells the cluster which namespace to find the resource script in, in this case heartbeat. The last field indicates the name of the resource script.

    View all available resource classes
    crm ra classes
    

    Output:

    heartbeat
    lsb
    ocf / heartbeat pacemaker
    stonith
    
    View all the OCF resource agents provided by Pacemaker and Heartbeat
    crm ra list ocf pacemaker
    

    Output:

    ClusterMon    Dummy         HealthCPU     HealthSMART   Stateful      SysInfo
    SystemHealth  controld      o2cb          ping          pingd
    

    For Heartbeat:

    crm ra list ocf heartbeat
    

    Output:

    AoEtarget            AudibleAlarm         CTDB                 ClusterMon
    Delay                Dummy                EvmsSCC              Evmsd
    Filesystem           ICP                  IPaddr               IPaddr2
    IPsrcaddr            IPv6addr             LVM                  LinuxSCSI
    MailTo               ManageRAID           ManageVE             Pure-FTPd
    Raid1                Route                SAPDatabase          SAPInstance
    SendArp              ServeRAID            SphinxSearchDaemon   Squid
    Stateful             SysInfo              VIPArip              VirtualDomain
    WAS                  WAS6                 WinPopup             Xen
    Xinetd               anything             apache               conntrackd
    db2                  drbd                 eDir88               ethmonitor
    exportfs             fio                  iSCSILogicalUnit     iSCSITarget
    ids                  iscsi                jboss                ldirectord
    lxc                  mysql                mysql-proxy          nfsserver
    nginx                oracle               oralsnr              pgsql
    pingd                portblock            postfix              proftpd
    rsyncd               scsi2reservation     sfex                 symlink
    syslog-ng            tomcat               vmware
    

    Add simple apache resource

    crm configure primitive apache-ha ocf:heartbeat:apache params configfile=/etc/apache2/apachd2.conf op monitor interval=1min
    

    Make sure Apache and the Virtual IP are on the same node

    crm configure colocation apache-with-ip inf: apache-ha failover-ip
    

    Make sure that when either one crashes they both are recovered on another

    node:

    crm configure order apache-after-ip mandatory: failover-ip apache-ha
    

    Stop a resource

    crm resource stop $`RESOURCENAME
    

    Delete a resource

    crm configure delete $RESOURCENAME
    

    Remove a node from the cluster

    crm node delete $NODENAME
    

    Stop all cluster resources

    crm configure property stop-all-resources=true
    

    Clean up warnings and errors for a resource

    crm resource cleanup $RESOURCENAME
    

    Erase entire config

    crm configure erase
    

    Disable quorum (when using only two nodes)

    crm configure property no-quorum-policy=ignore
    

    Let the shared IP go back to the primary node when it is up after failover

    crom configure rsc_defaults resource-stickiness=100
    

    sysctl

    In order to be able to bind on a IP which is not yet defined on the system, we need to enable non local binding at the kernel level.

    Temporary:

    echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind
    

    Permanent:

    Add this to /etc/sysctl.conf:

    net.ipv4.ip_nonlocal_bind = 1
    

    Enable with:

    sysctl -p
    

    Sources

    Tags: cluster , corosync , crm , heartbeat , high-availability , network , pacemaker , snippets