In the Linux world, a popular approach to build highly available clusters is with a set of software tools that include pacemaker (as resource manager) and corosync (as the group communication system), plus other libraries on which they depend and some configuration utilities.
On Illumos (and in our particular case, OmniOS), the ihac project is abandoned and I couldn’t find any other platform-specific open source and mature framework for clustering. Porting pacemaker to OmniOS is an option and this post is about our experience with this task.
The objective of the post is to describe how to get an active/passive pacemaker cluster running on OmniOS and to test it with a Dummy resource agent. The use case (or test case) is not relevant, but what should be achieved in a correctly configured cluster is that, if the node of the cluster running the Dummy resource (active node) fails, then that resource should fail-over and be started on the other node (high availability).
I will assume to start from a fresh installation of OmniOS 151012 with a working network configuration (and ssh, for your comfort!). Check the general administration guide, if needed.
This is what we will cover:
- Configuring the machines
- Patching and compiling the tools
- Running pacemaker and corosync from SMF
- Running an active/passive cluster with two nodes to manage the Dummy resource
Installing packages
Some packages are installed from the default repositories, but others need to be retrieved from opencsw.
Install from default repositories:
# pkg install developer/gnu-binutils text/gnu-grep \ rsync gnu-tar wget text/gnu-sed compatibility/ucb text/gawk\ autoconf gnu-m4 system/header header-math \ ipmitool gnu-make developer/build/libtool library/libtool/libltdl \ library/ncurses library/security/openssl text/gnu-gettext \ developer/versioning/mercurial \ developer/versioning/git \ SUNWcs driver/network/ofk system/header \ developer/library/lint developer/object-file \ system/library/mozilla-nss/header-nss library/nspr/header-nspr \ xz pkg://omnios/developer/swig \ package/pkg file/gnu-coreutils \ system/header/header-picl developer/gcc48 \ developer/build/automake
Install the OpenCSW utility and update the repos:
# pkgadd -d http://get.opencsw.org/now # /opt/csw/bin/pkgutil -U
Install the packages from CSW:
# /opt/csw/bin/pkgutil -i ggettext pkgconfig libnet gnutls libgnutls_dev libgnutls13 libev_dev libevent_dev libgcrypt11
Configure the environment
Variables
Some environment variables will be needed by the tools and scripts, but also during the building process. The easiest thing is to create a file (e.g., pacemaker.rc) and source it to get the pacemaker environment ready. You may want to separate the variables needed only for running the tools from the various flags needed during the build.
NOTE: there should be no particular reason to change the installation prefix (PREFIX), but if you need to, please adapt also to the remaining part of the instructions to that change, where needed.
Content of pacemaker.rc:
export PCMK_ipc_type=socket export PREFIX=/opt export CFLAGS='-D__EXTENSIONS__ -D_POSIX_PTHREAD_SEMANTICS -DNAME_MAX=255 -DHOST_NAME_MAX=255 -I/opt/gcc-4.8.1/include -I/usr/include -I${PREFIX}/include -I/opt/ha/include -I/opt/gcc-4.8.1/lib/gcc/i386-pc-solaris2.11/4.8.1/include/ -lsocket -lnsl' export LDFLAGS='-R/usr/gnu/lib -L${PREFIX}/lib -L/opt/gcc-4.8.1 -L/usr/gnu/lib -L/lib -L/usr/lib' export PATH=/usr/gnu/bin:/opt/gcc-4.8.1/bin/:/opt/csw/bin:/usr/gnu/bin:/usr/bin:/usr/sbin:/usr/local/bin:$PREFIX/bin:/sbin/:/opt/csw/gnu/:${PREFIX}/sbin export PKG_CONFIG_PATH='/opt/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig' export PKG_CONFIG_LIBDIR='/opt/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig' export PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=yes export PKG_CONFIG_ALLOW_SYSTEM_LIBS=yes export LCRSODIR=/usr/libexec/lcrso export CLUSTER_USER=hacluster export CLUSTER_GROUP=haclient export BUILDPATH=/export/builds export LD_ALTEXEC=/usr/gnu/i386-pc-solaris2.11/bin/ld export CONFIG_SHELL=/usr/gnu/bin/sh export PYTHONPATH=${PREFIX}/lib/python2.6/site-packages export OCF_ROOT=/opt/usr/lib/ocf
Then source the file to have the configuration on the current shell:
# source pacemaker.rc
Now update the library path on the system to include the CSW objects:
# crle -l /opt/csw/lib/ -u
Folders
# mkdir -p $BUILDPATH # mkdir -p $PREFIX/var # mkdir -p $PREFIX/lib/heartbeat/cores/$CLUSTER_USER
Cluster user and group
We create the hacluster and haclient user and group (respectively), that will run the cluster, then we set some permissions on the folders that we created before.
Note that the corosync and pacemaker processes will be run as hacluster user (as per the SMF script that comes later), so a common problem when using resource agents will be about missing permission on directories or executables.
# getent group ${CLUSTER_GROUP} >/dev/null || groupadd ${CLUSTER_GROUP} # getent passwd ${CLUSTER_USER} >/dev/null || useradd -g ${CLUSTER_GROUP} -d $PREFIX/lib/heartbeat/cores/$CLUSTER_USER -s /bin/bash -c "cluster user" ${CLUSTER_USER} # chown $CLUSTER_USER:$CLUSTER_GROUP $PREFIX/var/ # chown $CLUSTER_USER:$CLUSTER_GROUP $PREFIX/lib/heartbeat/cores/$CLUSTER_USER
Similarly, hacluster won’t have enough rights to run write commands such as ipadm create-addr, so we give him passwordless sudo powers. If some resource agents that you want to run will need sudo permissions in some of their instructions, then they will need to be patched.
Run
# visudo
then append this at the end of the file to have the passwordless sudo:
hacluster ALL=(ALL) NOPASSWD: ALL
An alternative would be to set appropriate Role-Based Access Control (RBAC) authorizations.
UPDATE: running pacemaker and corosync as root should work without issues. So you can edit the SMF script and use “root” instead of “hacluster” in the “CLUSTER_USER” variable.
Hostnames
Set the hostnames of both machines by appending an entry at the end of /etc/hosts (use the output of uname -n to get the symbolic name), example:
10.0.100.10 ha-test-1
Check that from each machine you can ping the other with the symbolic name.
Installation of the tools
This section will provide indications on which tools to build and install and how. Install them in the order shown here below.
This information is re-elaborated from Andreas page on libqb (check credits and references).
I will mention the version of the tools that I used on my setup (or explicitly add a checkout command). You are encouraged to try the latest “masters/tips” when available, but that might need patching work not documented here.
Also, the fact that code is compiling really doesn’t mean much. There will be differences, such as expected return values, between Linux and Solaris that will break the code at runtime. With the patches here described, I managed to run correctly the Dummy, IPaddr and ZFS resources, but the line of code that will crash everything will be executed sooner or later, I haven’t just traversed that code yet :D!
General note on configure.ac and Makefile.am
I had the need to do this change for many packages, so I will document it here as a general note and reference this paragraph if this change is needed to compile a certain package.
So if you see the note “apply the changes described in the general section about configure.ac and Makefile.am” during the installation instructions of a package, come back to this paragraph and do the two changes described here below.
Add the following line in configure.ac, after the AC_INIT directive:
AC_CONFIG_MACRO_DIR([/opt/csw/share/aclocal/])
If Makefile.am already has an ACLOCAL_AMFLAGS variable, then append
-I/opt/csw/share/aclocal/
to that line, otherwise add the complete entry
ACLOCAL_AMFLAGS=-I/opt/csw/share/aclocal/
help2man
# cd $BUILDPATH # wget http://ftp.hawo.stw.uni-erlangen.de/gnu/help2man/help2man-1.46.1.tar.xz # tar xf help2man\-1.46.1.tar.xz # cd help2man\-1.46.1 # ./configure # make # make install
libtool
# cd $BUILDPATH # wget http://ftp.gnu.org/gnu/libtool/libtool-2.4.2.tar.gz # tar zxf libtool-2.4.2.tar.gz # cd libtool\-2.4.2 # ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./bootstrap # chmod +x libltdl/config/install\-sh # ./configure # make install
libesmtp
# cd $BUILDPATH # export CFLAGS='-std=c89 -D__EXTENSIONS__ -DNAME_MAX=255 -DHOST_NAME_MAX=255' # wget http://www.stafford.uklinux.net/libesmtp/libesmtp-1.0.6.tar.gz # gtar zxf libesmtp-1.0.6.tar.gz # cd libesmtp-1.0.6 # mkdir m4 # autoreconf -i # ./configure --prefix=$PREFIX # perl -pi -e 's#// TODO: handle GEN_IPADD##' smtp-tls.c # gmake # gmake install # cp auth-client.h $PREFIX/include # cp auth-plugin.h $PREFIX/include # cp libesmtp.h $PREFIX/include # unset CFLAGS # export CFLAGS='-D__EXTENSIONS__ -D_POSIX_PTHREAD_SEMANTICS -DNAME_MAX=255 -DHOST_NAME_MAX=255 -I/opt/gcc-4.8.1/include -I/usr/include -I${PREFIX}/include -I/opt/ha/include -I/opt/gcc-4.8.1/lib/gcc/i386-pc-solaris2.11/4.8.1/include/ -lsocket -lnsl'
check
# cd $BUILDPATH # wget http://sourceforge.net/projects/check/files/check/0.9.8/check-0.9.8.tar.gz # gtar zxf check-0.9.8.tar.gz # cd check-0.9.8
Edit the configure.ac file to add two lines (just after AC_CONFIG_MACRO_DIR([m4]) ):
m4_pattern_allow([AM_PROG_AR]) AM_PROG_AR
Then continue with the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 autoreconf --install # ./configure # make # make install
asciidoc
# cd $BUILDPATH # wget http://sourceforge.net/projects/asciidoc/files/asciidoc/8.6.8/asciidoc-8.6.8.tar.gz # gtar zxf asciidoc-8.6.8.tar.gz # cd asciidoc-8.6.8 # ./configure # gmake install
cluster glue
(Note: I used version 2ce85bfab4c1 for my setup)
# cd $BUILDPATH # wget -O cluster-glue.tar.bz2 http://hg.linux-ha.org/glue/archive/tip.tar.bz2 # gtar jxf cluster-glue.tar.bz2 # cd Reusable-Cluster-Components-* # perl -pi -e 's#\$\(XSLTPROC\) \\#\$\(XSLTPROC\) --novalid \\#g' doc/Makefile.am
Search for “solaris” in configure.ac and match that section with the following (you should only add the CFLAGS line):
*solaris*) REBOOT_OPTIONS="-n" POWEROFF_OPTIONS="-n" CFLAGS="$CFLAGS -D__EXTENSIONS__"
Search for “cc_supports_flag()” in configure.ac and check that it matches the following:
cc_supports_flag() { local CFLAGS="$@" AC_MSG_CHECKING(whether $CC supports "$@") AC_COMPILE_IFELSE([AC_LANG_SOURCE(int main(){return 0;})] ,[RC=0; AC_MSG_RESULT(yes)],[RC=1; AC_MSG_RESULT(no)]) return $RC }
Run:
# ./autogen.sh # chmod +x install\-sh # sed -i 's/-fstack-protector-all//g' configure.ac
Then apply the changes described in the general section about configure.ac and Makefile.am.
Complete the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 autoreconf --install # LDFLAGS='-L/opt/csw/lib' ./configure --prefix=$PREFIX --enable-fatal-warnings=no --enable-doc=no --with-daemon-user=${CLUSTER_USER} --with-daemon-group=${CLUSTER_GROUP} # make # make install
Resource agents
(Note: I used version b644395 for my setup)
# cd $BUILDPATH # wget -O resource-agents.tar.gz https://github.com/ClusterLabs/resource-agents/tarball/master # gtar zxvf resource-agents.tar.gz # cd ClusterLabs-resource-agents-*/ # ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./autogen.sh # chmod +x install\-sh # ./configure --prefix=$PREFIX # gmake clean # gmake # gmake install
libqb
# cd $BUILDPATH # git clone https://github.com/ClusterLabs/libqb.git libqb # cd libqb # git checkout v0.17.1
Then apply the changes described in the general section about configure.ac and Makefile.am.
Complete the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./autogen.sh # LDFLAGS='-R/opt/gcc-4.8.1/lib' CFLAGS="-D_REENTRANT -D_POSIX_PTHREAD_SEMANTICS -D__EXTENSIONS__ -march=i486 -mtune=native" ./configure --prefix=$PREFIX --enable-debug --with-check=yes --enable-slow-tests # make clean # make # make install
libstatgrab
# cd $BUILDPATH # wget http://dl.ambiweb.de/mirrors/ftp.i-scream.org/libstatgrab/libstatgrab-0.91.tar.gz # gtar zxvf libstatgrab-0.91.tar.gz # cd libstatgrab-0.91 # ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./configure --prefix=$PREFIX # make # make install
corosync
Get the sources:
# cd $BUILDPATH # git clone https://github.com/corosync/corosync.git corosync # cd corosync # git checkout v2.3.4
Set the environment:
# export LDFLAGS='-R/opt/gcc-4.8.1/lib -R/usr/lib/mps -R/opt/lib -L/opt/gcc-4.8.1/lib -L/usr/lib/mps -L/opt/lib -lnss3 -lsmime3 -lssl3 -lnssutil3 -lplds4 -lplc4 -lnspr4 -lpthread -ldl -lposix4' # export nss_CFLAGS='-I/usr/include/mps' # export nss_LIBS='-R/usr/lib/mps -L/usr/lib/mps' # export PKG_CONFIG_PATH='/opt/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig' # export PKG_CONFIG_LIBDIR='/opt/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig' # export PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=yes # export PKG_CONFIG_ALLOW_SYSTEM_LIBS=yes
In case of previous failed attempts, clean the configuration cache:
# rm config.status # rm -rf autom4te.cache
Apply the changes described in the general section about configure.ac and Makefile.am.
Complete the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./autogen.sh # ./configure --prefix=$PREFIX --localstatedir=$PREFIX/var --enable-monitoring --enable-snmp --enable-xmlconf --enable-testagents -enable-augeas --enable-debug --enable-coverage # make # make install
Now logout and login again from your shell, then source pacemaker.rc to continue from a clean environment.
heartbeat
Get the sources:
# cd $BUILDPATH # wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/tip.tar.bz2 # gtar jxf tip.tar.bz2 # cd Heartbeat-3-0-*/
Apply the changes described in the general section about configure.ac and Makefile.am (for heartbeat the file is configure.in).
In case of previous failed attempts, clean the configuration cache:
# rm config.status # rm -rf autom4te.cache
Complete the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 autoreconf -i # CFLAGS="-I/opt/include -I/opt/csw/include/ " LDFLAGS='-R/opt/lib -L/opt/lib -L/opt/csw/lib/ -lnsl /opt/csw/lib/libgnutls.so.13 -lsocket' ./configure --prefix=$PREFIX --enable-quorumd # chmod +x install\-sh # make CPPFLAGS="-L/opt/csw/lib/ -lgnutls -I/usr/include/glib-2.0/ -I/usr/lib/glib-2.0/include/" # make install
pacemaker
Get the sources:
# cd $BUILDPATH # git clone https://github.com/ClusterLabs/pacemaker.git pacemaker # cd pacemaker # git checkout 272814b6423d4cdc21a0a83cd9007a4d57bd542d
Set the environment:
# export CFLAGS='-O3 -D_REENTRANT -D_POSIX_PTHREAD_SEMANTICS -march=i486 -mtune=native -I/usr/include/ncurses/' # export LDFLAGS="-R/usr/lib/mps:/opt/gcc-4.8.1/lib -L'/usr/lib/mps:/opt/gcc-4.8.1/lib' -lssp_nonshared" # export PKG_CONFIG_PATH='/opt/lib/pkgconfig:/usr/lib/pkgconfig:/usr/local/lib/pkgconfig' # export CONFIG_SHELL=/usr/gnu/bin/sh
Patch:
# perl -pi -e 's/-Wunsigned-char//g' configure.ac # perl -pi -e 's#-Wunused-but-set-variable##' configure.ac # perl -pi -e 's/-fstack-protector-all//g' configure.ac # sed -i 's/\(ACLOCAL_AMFLAGS\s*=\s*\-I\s*m4\)/\1 \-I\/opt\/csw\/share\/aclocal\//g' Makefile.am # find . -name "*.c" -o -name "*.h" | xargs sed -i 's/syscall\.h/sys\/syscall\.h/g' # sed -i 's/reboot(RB_AUTOBOOT)/reboot(RB_AUTOBOOT, \"pacemaker\")/g' lib/common/watchdog.c # sed -i 's/\(sysrq_init()\)/\/\/\1/g' mcp/pacemaker.c
Apply the hack of shame: this is a terrible workaround for the missing signalfd system call in IllumOS (the patch target is a file named services_linux.c!). We just wait 5 seconds for the forked process to finish providing stdout (instead of listening to signals)…
Get the patch file from the gist, extract it and apply it (this will also do minor changes to the Dummy resource agent):
# wget https://gist.githubusercontent.com/vincepii/b5d8f356a35d535313b5/raw/5a4a1e8df5691c39531eb5ffb7f6f0a5c0769a0b/pacemaker.patch # git apply pacemaker.patch
Complete the build:
# ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./autogen.sh # ./configure --prefix=$PREFIX --enable-fatal-warnings=no --with-corosync --with-cs-quorum --with-acl=no --enable-debug # make CPPFLAGS="-I/usr/include/ -I/usr/include/glib-2.0/ -I/usr/lib/glib-2.0/include/ -I/usr/include/libxml2/ -I/opt/include $CFLAGS" # make install
Post install:
# mkdir -p $PREFIX/etc/corosync/uidgid.d # ( echo "uidgid {" echo " uid: `id -u ${CLUSTER_USER}`" echo " gid: `id -g ${CLUSTER_USER}`" echo "}" ) > $PREFIX/etc/corosync/uidgid.d/uid.conf
Now logout and login again from your shell, then source pacemaker.rc to continue from a clean environment.
crm
# cd $BUILDPATH # git clone https://github.com/crmsh/crmsh.git crmsh # cd crmsh # git checkout 0d631cb36655695a67c940cf02c3fabccff705da # perl -pi -e 's#ps -e -o pid,command#ps -e -o pid,comm#' ./modules/utils.py # perl -pi -e 's#a2x -f manpage#a2x -L -f manpage#' doc/Makefile.am # ACLOCAL=aclocal-1.14 AUTOMAKE=automake-1.14 ./autogen.sh # ./configure --prefix=$PREFIX
Edit the file doc/Makefile.am:
# sed -i 's/a2x -L -f manpage $</a2x --no-xmllint -f manpage $</g' doc/Makefile.am
Complete the build:
# make # make install
Post install:
# mkdir -p /root/.config/crm/ # cp /opt/etc/crm/crm.conf /root/.config/crm/
And this should complete the installation part.
NOTE: if crm will not run, complaining about missing readline module, then you can use crm with the CSW python (for some reason readline.so will not appear in /usr/lib/python2.6/lib-dynload).
To do this (only if crm is not working), use CSW python as interpreter and reinstall crm:
# /opt/csw/bin/pkgutil -i libreadline6 libreadline_dev python py_lxml # cd $BUILDPATH/crmsh # sed -i 's/\#\!\/usr\/bin\/python/\#\!\/opt\/csw\/bin\/python/p' crm # make # make install
Corosync configuration
Get a sample corosync configuration file from this gist and put it in place:
# wget https://gist.github.com/vincepii/86f60ff4ff912a782a67/raw/c70603c7af996494ad490aa5ef16613babfe4572/corosync.conf # mv corosync.conf ${PREFIX}/etc/corosync/corosync.conf
Then edit the file and [change] the following fields:
memberaddr: use the addresses of your members bindnetaddr: use the address of your network ring0_addr: set the hostname of each node nodeid: not really necessary to change it, use values that you prefer
Fixing permissions
All these files should already exist (except for corosync.pid, unless you had already run corosync for some reason). If one of these commands give an error, do not ignore it!
# chown -R hacluster:haclient ${BUILDPATH}/corosync/exec # chown -R hacluster:haclient ${BUILDPATH}/corosync/common_lib/.libs # chown -R hacluster:haclient ${PREFIX}/var/log/cluster # chown hacluster:haclient ${PREFIX}/var/run # touch ${PREFIX}/var/run/corosync.pid # chown hacluster:haclient ${PREFIX}/var/run/corosync.pid # chown -R hacluster:haclient /opt/var/lib # chown -R hacluster:haclient /opt/var/run/resource-agents/
Setting up Corosync in SMF
To run corosync as a service in SMF, you will need the manifest and the executable script. You can find both of them on gist.
Download the manifest and the script and put them in place:
# wget https://gist.github.com/vincepii/2771b79dddd18adb1e51/raw/f0cc2d8c08419dd44ee9b4e1c2b6290d5c8859f0/corosync.xml # wget https://gist.github.com/vincepii/2771b79dddd18adb1e51/raw/f744a4ef02fa2e7e4d5a2f9c403cb9e9ff411617/corosyncd # mkdir ${PREFIX}/etc/smf # mv corosyncd ${PREFIX}/etc/smf/ # mv corosync.xml ${PREFIX}/etc/smf/ # chmod u+x ${PREFIX}/etc/smf/corosyncd
Validate the SMF manifest, hopefully you will get no errors.
# svccfg validate ${PREFIX}/etc/smf/corosync.xml
Import and enable the service:
# svccfg import ${PREFIX}/etc/smf/corosync.xml # svcadm enable corosync
Check if the service started:
# svcs | grep corosync
If everything went well, you should see an output like the following:
online 12:35:55 svc:/application/hacluster/corosync:default
If something went wrong, try to get the output of the SMF script with
# cat `svcs -L corosync`
Now corosync and pacemaker should start at boot. You can disable and enable the service with:
# svcadm disable corosync # svcadm enable corosync
Running the cluster
You can check the cluster status with:
# crm_mon
(remember to source pacemaker.rc if the command is not available!)
The output should look similar to the following (note that for this run I had 3 nodes, 2 of which offline, and I set the expected_votes to 1 to have a partition with quorum):
You need to have curses available at compile time to enable console mode Last updated: Thu Nov 6 12:44:26 2014 Last change: Thu Nov 6 12:42:01 2014 Stack: corosync Current DC: ha-test-1 (80) - partition with quorum Version: 1.1.12-272814b 3 Nodes configured 0 Resources configured Node omni-pcm (20): UNCLEAN (offline) Node omni-pcm-2 (40): UNCLEAN (offline) Online: [ ha-test-1 ]
Administer the Dummy resource!
If you now have configured two nodes to create a pacemaker cluster, the next step is to check that the cluster can administer a resource.
We will use the Dummy resource, which does nothing other than verifying that it is running. When we start the resource, it will run on one of the nodes. If we kill that node in some way, the Dummy resource should fail over and start running on the other node.
This demonstrates that pacemaker operations are working.
First, let’s create some symlinks and set some permissions to be sure that pacemaker will find everything accessible:
# ln -s /usr/lib/ocf/resource.d/pacemaker /opt/usr/lib/ocf/resource.d/pacemaker # ln -s /opt/usr/lib/ocf/lib/ /usr/lib/ocf/lib # ln -s /opt/usr/lib/ocf/resource.d/heartbeat /usr/lib/ocf/resource.d/heartbeat # chown -R hacluster:haclient /usr/lib/ocf # chown -R hacluster:haclient /opt/usr/lib/ocf
As a basic configuration for our cluster, we disable STONITH and ignore the no-quorum state (with two nodes, we cannot have a quorum if one of them fails!):
# crm configure property stonith-enabled=false # crm configure property no-quorum-policy=ignore
Verify that our configuration so far is correct with
# crm_verify -L -V
that should print nothing if everything is fine.
Now configure the Dummy resource:
# crm configure primitive dummy ocf:pacemaker:Dummy op monitor interval=120s
and then check its status (it should be running on one of the nodes):
# crm resource status dummy resource dummy is running on: ha-test-1
Now you can verify that if one node stops working (you can simulate this with svcadm disable corosync), the Dummy resource will be started on the other node.
You can check the status also with crm_mon, that should show something similar to this:
# crm_mon Defaulting to one-shot mode You need to have curses available at compile time to enable console mode Last updated: Thu Nov 6 16:39:58 2014 Last change: Thu Nov 6 16:34:43 2014 Stack: corosync Current DC: ha-test-1 (80) - partition with quorum Version: 1.1.12-272814b 3 Nodes configured 1 Resources configured Online: [ ha-test-1 ] OFFLINE: [ omni-pcm omni-pcm-2 ] dummy (ocf::pacemaker:Dummy): Started ha-test-1
Conclusions
If everything worked, you should now have a pacemaker cluster running on OmniOS.
If you need different (i.e., useful) resource agents now (e.g., IPaddr), some patching may be needed for the fact that RA scripts will be run by the hacluster user, which, according to this post, doesn’t have authorizations to perform system changes. Also expect problems in case the hacluster user won’t have enough rights to read/write/traverse some folders that the RA script will want to access. Check the logs (tail -f /opt/var/log/cluster/corosync.log), debug and fix :). UPDATE: running corosync and pacemaker as root should also work and this simplifies using RAs. Check the “Cluster user and group” section.
To have a better understanding on pacemaker, please refer to the official documentation.
Credits
I want to thank Andreas Grüninger for his great support in helping me with this setup and his contributions to the pacemaker tools that made it possible to run them on Illumos. I re-used the SMF manifest and script for corosync that he shared with me (with his permission :)). Also, the general procedure was made available by Andreas at this page.
Many thanks also to Sašo Kiselkov for his in-depth blog post on building a HA ZFS storage appliance.
References
- http://grueni.github.io/libqb/
- http://zfs-create.blogspot.ch/2013/06/building-zfs-storage-appliance-part-1.html
- http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_adding_a_resource.html
- http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.manual_config.html
Hi,
It’s Awesome guide, to install the HA-cluster on OmniOS. Btw, I follow your guide and successfully add the Dummy resource. but when I’ve tried to add IPaddr2 resource its return with error on “crm status” http://pastebin.com/bKYru6iD
and also I Can’t find the ZFS resource, can you share the modification you made on the resource script ?
Hi Adhi, I’m very happy to hear that this helped :)!
One first big recommendation to have RAs working more easily is to run pacemaker and corosync as root.
Then, I can share my modified IPaddr script, hoping that this would also work for you, otherwise you may need to debug :).
You can find it here: https://gist.github.com/vincepii/6763170efa5050d2d73d
There are still debug lines (commented out) in there :D.
thanks piiv for your response. More question , how about using stonith ?
just like in Sašo Kiselkov blog post, configure stonith for ipmi resource :
crm(live)configure# primitive head1-stonith stonith:external/ipmi
with this guide my crm only show this option:
crm(live)configure# primitive test_stonith stonith:fence_
stonith:fence_legacy stonith:fence_pcmk
even I link stonith plugins resource from :
ln -s /opt/lib/stonith /opt/usr/lib/
can you help me ?
Hi Adhi,
I have used resource-level fencing on my setup (http://clusterlabs.org/doc/crm_fencing.html), as stonith on IPMI was not an option for us.
I’m Sorry for the next question, but actually I’m not expert in clustering.
How to enable or add the “node-level fencing” ?
You need to get STONITH to run on your pacemaker setup, you can follow any of the available tutorials.
I haven’t covered this, so I don’t know what kind of issues you may run into and if more patching will be needed.
Hi,
Very nice writeup. I have tried to follow all te steps litterally but got stuck building pacemaker.
I get error messages like:
../lib/cluster/.libs/libcrmcluster.so: undefined reference to `cl_get_string’
../lib/cluster/.libs/libcrmcluster.so: undefined reference to `ha_msg_expand’
../lib/cluster/.libs/libcrmcluster.so: undefined reference to `ha_msg_addstruct_compress’
.
. etc
Now, I am not a programmer so troubleshooting this will take me a long time probably. Any tips?
Since this page has been here for a while, might it be that following your instructions, I get a newer version of pacemaker then the one that you are using? I get version 1.1.12. If so, can you please tell me which versions of the tools you have used to accomplish this? I will try to download them then.
Thanks in advance.
Sim
The linker cannot find the libraries that define those symbols.
Most likely it is one of two possible causes:
1. You have the libraries, but they are in a path that the linker is not aware of (missing -L option) or the library is not used for linking (missing -l option)
2. You don’t have the libraries/object files and you need to compile the source code that provides them
I see that all those functions are defined in the heartbeat package, so that’s what you are missing here (http://sourcecodebrowser.com/heartbeat/2.1.3/ha__msg_8h.html#a37d2ff7771f225e758a14e4c390b53f5).
Did heartbeat built and installed correctly?
About the version of the tools, the git checkout command on the pacemaker source code should bring your working copy to the same state as I used.
What version of OmniOS are you using?
Hi,
Thank you very much for your reply!
I have used the commands copy paste from above when building pacemaker. Do you mean a -L or -l option in the make command of pacemaker? Pardon the question, since I haven’t looked at your link regarding the heartbeat package yet. There was one diff while building heartbeat though. The source does not contain a config.in as stated in your red message. I had to apply the changes to the Makefile.am. I am using omnios r12 (same as you stated in this post), though I downloaded one of the first images available. Maybe I should use a more recent r12 if still available? Do you mind if I send you a mail with some questions about your own implementation to your normal mail account? They might clutter your nice page if I did it here 🙂 and are not speciffic to building the source but more to your experience.
High regards!
The -L and/or -l should go in the LDFLAGS, with the right parameters (e.g., -L/path/to/libs or -llibname).
But most likely, something went wrong with the installation of heartbeat.
I have checked the download link and that actually points to the tip of the branch, so your first hypothesis is correct, that is not exactly the same archive that I used (but a most recent one).
You can find the package that I used here: https://owncloud.engineering.zhaw.ch/public.php?service=files&t=bd374d54710dd131bc81ef8baf3d7c59
The OmniOS version that you use should not give problems, I have done this setup there as well.
For the last question, feel free to write to me at “|my four letters username as you can read it on top of this message|” at zhaw dot ch 🙂
Thank you!
I must start with my job now but will keep you posted as soon as I can
l find that my stonith:external/ipmi resource can’t start, and the error log was :stonith-ng: error: get_agent_metadata: Could not retrieve metadata for fencing agent fence_legacy.
could you help to give some light?