Pacemaker, please meet NDB Cluster or using Pacemaker/Heartbeat to start a NDB Cluster
Customers have always asked me to make NDB Cluster starts automatically upon startup of the servers. For the ones who know NDB Cluster, it is tricky to make it starts automatically. I know at least 2 sets of scripts to manage NDB startup, ndb-initializer and from Johan configurator www.severalnines.com. If all the nodes come up at about the same time, it is not too bad but what if one the critical node takes much longer to start because of an fsck on a large ext3 partition. Then, a startup script becomes a nightmare. Finally, if the box on which the script is supposed to run didn't start at all. That's a lot of rules to handle.
Since all aspects of HA interest me, I was recently reading the Pacemaker documentation and I realized that Pacemaker has all the logic required to manage NDB Cluster startup. Okay it might seems weird to control a cluster by cluster but if you think about it, this is, I think, the best solution.
The Linux-HA project has split the old Heartbeat-2 project in 2 parts, the clustering and communication layer, Heartbeat and the resources manager, Pacemaker. A key new features that has been added to Pacemaker recently, a Clone resources set, that allows an optional startup if only one of 2 similar resources starts. I use this feature to start the data nodes. If after a major outage, only one of the physical host where the data nodes are located comes up, the cluster will start. The other features of Pacemaker that I need are resource location rsc_location and resource ordering rsc_order.
Let's start by the beginning. My NDB cluster is made of the following 3 nodes:
- testvirtbox: ndb_mgmd (10.2.2.139)
- test1: ndbd
- test2: ndbd
For the sake of simplicity, I am not considering the SQL nodes but given the framework, extending to SQL nodes is trivial. Installing Pacemaker and Heartbeat is very easy on Lucid Lynx, just do the following:
-
apt-get install heartbeat pacemaker
On other distributions, you might have to build from sources, look here for help.
There 2 minimal configuration files:
-
root@test2:~# cat /etc/ha.d/authkeys
-
auth 1
-
1 sha1 yves
-
root@test2:~# cat /etc/ha.d/ha.cf
-
autojoin none
-
bcast eth0
-
warntime 5
-
deadtime 15
-
initdead 60
-
keepalive 2
-
node test1
-
node test2
-
node testvirtbox
-
crm respawn
And then, Heartbeat can be started on all nodes with /etc/init.d/heartbeat start.
Next, since Pacemaker is used to start resources and not to manage them, we don't need to define Stonith devices so (run on only one node):
-
crm_attribute -t crm_config -n stonith-enabled -v false
A last before defining resources, since the Heartbeat cluster is asymmetrical, meaning resources will not be able to run anywhere, we must create an "Opt-In" cluster with (run on only one node):
-
crm_attribute --attr-name symmetric-cluster --attr-value false
At this point, we have a running cluster controlling nothing. The trick with NDB Cluster is that Heartbeat is required to start the resources but not to stop them. In order to achieve this behavior, I created fake resource scripts that can be fully controlled by Heartbeat but allowing the one way behavior I wanted.
-
root@testvirtbox:~# cat /usr/local/bin/fake_ndb_mgmd
-
#!/bin/bash
-
-
/usr/bin/nohup /usr/local/mysql/libexec/ndb_mgmd> /dev/null &
-
-
while [ 1 ]
-
do
-
/bin/sleep 60
-
done
-
-
root@testvirtbox:~# cat /usr/local/bin/fake_ndb_cluster_start
-
#!/bin/bash
-
-
#Give some time to the nodes to connect
-
/bin/sleep 15
-
-
/usr/local/mysql/bin/ndb_mgm -e 'all start'> /dev/null
-
-
while [ 1 ]
-
do
-
/bin/sleep 60
-
done
-
-
root@test1:~# cat /usr/local/bin/fake_ndbd
-
#!/bin/bash
-
-
#Give some time to ndb_mgmd to start
-
/bin/sleep 10
-
-
nohup /usr/local/mysql/libexec/ndbd -c 10.2.2.139> /dev/null &
-
-
while [ 1 ]
-
do
-
sleep 60
-
done
With Pacemaker it is not longer required to manipulate the cib in xml format but for this post, xml offers a compact way of presenting the configuration. The first things we need to define are the resources. A very handy resource type for us is the anything resource which allow an arbitrary script or binary to be run. The resources section will look like:
-
<resources>
-
<primitive id="mgmd" class="ocf" type="anything" provider="heartbeat">
-
<instance_attributes id="params-mgmd">
-
<nvpair id="param-mgmd-binfile" name="binfile" value="/usr/local/bin/fake_ndb_mgmd"/>
-
<nvpair id="param-mgmd-pidnile" name="pidfile" value="/var/run/heartbeat/fake_ndb_mgmd.pid"/>
-
</instance_attributes>
-
</primitive>
-
<clone id="ndbdclone">
-
<meta_attributes id="ndbdclone-options">
-
<nvpair id="ndbdclone-option-1" name="globally-unique" value="false"/>
-
<nvpair id="ndbdclone-option-2" name="clone-max" value="2"/>
-
<nvpair id="ndbdclone-option-3" name="clone-node-max" value="1"/>
-
</meta_attributes>
-
<primitive id="ndbd" class="ocf" type="anything" provider="heartbeat">
-
<instance_attributes id="params-ndbd">
-
<nvpair id="param-ndbd-binfile" name="binfile" value="/usr/local/bin/fake_ndbd"/>
-
<nvpair id="param-ndbd-pidfile" name="pidfile" value="/var/run/heartbeat/fake_ndbd.pid"/>
-
</instance_attributes>
-
</primitive>
-
</clone>
-
<primitive id="ndbcluster" class="ocf" type="anything" provider="heartbeat">
-
<instance_attributes id="params-ndbcluster">
-
<nvpair id="param-ndbcluster-binfile" name="binfile" value="/usr/local/bin/fake_ndb_cluster_start"/>
-
<nvpair id="param-ndbcluster-pidfile" name="pidfile" value="/var/run/heartbeat/fake_ndb_cluster_start.pid"/>
-
</instance_attributes>
-
</primitive>
-
</resources>
Please note the ndbd resource is defined through the use of a clone set. The clone set will allow the cluster to start even if only one to the ndb node group is available. If you have multiple ndb node groups, you'll need one clone set per node group. The ndb_mgmd nodes or eventual SQL nodes could have been handled the same way although for SQL nodes, ndb_waiter is very handy. Once the resources are defined, we need to setup the constraints which cover mandatory locations and ordering.
-
<constraints>
-
<rsc_location id="loc-1" rsc="mgmd" node="testvirtbox" score="INFINITY"/>
-
<rsc_location id="loc-2" rsc="ndbcluster" node="testvirtbox" score="INFINITY"/>
-
<rsc_location id="loc-3" rsc="ndbdclone" node="test1" score="INFINITY"/>
-
<rsc_location id="loc-4" rsc="ndbdclone" node="test2" score="INFINITY"/>
-
<rsc_order id="order-1">
-
<resource_set id="ordered-set-1" sequential="true">
-
<resource_ref id="mgmd"/>
-
<resource_ref id="ndbdclone"/>
-
<resource_ref id="ndbcluster"/>
-
</resource_set>
-
</rsc_order>
-
</constraints>
And... that's it. For my part, I configured Pacemaker by dumping the cib in xml format, editing and reloading. In term of commands, it means:
-
cibadmin --query> local.xml
-
vi local.xml
-
cibadmin --replace --xml-file local.xml
Once NDB is started, you can even stop heartbeat, it is no longer required.
P.S.:
As suggested by Florian, here is the configuration in CLI format:
-
root@testvirtbox:~# crm configure show
-
INFO: building help index
-
INFO: object order-1 cannot be represented in the CLI notation
-
node $id="27687295-f72c-49bd-b82d-25f32dbfe1e2" test2
-
node $id="3086852d-abb9-4bdb-93a1-9390e14c148c" test1
-
node $id="cad7f678-fc91-4f09-a39e-1dde6d5bcd30" testvirtbox
-
primitive mgmd ocf:heartbeat:anything \
-
params binfile="/usr/local/bin/fake_ndb_mgmd" pidfile="/var/run/heartbeat/fake_ndb_mgmd.pid"
-
primitive ndbcluster ocf:heartbeat:anything \
-
params binfile="/usr/local/bin/fake_ndb_cluster_start" pidfile="/var/run/heartbeat/fake_ndb_cluster_start.pid"
-
primitive ndbd1-IP ocf:heartbeat:anything \
-
params binfile="/usr/local/bin/fake_ndbd" pidfile="/var/run/heartbeat/fake_ndbd.pid"
-
clone ndbdclone ndbd1-IP \
-
meta globally-unique="false" clone-max="2" clone-node-max="1"
-
location loc-1 mgmd inf: testvirtbox
-
location loc-2 ndbcluster inf: testvirtbox
-
location loc-3 ndbdclone inf: test1
-
location loc-4 ndbdclone inf: test2
-
xml <rsc_order id="order-1"> \
-
<resource_set id="ordered-set-1" sequential="true"> \
-
<resource_ref id="mgmd"/> \
-
<resource_ref id="ndbdclone"/> \
-
<resource_ref id="ndbcluster"/> \
-
</resource_set> \
-
</rsc_order>
9 Comments

23-24 Aug








del.icio.us
digg
Goodness, Yves! Please stop scaring people with XML dumps.
“crm configure show” dumps will do just fine, be much more concise, and easier to read.
Comment :: May 19, 2010 @ 11:16 am
Florian, I must not have read far enough in the Pacemaker doc…
Comment :: May 19, 2010 @ 11:45 am
great post!
erkules;)
Comment :: May 19, 2010 @ 3:45 pm
Any specific reasons to pick heartbeat instead of OpenAIS/corosync? While still compatible with heartbeat, pacemaker was rather designed with OpenAIS in mind …
Comment :: May 20, 2010 @ 12:07 pm
@Didier: sorry, that’s plain wrong. Pacemaker is a spin-off of the Linux-HA project (it’s the continuation of the CRM effort in Heartbeat 2, albeit in a separate project for various good reasons). It supports both messaging layers just fine.
Comment :: May 20, 2010 @ 12:50 pm
@Didier: Pacemaker works very well with Heartbeat and I have been using it for years so it was easier for me to use Heartbeat.
Comment :: May 21, 2010 @ 9:46 am
@Florian: that’s right. Apologies.
Comment :: May 21, 2010 @ 10:47 am
You can also use MySQL Cluster Manager – http://www.mysql.com/products/database/cluster/mcm/ – or Solaris Cluster – http://www.sun.com/software/solaris/cluster/ – to manage a MySQL Cluster.
Comment :: May 28, 2010 @ 1:13 am
I find u really like ocf:heartbeat:anything ……
but i think it’s better to use lsb:[xxx] which can use script in the /etc/init.d/[xxx], then you can add ’sleep words’ in the start() :)
pacemaker is really powerfull.
Comment :: July 15, 2010 @ 5:50 am