July 25, 2014

Percona Replication Manager, a solution for MySQL high availability with replication using Pacemaker

The content of this article is outdated, look here for more up to date information.

Over the last year, the frustration of many of us at Percona regarding issues with MMM has grown to a level where we started looking at other ways of achieving higher availability using MySQL replication. One of the weakness of MMM is its communication layer, so instead of reinventing a flat tire, we decided, Baron Schwartz and I, to develop a solution using Pacemaker, a well known and established cluster manager with a bullet proof communication layer. One of the great thing about Pacemaker is its flexibility but flexibility may results in complexity. With the help of people from the Pacemaker community, namely Florian Haas and Raoul Bhatia, I have been able to modify the existing MySQL Pacemaker resource agent in a way that it survived our replication tests and offered a behavior pretty similar to MMM regarding Virtual IP addresses, VIPs, management. We decided to call this solution PRM for Percona Replication Manager. All the parts are opensource and available under the GPL license.

Keep in mind this solution is hot from the press, consider it alpha. Like I said above, it survived testing in a very controlled environment but it is young and many issues/bugs are likely to be found. Also, it is different from Yoshinori Matsunobu’s MHA solution and in fact it is quite a complement to it. One of my near term goal is to integrate with MHA for master promotion.

The solution is basically made of 3 pieces:

  • The Pacemaker cluster manager
  • A Pacemaker configuration
  • A MySQL resource agent

Here I will not cover the Pacemaker installation since this is fairly straightforward and covered elsewhere. I’ll discuss the MySQL resource agent and the supporting configuration while assuming basic knowledge of Pacemaker.

But, before we start, what does this solution offers.

  • Reader and writer VIPs behaviors similar to MMM
  • If the master fails, a new master is promoted from the slaves, no master to master setup needed. Selection of master is based on scores published by the slaves, the more up to date slaves have higher scores for promotion
  • Some nodes can be dedicated to be only slaves or less likely to become master
  • A node can be the preferred master
  • If replication on a slave breaks or lags beyond a defined threshold, the reader VIP(s) is removed. MySQL is not restarted.
  • If no slaves are ok, all VIPs, readers and writer, will be located on the master
  • During a master switch, connections are killed on the demoted master to avoid replication conflicts
  • All slaves are in read_only mode
  • Simple administrative commands can remove master role from a node
  • Pacemaker stonith devices are supported
  • No logical limits in term of number of nodes
  • Easy to add nodes

In order to setup the solution you’ll need my version of the MySQL resource agent, it is not yet pushed to the main Pacemaker resource agents branch. More testing and cleaning will be needed before that happen. You can get the resource agent from here:

https://github.com/y-trudeau/resource-agents/raw/master/heartbeat/mysql

You can also the whole branch from here:

https://github.com/y-trudeau/resource-agents/zipball/master

On my Ubuntu Lucid VM, this file goes in /usr/lib/ocf/resource.d/heartbeat/ directory.

To use this agent, you’ll need a Pacemaker configuration. As a starting point, I’ll discuss the configuration I use during my tests.

Let’s review the configuration. First it begins by 3 node entries defining the 3 nodes I have in my cluster. One attribute is required to each node, the IP address that will be used for replication. This is a real IP address not a reader or writer VIP. This attribute allows the use of a private network for replication if needed.

Next is the mysql primitive resource declaration. This primitive defines the mysql resource on each node and has many parameters, here’s the ones I had to define:

  • config: The path of the my.cnf file. Remember that Pacemaker will start MySQL, not the regular init.d script
  • pid: The pid file. This is use by Pacemaker to know if MySQL is already running. It should match the my.cnf pid_file setting.
  • socket: The MySQL unix socket file
  • replication_user: The user to use when setting up replication. It is also currently used for the ‘CHANGE MASTER TO’ command, something that should/will change in the future
  • replication_passwd: The password for the above user
  • max_slave_lag: The maximum allowed slave lag in seconds, if a slave lags by more than that value, it will lose its reader VIP(s)
  • evict_outdated_slaves: A mandatory to set this to false otherwise Pacemaker will stop MySQL on a slave that lags behind. This will absolutely not help its recovery.
  • test_user and test_passwd: The credentials to test MySQL. Default is to run select count(*) on mysql.user table, so the user given should at least have select on that table.
  • op monitor: An entry is needed for each role, Master and Slave. Intervals must not be the same.

Following the mysql primitive declaration, the primitives for 3 reader vips and one writer vip are defined. Those are straightforward so I’ll skip detailed description. The next interesting element is the master-slave “ms” declaration. This is how Pacemaker defines an asymmetrical resource having a master and slaves. The only thing that may change here is clone-max=”3″ which should match the number of database nodes you have.

The handling of the VIPs is the truly new thing in the resource agent. I am grateful to Florian Haas who told me to use node attributes to avoid Pacemaker from over reacting. The availability of a reader or writer VIPs on a node are controlled by the attributes readerOK and writerOK and the location rules. An infinite negative weight is given when a VIP should not be on a host. I also added a few colocation rules to help spread the reader VIPs on all the nodes.

As a final thought on the Pacemaker configuration, remember that in order for a pacemaker cluster to run correctly on a 2 nodes cluster, you should set the quorum policy to ignore. Also, this example configuration has no stonith devices defined so stonith is disable. At the end of the configuration, you’ll notice the replication_info cluster attribute. You don’t have to define this, the mysql RA will add it automatically when the first a node will promoted to master.

There are not many requirements regarding the MySQL configuration, Pacemaker will automatically add “skip-start-slave” for a saner behavior. One of the important setting is “log_slave_updates = OFF” (default value). In some cases, if slaves are logging replication updates, it may cause failover issues. Also, the solution relies on the read_only setting on the slave so, make sure the application database use doesn’t have the SUPER privilege which overrides read_only.

Like I mentioned above, this project is young. In the future, I’d like to integrate MHA to benefit for its capacity of bringing all the nodes to a consistent level. Also, the security around the solution should be improved, a fairly easy task I believe. Of course, I’ll work with the maintainers of the Pacemaker resources agents to include it in the main branch once it matured a bit.

Finally, if you are interested by this solution but have problems setting it up, just contact us at Percona, we’ll be pleased to help.

About Yves Trudeau

Yves is a Principal Consultant at Percona, specializing in technologies such as MySQL Cluster, Pacemaker and DRBD. He was previously a senior consultant for MySQL and Sun Microsystems. He holds a Ph.D. in Experimental Physics.

Comments

  1. Good work! A couple comments:

    The pacemaker config looks quite intimidating, what with all those unclear abbreviations and the amount of options.
    I sure wasn’t satisfied with MMM’s behavior, but its configuration was simple enough to understand. Here, it looks like I’m going to copy+paste your config and hope for the best. If anything goes wrong — I’ll have to have deeper understanding of pacemaker.

    This is not a criticism, but an observation: in order to set up the PRM high-avaliability solution for MySQL, you’ll need a sys-admin in addition to your DBA. Not all DBAs will know how to manage and analyze a Pacemaker configuration.

    Just consider the fact that you, as in Percona, had to go to Florian, who is probably one of the most knowledgeable people on Pacemaker, to make it work (e.g. Florian told you you had better used node attributes). I suspect things will not go smooth on all installations. How many Florians are there to be contacted?

    Again, this is merely an observation. Perhaps there is no easy way out. I would surely like a solution which focuses on usability and just wraps it all up for you.

  2. vineet says:

    What about data integrity in such cluster environment?

    As per my understanding this is still asynchronous replication and there are chance of data loss in case of master node failure.

  3. Yves & Shlomi, thanks for the kudos.

    Yves, I know I still owe you a review on that RA; sorry this has taken a while — I’ll try to get to it today.

    Shlomi, as to your comment about the config looking intimidating: I concur, but fear not: it can be made a lot less so. Most of what you see under $cib-bootstrap-options is just scaffolding. The mysql_replication property is auto-managed by the RA. And for the reader VIPs, we can make use of a cool feature in Pacemaker: clones allow us to manage an entire IP range as one resource, and we can then have all those constraints just apply to the clone. This makes for a much-condensed configuration:

    node testvirtbox1 \n attributes IP="10.2.2.160"
    node testvirtbox2 \n attributes IP="10.2.2.161"
    node testvirtbox3 \n attributes IP="10.2.2.162"
    primitive p_mysql ocf:heartbeat:mysql \n params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" \n socket="/var/run/mysqld/mysqld.sock" replication_user="root" \n replication_passwd="rootpass" max_slave_lag="15" evict_outdated_slaves="false" \n binary="/usr/bin/mysqld_safe" test_user="root" \n test_passwd="rootpass" \n op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \n op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1"
    # unique_clone_address="true" configures the resource
    # to manage an IP range when cloned
    primitive p_reader_vip ocf:heartbeat:IPaddr2 \n params ip="10.2.2.171" unique_clone_address="true"
    clone reader_vip p_reader_vip \n meta globally-unique="true" clone-max=3 clone-node-max=3
    primitive writer_vip ocf:heartbeat:IPaddr2 \n params ip="10.2.2.170" nic="eth0" \n meta target-role="Started"
    ms ms_MySQL p_mysql \n meta clone-max="3" notify="true"
    location reader_vip_reader_ok reader_vip \n rule -inf: readerOK eq 0
    location writer_vip_writer_ok writer_vip \n rule -inf: writerOK eq 0
    property $id="cib-bootstrap-options" \n stonith-enabled="false" \n no-quorum-policy="ignore"
    rsc_defaults $id="rsc-options" \n resource-stickiness="100"

    It’s still not super simple, but it’s a lot simpler than hacking a million shell scripts to do this on your own, less reliably. As Yves also mentions, this is an “alpha” solution and his changes to the RA are not yet merged upstream, so we are expecting a few more changes to happen before it’s merged.

  4. Argl. It seems like I missed a closing </code> tag. Yves, if you’re able to edit the comment and fix that, please do.

  5. vineet, that problem is well understood. Yves’ approach is for scale-out. If you need transaction-safe synchronous replication, slap MySQL on DRBD, put everything under Pacemaker management, and you’re good to go. That solution has been around for years.

  6. 15 seconds for slave lag seems short. If you have a query that takes 15 seconds to run on a master, it’s not going to be complete on a slave until 30 seconds have elapsed (assuming servers have the same performance). It seems silly to knock slaves offline just because they’re running a long query – it’s just a known limitation of async replication. If you had several slaves, a slow query like that would knock them all offline at once, which seems to be inviting disaster.

  7. @Shlomi
    I indeed help from Florian but that was for the resource agent design, the implementation is not that complex. For sure the is a learning curve from MMM but Pacemaker is not that complicated.

    @Marcus
    Slave lag is adjustable, so adjust it to what makes sense to you. MMM was also removing slaves from cluster if the were lagging behind so it is not a new behavior.

    @vineet
    Like you said, data integrity is not guaranteed by replication but yet, in many deployment, it is not a hard requirement and replication just do the job.

  8. Viacheslav Biriukov says:

    What about set_read_only function and timeouts?
    If I get it right: resource notify master to kill all connections before it sets read-only (I mean during live migrations)?

  9. William says:

    @Yves
    Thanks for all of the work on this. I would like to add the main thing missing from this blog post ( and many others around the web) is the description of the problem you are trying to solve and the draw backs to the approach. Sadly, I do see a lot of solutions to scale out just assume that data integrity is not an absolute requirement. Doing both is very difficult, no doubt.

  10. William, doing both is actually not much of a problem at all if you combine Yves’ approach for managing the slaves with the traditional DRBD approach for managing the master. And you can run all of that under Pacemaker as your unifying cluster infrastructure umbrella,

  11. Florian
    I tried the clone set with IPAddr2, it sort of work but the behavior is not entirely good. All the reader vips can end up on the same node even and valid nodes can be left without any reader vips. I tried a negative colocation rule with not luck. That’s why I reverted to using individual IPAddr2 resources.

  12. Yves, if you want IP addresses to move away from the node you’ll just have to reset the resource stickiness.

    Let’s see if the configuration snippet works out better this time. :)

    node testvirtbox1
    node testvirtbox2
    node testvirtbox3
    primitive p_mysql ocf:heartbeat:mysql \
    params config=”/etc/mysql/my.cnf” pid=”/var/run/mysqld/mysqld.pid” \n socket=”/var/run/mysqld/mysqld.sock” \
    replication_user=”root” replication_passwd=”rootpass” max_slave_lag=”15″ evict_outdated_slaves=”false” \
    binary=”/usr/sbin/mysqld” test_user=”root” test_passwd=”rootpass” \
    op monitor interval=”20s” role=”Master” OCF_CHECK_LEVEL=”1″ \
    op monitor interval=”30s” role=”Slave” OCF_CHECK_LEVEL=”1″
    ms ms_MySQL p_mysql \
    meta clone-max=”3″ notify=”true”
    primitive p_reader_vip ocf:heartbeat:IPaddr2 \
    params ip=”10.2.2.171″ unique_clone_address=”true” \
    meta resource-stickiness=0
    clone reader_vip p_reader_vip \
    meta globally-unique=”true” clone-max=3 clone-node-max=3
    primitive writer_vip ocf:heartbeat:IPaddr2 \
    params ip=”10.2.2.170″ nic=”eth0″
    location reader_vip_reader_ok reader_vip rule -inf: readerOK eq 0
    location writer_vip_writer_ok writer_vip rule -inf: writerOK eq 0
    property stonith-enabled=”false” no-quorum-policy=”ignore”
    rsc_defaults resource-stickiness=”100″

  13. Florian,
    I tried your config but still some issues:

    Online: [ testvirtbox1 testvirtbox3 testvirtbox2 ]

    writer_vip (ocf::heartbeat:IPaddr2): Started testvirtbox3
    Master/Slave Set: ms_MySQL
    Masters: [ testvirtbox3 ]
    Slaves: [ testvirtbox1 testvirtbox2 ]
    Clone Set: reader_vip (unique)
    p_reader_vip:0 (ocf::heartbeat:IPaddr2): Started testvirtbox2
    p_reader_vip:1 (ocf::heartbeat:IPaddr2): Started testvirtbox1
    p_reader_vip:2 (ocf::heartbeat:IPaddr2): Started testvirtbox1
    root@testvirtbox1:/tmp/mysql.ocf.ra.debug# crm_attribute -N testvirtbox3 -l reboot –name readerOK –query -q
    1

  14. +1 to what Schlomi says. While choosing Pacemaker as a robust clustering framework may not be the worst idea, the next step should be some kind of wrapper where the user provides some simple ini file and you hide all the Pacemaker complexity from end users. Without that, this is a lost cause.

    Also, if you think pacemaker configuration is a lot to digest, you should see the logs that it outputs!

  15. Thx for the post. Ive got some questions.

    1. Why do you use no-quorum-policy=”ignore” at all (there is expected-quorum-votes=”3″ defined also). Having 3 nodes you should be able to go for quorum

    2. binary=”/usr/bin/mysqld_safe” why not just simple take mysqld? mysqld_safe is going to restart mysqld if it exists in an ‘abnormal’ way. Imho only pacemaker should do it.

    3. I don’t see where/how you are starting your master initially. I.e. if you want to upgrade an installation with the HA capabilities of pacemaker.

    4. Still reading the ocf. Great work!

  16. Hi Henrik,
    I do both MMM and Pacemaker and I don’t agree with you that Pacemaker is much more complex setup. The problem with MMM is that it fails to deliver what it is supposed to do. I do agree though that we need a step by step documentation and the idea of configuration wrapper is interesting, I’ll work on that in a near future.

  17. Hi Erkan,
    1. I used no-quorum-policy=”ignore” to allow the cluster to start with only 1 or 2 nodes (in a 3 nodes cluster). With more nodes it would less needed to have “ignore”
    2. You have a point, the old ocf was using mysqld_safe and I just didn’t change it
    3. Indeed, I need to document more. I have in mind at least these: step by step install, migration from MMM, adding nodes, common problems and how to solve them. I’ll work on that.
    4. thanks

  18. Henrik, please be introduced to Pacemaker crm shell templates. http://www.clusterlabs.org/doc/crm_cli.html#_templates

    No need to hack your own wrapper or come up with an .ini file. You just create an publish a template, and users only need to fill in the blanks.

  19. Florian, so why don’t create a system that has web.ini, with this part as it’s only content:
    $ grep -n ^%% ~/.crmconf/web
    23:%% ip 192.168.1.101
    31:%% netmask
    35:%% lvs_support
    61:%% id websvc
    65:%% configfile /etc/apache2/httpd.conf
    71:%% options
    76:%% envfilesthese

    Why do I need to even enter the crm for a basic setup?

    And remember, the crm is already a wrapper around the official, internal xml based configuration format. That the configuration is so difficult that we are discussing a wrapper around a wrapper to hide it… This is the reason why Pacemaker is right up there with Symbian C++ and autotools as most difficult technologies I’ve ever tried to learn. (For a wrapper around autotools, see pandora build system. For Symbian there was no cure and it is now dying a slow death…)

    Yves: I didn’t ever use MMM, but I thought it was supposed to be simple to use. If it is difficult *and* doesn’t work, I’m surprised people ever used it. But look, you and Florian are at this moment the only people in the world who know how to use Pacemaker to manage MySQL replication clusters *correctly*. Even you can only do it with Florian’s help. So I’m wishing you good luck in bringing this technology to the masses, and based on our talks in London I do see there is a chance you can actually come up with something great, but at the moment it’s not there yet.

  20. Henrik, what on earth makes you think the “internal xml based configuration format” is “official”? Yes we all know that due to a misguided release decision back in Heartbeat 2 (that was what? 5 years ago?) a predecessor of this actually required you to edit XML to retrieve or modify the configuration.

    But one of the core changes to Pacemaker when it was spun off from Heartbeat, in 2008, was the introduction of the crm shell as the preferred way to manage the cluster configuration. The crm shell syntax is no less “official” than the underlying XML, which you do not need to touch. Ever.

    See http://theclusterguy.clusterlabs.org/post/178680309/configuring-heartbeat-v1-was-so-simple.

    What’s next? “Linux is crap for text editing” because way-back-when there was only ed? Please.

  21. I can officially say that the shell is no less official than writing raw XML.
    We chose XML for the CIB so that machines could easily read it, users were never intended to see the XML

    There was always supposed to be a GUI or CLI sitting on top, it just took a few more years than we intended for them to get written.

  22. Henrik,
    Why people use MMM if it doesn’t work? Because the only other solution is Flipper and it too had issue. I mean I had many customers with hung MMM blocking their production. I submitted patches to the MMM LP project but they have never even been acknowledged. The MMM code is also terrible to follow and trace. Managing replication in a distributed environment is surprisingly challenging and that is why using Pacemaker is so great, all the aspect of dealing with inter-node communication are handled. Also, Pacemaker is incredibly powerful and flexible. If you deal with distributed computing you should know how to use it, a life saver.

  23. Thank gods we now have vi and emacs, the easy to use text editors :-) (It won’t let me quit, it won’t let me quit… Please if you tell me how to get out of this I promise I will never us it again…)

    I’m thinking the internal xml configuration format is the official one because that’s actually used internally. crm converts to that. There are things you can do with the internal xml and half-xml that you cannot do with crm shell. Yes I did ever have to use them when I wrote my own MySQL agent last Summer. (It’s true the person who’s going to use my agent doesn’t need to understand that.)

    But what really worries me is that I’m not 100% convinced all the Pacemaker developers, who write the internal code, will understand the xml configuration stuff and I fear they will fumble because it’s difficult and unintuitive. You of course will have no problems, but the average dev will.

    Btw, when I go to clusterlabs.org and click Explore | Reference documentation. The first document in that list uses the xml stuff to do something. Yes, I started reading from the top, silly me. When that didn’t satisfy me, I then – for whatever reason – jumped instead to read the 1.0 documentation “Pacemaker Explained”. Does it advice me to use the xml notation, well yes it does.

    If you want people to use the crm way, you can make it more explicit. (But in my case I’m happy I did read about the xml syntax since I really used it for one line of code eventually.)

    For the record, I did learn Symbian. I get stuff done with autotools even if I’m sure I will never understand it. I did write a Pacemaker mysql agent some months ago. (You would hate me if I told you what I made it do :-) But I’m also saying these technologies are more difficult than they should be.

  24. Florian: Btw, just want to say for the record: Using Pacemaker and DRBD for MySQL HA, while still perhaps difficult to setup, does work correctly and people use it for good reason.

    For using Pacemaker for MySQL replication I did not yet see that there would be an agent that I would actually trust to do all the correct steps in a failover situation, and they certainly would not be capable of handling more than a 2 node system (ie they couldn’t do what MHA does). But the design you explained in London does sound correct, so what Yves publishes here – while I haven’t reviewed any of it – should be an awesome improvement. (Still not easy to setup, but at least worth the trouble perhaps :-)

  25. Henrik, sorry, you’re just not making sense. You’re complaining about the fact that a piece of infrastructure allows you access to its internal configuration syntax. So what? You don’t have to use it. The fact that it uses XML internally doesn’t make that the “official” interface. Is the “official” view on a btrfs filesystem the internal B-tree structure, or is it perhaps file handles and inodes and everything else that makes it a POSIX compliant filesystem? Is the “official” means of access to an RDBMS the internal storage engine implementation, or is it maybe SQL?

    But what really worries me is that I’m not 100% convinced all the Pacemaker developers, who write the internal code, will understand the xml configuration stuff and I fear they will fumble because it’s difficult and unintuitive. You of course will have no problems, but the average dev will.

    Huh? “The internal code” is well below the XML layer.

  26. Look, if you want you can take my previous comment as feedback on clusterlabs.org usability. I went to read your documentation. I read about the xml configuration syntax. I didn’t like it. (And it seems you don’t either.) I didn’t read about crm.

    Improve the website so people read what you want them to read.

  27. I didn’t twig that pacemaker had anything to do with crm until this thread! I’m using heartbeat/crm for a redundant web front-end, managing a traditional floating IP. When I set it up I did find it really cryptic, especially since there didn’t seem to be any config file at all, depending entirely on some distributed database with no simple on-disk representation. That also meant that it didn’t seem possible to pre-configure it then bring it up in a working state – I had to bring it up broken, then configure it interactively. Now that it’s up and running, it’s working fine, but getting it there wasn’t easy or comprehensible, and that was only to manage a simple IP address on two nodes! I don’t think I’d want to attempt anything more complex with it, at least not without a lot of time. Also the crm_mon application doesn’t work properly under anything but bash (I prefer zsh) and often shows confusing information about things that should be absolutely clear-cut (this node is up, this node is down, node x is holding resource y etc). It might be completely brilliant of course (and I doubt Pecona would have chosen it otherwise), but it’s not obvious.

    FWIW, I really like mmm – it’s been working really well for me (1.x originally installed for me by Percona, and I’ve done several 2.x deploys since then), coping beautifully with all kinds of weird network and incidents, allowing downtime-free upgrades and more.

  28. Marcus:

    Actually there is an on-disk representation (also in XML).
    We don’t encourage people to modify it directly because most of the time this will not achieve the intended result, but if you’re careful it is possible to do safely.

    With the caveat that you appear to be running a pre-Pacemaker (ie. very old) version of our software, I’d have thought crm_mon already did a reasonable job of showing “node x is holding resource y”, perhaps you’d prefer the output with -n instead?

    Also, please file a bug with details on the crm_mon/zsh issue. This is the first I’ve heard of it.

    Henrik:

    Could you clarify who you mean by “the Pacemaker developers, who write the internal code”?
    Do you refer to the people writing the resource agent scripts that Pacemaker uses or to the authors of Pacemaker itself?

    To your comment that “If you want people to use the crm way, you can make it more explicit”, I would make two points:

    - People should use whatever they are comfortable with.

    There are multiple graphical and command-line options for configuring the cluster.
    The proper thing for the project to do is make the options known, not to dictate a specific tool that admins must use.

    Having said that, the first document listed under the Explore tab is “Clusters from Scratch” which does not use XML.
    That you jumped ahead to the reference material and clicked on the first alphabetically sorted document should not infer much about our relative preference for either configuration method.

    - The purpose of “Pacemaker Explained” is very different from documents like “Clusters from Scratch”.

    One does negate the other. The first (XML, reference) needs to exist so that shells and GUIs can be written, this is the API that Pacemaker itself commits to and details all the available options and configuration constructs. The second (crm, howto) is needed because XML is hard to read.

    I would also point out that “Pacemaker Explained” does /not/ advise you “to use the xml notation”. It only says that the shell syntax is not within the scope of that particular document.

  29. By “Pacemaker developers” I suppose I mean both of those. My point here as in my opinion the xml format wasn’t entirely intuitive, ie the way some xml attributes map to real life objects wasn’t easy for me to remember. So it makes me wonder if even the average Pacemaker and/or Corosync developer fully understands them, or is at risk of making mistakes due to not remembering some exception or interpretation of some configuration parameter. For instance I remember there was something that if I set unique=”1″ it will trigger a restart of the resource (as a side effect, it seems?). Why? If I wanted to restart a resource I would want to say something like restart=”1″.

    Anyway, I suppose what it really comes down to is that if you did usability testing along the lines of a person unfamiliar with pacemaker going to clusterlabs.org to learn how to set it up, I’m afraid results would be poor – using myself as an example I didn’t end up reading the documents that Florian at least feels is the recommended one, which resulted in a poor experience.

    Even there I’m being generous, if you really want to make a usability test you should ask people to first search for “Pacemaker documentation” and then learn to set it up. When “clusterlabs.org” shows up in search results, it’s not at all obvious that is what I’m looking for…

    I admit that on the Explore page “Clusters from Scratch” comes before reference documentation. But the title made me think of “Linux from Scratch”, so not the user friendly easy documentation I was looking for. “HowTo Guides” made me hopeful but that goes to a wiki page about contributing code to the project and such. So really “Reference documentation” was the only one that looked like documentation, and there Pacemaker Explained is what seems to define the essence of Pacemaker. The Pacemaker project should think about this: how should people use Pacemaker easily, and does the web page support that experience or not?

  30. > My point here as in my opinion the xml format wasn’t entirely intuitive

    See above, it wasn’t meant to be.

    > So it makes me wonder if even the average Pacemaker and/or Corosync developer fully understands them,
    > or is at risk of making mistakes due to not remembering some exception or interpretation of some configuration parameter.

    We have approximately 500 automated regression tests to avoid relying on people’s memory.

    > For instance I remember there was something that if I set unique=”1″ it will trigger a restart of
    > the resource (as a side effect, it seems?). Why? If I wanted to restart a resource I would want
    > to say something like restart=”1″.

    1. This is something that came from the OCF standard, not Pacemaker.
    2. It means more than just “restart”, it means “never have two things in the cluster with the same value for this”

    For the rest, I disagree but you are of course entitled to your opinion, especially if you back them up with contributions.
    People are always invited to contribute to the project and improve the things, such as the website and documentation, that they consider are lacking.

  31. Andrew,

    I’m just setting up another heartbeat/CRM installation and ran into the zsh issue I mentioned again. A bit of rummaging led me to this post which describes the problem: http://oss.clusterlabs.org/pipermail/pacemaker/2010-September/007645.html

    Short version: crm uses shell options that only work with bash.

    The server I was working on had root’s shell set to zsh; changing it to bash fixed crm (using ‘sudo -i’ from a user account, as that article suggests), but of course left me with bash as the default shell.

    I don’t know if this is fixed in later versions of heartbeat – I’m using the stock package for Ubuntu Lucid.

  32. Marcus:
    That mail had a couple of work-arounds listed, did you try them?
    Perhaps ask Ubuntu to include the patch listed in the bugzilla.

    If you’re still having problems, please consider contacting the mailing list with the output and errors.

  33. Marcus, the issue you’re having is not related to Heartbeat, only Pacemaker. And the stock packages from Lucid are pretty dated at this point; here’s an updated PPA:

    https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa

  34. Thanks for that Florian – I installed packages from that PPA and my zsh problems have gone away.

  35. Marcus, good to know. Everyone else, apologies that we took this thread off on an OT tangent; we can go back to MySQL integration in Pacemaker now. :)

  36. Lars Fronius says:

    Well, there is galera’s synchronous multi master replication around, which works quite nice. Is there any approach to build a RA for this? It would be quite nice to see, because you don’t need to keep struggling with asynchronous replication which can break your cluster when you want to migrate your clusters Master-IP.

  37. Lars, feel free to either write one, or integrate Galera replication with the existing MySQL RA. http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html is the OCF resource agent developer’s guide, and linux-ha-dev is the correct mailing list to post to if you’re looking for help.

  38. Lars: Since Galera monitors it’s own state, and takes all the actions necessary upon node or network failures, there is little need left for an external cluster manager.

    Mainly, if you like to use virtual IPs (which are not at all necessary with galera, but could be convenient for some operational tasks depending on how you are used to doing things) you could still use Pacemaker to move VIPs around, even if not Galera. You would then have to write some Pacemaker functionality that is aware of Galera, even if not managing Galera itself.

    The other thing Pacemaker could be used for would be to restart a galera node after it has crashed. Given that mysqld_safe already does that, I consider using Pacemaker for that purpose completely overkill.

  39. The other thing Pacemaker could be used for would be to restart a galera node after it has crashed. Given that mysqld_safe already does that, I consider using Pacemaker for that purpose completely overkill.

    Henrik: that suggestion is completely misguided. Yves: can you please change your config snippet to include binary=/usr/sbin/mysqld. In a Pacemaker cluster, it should be Pacemaker that takes care of resource monitoring and recovery. Thanks.

  40. Lars Fronius says:

    I think Pacemaker would be useful for Galera, when it comes to provisioning of new nodes. One gets donor in the cluster, to transfer its state. I would want Pacemaker for moving a VIP away from that (donor-)node. You could also choose a load-balancer, which then automatically disables this IP.

  41. Lars, Galera is an interesting product for sure but it is far from a one size fits all. There are many cases where normal replication is better.

  42. Florian, mysqld_safe does more than just restarting MySQL, it also sets the ulimits and redirect logs to syslog. For my part, I found very convenient to have binary=/usr/bin/mysqld_safe.

  43. Lars: Exactly. You need some sort of failover or load balancing also with Galera. VIPs are not your only option and imo not even the best option, but they are simple and well understood. If you want to use VIPs then you need something that will move them around.

    If you you use Galera and don’t use VIPs, then using Pacemaker purely as a mysqld_safe replacement would be pure folly.

    Yves: I don’t have a lot of field experience there, but I would consider Florian’s advice. If Pacemaker sees that MySQL is not responding, it will try to restart something. Will it then try to restart mysqld or mysqld_safe? And will mysqld_safe simulatenously try to restart something? It sounds like you’re in for a mess… Better to move the functionality provided by mysqld_safe into the pacemaker agent and let pacemaker control everything.

  44. Hi!

    I’m using MySQL in a “classical” setup with Pacemaker and DRBD for failover and replication. I stumbled over your solution while looking for an alternative, which works with the built-in MySQL replication AND Pacemaker. Nice work!

    Still, the aspect of asynchronous replication is something that worries me a bit. I read that there are already synchronous replication mechanisms out there for MySQL, especially Continuent Tungsten Replicator (see http://www.continuent.com). Couldn’t one weld toghether your approach for high availability and continuents replication solution? Just a (probably based on misunderstood information) thought…

    Cheers and good bye,

    Andreas

  45. Andreas: Tungsten is also asynchronous. Possibly what you are referring to is Galera, which is synchronous. Please see this more recent Percona blog for more info on that one: http://www.facebook.com/profile.php?id=665178886

  46. Nice article!

    Though, I’m trying to get this working in my test setup but I’m having problems.
    I have 3 servers, both have the standard mysql db’s setup as in mysql_install_db, created a replication client on all 3, reset masters etc. Stopped all mysqld’s and used your config to get this running, I’m stuck at an error that shows up in the logs

    “Jan 19 13:01:56 tabit mysql[818]: ERROR: /usr/lib/ocf/resource.d//heartbeat/mysql: 1313: -q: not found”

    No resources are assigned to any machine.

    status keeps at:
    Online: [ tabit meissa toucan ]

    Failed actions:
    p_mysql:1_start_0 (node=toucan, call=20, rc=1, status=complete): unknown error
    p_mysql:0_start_0 (node=tabit, call=20, rc=1, status=complete): unknown error
    p_mysql:0_start_0 (node=meissa, call=20, rc=1, status=complete): unknown error

    Any idea’s?

  47. Andreas Stallmann says:

    Hi!

    @Yves:

    I currently have a setup (MySQL on DRBD with Pacemaker) where my applications read and write to the same VIP and I can’t change that (and can’t force our dev-team to change it). Does your configuration work with only one VIP, too? If yes, how would the appropriate crm-setup look like?

    Thanks for your good work,

    Andreas

  48. Andreas Stallmann says:

    Just an other question. I receive the error

    WARNING: p_mysql: action monitor_Slave_0 not advertised in meta-data, it may not be supported by the RA

    when commiting the config. Additionaly the following error shows in crm_mon:

    Failed actions:
    p_mysql:1_monitor_0 (node=int-ipfuie-mgmt01, call=36, rc=5, status=complete): not installed
    p_mysql:0_monitor_0 (node=int-ipfuie-mgmt02, call=49, rc=1, status=complete): unknown error
    p_mysql:0_stop_0 (node=int-ipfuie-mgmt02, call=57, rc=1, status=complete): unknown error

    Any suggestions?

    Cheers,

    Andreas

  49. Hi Andreas,
    It is easy to use only the write_vip, the reader_vips are not mandatory at all.

  50. @Andreas,
    Have you installed the mysql RA from GitHub? The default one that comes with many distribution will not work.

  51. Hi again,

    I still see

    Master/Slave Set: ms_MySQL
    p_mysql:0 (ocf::heartbeat:mysql): Slave int-ipfuie-mgmt01 (unmanaged) FAILED
    p_mysql:1 (ocf::heartbeat:mysql): Slave int-ipfuie-mgmt02 (unmanaged) FAILED

    and

    p_mysql:0_monitor_0 (node=int-ipfuie-mgmt01, call=35, rc=1, status=complete): unknown error
    p_mysql:0_stop_0 (node=int-ipfuie-mgmt01, call=37, rc=1, status=complete): unknown error
    p_mysql:0_monitor_0 (node=int-ipfuie-mgmt02, call=24, rc=1, status=complete): unknown error
    p_mysql:0_stop_0 (node=int-ipfuie-mgmt02, call=26, rc=1, status=complete): unknown error

    There are no obvious errors in /var/log/messages. Any ideas where to look?

    By the way: If this is not the right place to ask such questions, please redirect me to a more appropriate site/forum/mailing list.

    Thanks,

    Andreas

  52. …and just an other tought:

    Because you did not mention the prerequisites for your setup, I did it according to

    http://dev.mysql.com/doc/refman/5.1/en/replication-howto.html and http://dev.mysql.com/doc/refman/5.1/de/replication-howto.html

    Anything wrong with that? Do you perhaps rely on a “clean” setup without the nodes being preconfigured for replication?

    Cheers,

    Andreas

    PS: You where right with your last comment; I forgot to copy the agent to one of my nodes.

  53. Andreas, the Pacemaker mailing list is the best option to discuss configuration issues. http://oss.clusterlabs.org/mailman/listinfo/pacemaker

  54. One possible alternation to your resource script (for pacemaker pacemaker-1.1.2.1 under OpenSuSE 11.3):

    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d//heartbeat/}
    . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs

    The paths you provided did not work. I don’t know, if this is any relevant to other distributions.

    Cheers,

    Andreas

  55. Lars Fronius says:

    There was a changed made, when RHCS Resource-Agents were merged into Pacemaker. The path in versions before that merge of resource-agents was
    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d//heartbeat/}
    . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
    , for later versions it is
    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
    . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
    You just have to adapt it by hand…

  56. I found a possible bug in the script. When I call it with ocf-tester, I get:

    mysql[31420]: ERROR: /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs: line 332: -q: command not found

    Indeed, in line 889 of your script I read:

    ocf_run -q $MYSQL $mysql_options \

    whereas ocf_run is called without -q everywhere else in your script.

    Secondly, the ocf-tester reports:

    ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

    This happens, although I provided a password for “test_user=root”. Could it be, that the password is not read from the OCF_RESKEY_test_passwd-Parameter but instead still uses the default (which is empty)?

    Cheers,

    Andreas

  57. Andreas, when you test this agent, please rebuild the resource-agents package from upstream git, or at least get a reasonably recent one that your distro may ship. Don’t expect to be able to drop this agent into an age-old install.

  58. Hi Florian,

    thanks for your suggestion. I included the HA-repo, as you advised. Still, the packages don’t look to outdated. Especially the “resource-agents package looks like it’s just a minor-minor release.

    Are there any other packages, which you would consider to be too old or problematic concerning the new mysql resource script?

    Package / Installed Version / Available in HA-Repo:
    ——————————————————————-
    > resource-agents; 1.0.3-1.4; 1.0.3-2.11
    > pacemaker; 1.1.2.1-2.1.1; 1.1.5-1.1
    > libpacemaker3; 1.1.2.1-2.1.1; 1.1.5-1.1
    > cluster-glue; 1.0.5-1.4; 1.0.6-1.14
    > corosync; 1.2.1-1.2; 1.2.7-1.10

    @Yves: If you find the time to write a full “howto” about your resource agent, would you mind to include the prerequisites, including the minimum version of the packages or something like a “verfied and tested on…”? That would be really nice! :-)

    Cheers,

    Andreas

  59. Andreas, I have no clue where you’re getting your reference version info from, but the current Corosync release is 1.4.2, Pacemaker is at 1.1.6, and resource-agents is at 3.9.2.

  60. Florian, I just followed the link to the repo on http://www.clusterlabs.org/rpm-next/ for OpenSuSE 11.3. I thought this repo would carry the latest packages (even for an one year old distribution like OS 11.3). Are there better (maintained) repos? If not I’ll consider a distribution upgrade or even a change of the distribution. Is there any distribution known for it’s “bleeding edge” maintenance of the ha-packages?

    Cheers,

    Andreas

  61. I thought OpenSUSE did have a pretty solid HA stack (it certainly does in 12.1). The distro that currently tracks upstream most closely is, believe it or not, Debian (via squeeze-backports).

  62. I posted before that i got this error, thanks for pointing out the backport of squeeze, i updated from backport and the RA worked.

    But i ran into a problem, which was: setting the slaves CHANGE MASTER TO command, after some debugging I found that there was data missing in:

    property $id=”mysql_replication” replication_info=”192.168.5.96||”

    the function “update_data_master_status()” fetches that information and saves it to a temp file, but that file was empty, all the time because it wasn’t allowed to “SHOW MASTER STATUS” .. why? $mysql_options was empty .

    i copied the code from all other functions to fill mysql_options and after that it works perfect!

    The final version of the function is:

    # Stores data for MASTER STATUS from MySQL
    update_data_master_status() {
    local mysql_options
    mysql_options=”$MYSQL_OPTIONS_LOCAL”
    if [ -n $OCF_RESKEY_replication_user ]; then
    mysql_options=”$mysql_options $MYSQL_OPTIONS_REPL”
    fi
    tmpfile=mktemp ${HA_RSCTMP}/master_status.${OCF_RESOURCE_INSTANCE}.XXXXXX
    $MYSQL $mysql_options -e “SHOW MASTER STATUS\G” > $tmpfile
    }

  63. Hi – We’re working on the same path. Here is a link to a Google Doc I’m going to publish on MySQLFanBoy.com when I’m done. My knowlage of PaceMaker is week. My next step is to add MHA failover to the process.

    https://docs.google.com/document/d/1UVaQcxjsZQj19BZy8ngU9hRJRHCC6vt66v4-FFjnHlU/edit

    Have you started work on the MHA failover?

  64. trey85stang says:

    Very nice, Im running it in a lab setup and for the most part it is working great. the only problem I am seeing (other then the above mentioned update_data_master_status function) is the reader vip’s do not migrate over to the slaves in my setup. They remain on the writer node… even though the slaves have started and are working. Im sure there are some location/colocation restraints that are still needed.

    I look forward to this making it upstream to the pacemaker repos.

  65. I’ve changed a few more things to get this working. It’s a combination of the configurations supplied by Florian and Yves. I fiddled with the settings here and there and did the above patch in the RA itself.

    The whole setup now works like a charm, resources stick correctly, reader/writer (V)IPs migrate correctly, etc.

    The configuration can be found here: http://www.e-rave.nl/wp-content/uploads/2012/02/mysql-ha.txt

    I’ve 3 nodes. The reader IP is a VIP placed on lo:# and my keepalived load-balancer will distribute the reads on the 3 machines. The writer IP is where ms_Mysql is master and is placed on eth1:#
    ms_MySQL has preferred: meissa > tabit > cepheus

  66. Hi Mark,

    that’s excellent. Now we’ll have to get Yves’ and your patches in proper shape for upstream, which basically amounts to just a little bit of git mangling. Configuration looks good to me except for one detail: the INFINITY resource stickiness. Can you explain why you think that’s necessary?

    Cheers,
    Florian

  67. Well tbh the INFINITY was more of a “sigh, I’m playing with this for ages, let’s set it to INF. and see what happens”, i knew someone was going to take notice of it ;-)

    I’m not a pacemaker guru and still learning, I’ve read some stuff about the resource stickiness, but i still can’t find out how it actually works. I tried 100, 1000 and still ms_MySQL:Master switched back to my preferred server when it came back online. I want it to use the preferred server only when the current one fails. INFINITY seems to do the trick ;) I know not good practice ..

    Now let’s see if my boss accepts my suggestion to attend “Percona Live: MySQL Conference 2012″ so you can explain me IRL :)

  68. Now let’s see if my boss accepts my suggestion to attend “Percona Live: MySQL Conference 2012″ so you can explain me IRL

    You might want to tell them that early bird registrations are still on, so if they make up their mind quickly it’s costing them less than if they take their time. :)

  69. Hi Mark,
    I am happy to hear it works like a charm. Maybe it is me not knowing enough about keepalived but why are you defining reader_vip on lo? I mean, if you are not using these resources, just don’t define them. I also had issues with the way you wrote the colocation rule for the writer_vip but I was using Pacemaker 1.0.11, maybe it is fine with 1.1.6.

    Regards,
    Yves

  70. Hi trey85stang,
    in order for the reader VIP to move to a slave, the slave need to update the readerOK attribute during a monitor call. Look at the content of you CIB with ‘cibadmin -Q > /tmp/cib.xml’ and check if the readerOK attributes are defined. If not, you’ll need to investigate why. A good way to get debuging info is to create the file /tmp/mysql.ocf.ra.debug/log with “mkdir -p /tmp/mysql.ocf.ra.debug; touch /tmp/mysql.ocf.ra.debug/log”. That’s very verbose, check for the monitor calls. Once done, delete the “log” file to stop the logging.

    Regards,
    Yves

  71. Well, I use keepalived with Direct Routing for loadbalancing. We setup a vip (192.168.5.141) in the config, and attach 3 real servers to it (.74, .95 and .96). Now when there is a request on .141 it goes to the loadbalancer, he picks one of the real servers that is available, rewrites the mac address in the packet and sends it to the real server. I put the .141 VIP on the local interface so the kernel accepts this packet as being destined for him. It’s a common way to use Keepalived with DR. For MySQL you might want to use NAT, but DR is about 50x faster.

    You are right about my setup, in my setup I could remove the readVIP and just set it as a default in my network config.
    But that’s not fun! ;) Let’s say I want to change my setup to have only the reader VIP up on slaves.

    I can change the clone rule with clone-max=2 and set a co location “dislike” for readervip on a master server, something like:

    colocation dislike-reader-on-master -inf: reader_vip ms_MySQL:Master

    so it’ll skip the master and set the other 2 nodes with a readervip (more of a hypothesis, but i am going to test it ;))

    Again, the whole reader VIP can be removed from the config in my particular setup, but I’d thought it was a nice exercise for me to implement the reader VIP resource in the cluster as well.

    I moved the writer VIP resource assigning from the location constraint writerOK to a colocation. With your initial config I often had the writer VIP being placed on a server that was a slave, so I didn’t really trust the code behind it. (sorry ;)) So I moved it, so I could control it myself with the colocation “Set p_writer_vip where ms_MySQL is Master role” simple, no background code needed, just pacemaker.

    I’m running everything from squeeze-backport: Pacemaker 1.1.6-2

  72. Hi MarkG,
    you still have some ground to cover, you don’t manage VIPs and replication right now. Please grab the new mysql agent and use a configuration like ours, we would then be able to work together to bring MHA in the picture to improve master promotion.

    Regards,
    Yves

  73. Hi MarkS,
    so, the readerVIP are independent of Pacemaker…. what if the slave is lagging or replication broke. The whole point of this config is to react to these events.

    Regards,
    Yves

  74. I’m already using your latest version of the RA, I only added the mysql_options code in the update_data_master_status function (see few replies ago) so the replication data sets filled correctly in the cib.

    You’re correct on when a slave is behind or broken, it shouldn’t carry a readVIP from that point on until it’s not running behind anymore, i haven’t covered that in my setup. hmm ….

    I’ll read the RA code again next week, maybe it’ll get more clear to me, .. for me to understand it a bit more, can you answer me these questions:

    Are you checking the health of a slave in the monitor function and if so depending on the results of the monitoring set the readerOK and writerOK?

    readerOK = 0 means that a server is not allowed to run a reader_vip or that it doesn’t have a reader_vip? (same for writer i assume)

    Where is decided on who get’s to have the writerVIP?

  75. Hi MarkS,
    sorry the RA version was for MarkG. Yes, in the monitor function, the state of the slave is monitor and the value of the readerOK attribute adjusted if needed. The writerOK is also set in the monitor function based on the status of the read_only variables on MySQL and also in promote and demote function. This is to avoid having a writer_vip present when the database is not ready for it. There are some required steps that are needed to enforce data integrity and at some point in time and for a short duration, no writer_vip can be present.

    Regards,
    Yves

  76. Hi there!

    thanks to Florian, I switched my platform to Ubuntu 11.10, which does have more recent packages of pacemaker and resource_agents.

    I have two issues with the RA:

    1. I used Yves mysql-RA and fixed it with Mark St.’s patch. At first glance, everything looks fine in crm_mon:

    p_writer_vip (ocf::heartbeat:IPaddr2): Started ubuntu-cluster01
    Master/Slave Set: ms_MySQL [p_mysql]
    Masters: [ ubuntu-cluster01 ]
    Slaves: [ ubuntu-cluster02 ]

    Still, the replication never starts! The crm shows:

    property $id=”mysql_replication” replication_info=”||”

    I clearly statet replication_user=”repl” replication_passwd=”7la4epa99″ in my config, still “show slave status” gives the following output:

    mysql> show slave status\G
    *************************** 1. row ***************************
    Slave_IO_State:
    Master_Host: ubuntu-cluster01
    Master_User: test

    Where does this user “test” come from?

    2. If by any odd chance the replication crashes (for example, because that’s what I did, if you create a database just on the master node and then create a new table for that database), the crm get no note of that. That’s bad, because everything looks fine in crm_mon, but there’s no actual replicatin going on any more. Shouldn’t that be something the RA report for slave monitoring?

    Cheers,

    Andreas
    PS: I’m using…
    pacemaker-1.1.5-0ubuntu1
    resource-agents-1:3.9.2-0ubuntu1

  77. Hi Andreas,
    I see the problem in update_data_master_status, the variables $mysql_options is not set correctly. It works by accident right now if root have access to localhost with no password. I’ll change the code, run some basic test and send a patch.

    Regards,

  78. Yves, Tnks, good to know I’m on the same path just a bit behind. I need to study Pacemaker configuration. What should I read.

    Also, What do you think of using Linux Cluster Management Console (LCMC) to manage Pacemaker.

  79. Mark, for an intro to a baseline Pacemaker configuration, perhaps you’d like to take a look at http://youtu.be/3GoT36cK6os (an LCA2012 tutorial).

  80. Hi,
    did some cleanup following MarkS patch, pushed on git here:

    git://github.com/y-trudeau/resource-agents.git

    or

    https://y-trudeau@github.com/y-trudeau/resource-agents.git

    Nothing major changed, just $mysql_option cleanup.

  81. Hi!

    Strange… with the “old” RA (patch from MarkS applied manually) I got from ocf-tester:

    Beginning tests for /usr/lib/ocf/resource.d/heartbeat/mysql…
    * Your agent does not support the reload action (optional)
    /usr/lib/ocf/resource.d/heartbeat/mysql passed all tests

    While, with the latest version you pushed to githup i receive:

    Beginning tests for /usr/lib/ocf/resource.d/heartbeat/mysql…
    : not foundf/resource.d/heartbeat/mysql: 53:
    : not foundf/resource.d/heartbeat/mysql: 56:
    .: 58: Can’t open /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
    * rc=127: Your agent has too restrictive permissions: should be 755
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 53: $’\r’: Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 56: $’\r’: Kommando nicht gefunden.
    : Datei oder Verzeichnis nicht gefunden: Zeile 58: /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 59: $’\r’: Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 61: $’\r’: Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 124: Syntaxfehler beim unerwarteten Wort $'{\r''
    'usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 124:
    usage() {
    -:1: parser error : Document is empty

    ^
    -:1: parser error : Start tag expected, ‘<' not found

    ^
    I/O error : Invalid seek
    * rc=1: Your agent produces meta-data which does not conform to ra-api-1.dtd
    * rc=2: The meta-data action cannot fail and must return 0
    * rc=2: Validation failed. Did you supply enough options with -o ?
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 53: $'\r': Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 56: $'\r': Kommando nicht gefunden.
    : Datei oder Verzeichnis nicht gefunden: Zeile 58: /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 59: $'\r': Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 61: $'\r': Kommando nicht gefunden.
    /usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 124: Syntaxfehler beim unerwarteten Wort $'{\r''
    'usr/lib/ocf/resource.d/heartbeat/mysql: Zeile 124:
    usage() {
    Aborting tests

    Doesn't look to good, does it?

    Cheers,

    Andreas

  82. Andreas, I just pulled the lastest RA from Yves’ git and it works without a problem, so the RA on github is fine.
    I suggest you try and download it again, try:

    wget https://raw.github.com/y-trudeau/resource-agents/master/heartbeat/mysql
    chmod +x mysql

  83. Hi again *sigh*,

    OK, I downloaded the RA via wget instead of the whole repo via git and now it somehow “works” again (that is: I don’t see any errors in crm_mon and host02 is displayed as slave and host01 as master.

    Still: host01 never really becomes slave, the command “START SLAVE” is never executed! There where no errors in the logfile, just one warning (on both nodes):

    WARNING: MyRA: Attempted to unset the replication master on an instance that is not configured as a replication slave

    (I put a “MyRA in front of every log message, to make them easier to grep from the logs).

    After adding some further debug messages, I found out, that the function set_master() is not executed at any time (I never found the output “Changing MySQL configuration…” anywhere in the logs).

    It seems, the RA never get’s past this if-construct in line 1027:

    if [ "$master_host" -a "$master_host" != uname -n ]; then

    The reason for that is, that the value of $master_host is empty and that’s (obviously) because $OCF_RESKEY_CRM_meta_notify_master_uname is not set. That’s probably related to this output from crm configure show:

    property $id=”mysql_replication” \
    replication_info=”|mysql-bin.000048|106″

    Any idea, why that happens? And any further ideas, why this happens and STILL everything looks allright in crm_mon? Shouldn’t we get some errors, if the function set_master is never executed?

    Cheers,

    Andreas
    PS: MarkS, would you please post (or send to me via mail) your (working) setup, so that I can duplicate it? I need:
    - OS and kernel version
    - package-version at least of mysql, pacemaker and resource-agents
    - output of crm configure show (if possible, ony the user-generated stuff and not what’s put in there by the crm)
    - output of grep -v “#” /etc/[mysql]/my.cnf | grep -v “^$”

    Thanks a lot!

  84. Hi Andreas,
    you are missing the IP of the master, have you added the IP attributes in the nodes section? Like this:

    node testvirtbox1 \
    attributes IP=”10.2.2.160″
    node testvirtbox2 \
    attributes IP=”10.2.2.161″
    node testvirtbox3 \
    attributes IP=”10.2.2.162″

    This is from where the IP to use for the master role comes from. In crm_mon, since we don’t evict slaves, the only impact of not having replication running is to have the readerOK attribute set to 0.

    Regards,

    Yves

  85. Hi Yves,

    thank’s a lot. Sometimes it’s the small details…

    The “attributes IP” weren’t set for the nodes. No it looks better – somehow. The replication_info is set:

    replication_info=”10.20.0.21|mysql-bin.000056|106″

    Still, on the slave node, I find no slave configuration:

    mysql -u repl -p7la4epa99 -e’show slave statusG’ returns an empty set and the master status differs:

    *************************** 1. row ***************************
    File: mysql-bin.000050
    Position: 106
    Binlog_Do_DB:
    Binlog_Ignore_DB:

    And again, the function “set_master” is never used and master_host stays empty.

    Any further ideas?

    Cheers,

    Andreas
    PS: Find my whole config below:

    node ubuntu-cluster03 \n attributes IP=”10.20.0.21″
    node ubuntu-cluster04 \n attributes IP=”10.20.0.22″
    primitive p_mysql ocf:heartbeat:mysql \n params config=”/etc/mysql/my.cnf” pid=”/var/run/mysqld/mysqld.pid” socket=”/var/run/mysqld/mysqld.sock” replication_user=”repl” replication_passwd=”7la4epa99″ max_slave_lag=”15″ evict_outdated_slaves=”false” binary=”/usr/sbin/mysqld” test_user=”root” test_passwd=”r00tp877″ \n op monitor interval=”20s” role=”Master” OCF_CHECK_LEVEL=”1″ \n op monitor interval=”30s” role=”Slave” OCF_CHECK_LEVEL=”1″
    primitive p_writer_vip ocf:heartbeat:IPaddr2 \n params ip=”10.20.0.20″ cidr_netmask=”16″ nic=”eth0:0″
    ms ms_MySQL p_mysql \n meta master-max=”1″ master-node-max=”1″ clone-max=”2″ clone-node-max=”1″ target-role=”Master”
    location pref-master-1 ms_MySQL 100: ubuntu-cluster03
    location pref-master-2 ms_MySQL 50: ubuntu-cluster04
    colocation writer-on-master inf: p_writer_vip ms_MySQL:Master
    property $id=”cib-bootstrap-options” \n dc-version=”1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f” \n cluster-infrastructure=”openais” \n expected-quorum-votes=”2″ \n stonith-enabled=”false” \n no-quorum-policy=”ignore” \n default-resource-stickiness=”1000″ \n last-lrm-refresh=”1328620089″
    property $id=”mysql_replication” \n replication_info=”10.20.0.21|mysql-bin.000056|106″

  86. Hi,
    I think this has a too high traffic for blog post comments. I suggest we moved to the pacemaker mailing list, pacemaker@oss.clusterlabs.org. Andreas, I’ll reply to your email there, I see you are already registered there.

    Regards,

    Yves

  87. Andreas,

    You’ve set the wrong nic:
    primitive p_writer_vip ocf:heartbeat:IPaddr2 params ip=”10.20.0.20″ cidr_netmask=”16″ nic=”eth0:0″
    should be:
    primitive p_writer_vip ocf:heartbeat:IPaddr2 params ip=”10.20.0.20″ cidr_netmask=”16″ nic=”eth0″

    Pacemaker will automatically decide if it should be 0:0, 0:1 etc ..

    I’m making an entry on my blog how I set this thing up. I’ll post it here when I’m done with it.

    Cheers,
    Mark

  88. Hi Andreas,
    the change master is done in the notify events but you forgot to include ‘notify=”true”‘ in the ms_MySQL declaration so the notify events are not generated. Please continue the discussion on pacemaker@oss.clusterlabs.org, I know I haven’t replied on the list to your emails, I’ll try to be more diligent know, although I’ll be out for 5 days starting Friday.

    Regards,

    Yves

  89. Yves and Mark S.

    Mark S. Do you have you resources ordered? I update mysql from github and stuff started working. However, when I add a dependency it thinks the slave is not running. Here is what I’m adding.

    colocation col_ms_mysql_failover-ip inf: ms_mysql failover-ip
    order ord_failover-ip_ms_mysql inf: failover-ip ms_mysql

    Here is my crm config before the ordering.

    node db1.grennan.com \
    attributes IP=”192.168.2.201″
    node db2.grennan.com \
    attributes IP=”192.168.2.202″
    primitive failover-ip ocf:heartbeat:IPaddr \
    params ip=”192.168.2.200″ \
    operations $id=”failover-ip-operations” \
    op monitor start-delay=”0″ interval=”2″ \
    meta target-role=”started”
    primitive p_mysql ocf:heartbeat:mysql \
    params binary=”/usr/bin/mysqld_safe” pid=”/var/run/mysqld/mysqld.pid” socket=”/data/mysql/mysql.sock” replication_user=”root” replication_passwd=”P@ssw0rd” \
    meta target-role=”started”
    ms ms_mysql p_mysql \
    meta clone-max=”2″
    property $id=”cib-bootstrap-options” \
    expected-quorum-votes=”2″ \
    stonith-enabled=”false” \
    dc-version=”1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558″ \
    no-quorum-policy=”ignore” \
    cluster-infrastructure=”openais” \
    last-lrm-refresh=”1328725415″
    property $id=”mysql_replication” \
    replication_info=”192.168.2.201|mysql-bin.000741|107″

  90. Hi MarkG,
    you too are missing the ‘notify=”true”‘ so you config cannot work, you are also missing the monitor operations for p_mysql, another show stopper. Please post to pacemaker@oss.clusterlabs.org, the comment stream is becoming huge.

    Regards,

    Yves

  91. I have joined the clusterlabs mailing list. I just wanted you to know I have it working with your resource agent out of github. It is your work right?

    I blogged a video example of it on mysqlfanboy.com. And I’ll be blogging my Step-by-Step document in the next week after some testing.

  92. It’s been pretty quiet here for a while now :)

    I’ve to add something I ran into while I had some maintenance planned on the cluster.
    What I wanted to do is add another server to the cluster.

    The “replication_info” config setting is only changed when a new master is elected. So how to add a new slave?

    Create a new slave by dumping/binary copying or any other way your normally prepare a slave. Now change the “replication_info” manually in corosync to the new binlog and logpos position from when you took the dump/copy.

    From within corosync itself you’ll get an error when you try to edit it, so use the command that the mysql RA uses, which is:
    crm_attribute –type crm_config –name replication_info -s mysql_replication -v “||”

    Now start corosync on your slave and let it join the cluster, it will now correctly start slaving from the position you made the snapshot.

  93. Johannes Arlt says:

    On my boxs with MySQL 5.5 (Ubuntu Server 12.04 LTS) the pacemaker mysql plugin (ocf:heartbeat:mysql) doesn’t work, because since mysql 5.5 the statement “CHANGE MASTER TO MASTER_HOST=” ” will make an error and stop this script. As I can see, Yves fixed this in his the last version on gitub.
    But in my MySQL – installation a “SHOW SLAVE STATUS” will ALWAYS produce a completely output. Maybe not initial, but if you use the server instance one time as as a slave, it will do so. In the function is_slave() the test for an output to a tmpfile, not for the content “Slave_SQL_Running: Yes” is used for to say that this server is a slave, but later the test of “Slave_SQL_Running: Yes” fails, because the server is not running as a slave. So the 2nd start of pacemaker would be fail. Yesterday, I fix this up very dirty and it works.

  94. SLiX says:

    Hi, thanks for your work ! But I’ve been a little confused at first, because I tried to use “MySQL_replication” agent, without success… so, what is the relation between the two ?

    And also I’ve found a little bug in the agent:

    : ${OCF_RESKEY_reader_attribute=${OCF_RESKEY_evict_reader_attribute_default}}
    should be:
    : ${OCF_RESKEY_reader_attribute=${OCF_RESKEY_reader_attribute_default}}

    Thanks again.
    SLiX.

  95. Jackie Khuu says:

    Dear Mr

    When I set up PRM as your instruction. There are many error I received. Please help me!

    My Set up Enviroment is:
    All server use CentOS 6.2 32 bit.
    One Master name: PRM_M, IP: 192.168.163.205
    One Slave name: PRM_S1, IP: 192.168.163.206

    I also set up MySQL Replication, It’s working fine.

    This is file Resource Agent which I have already edited for my server:
    # Fill in some defaults if no values are specified
    HOSTOS=uname
    if [ "X${HOSTOS}" = "XOpenBSD" ];then
    OCF_RESKEY_binary_default=”/usr/local/bin/mysqld_safe”
    OCF_RESKEY_config_default=”/etc/my.cnf”
    OCF_RESKEY_datadir_default=”/var/mysql”
    OCF_RESKEY_user_default=”_mysql”
    OCF_RESKEY_group_default=”_mysql”
    OCF_RESKEY_log_default=”/var/log/mysqld.log”
    OCF_RESKEY_pid_default=”/var/mysql/mysqld.pid”
    OCF_RESKEY_socket_default=”/var/run/mysql/mysql.sock”
    else
    OCF_RESKEY_binary_default=”/usr/local/mysql/bin/mysqld_safe”
    OCF_RESKEY_config_default=”/usr/local/mysql/etc/my.cnf”
    OCF_RESKEY_datadir_default=”/usr/local/mysql/data”
    OCF_RESKEY_user_default=”mysql”
    OCF_RESKEY_group_default=”mysql”
    OCF_RESKEY_log_default=”/usr/local/mysql/data/PRM_M.err”
    OCF_RESKEY_pid_default=”/usr/local/mysql/data/PRM_M.pid”
    OCF_RESKEY_socket_default=”/tmp/mysql.sock”
    fi
    OCF_RESKEY_client_binary_default=”mysql”
    OCF_RESKEY_test_user_default=”root”
    OCF_RESKEY_test_table_default=”mysql.user”
    OCF_RESKEY_test_passwd_default=”mysql”
    OCF_RESKEY_enable_creation_default=0
    OCF_RESKEY_additional_parameters_default=””
    OCF_RESKEY_replication_port_default=”3306″
    OCF_RESKEY_max_slave_lag_default=”3600″
    OCF_RESKEY_evict_outdated_slaves_default=”false”
    OCF_RESKEY_reader_attribute_default=”readable”

    Finally, pacemaker configuration is:
    node PRM_M \
    attributes IP=”192.168.163.205″
    node PRM_S1 \
    attributes IP=”192.168.163.206″
    node PRM_S2 \
    attributes IP=”192.168.163.207″
    primitive p_mysql ocf:heartbeat:mysql \
    params config=”/usr/local/mysql/etc/my.cnf” pid=”/usr/local/mysql/mysqld.pid” socket=”/tmp/mysqld.sock” replication_user=”root” replication_passwd=”repl” max_slave_lag=”15″ evict_outdated_slaves=”false” binary=”/usr/local/mysql/bin/mysqld_safe” test_user=”root” test_passwd=”centos6″ \
    op monitor interval=”5s” role=”Master” OCF_CHECK_LEVEL=”1″ \
    op monitor interval=”2s” role=”Slave” OCF_CHECK_LEVEL=”1″
    primitive reader_vip_1 ocf:heartbeat:IPaddr2 \
    params ip=”192.168.163.171″ nic=”eth0″
    primitive reader_vip_2 ocf:heartbeat:IPaddr2 \
    params ip=”192.168.163.172″ nic=”eth0″
    primitive reader_vip_3 ocf:heartbeat:IPaddr2 \
    params ip=”192.168.163.173″ nic=”eth0″
    primitive writer_vip ocf:heartbeat:IPaddr2 \
    params ip=”192.168.163.170″ nic=”eth0″ \
    meta target-role=”Started”
    ms ms_MySQL p_mysql \
    meta master-max=”1″ master-node-max=”1″ clone-max=”3″ clone-node-max=”1″ notify=”true” globally-unique=”false” target-role=”Master” is-managed=”true”
    location No-reader-vip-1-loc reader_vip_1 \
    rule $id=”No-reader-vip-1-rule” -inf: readerOK eq 0
    location No-reader-vip-2-loc reader_vip_2 \
    rule $id=”No-reader-vip-2-rule” -inf: readerOK eq 0
    location No-reader-vip-3-loc reader_vip_3 \
    rule $id=”No-reader-vip-3-rule” -inf: readerOK eq 0
    location No-writer-vip-loc writer_vip \
    rule $id=”No-writer-vip-rule” -inf: writerOK eq 0
    colocation reader_vip_1_dislike_reader_vip_2 -200: reader_vip_1 reader_vip_2
    colocation reader_vip_1_dislike_reader_vip_3 -200: reader_vip_1 reader_vip_3
    colocation reader_vip_2_dislike_reader_vip_3 -200: reader_vip_2 reader_vip_3
    property $id=”cib-bootstrap-options” \
    dc-version=”1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558″ \
    cluster-infrastructure=”openais” \
    expected-quorum-votes=”2″

    When I start corosync, and pacemaker, then check it with command: crm_mon -1, this is my results:
    [root@PRM_M ~]# crm_mon -1
    ============
    Last updated: Mon May 28 11:19:57 2012
    Last change: Mon May 28 11:16:40 2012 via crmd on PRM_S1
    Stack: openais
    Current DC: PRM_S1 – partition with quorum
    Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
    3 Nodes configured, 2 expected votes
    7 Resources configured.
    ============

    Node PRM_S2: UNCLEAN (offline)
    Online: [ PRM_M PRM_S1 ]

    Failed actions:
    reader_vip_1_start_0 (node=PRM_S1, call=7, rc=6, status=complete): not configured
    reader_vip_2_start_0 (node=PRM_S1, call=8, rc=6, status=complete): not configured
    reader_vip_3_start_0 (node=PRM_S1, call=9, rc=6, status=complete): not configured
    p_mysql:0_monitor_0 (node=PRM_S1, call=6, rc=5, status=complete): not installed
    writer_vip_start_0 (node=PRM_S1, call=10, rc=6, status=complete): not configured
    p_mysql:0_monitor_0 (node=PRM_M, call=6, rc=5, status=complete): not installed
    [root@PRM_M ~]#

    And that seem PRM can not work.
    Please help me if I configured anything wrong.
    Thanks very much!

  96. Sabin Iacob says:

    I guess the article is slightly out of sync with the current version of the RA :)

    I got the it from https://github.com/ClusterLabs/resource-agents (where your README says it’s moved), and I was scratching my head for a while looking for why readerOK and writerOK weren’t set, only to find out by reading the script that it’s called “readable” by default now, and there is no mention of a “writable” attribute

    Also, check_slave has a bug: if replication is not working, $secs_behind is NULL, so [ $secs_behind -gt $OCF_RESKEY_max_slave_lag ] will give you an error and leave the reader IP on (because if sees the exit status from [ as false)

    The correct test is [ $secs_behind = NULL ] || [ $secs_behind -gt $OCF_RESKEY_max_slave_lag ]

    I added code for setting “writable” to the master and things seem to be working fine, but I need to stress it some more (under some circumstances if I reboot the master under load it sets some bonkers value for log file and log position when coming back online as a slave); it is less of a problem after fixing the secs_behind logic, though…

    However, killing the master and watching siege only miss 10-15 requests before the new master takes resuming is bloody impressive :D

    If you are interested in the changes, I can do the github pull request dance and all

  97. Sabin Iacob says:

    ugh… s/takes resuming/takes over/ :(

  98. SLiX says:

    Hi,

    I now have a setup that works OK, but I would like to suggest a little thing.

    When promoting a server as new master, its “slave” state should be resetted (“show slave status” should be empty). This would help “MySQL Enterprise Monitor” to correctly identify replication topology (currently, it finds a circular replication, with one slave stopped).

    If I had to do it (I will try in my test setup), I would do “stop slave; change master to master_host=”, master_user=”, master_password=”; reset slave;”.

    Thanks,
    SLiX.

  99. Sabin Iacob says:

    master_host=” won’t work in mysql 5.5

  100. alex hao ye says:

    Does pacemaker support the failover on the multiple instances? Say, there 3 instances at the master, and 3 instances at the slave, when a failover occurs, can all of three instances be flipped at the same time?

  101. Jay N. says:

    I was wondering about the property $id=”mysql_replication” part : what’s mandatory in it?

    I have to export my Pacemaker configuration to a brand new empty and clean DB.

    Can I export the configuration, open the file, change file and position to something like :
    “replication_info=||4″

    and start and it will work?

    Thanks

  102. Stephen P. says:

    Working through your instructions, we’ve got a four-node master/slave MySQL cluster running like a champ. Thanks!

    One quick question if you have time. I’m thinking of modifying your RA to include ‘On Promotion’ / ‘On Demotion’ parameters which would allow the user to execute an arbitrary script during promotion / demotion events. (e.g. params config=”/etc/my.cnf” onpromotion=”/etc/scripts/onpromo.sh” ondemotion=”/etc/scripts/ondemo.sh …)

    Would you consider this bad practice? and if so, what would you consider to be a better approach?

    Forgive me being a little new at HA/OCF, and thanks for your time.

    Regards,

    Stephen Punak

Speak Your Mind

*