|
Trent 'Lathiat' Lloyd (トレント)
|
|
|
| Checking ldirectord is alive |
[Jul. 24th, 2009|01:20 pm] |
This week's blog post is not strictly to do with MySQL, but a little more to do with highly available clusters using heartbeat and ldirectord
While many people use heartbeat with MySQL, ldirectord usage is less common in these scenarios and more often using in clustered web and mail servers. It is however sometimes used with a selection of MySQL slave servers.
ldirectord is a tool that manages IPVS in the kernel. Essentially you give it a list of servers that act as a "cluster" for a service, for example, 2 web servers. It will then setup the linux load balancer IPVS to direct to these 2 machines. However, what it does is keeps monitoring the servers, and if one of them goes away, it removes them from the cluster pool.
The problem I have run into a number of times, is that ldirectord gets stuck and stops monitoring services. However when it does this, then services stop getting updated, and if a service goes down, it won't be removed from the cluster. On top of that, I have had it get stuck on START UP, after a failover, in this case not many of the services had a chance to come up yet, and you are stuck with a cluster often with 0 nodes available to service some requests - which causes downtime.
So last night, Shane Short and myself wrote a patch to ldirectord and a nagios plugin, in order to make sure that ldirectord is doing it's job and hasn't got stuck. It works by hooking into ldirectord's '_check_real' function, which is called whenever a server is checked. It will then adjust the timestamp on a 'pulse' file, which we later check with our nagios script.
Here is the patch to ldirectord: http://lathiat.net/files/ldirectord.patch
So now, in the default configuration we will end up with a file at /var/run/ldirector.ldirectord.pulse (this file is actually in the same location as your ldirectord PID file, so if you have moved that, it will be with it) which has it's timestamp updated with each service check.
Now, we make a plugin for nagios, which I have here:
http://lathiat.net/files/check_ldirectord_pulse
And you can configure this plugin into Nagios or Nagios NRPE as normal. You can test it is working by running the plugin manually, you should have a result like this;
OK: ldirectord pulse is regular (4 seconds since last beat)
The default timeout is 120 seconds (2 minutes) for a warning, and 300 seconds (5 minutes) for a critical alert.
Hope this helps some people using ldirectord! Personally this has caused downtime for ourselves a few times when ldirectord got stuck. I would love to hear any feedback or if you are successfully (or unsuccesfully) using this, you can contact me here: http://lathiat.net/contact |
|
|
| Joining sun... |
[Apr. 14th, 2008|09:55 pm] |
| [ | Tags | | | mysql, sun | ] |
| [ | Current Location |
| | Home | ] |
| [ | Current Mood |
| | happy | ] |
| [ | Current Music |
| | Pete Murray - Oppurtunity | ] |
As most of you reading this are likely aware, I work as a Support Engineer for MySQL AB - which was recently acquired by Sun Microsystems.
If you have been living under some variety of rock-like object, you can see the press release here: http://www.mysql.com/news-and-events/sun/
As part of that, as of this month I now work for Sun Microsystems Australia.
Trent Lloyd MySQL Support Engineer Sun Microsystems
However I am really doing the same job I was before.. so far the acquisition seems to have gone fairly positively. I was a little concerned about some of the employment/IP restrictions that our new fangled contracts bring but in the end I was able to sort out enough to satisfy me.
I am still working from Home, and there are no plans (at least for me) to move around within Sun - I plan to continue my role as MySQL Support Engineer.
I really hope this acquisition is beneficial for MySQL in the long run.. there are potential ups and downs but so far for me the experience has been very positive and the Sun team have been very welcoming.
The MySQL Users Conference is upon us this week, which means there will likely to be some interesting announcements from various MySQL-related companies as tends to be the case around major conferences (there are already some talk of a new product from "Kickfire" that sounds interesting, albeit sketchy at this stage)
In more personal news, they sold the house I am renting so I have to move. For more amusement Aleesha is over-seas until after I have to move so I have to move before she gets back, oh well. I applied for a house today and the Property Manager seemed very enthusiastic so I guess thats a good sign.
It's a very nice 3x2 townhouse: http://gallery.mac.com/lathiat#100525&bgcolor=black&view=grid
It's also only a couple hundred meters from the local telephone exchange, so ADSL2+ speeds should be fairly flat (24/1 M) which will be nice :)
To follow me closer, check out my twitter: twitter.com/lathiat |
|
|
| Cupertino, CA |
[Aug. 18th, 2007|06:05 am] |
| [ | Tags | | | mysql, usa | ] |
| [ | Current Location |
| | Cupertino, CA | ] |
| [ | Current Mood |
| | excited | ] |
| [ | Current Music |
| | Anika-Robert Picardo-Extreme Bob-More Parodies, Travesties & Anomalies | ] |
Thanks to work, I spent this week in Cupertino, CA. Having never been anywhere in the USA before, it was somewhat exciting to be staying literally up the road from the Apple Campus.
 Me @ 1 Infinite Loop
I was also able to goto Google in Mountain View, to attend the Silicon Valley MySQL Users Group at the Visitors Center. Unfortunately I didn't come accross any large google logos to take my photo with, but I did find building number 42

I also spotted the offices of Symantec, Trend Micro, Packeteer, Solid, Microsoft, Borland and MySQL (Surprise!) along the way. Certainly "exciting" for someone thats never been to the bay area.
I had hoped to make it up to San Francisco, and do a little site-seeing of the Golden Gate Bridge and some other stuff, but was too tired to do it this afternoon, and I am attending BarCampBlock today in Palo Alto from early until I fly out tonight - so unfortunately I am going to miss out this trip. Hopefully work will send me back this way again sometime next year. |
|
|
| Trent Lloyd, Support Engineer, MySQL AB |
[May. 26th, 2007|05:10 pm] |
Just over 18 months ago, I started working at HostAway, initially starting out doing some casual 2-week phone support they needed at the time, my role became permanent and I quickly expanded into both Network & Systems Administration (while still doing a lot of day to day customer support, small organizations tend to demand wider skill ranges :)
None the less I felt, for various reasons, it was time to move on. As such, I have just completed my first week working MySQL AB as a Support Engineer.
The job is home based, and I am currently working the hours of 5AM-1PM local time. Now you may think "wow, you're crazy" for working those hours, but I have discovered that so far I am actually quite enjoying it, I feel I am getting a lot more out of my day at the moment because I seem more awake longer, getting up earlier in the day (Usually about 4:30AM).
I do have the option to start working at 7AM on most days, and given I have no travel time this doesn't put me very far behind a "normal" job as far as messing up the daily schedule.
Being a free software 'person', working for a company with heavy involvement in the free software world is pretty exciting for me. I guess time will tell how it goes, but if this week is anything to go by I think I should be happy in the long term.
In other news, I have discovered that listening to Weird Al Yankovic continually is horrible for your sanity, mostly because he picks alot of very catchy songs to parody, which proceed to persistently circle around inside my head for the next week... |
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| |
|
|