JBOD can bite you… (and Ubuntu 7.04)

Okay, so one of the disks in a JBOD (well… single LVM) has been on the way out (hopefully can recover some stuff off it… there’s nothing completely important… but still).

I’ve now learnt and desktop has three new 320GB drives in a RAID5.

Currently installing Ubuntu 7.04 on it. I do have to say that the alternate install disk (which uses debian-installer) has a REALLY nice RAID and LVM setup now. If only it also let you pass parameters to mkfs it would be ideal.

Update: It got the bootloader horribly wrong though and I’ve gotten to piss-fart around trying to get LILO to install and boot. Current result? Blinking cursor in top left of screen. Fantastic… fucking fantastic.

NDB Online Add Node Progress (or rather, testing it)

So, the sitch as of today:

Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.

By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).

I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.

This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.

If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.

But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.

Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.

So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.

Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.

mgmapi timeouts going in…

So my timeout patches for the MySQL Cluster Management API have been finished. This should solve a lot of people’s problems writing management API  applications that want to do something sane when the management server either dies or gets somehow disconnected from you.

More importantly I should say, the autotest run looks good. It passed 199 tests in the daily-basic suite… which is a new record (I added some tests, so that could be classified as cheating)… probably would have been 200 if a sporadically failing test hadn’t failed :(

During my trip back to Melbourne, Jonas will probably apply these to a bunch of trees (at least some of the telco release) – with 5.1 coming at some point.

I heart Gnome SSH Tunnel Manager

Jonas just switched me on to Gnome SSH Tunnel Manager – a simple GNOME app that stores a list of SSH tunnels you want and can automatically start and stop them.

Totally useful for those who travel (hrrm.. fair few MySQLers there) and/or always have SSH tunnels to places (hrrm… MySQLers there too).

There’s a debian package up there (and you can build one easily) but it’s not yet in the Ubuntu archive… maybe for the next release. But works fine on edgy for me!

irritation of the day….

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…

Code size of an engine versus test suite

If you count the lines of code in the MySQL Cluster (NDB) test suite (mysql-5.1/storage/ndb/test – and exclude the old ODBC stuff) you come up with about 104000 lines of code. This is in contrast to the approximate other 350,000 lines of code for the NDB engine (excluding the handler, which is an additional 12,000 lines – this isn’t tested much by the NDB test suite… mysql-test-run.pl is meant to take care of a lot of that).

If you go and check the MyISAM tree, it’s only 40545 lines of code – for the entire engine. That’s right, the MySQL Cluster test suite is about 2.5 times the size of MyISAM.

If you look at mysql-test-run.pl tests, which are just lists of SQL commands with static data, it comes up at 250,000 lines (that excludes result files). The NDB tests do things programmatically – so can generate large amounts of data and different loads quite easily.

The architecture of the NDB tests (commonly referred to as autotest, ATRT or HUGO framework) is very different from mysql-test-run.pl – it easily allows you to write a test that is high on concurrency, high on load and high on amount of data. It also is modular, so that when you get an issue from a customer (or need to do some benchmarking on a speficic type of schema) you can use the utility programs to help you (e.g. there’s one that does random PK updates to tables, one that does scans, one that does index operations etc).

There’s this whole bunch of things you just cannot do with mysql-test-run.pl.

Then we get to fault injection… MySQL Cluster is a distributed system that is designed to withstand failure. Without testing this, we can never say it’s remotely HA. So we test it. A lot. We inject failures into nodes to check our node failure handling, using the utility programs and some basic shell it’s possible to do custom tests (such as multi-node failure)  where our test suite doesn’t have the best coverage yet.

Again, either not possible or extremely hard with mysql-test-run.pl

mysql_slap is the hint of a nice utility to help in testing… but using it in mysql-test-run.pl scripts in a verifyable way (i.e. check what came out is what went in, using a variety of access methods – full table scans, pk scans, index scans, stored procs, cursors, views, joins etc) is tricky at best (but really impossible).

Yes, I’m really pining for a better test suite infrastructure for the MySQL Server – it can only lead to better quality software…. almost somebody just rewriting a bunch of the hugo classes to use the MySQL C API would be useful.

mgmapi timeouts and resurrecting the online add node

The other day I managed to send off what’s nearly the final patches for adding proper timeout support to the MySQL Cluster management API. Jonas has had a bit of a look, found one thing I’ve missed, but it’ll probably get in somewhere soon (probably the carrier grade edition first, then others… 5.1 makes sense IMHO if only for the amount of management server testing that my patches add).

Unfortunately in what we laughingly call the past the management server – for whatever hysterical raisins – never really received much direct testing. Sure, if the data nodes couldn’t get configuration, autotest couldn’t control the daemons or something then things were obviously broken. But, say, a subtle (or not so much) change in API or behaviour would certainly not be picked up.

Although the real “feature of the year” (not my words) is fault injection for the management server that we can use in testing. The MySQL Cluster kernel (data nodes) already have extensive fault injection that is regularly tested by ATRT (storage/ndb/test in the source tree).

I’ve also started to resurrect my online add node patch that I’ve had sitting around in various states for over a year (actually… about 14 months… i just haven’t touched it in 12) and port it to the latest 5.1 tree (as not sure where it’ll end up, start at the lowest common denominator – possible that it’ll end up in Carrier Grade first too). Now comes the problem of testing the sucker. Previously i’ve had a shockingly bad shell script and hard coded files to make this go.

Obviously, hard coded stuff is not the way to go. The real way is to be able to do everything neatly and programmatically so we can run it as part of the regular autotest.

Stockholm

Currently in the MySQL Cluster team office in Stockholm – and have been since Wednesday. I’ll be here for the next 3 weeks working in the office. This will be the longest amount of time I’ve worked in an actual office (instead of working from home) in more than 2.25 years!

I found Veronica Mars on TV last night… which is great, because I’ve sort of become addicted. Unfortunately, Sweden is a few episodes ahead of Australia…. so I’ve skipped a few now (go MythTV, record them for me baby). One really good thing about Swedish TV is that things are subtitled instead of dubbed – excellent if your Swedish isn’t that great (mine isn’t). Whenever here, I also seem to find some TV shows that look really interesting, except for the fact that it’s all in a language I don’t understand… certainly an interesting dilemma.

Today I’ve been working on material for the MySQL Cluster: The Complete Tutorial at the MySQL Conference and Expo. The conference is April 23-26 in Santa Clara, California (and if you register by March 14 you save $200 on registration). It’s going to be a SIX hour hands on tutorial (with breaks, don’t worry).  The hands on part (I think) is very important… that way you walk away with real-world knowledge that you can directly apply – not just with theory that you could have gotten from reading a bit here and there.

I’m really hoping that as many people with existing knowledge (esp MySQLers) can be around during the session to help people when needed… I have a feeling there’ll be a few.

Nearly off to London and Stockholm

In about 4.5hrs, I’ll be in a cab to the airport. At about lunchtime (1:30pm or so) London time, I’ll be in London. I’ll be there for a few days – until the 27th. If you’re around London or can make it, it’d be cool to hang. I plan to be a bit of a tourist here and there as I haven’t seen heaps of London and I do hear it’s nice :)

After that, I’ll be in Stockholm for about three weeks (I leave on the 19th… as it’s currently planned). So if you’re around, give me a yell!

My cell (mobile) number is pretty easy to find (hint: google my name along with my employer and look for a post on a mailing list… my work email sig has my phone number).

I’m in Stockholm for work, I’m going to be working in the office (which will be the longest amount of time I’ve gone to work in an office in over 2 years).

SVN shows its’ true colours

I thought “svn”, I typed “cvs”. Hrrm… sounds about right.

In other revision control news, using quilt to manage work-in-progress patches in conjunction with BK is proving really, really great. I feel like an idiot having lived this long and not worked this way.

I have a feeling that if git was being used I’d just do everything there as it’s so quick anyway. I haven’t used bzr on these sorts of size of repos yet, but it should be good too.

timeout units

Following a discussion on mythtv on #xfs (as you do), and a wondering of “hrrm… i wonder what unit that timeout is” with some NDB code I wish to make the following announcement:

All timeout values in NDB related APIs will now be given in centijiffies of the server system. For APIs that can talk to multiple hosts, it will be furlongs per fortnight.

I feel that having a consistent interface such as this will lead to much less confusion and better apps.

Testing Cluster Certification exam questions

I (like several other of my co-workers) have over the past few days (since about Thursday/Friday IIRC) spent quite a few hours testing exam questions for the upcoming MySQL Cluster Certification exam.

Currently, I hold the record for the most correct answers (and number of questions that have been commented on). Yes, this is a direct challenge to other MySQLers to try and beat me.

For MySQL certifications, there is a giant pool of questions, a selection of which are selected for a particular candidate. So those brave few who go and test every single question spend a lot of time doing so to make sure the end exam is the best possible.

I’ve really grown to repsect the MySQL certs even more than I did before after being involved in creating one… we really try to make sure that to pass these you have to know and understand what you’re doing… no superficial stuff here.

Oh, and Roland is going to be giving a MySQL Cluster Certification Primer at the MySQL Conference coming up in April.

LCA2007 Photos

I’ve added a LCA2007 section to my Gallery with a bunch of photos I took at and around the conference. Feel free to have a look. I’ve posted a bunch of these to flickr already, so you’ve likely seen some if you follow my flickr feed.

Note that this gallery install is usually running a top-of-tree mysql cluster install on a box that has a bunch of other load on it… so things may work, may not – whatever :)

Those of you listening in on Planet MySQL – you should be able to spot a few other MySQLers around there, and there’s photos from the MySQL miniconf.

bitkeeper unlock -s magic

If you ever get something like this:

bk clone -lq ndb bug25567

clone: unable to readlock /home/stewart/Documents/MySQL/5.1/ndb

then try this:

bk unlock -s

to remove stale locks.

I have no idea how anybody is meant to come to the command from the error message… blindly guessing ‘bk help unlock’ worked for me though.

NDB! NDB! The storage engine for me!

Today I set up a mysqld connected to my not-quite-HA cluster at home here to replicate from my MythTV database into cluster. The idea behind this is to eat an increasing amount of my own dogfood around the house.

To do this, I also set up the MySQL Instance Manager to manage the now multiple instances of MySQL Servers on a box here. I found it a pain to do, it should be a lot simpler, but isn’t. At least now things are going okay…. but the feature wish list I have is rather long (perhaps I should hack some stuff up in this “spare time” i’ve been hearing so much about).

I’m also about 10 minutes (or however long the build takes) off moving one of the data nodes off the machine so it will be a real 2 node system (but I still have to move the management server to a third machine to have any real HA… and I have a PowerPC machine marked for that, I just have to await some patches to make it work :)

Currently though, my Gallery is being served off this. There are so many more photos I should add, I just haven’t come up with a decent way to interface f-spot and Gallery together – especially when I go back and retouch, delete or tag photos.

book writing tools

I’m involved in the authoring of two books at the moment – both using different tools, neither of which would be my choice if it was up to me. One is using DocBook, writing raw incredibly verbose XML… which honestly, isn’t that much fun. The other is in Microsoft Word (well, OpenOffice.org Writer for me). The last time I really used Microsoft Word really seriously was probably around 1998/1999 with Office 98 on the Mac. It was a pretty awesome suite of software actually. Especially after the update that fixed a few crashing bugs :)

One thing I do notice though is the collaboration tools in OOo Writer are nowhere near good enough. The notes are small yellow rectangles where you either have to hold the mouse cursor over them to read them (ick, slow) or double click them and scroll right forever to read the whole or in conjunction with the last way, use the object browser.

Also, track changes doesn’t really track changes to things you’ve changed. i.e. i cannot edit the same thing twice and keep both changes. urgh. I’m pretty sure MS Word let me do that… It certainly allowed me to use versions in a Microsoft Word format document – which, unfortunately, a lot of the world still primarily deals with.

If these few things were fixed it would be a much better word processor.

It’s always frustrated me how poorly word processors handle large files too (except perhaps Nisus Writer… that was certainly a neat app). Add a bunch of images and your file now takes ages to open and save. blergh.

As for figures in DocBook, there seems to not be much input and output processing… i.e. if you put in SVG and you output to HTML it doesn’t output very nicely (puts in SVGs into small windows) and probably completely doesn’t work on browsers that don’t do SVG.

I sometimes wonder if we’ve really moved on to something better than LaTeX and xfig…. okay, there are better tools than xfig for a large number of diagram types.

MySQL 5.1.14 has hit the streets, the kids love it.

Over at the DevZone, MySQL 5.1.14 Downloads the cool kids are grabbing the latest 5.1 beta. Lots of Cluster fixes in this release too. We’re getting to a much more polished state for NDB with each release and that’s a good thing to see.

On a totally different topic, i bought a really sweet smelling mango today and cannot wait for the right time sometime this afternoon to eat it. All the summer fruits are really nice at the moment (benefit of being in a warm December I guess) and I’m loving it.

Although 37-41 degrees (Celcius, duh) can be less fun with a rather warm laptop.

online online online! (or restarts are for wusses)

I often see things go past my eyes where customers (and users – i.e. those that don’t send wads of cash our way and hence are not financially supporting my beer, curry and photography habits) have amazing uptime and reliability requirements.

When talking to businesses that use MySQL, it’s not uncommon to have the “if the DB is down, our business doesn’t operate” line bandied around. How people make sure this never happens can differ (hint: it often involves replication and good sysadmin practices).

One thing I like doing is making things easier for people. Sometimes it’s also a much more complicated problem than you’re initially led to believe.

I think configuration files are obsolete. Okay, maybe just for databases. Everything should be changable as an online operation. This should also be able to be done via a standard interface – in our case, SQL. This means it’s suddenly really easy to write portable UIs around the admin functionality (no getting the parsing and generation – most trickily, the modification of text based config files right) just the issuing of SQL to the server, relativly simple. This even enables web apps to tune the database a bit, opting for various amounts of automation for various applications – in a cross platform way!

One of my visions for NDB (MySQL Cluster) is to get rid of the (user visible) configuration file and manage everything through SQL (or management client, something like that). This way you could ALTER CLUSTER ADD NODE, ALTER CLUSTER SET DataMemory=4GB etc and things should “just work”, take however long it needs – without downtime.

In a clustered environment, we could do these operations transactionally so that in the event of node or system failure we have some hope of being in a nicely consistent state and that during system recovery (or node recovery) we’re not performing a configuration change in addition to restarting (e.g. if you edited a config file and then had a crash).

Config changes could also have EXPLAIN, a non-modifying operation that would EXPLAIN what would be done – e.g. rolling restart, taking approximately X minutes per node and Y minutes total. This could help in planning and scheduling of configuration changes.

(i wonder if that made any sense)