faster net is da bomb!

So, while I was away it seems that Telstra enabled the 8Mbit down/1Mbit up stuff in their ADSL points in the exchanges (which has been possible for all sorts of amount of time).

I enabled/upgraded my Internode plan to get the faster speed. It got activated or whatever yesterday, but I didn’t really see an improvement. Anyway, headed over to the Internode support site to check out their setup instructions for my ADSL modem – turns out that simply by changing from PPPoE to PPPoA I’ve gotten a huge speed boost.

Just pulled an LCA video from the Internode mirror at 862K/s. Rock.

NDB Online Add Node Progress (or rather, testing it)

So, the sitch as of today:

Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.

By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).

I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.

This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.

If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.

But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.

Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.

So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.

Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.

mgmapi timeouts going in…

So my timeout patches for the MySQL Cluster Management API have been finished. This should solve a lot of people’s problems writing management API  applications that want to do something sane when the management server either dies or gets somehow disconnected from you.

More importantly I should say, the autotest run looks good. It passed 199 tests in the daily-basic suite… which is a new record (I added some tests, so that could be classified as cheating)… probably would have been 200 if a sporadically failing test hadn’t failed :(

During my trip back to Melbourne, Jonas will probably apply these to a bunch of trees (at least some of the telco release) – with 5.1 coming at some point.

Bus Drivers

(as of the other day) Stockholm is now the only city where I’ve boarded a bus to see what I can only describe as a stunningly gorgeous bus driver.

The good news is she also passed the test for bus drivers – being able to drive. Seems to be a quality often found here. Back home, when regularly taking a bus (uni) it was inevitably a bit hit and miss.

I heart Gnome SSH Tunnel Manager

Jonas just switched me on to Gnome SSH Tunnel Manager – a simple GNOME app that stores a list of SSH tunnels you want and can automatically start and stop them.

Totally useful for those who travel (hrrm.. fair few MySQLers there) and/or always have SSH tunnels to places (hrrm… MySQLers there too).

There’s a debian package up there (and you can build one easily) but it’s not yet in the Ubuntu archive… maybe for the next release. But works fine on edgy for me!

irritation of the day….

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…

Code size of an engine versus test suite

If you count the lines of code in the MySQL Cluster (NDB) test suite (mysql-5.1/storage/ndb/test – and exclude the old ODBC stuff) you come up with about 104000 lines of code. This is in contrast to the approximate other 350,000 lines of code for the NDB engine (excluding the handler, which is an additional 12,000 lines – this isn’t tested much by the NDB test suite… mysql-test-run.pl is meant to take care of a lot of that).

If you go and check the MyISAM tree, it’s only 40545 lines of code – for the entire engine. That’s right, the MySQL Cluster test suite is about 2.5 times the size of MyISAM.

If you look at mysql-test-run.pl tests, which are just lists of SQL commands with static data, it comes up at 250,000 lines (that excludes result files). The NDB tests do things programmatically – so can generate large amounts of data and different loads quite easily.

The architecture of the NDB tests (commonly referred to as autotest, ATRT or HUGO framework) is very different from mysql-test-run.pl – it easily allows you to write a test that is high on concurrency, high on load and high on amount of data. It also is modular, so that when you get an issue from a customer (or need to do some benchmarking on a speficic type of schema) you can use the utility programs to help you (e.g. there’s one that does random PK updates to tables, one that does scans, one that does index operations etc).

There’s this whole bunch of things you just cannot do with mysql-test-run.pl.

Then we get to fault injection… MySQL Cluster is a distributed system that is designed to withstand failure. Without testing this, we can never say it’s remotely HA. So we test it. A lot. We inject failures into nodes to check our node failure handling, using the utility programs and some basic shell it’s possible to do custom tests (such as multi-node failure)  where our test suite doesn’t have the best coverage yet.

Again, either not possible or extremely hard with mysql-test-run.pl

mysql_slap is the hint of a nice utility to help in testing… but using it in mysql-test-run.pl scripts in a verifyable way (i.e. check what came out is what went in, using a variety of access methods – full table scans, pk scans, index scans, stored procs, cursors, views, joins etc) is tricky at best (but really impossible).

Yes, I’m really pining for a better test suite infrastructure for the MySQL Server – it can only lead to better quality software…. almost somebody just rewriting a bunch of the hugo classes to use the MySQL C API would be useful.

Pleading for a better mail suite….

or really just all the Evolution bugs that I consistently hit to be fixed.

Why it needs hundreds of megabytes of memory just to list a single mail folder? What could it possibly be doing, loading the entire mailbox into memory? ick..

currently, after a crash “checking folder consistency” for at leat 10 minutes now… and aparrently i haev 13000 unread messages in INBOX. Bull. About 250 more likely. This will probably be some arse load of crack i’ll have to remove the cache files, restart evo 10 times and sacrifice  a goat to the gods of crackful annoying-apps.

mgmapi timeouts and resurrecting the online add node

The other day I managed to send off what’s nearly the final patches for adding proper timeout support to the MySQL Cluster management API. Jonas has had a bit of a look, found one thing I’ve missed, but it’ll probably get in somewhere soon (probably the carrier grade edition first, then others… 5.1 makes sense IMHO if only for the amount of management server testing that my patches add).

Unfortunately in what we laughingly call the past the management server – for whatever hysterical raisins – never really received much direct testing. Sure, if the data nodes couldn’t get configuration, autotest couldn’t control the daemons or something then things were obviously broken. But, say, a subtle (or not so much) change in API or behaviour would certainly not be picked up.

Although the real “feature of the year” (not my words) is fault injection for the management server that we can use in testing. The MySQL Cluster kernel (data nodes) already have extensive fault injection that is regularly tested by ATRT (storage/ndb/test in the source tree).

I’ve also started to resurrect my online add node patch that I’ve had sitting around in various states for over a year (actually… about 14 months… i just haven’t touched it in 12) and port it to the latest 5.1 tree (as not sure where it’ll end up, start at the lowest common denominator – possible that it’ll end up in Carrier Grade first too). Now comes the problem of testing the sucker. Previously i’ve had a shockingly bad shell script and hard coded files to make this go.

Obviously, hard coded stuff is not the way to go. The real way is to be able to do everything neatly and programmatically so we can run it as part of the regular autotest.

oh LugRadio how funny you are

How many other open source/free software radio/podcast shows could pull off discussing morals and free software with a discussion on machines running free software that a) used for efficient slaughter (of various things) and b) the violent anal raping of donkeys.

Apparently there’s people who think of these things…. and it’s hilarious.

(hrrm… what does that say about me finding that hilarious?)

hej hej

Great things:

  1. I get to see snow. I haven’t seen snow anywhere else in the world yet, just in Stockholm (apart from flying over places… but that doesn’t really count)
  2. The language is cool, a lot of people speak English (to varying degrees) and it’s not that hard to pick up enough to get by (especially since TV programs as subtitled… so watching Buffy on Swedish TV will educate you in enough Swedish to save the world from unspeakable demons)
  3. We have MySQLers here (including a good number of Cluster developers)
  4. They have the Internet here. Not like Australia, stuck on the arse end of the internet – oh no, 5Mbit is considered slow here.
  5. Stockholm really is a beutiful city.
  6. public transport is frequent and close by (at least for the Stockholm area… which is where I am). Further into the center it’s even better, but here it’s good (where here is about 15-20mins via bus and subway to Liljeholmen, where the office is)
  7. R&D is (again, unlike Australia) valued highly here, with a good amonut of high tech industry and a seeming respect for academia.
  8. There’s a chemist in Gamla Stan that’s been there for about 400 years. I haven’t bought anything from there, but I feel I should – to go with that beer from that pub that first got it’s license nearly 400 years ago that I had while in London.

And not so great…

  1. The only way to buy beer stronger than 3.5% is to go to the government run System Bolaget – which is closed at about any time you’d consider buying alcohol. Aparrently the locals get around this by going there and just buying heaps at once – so completely defeating the attempt to get people to buy less. Oh, and if you like any decent liquor – it’s probably cheaper to drive/fly to another country and bring it back. Aparrently that’s what people do… with vans. Lucky for me I picked up some Laphroigh on the way through London
  2. Some things are expensive… and there are relatively high tax rates… although you seem to actually get something for that, so it’s not all bad (unlike in .au… where you seem to get nothing).
  3. It’s a long way from Melbourne, especially in economy seats… urggh. Not exactly a company policy I agree with for such long trips.

for now, hej då

Stockholm

Currently in the MySQL Cluster team office in Stockholm – and have been since Wednesday. I’ll be here for the next 3 weeks working in the office. This will be the longest amount of time I’ve worked in an actual office (instead of working from home) in more than 2.25 years!

I found Veronica Mars on TV last night… which is great, because I’ve sort of become addicted. Unfortunately, Sweden is a few episodes ahead of Australia…. so I’ve skipped a few now (go MythTV, record them for me baby). One really good thing about Swedish TV is that things are subtitled instead of dubbed – excellent if your Swedish isn’t that great (mine isn’t). Whenever here, I also seem to find some TV shows that look really interesting, except for the fact that it’s all in a language I don’t understand… certainly an interesting dilemma.

Today I’ve been working on material for the MySQL Cluster: The Complete Tutorial at the MySQL Conference and Expo. The conference is April 23-26 in Santa Clara, California (and if you register by March 14 you save $200 on registration). It’s going to be a SIX hour hands on tutorial (with breaks, don’t worry).  The hands on part (I think) is very important… that way you walk away with real-world knowledge that you can directly apply – not just with theory that you could have gotten from reading a bit here and there.

I’m really hoping that as many people with existing knowledge (esp MySQLers) can be around during the session to help people when needed… I have a feeling there’ll be a few.

Nearly off to London and Stockholm

In about 4.5hrs, I’ll be in a cab to the airport. At about lunchtime (1:30pm or so) London time, I’ll be in London. I’ll be there for a few days – until the 27th. If you’re around London or can make it, it’d be cool to hang. I plan to be a bit of a tourist here and there as I haven’t seen heaps of London and I do hear it’s nice :)

After that, I’ll be in Stockholm for about three weeks (I leave on the 19th… as it’s currently planned). So if you’re around, give me a yell!

My cell (mobile) number is pretty easy to find (hint: google my name along with my employer and look for a post on a mailing list… my work email sig has my phone number).

I’m in Stockholm for work, I’m going to be working in the office (which will be the longest amount of time I’ve gone to work in an office in over 2 years).

SVN shows its’ true colours

I thought “svn”, I typed “cvs”. Hrrm… sounds about right.

In other revision control news, using quilt to manage work-in-progress patches in conjunction with BK is proving really, really great. I feel like an idiot having lived this long and not worked this way.

I have a feeling that if git was being used I’d just do everything there as it’s so quick anyway. I haven’t used bzr on these sorts of size of repos yet, but it should be good too.

Danger-Bomb alarm clock: reconnect the wires to turn it off

Boing Boing: Danger-Bomb alarm clock: reconnect the wires to turn it off

I think I totally need one of these – over the years I’ve become a complete expert in the art of not getting out of bed when an alarm goes off (even if I have several scattered around the room)

RAID, LVM2 on USB disks with Ubuntu makes me a sad panda

Talk about a total pain to operate. After a reboot, the following is needed:

  1. modprobe dm_snapshot (if you don’t do this, you get “device-mapper: reload ioctl failed: Invalid argument” in step 6 – a really helpful error message. no dmesg or verbose things gets you any closer)
  2. some magic foo to scan the md array back
  3. pvscan
  4. vgscan
  5. lvscan
  6. lvchange -ay FooVG/barLV
  7. mount
  8. urgh.

Why this doesn’t just all detect itself on boot who knows. I’m surely not the only one doing this….

timeout units

Following a discussion on mythtv on #xfs (as you do), and a wondering of “hrrm… i wonder what unit that timeout is” with some NDB code I wish to make the following announcement:

All timeout values in NDB related APIs will now be given in centijiffies of the server system. For APIs that can talk to multiple hosts, it will be furlongs per fortnight.

I feel that having a consistent interface such as this will lead to much less confusion and better apps.