NDB$INFO

There’s been talk over the years of better monitoring for NDB (MySQL Cluster). This has been dubiously named NDB$INFO, after some special magical naming convention for tables holding information on the insides of NDB. Otherwise known as Worklog 3363 (viewable on MySQL Forge).

The basic idea is to get a bunch of things that are already known inside NDB available through a rather standard interface (SQL is preferred).

My top examples are “How much DataMemory is used?” and “Do I need to increase MaxNoOf(Tables|Attributes|ConcurrentTransactions)?”. You can get some of this information now either through the management client (ndb_mgm -e “all report MemoryUsage”) or the MGM API using events and some other foo.

This is a rather limited interface though. It would be great if you could point all your monitoring stuff to a MySQL Server, throwing queries at it and finding out the state of your cluster.

So this year I’ve been working on implementing NDB$INFO. The big requirements (for me at least) are:

  1. Everything can be queried easly from SQL
  2. It’s easy to add a new NDB$INFO table (for a NDB developer)
  3. you can use NDB$INFO tables to diagnose problems (such as nodes not connecting)

Among the 492 things I’m currently doing, is fixing up a basic patchset for NDB$INFO and working on getting it into the tree. It’s all going to be basic scan interfaces in the current version, so things may be slow if there’s lots of rows, but they’ll get there.

What would you like to see exposed?

Encrypted Online Backup (design, thoughts, ask-the-lazyweb)

So after a ever so temporary but loud moment of insanity[1] having a decision made which I very strongly disagreed with (wanting to release online encrypted backup as closed source), we’re back in the world of freedom and the MySQL Server is (and will be) free and open source software (dual licensed, so you can buy a commercial license of the same thing).

[1] Addition (wanting to remove my use of the word): Marten (rightly) points out that although appreciating the new blog posts, he doesn’t appreciate having his decisions called insanity. He’s right. It’s the wrong way to put it. So, without wanting to censor or change history (instead preferring to illustrate my own stupidity and amazing ability to completely say the wrong thing every 6 months or so), I offer this clarification (that i have tried to express in about 3 drafts of blog posts, none of which have made the light of day as i was never really happy with them): the decision was made with all the right intentions (grow the company, end up producing more free software, making sales to enterprises easier, clearer differentiation etc) but it was one that I (and many others) rather strongly disagreed with. In the end, the dicision was made to have these parts as free software and I truly believe that this was made after more arguments were presented by myself (and others) about why having these parts as closed was a bad idea. It is quite the thing to make the decision to make modules for your free software product closed, it is about 15 steps higher to go back on it. I’ll share a phrase I used a few times when being a right nick-picker about things during employment contract negotiation this year (for MySQL Australia and then Sun): “Do I trust Marten? Absolutely. It’s the next guy. Remember, SCO was once Caldera and producing a linux distro and generally considered good.” So, that was more than I intended to write on the subject… but hopefully clarifies that I just thought the decision itself was bad, and am lucky enough to work at a place that encourages discussion when you don’t like things.

So, now I’m involved with writing up the worklog for encryption for the MySQL server native online backup. I also wrote most of the original worklog for compression of online backup (I implemented compressed backup and LCP for MySQl Cluster) as well as some proof-of-concept code (written in <5 minutes at 3am while jetlagged).

There are two main approaches to encryption: symmetric and asymmetric (public key). I think we should support both (but we’ll see what others think).

For symmetric (password based for those not up with the street lingo of crypto) we’re thinking of the following algorithms: 3DES, AES, Blowfish. Are there any others that people care about?

DES is obviously out as it’s not considered secure, and really, we should be helping users to get things right.

For public key: RSA and DSA are the obvious choices.

As for libraries implementing all of these? well….. I’m thinking about libgcrypt – it looks fairly nice and a bit similar to the kernel crypto api (which seems quite nice). Anybody got any other suggestions? Things you’d like to see? thoughts?

EDIT: Server not Service. We sell services, the server is free and open source. I fail.

eHorizons

I flew back into Sydney on Sunday morning to give a tutorial at Sun’s Expanding Horizon’s summit. It was a half day tutorial on MySQL Cluster – so a shortened version of the one I’ve given at the MySQL User Conference for the past few years. I had about 15 attendees, all of which had done their homework (It probably help that they were pestered via phone :)

The tutorial went really well. It really helps when everybody has done the homework and already have Linux and MySQL Cluster installed. Everybody got up and running (we used mysql-test-run to start a cluster, not writing the config file from scratch, which made things happen a lot faster). Also got some good feedback – yay! We may even have some people look to deploy it after attending, always a plus.

I also gave a “Scaling MySQL” talk that was well attended. I didn’t talk at all about query optimisation, mysqld configuration tuning or stuff like that – instead focusing on making the app saner, caching etc. memcached, of course, got a good mention :) It seemed to go down well, some good questions, and a rather full room.

So a rather productive two days for spreading the freedom love.

However, the conference dinner was complete FAIL on account of the venue. I don’t know which vegetarians/vegans call beef and fish vegetarian, but I’ve never met one (hint: they don’t exist). This is *after* the explanation on being vegan. Then… there was some discussion about pasta with a tomato/vegetable sauce, never came. So as others were finishing meals, again inquire – eventually, something was brought over. Undercooked rice and undercooked steamed vegetables. I don’t know who eats that for dinner (hint: nobody). Of course, after the pasta discussion, I then selected a wine that would go with it. After more of the stuff-ups, I pointed out that there was no way I was going to pay for the wine when shit like that was served (yes, in those words… perhaps I’ve been watching too much Gordon Ramsay).

It was the first time ever that I’ve left a restaurant during a function, gone down the street, gotten take away and brought it back. Novotel Brighton Beach (in Sydney) – you suck.

(there’s also a beutiful view across the bay of the runways of Sydney airport… which is fine if you can sleep through planes landing an taking off, like i can, but i know others can’t).

Will never stay at the Novotel Brighton Beach voluntarily, ever. On the plus side, the guy at the desk when checking out was very apologetic…

My Name Is…

Stewart.

With a t at the end. Not a d.

Get it wrong once, possible mistake.

When I correct you, and you do it *again* and *again*, I want to slap you in the face with a keyboard.

It’s especially bad when you’re replying to my email, as my name is spelt correctly at least THREE TIMES right in front of you (Twice in “To: Stewart Smith <stewart@…>” and again in “On X, Stewart Smith wrote:”.

(and if you’re wondering why I’ve tagged this post with “sun” and “mysql” it’s because some of you people are the worst offenders)

kthxbye

mysql-5.1.24-stew1 (with Maria and PBXT)

I’ve hacked around a bit to get PBXT to compile in tree, and pulled in the Maria engine. Both are latest source.

So, want to try out Maria?

Want to try out PBXT?

Just want to do ./configure and go with it, just like building a normal MySQL Server?

Grab the -stew tree. Source tarball here:

mysql-5.1.24-stew1.tar.gz

(it’s based on something close to 5.1.24… and I’m going to switch some of my systems over to it rather soon… already done some good benchmarks on one of my apps).

feedback much appreciated.

UPDATE: Got x86-64 Linux. Try my binary tarball (built from above src tarball). Built on Ubuntu Gutsy (my laptop). So it may (or may not) work. If it kills your squirrel, not my fault.

OpenLDAP, MySQL Cluster, world of awesome

Last night (okay… i’m posting thsi a bit later… so the other night), a group of us gathered around to hear about some work that had been done in getting a MySQL Cluster backend for OpenLDAP.

I’d heard a bit of rumors (where rumors is defined by somebody saying something on IRC and me being busy looking at other things) about this previously, but last night was the first time I a) saw it working and b) saw performance numbers.

Disclaimer: I am no LDAP expert.

So, what is it?

Normal LDAP can replicate asynchronously from one machine to another. You can even update on both and it has some conflict resolution. But… this costs in performance.

Normal LDAP can also replicate (asynchronously) to a remote location for read-only (e.g. make authentication go faster in Australia with the main LDAP server in the US).

The MySQL Cluster backend for OpenLDAP connects directly to MySQL Cluster, using rather optimised schema, indexes and coding (directly using everything that we’re good at – which was really awesome to hear).

So, the MySQL Cluster backed LDAP server is the 2nd fastest in the world. The fastest is OpenLDAP on a single machine. With MySQL Cluster, we’re not that much slower than a single box – but we have redundancy. So that’s totally awesome. The third fastest…. much slower than us.

This was one of the most awesome new things I’ve seen here.

default filesystem and disk parameters are for wusses

I can’t remember the last time i used default mkfs or mount options… oh yeah, that’s right – by accident.

Anyway… I did a little experiment today.

The filesystem is my laptop /home – XFS, 100GB, 95% used (so 5-6GB free), rather aged. This is where a lot of my MySQL development is done. Mkfs options: 128MB log, version2 log. Mount options: logbufs=8, logbsize=256k. All of this geared towards increasing metadata performance.

Why metadata performance? well… source code trees are a lot of metadata :)

So, let’s try some things: cloning a repository and then removing the repository.

Two variables are being tested: mounting the file system with nobarrier (or barrier, the default). Write barriers tell the disk to ensure write order to the platter when write cache is in use. Also testing disabling (or enabling, the default) the disk write cache.

cloneperf1.png

rmperf1.png

NOTE: the last option which has the write cache enabled and write barriers disabled is NOT SAFE. If your machine crashes, you loose data, and potentially your file system ends up corrupted.

So I’m now disabling my disk write cache and mounting with nobarrier.

If you use real disk arrays – e.g. battery backed write cache RAID boxes, the story is likely very different!

Going to MySQL User Conference? Come to LugRadio Live!

LugRadio is possibly the best Linux/Free Software related podcast out there – and it has been for a very long time. LugRadio Live is a live event where lots of people gather together to love freedom.

This year, there’s a LugRadio Live USA – and it’s in San Francisco. So, are you going to be (or are able to be) around SF the weekend before the MySQL UC? If, there is no excuse not to drop in on LugRadio Live USA – it should be awesome! (oh, and it doesn’t cost much… like $10 or something)

Windows finally gets poll() (well…WSAPoll)

Found this today: Windows Core Networking : WSAPoll, A new Winsock API to simplify porting poll() applications to Winsock.

Which means we should be able to squeeze some peformance out of Vista. Although, since poll() we’ve gone and moved on to much better things in the free world.

As part of porting MySQL Cluster to Microsoft Windows platforms, I’m going and redoing a lot of the portable sockets code in MySQL. My goal is to have enforcing of proper use of the portable functions be done by the compiler instead of the programmer – leading to less portability bugs.

Let’s face it: friends don’t let friends have to run Windows to not break the Windows port.

Good news is that so far I’ve found several bugs that could have presented them in real nasty ways at run time but have instead showed up as build errors – brilliant!

Solaris, Linux, it is GNU folks…

Brian “Krow” Aker’s Idle Thoughts – Solaris, Linux, it is GNU folks…

Brian hits the nail on the head… The way you get a usable system is install all the GNU tools.

This is how I go from fresh Ubuntu install to building MySQL:

apt-get build-dep mysql-server

apt-get install bison

(now go and build).

(and i could do this graphically if I wasn’t so stuck in my ways)

For Solaris? umm… there was a point where I could get Solaris to apply security updates and Brian could get all the stuff needed to build a MySQL Server. Together we had the knowledge needed… but neither was as trivial as with Ubuntu and combining knowledge was too much – I just gave up and went on to more productive things.

Even on an existing Solaris system… getting your PATH right is a trip into some weird fantasy land seemingly designed to annoy you. No doubt this all made some sense back in the day… but now it just causes pain when all you want to do is compile your program, find the bug and fix it.

When I started at SGI several years ago, what’s the first thing I did? Went and installed all the GNU packages. IRIX is a lot nicer then.

Same with MacOS X – the first thing you do is go and install darwinports or fink and get a remotely usable system.

With Windows, it varies – but the shell is so outrageously shit you need cygwin just for bash, you need either emacs or VisualStudio to get an editor you don’t want to kill, Firefox for a web browser that works etc etc etc. The fact that the Windows packing system just blows chunks makes it the most painful experience of all.

So even if you’ve heard rave things about the debugger in VisualStudio – actually getting a Windows install to the state where you can run the debugger takes hours. Click click click, upgrade, yes, install, swap disks, upgrade, upgrade, wait, reboot, install manually, install manually, install manually. ick.

Project Indiana is possibly the saviour of Solaris. Default userland is gnu, default shell is bash. Starts to make it feel like home. Just as when Solaris started shipping GNOME made it feel more homely.

Solaris comes with a version of vi that is old enough to drink in bars. Project Indiana realises that a drunk editor isn’t a good idea and ships something sensible.
The BSDs get a lot of things right. Sane userland that is familiar to people. Jumping onto a FreeBSD box is remarkably easy.

The typical thing said by people is “backwards compatibility” and all that… basically so that everyone can run their apps from 1985 and not change a thing. Worthy goal. Of course, 1985 does not need to be the default environment in 2008.

There is a standard for the unixy way of things: it’s Linux with GNU tools in userland.

Just as Windows set the big standard for having a kind of usable GUI (the Mac did it better, but Windows got the numbers) – and to get people to use Linux on the Desktop we needed to get it to a stage where those people are comfortable.

If you want your UNIXy system to be used by anybody today, you need to have it be comfortable for Linux people.

On the other hand though, Ubuntu is still the best desktop I’ve ever used and am rather happy with it (no matter how much i bitch and moan about certain things being obviously broken).

(and no, I’m not switching my desktop to any Solaris variant – but wholeheartedly look forward to the days when maintaing software than runs on Solaris is a heck of a lot easier because Solaris becomes less annoying).

One more point: OSX and Solaris are the only remotely proprietary UNIXes left. Everybody else is either dead or doesn’t know it yet. Solaris is nearly all free (AFAIK there’s still just some binary only drivers around… which sucks… but these things can take time, so that’s okay) and OSX has parts which are (sometimes seemingly dependent on phase of the moon) free-ish. So really, OSX is the one last hold out of the largely proprietary UNIX world. It’s a fascinating thing to think about…. freedom wins.

(and this no doubt goes on far too long and incoherently…. but that’s because of long days and late nights because of upcoming really cool stuff which I’ll blog about later)

bzr-loom – a bzr plugin with quilt like functionality

A bzr plugin to assist in developing focused patches. in Launchpad

I use quilt a lot for development. Currently, If I had to choose between BK and quilt – I’d choose quilt.

I use bzr in other development projects like MemberDB. I use git as a frontend for SVN (it is *so* much faster than the svn client and incredibly more space efficient… A copy of the entire history of a tree stored in git is usually less than a single svn checkout). I also use darcs (and quilt) for offlineimap and just about every other revision control tool at some point.

So this is a bit of a discussion about how I work and how bzr-loom would help it… (I’ve wished for a long time that bk had stuff like this… bk collapse is just not what I want, although others use it lots).

The loom plugin to bzr looks like a fantasy world of goodness where the revision control system has some knowledge of these work in progress patches. The ability to push and pull looms around the place seems awfully nice.

What’s even more awsome is that you can push your set of patches up to a normal bzr branch and they become normal commits! i.e. you get rid of the whole “convert quilt patches into changesets” pain and just push.

Revision tracking them (so you can see what you’ve changed in your patch set) is also nice (I have thought about keeping patches/ in a bzr repo for this purpose). So I can now get a history of my patchset against various mainline versions.

One of the big advantages of quilt is speed – it’s lightning fast (basically being a diff and patch wrapper) . Hopefully bzr looms continue in this fine tradition (and I wish other systems would get something like it too)

A world of FAIL

=================================== ERROR ====================================
File holyfoot/hf@mysql.com/deer.(none)|mysql-test/r/bdb_notembedded.result|20061113160642|60022|276fa5181da9a588
is marked as gone in this repository and therefor cannot accept updates.
The fact that you are getting updates indicates that the file is not gone
in the other repository and could be restored in this repository.
if you want to "un-gone" the file(s) using the s.file from a remote
repository, try "bk repair "
takepatch: other patches left in PENDING
==============================================================================
2.85M uncompressed to 16.4M, 5.77X expansion
Pull failed: takepatch exited 1.

stewart@willster:~/MySQL/5.1/ndb$ cp ../../5.0/ndb/mysql-test/r/SCCS/s.bdb_notembedded.result mysql-test/r/SCCS/

A world of fail. Of course, the “bk repair” suggested does not fix the problem. This kind of problem should just not be possible. grr…

Getting a file size (on Windows)

The first point I’d like to make is that you’re using a Microsoft Windows API, so you have already lost. You are just not quite aware of how much you have lost.

A quick look around and you say “Ahh… GetFileSize, that’s what I want to do!” Except, of course, you’re wrong. You don’t want to use GetFileSize at all. It has the following signature:

DWORD WINAPI GetFileSize(  __in       HANDLE hFile,

__out_opt  LPDWORD lpFileSizeHigh

);

Yes, it supports larger than 4GB files! How? A pointer to the high-order doubleword is passed in! So how do you know if this errored? Return -1? WRONG! Because the high word could have been set and your file length could legitimately be 0x1ffffffff. So to find out if you actually had an error, you must call GetLastError! Instead of one call, you now have two.

The Microsoft documentation even acknowledges that this is stupid: “Because of this behavior, it is recommended that you use GetFileSizeEx instead.”

GetFileSizeEx is presumably named “Ex” as in “i broke up with my ex because their API sucked.”

You now have something that looks like this:

BOOL WINAPI GetFileSizeEx(  __in   HANDLE hFile,

__out  PLARGE_INTEGER lpFileSize

);

Which starts to look a little bit nicer. For a start, the return code of BOOL seems to indicate success or failure.

You now get to provide a pointer to a LARGE_INTEGER. Which, if you missed it, a LARGE_INTEGER is:

typedef union _LARGE_INTEGER {  struct {

DWORD LowPart;

LONG HighPart;

};

struct {

DWORD LowPart;

LONG HighPart;

} u;

LONGLONG QuadPart;

} LARGE_INTEGER,

*PLARGE_INTEGER;

Why this abomination? Well… ” If your compiler has built-in support for 64-bit integers, use the QuadPart member to store the 64-bit integer. Otherwise, use the LowPart and HighPart members to store the 64-bit integer.”

That’s right kiddies… if you’ve decided to loose from the get-go and have a compiler that doesn’t support 64-bit integers, you can still get the file size! Of course, you’re using a compiler that doesn’t have 64bit integer support… and the Microsoft documentation indicates that the GetFileSizeEx call requires Windows 2000… so it’s post y2k and you’re using a compiler without 64-bit ints? You have already lost.

Oh, but you say something about binary compatibility for apps written in the old days (handwave something like that). Well… let’s see… IRIX will give you 64bit numbers in stat (stat64) unless you build with -o32 – giving you the old ABI. I just can’t see a use for GetFileSize….. somebody please enlighten me.

Which header would you include? Any Linux/UNIX person would think of something logical – say sys/stat.h (Linux man page says sys/types.h, sys/stat.h and unistd.h). No, nothing sensible like that. It’s “Declared in WinBase.h; include Windows.h”.

So… you thought that obviously somebody went through the API and gave you all this Ex function goodness to get rid of mucking about with parts of a 64bit int? You were wrong. Let me say it with this:

DWORD WINAPI GetCompressedFileSizeTransacted(
  __in       LPCTSTR lpFileName,
  __out_opt  LPDWORD lpFileSizeHigh,
  __in       HANDLE hTransaction
);

I’ll now tell you that this was introduced in Vista/Server 2008.

Obviously, you want to be able to use Transaction NTFS on Windows Vista with a compiler that doesn’t have 64 bit ints. Oh, and you must then make another function call to see if something went wrong?

But you know what… perhaps we can get away from this complete and utter world of madness and use stat()…. or rather… perhaps _stati64().

Well… you’d be fine except for the fact that these seem to lie to you (at least on Windows Server 2003 and Vista) – it seems that even Explorer can lie to you.

But perhaps you’ve been barking up the wrong tree… you obviously don’t want to find the file size at all – what you want is to FindFirstFile! No, you don’t want FindFirstFileEx (in this case, Ex is for Extremely complicated). It’s meant to be faster too… you know, maybe.

So remember kids, smoke your crack pipe – you’re going to need it if using this thing called the Microsoft Windows File Management Functions.