Naturally, it was going to hit Slashdot with something like “MySQL and SCO Join Forces”

(insert disclaimer about this being my own views – no that of MySQL AB)

Slashdot | MySQL and SCO Join Forces

Some people seem to think that porting your application to a newer version of an OS, having a trial version of your subscribtion-based support shipping with every copy of that OS and access through that OS vendors reseller channel is a bad thing.

Granted, a lot of people think that certain actions of said OS vendor are just plain retarded. Myself included – it would be much better if they actually focused on products. That being said, there’s more than one OS vendor that does just plain dumb stuff – or, to use the more emotive “evil” word.

Of course, there’s part of the /. crowd that seem to think we must be evil for porting to a SCO platform – but by their silence (and sometimes “screw you guys, I’m going to X RDBMS”) it must be okay for others to do it (note that X RDBMS already supports SCO platforms)?

Besides, anybody who’s really used MySQL will know how easy it is to move your database from one platform to another – really empowering you to make sure you OS vendor gives you the best deal possible – because you can easily move to where the grass is greener.

MemberDB election-results performance on new laptop

So I picked up my new laptop on friday. It’s an ASUS V6V – nice and fast, light, good resolution screen and lots of disk and RAM (it came with 1GB, I’ve got 2GB).

Anyway, the transfer of data from my PowerBook went fine. I waited for xfsdump to dump /home from the powerbook to a firewire drive (and for “waiting” I do mean going out and seeing Charlie and the Chocolate Factory – which was very good).

Installing Ubuntu on the ASUS went like a dream. Everything, and i do mean everything worked out-of-the-box with only one tweak. That was uncommented the ACPI sleep configuration option do-dad in /etc/default/acpi-something-foo to get suspend to ram working.

The WEP didn’t work in the installer, so I initially just used the GigE adapter until the first reboot.

The firewire drivers don’t really behave with this laptop atm… that dreaded “aborted sbp2 command” error too often – so abandoned that and futzed around with a private net and NFS to xfsrestore /home.

Go to bed, awake later to find /home on new laptop (with an extra 23GB of free space!). I had to, of course – rebuild those essential packages for x86 instead of ppc – namely wesnoth.

oh, and cleaning out the ppc binaries from my mysql bk trees and doing a x86 build (I also had to change my CC from ‘ccache distcc powerpc-linux-gcc’ to ‘ccache distcc i386-linux-gcc’). One thing is for certain, it’s quicker at building things – even if the fan ramps up a bit when doing so :)

MySQL builds pretty quickly when you have a 2.8Ghz P4 and a 2.13Ghz Pentium M building it.

Anyway, set up all the apache foo for hacking on the LA website and MemberDB today. A load of the elections-result page on digital (the LA server – dual PIII 1.133Ghz) takes about fourteen or fifteen seconds using PostgreSQL as the database.

I previously reported that using MySQL (InnoDB tables) I got about twice the performance on my old laptop (1Ghz G4).

Well, on this one (2.13Ghz Pentium M) I’m getting the page loading in under three seconds. Sweet. Maybe I won’t go ahead and try to optimise some of the queries :)

(the query cache is probably coming into this – but i did do the query several times – so it’s not as if there’s any unfair advantage anywhere).

I’m using the 5.0.12-max-beta gcc dynamic build as downloaded from mysql.com for these runs. All other packages (apache2, php) are as shipped in Ubuntu. The my.ini file is as-shipped (err.. i think so: no query log, no binlog, slow query log enabled and some paths changed)

lathiat: Avahi

lathiat: Avahi

Trent has been blogging quite often about Avahi. It does look like a good project to watch. Promises ease of use (for the coder who really doesn’t care about Rendezvous/Bonjour/ZeroConf/whatever-they-call-it-this-week internals and just wants Cool Functionality(tm) in their app).

Maybe integrating this would be cool for mysqld and the gui tools. (i’ve toyed with the thoughts of using it in cluster… but do we relaly want another thing that can possibly fail in a HA environment… probably not – considering using DNS is usually a bad idea).

Why “returns -1 on error” is bad

(a general note on what’s good practice)

In C, 0 is false and !0 is true.

In the dim past there was an elsewhere where 0 was true and !0 was false. Why? Because there can be more than one error state and this is usually more interesting than how many ways success could have been acheived.

Well, that sucks too – there’s information on success that could be useful (e.g. we succeeded, but only n bytes worth instead of the m you asked for).

So, the way of <0 on failure and else success came about for packing the maximum amount of information into the int that we commonly return from functions (and usually fits nicely in a register and it all leads to hugs, puppies and a warm feeling inside).

So what do most people do on error? Return -1.

Hrrmm… this casually (if not totally) defeats the point. In any function that does any real work, there’s going to be more than one place where failure could occur (even if it’s an error path that should never really happen… it will, but never to you… always to a guy somewhere in a country that you didn’t know existed and knows less $native_language than you have digits).

So if you get a bug report in with a log message (because you do print log messages when errors occur! – especially non-totally-fatal ones!) about a failure, and you go to look at that function and go “aha! this function must have returned -1!” Well, it just so happens that there are five places that could return -1. Where did your program fail? Without a core dump or something, you will never know.

So, what if these five places returned different error codes (which, of course, you wrote to the log)? Then you’d be able to narrow down the search for buggy code!

It doesn’t have to be a unique number, or even user understandable (especially when these are places that shouldn’t fail – or so you think) but it makes your job a hell of a lot easier if you can quickly jump to the bit of code you should look at.

In cluster, we have this great system where when really bad stuff happens, we get these nice trace logs of what signals have been cruising around the cluster recently. This greatly helps with debugging. It sort of makes you go “wow” when you first see a crash reported, trace file follows, and then a patch a few hrs later that fixes the problem. This is because it’s an aid in tracking down exactly where to look for the problem.

“It crashed” is never a useful bug report. But only having the facilities in your software for only being able to say “it crashed” unless you’re a developer guru dude isn’t very useful either.

The various backtrace reporting tools do a bit to help. As always, the more information the better. This is certainly the case when you look at the backtrace and go “how on earth did we ever get there?” or the stack is just completely hosed and you have no hope of finding your arse from your elbow (although these days valgrind will help you here).

Here endith the lesson.

Comments Are Evil

When a comment above a function says “returns -1 on error” and the code does the exact oposite (returns -1 anyway except if there was out of memory error, which may be #defined to -1 anyway) it’s a bit annoying when you first look at it.

Remember kids, comments in code are evil. They are wrong – or misleading at best. They only ever say what one person at some point in the past thought they beleived the code did. The definitive record is the code itself.

(there are possible exceptions to this rule… maybe… internals can be good to document – but arguably it should be *away* from the code so that you don’t start thinking the documentation is accurate and up to date – because it’s not).

ndb_mgmd restart

One of the things I’m working on is adding the ability to use ndb_mgm to issue a restart command to ndb_mgmd (i.e. from a management client, get management servers to restart). At the moment, you have to go and shut it down then start it yourself.

So why would you ever want to restart a management server? Well, bugs aren’t really a reason – I’ve never heard of anyone having to restart the management server “just to get something to work again”.

The reason is online configuration upgrades.

There are a bunch of parameters you can change without having to restart your cluster. We call it a “rolling upgrade” as it’s the same procedure as upgrading one compatible version to another.

The whole procedure would be really easy if we only ever had one management server (which is what a lot of people have anyway – you only ever need a mgm server to have a node join the cluster).

It’s also tricky because you don’t want management servers up and serving different configurations. This would tend to be bad and never lead to hugs and puppies.

For supporting more dramatic configuration changes (e.g. add/drop node) we’ll be needing configuration locks and enforcing that everyone agrees on the config they’re serving out.

There exists some code from a previous effort a few years ago. So I’m having a look through it and trying to work out the state of everything. There seems to be a bit of bitrot and I’m trying to work ouf if anything is worth using.

The approach that I’ve come up with is to have a “single user mode” for the mgm server – i.e. nobody but one connection can do anything. This is where we’d do updates and changes before unlocking.

I wasn’t really caring about notifying ndbd about changes as the way you do things atm is to restart each ndbd and they then pick up the changes.

Otherwise, we really want a “this parameter changed” rather than “there’s a new configuration”.

So, the ‘mgm restart’ thing is really going to be implemented as “config reload” – getting the mgmds to stop and restart in the right order and so there is never more than one version of the configuration is being served at a time.

hrrm… back to the code to figure out what’s going on with this older stuff.

Fancy shortcuts to MySQL Bugs

So Elliot Murphy is talking about QuickSearch shortcut for bugs.mysql.com which is quite useful if you use Firefox.

However, I’m using Epiphany (which is based on the same rendering engine, but is a bit more GNOMEy – and besides, i’ve been using it for a while, i have all my bookmarks and saved passwords etc there).

So, a quick PHP script later, and I have a nice little command line version of the same script.


#!/usr/bin/php
< ?php system('sensible-browser ' .'http://bugs.mysql.com/search.php?' .'cmd=display\&limit=10\&' .'status=All\&search_for=' .urlencode(implode(' ',array_slice($argv,1)))); ?>

Useful! Usage is like:


$ mybug 10950
$ mybug ndb

hope it helps.

maybe i’ll switch to firefox one day… when it’s faster.

MySQL 5.0.10 released!

The next beta of 5.0 is out – 5.0.10. So go ahead and download it. You’re not cool unless you download it (you can find it here http://dev.mysql.com/downloads/mysql/5.0.html)

File swap it, torrent it, burn it to cds and give it to friends. Don’t you just love free software!

This release fixes a bunch of bugs that were holding back MemberDB from being usable on MySQL. This was bugs in new features, and the main one had already been reported.

There’s also been a few fixes for cluster which is always a good thing. Nothing show stopping though. We must be good :)

still get called for tech support…

okay, when it’s family you can’t really say no. But it does seem a bit strange when you have no idea.

Problems getting new printer to work. My advice is reinstall driver, remove device, reboot. Some random stuff. Remove from device manager, plug in again, see if it changes.

That’s the total of my windows troubleshooting knowledge (hey, apart from all that stuff i know about 3.1 from back in the day).

I’ve done dev work here and there on the platform – inside more unixy areas (software that interfaces with unix, or has been ported from). In other words, no, I don’t speak Hungarian (nor have any wish to).

That said, I’m fully supportive of efforts to make sure our software runs well on the platform. If, for whatever reason (lack of enlightnment or lack of enlightenment further up the chain), someone has to use it, then darn well, our stuff should work well and as expected.

Also, being portable is always a good thing – you never know what the next big thing is going to be like (Okay, it’s unlikely to be VMS or Hurd) but if someone wants a product you sell ported to platform X and yours is more portable than the competitor, odds are you’re the one going to get the sale.

Also, other platforms can help you fix bugs. Fixing bugs is good.

Listening to: Rage Against The Machine

GDB bugs

Well, there’s a GDB bug (in the known-problems list) that I am regularly hitting. It makes it go “gdb internal error, would you like a core of GDB”.

I’m running 6.3, what comes with Ubuntu. However, I’m now rebuilding gdb 6.1.1 in the hope that this will be more stable for me.

6.3 is proving to be not very useful when I need it most (i.e. when strange things are happenning).

Aparrently this only happens because my distro and architecture still uses LinuxThreads instead of NPTL.

Oh how I wish for the day of NPTL on ppc.

Helgrind

To try and help in debugging, I’ve been playing with some of the extra tools that come with Valgrind.

Everybody knows that if you aren’t using Valgrind you are living in sin.

I’ve recently tried to have a go at using Helgrind. It’s suppossed to be able to help you in finding race conditions in multithreaded applications.

Well, mysql is multithreaded, and so is NDB (cluster), so, this could be rather useful (especially since, with some new work being done, i’ve found race conditions, i just need to find out where).

From the manual:

Basically what Helgrind does is to look for memory locations which are accessed by more than one thread. For each such location, Helgrind records which of the program’s (pthread_mutex_)locks were held by the accessing thread at the time of the access. The hope is to discover that there is indeed at least one lock which is used by all threads to protect that location. If no such lock can be found, then there is (apparently) no consistent locking strategy being applied for that location, and so a possible data race might result.

sounds reasonable enough.
Now I’m actually chasing up one of it’s possible data races and seeing if it is, in fact, a race.

Has anybody else had any experience with Helgrind? Thoughts?

Don’t you just love being compatible?

/* Force server down. kill all connections and threads and exit */

#if defined(OS2) || defined(__NETWARE__)
extern "C" void kill_server(int sig_ptr)
#define RETURN_FROM_KILL_SERVER DBUG_VOID_RETURN
#elif !defined(__WIN__)
static void *kill_server(void *sig_ptr)
#define RETURN_FROM_KILL_SERVER DBUG_RETURN(0)
#else
static void __cdecl kill_server(int sig_ptr)
#define RETURN_FROM_KILL_SERVER DBUG_VOID_RETURN
#endif
{
DBUG_ENTER("kill_server");

(from sql/mysqld.cc)

There just has to be a better way to do this….

maybe we need a kill_server which is platform defined (e.g. in a mythical win32.cc, netware.cc or generic_sane_unix.cc) and the generic _kill_server in mysqld.cc? possibly some variation of… some platforms seem to do strange things.

i don’t know. it just doesn’t look that clean to me…. maybe i need more coffee.

Why – o – Why does this happen to me?

/build/buildd/gdb-6.3/gdb/linux-nat.c:1208: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
/build/buildd/gdb-6.3/gdb/linux-nat.c:1208: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n)

Interview with Sybase CEO where MySQL is mentioned

(insert disclaimer about this being my own ramblings and nothing to do with my employer)

Try and Buy, or Buy and Buy? :: AO

So, the Sybase CEO doesn’t get it.

The Sybase ‘free’ offering is in no way free. It is proprietary software that owns you.

Yes, Stallman was right – you do not own proprietary software, it owns you.

By giving the user the first hit for ‘free’ and placing an arbitrary limit on when they have to start to pay (and only pay you, so they have no real freedom of choice) ties the user in chains. It is not ‘express’, it is a demo – a cinema preview and nothing more.

This is nothing like the free copies of free software. It is merely a proprietary product packaged for a free software platform. Letting people live in a partial freedom. It’s an improvement, but is a gift to your customers (of partial freedom), not the free software community.

John Chen even says something that is totally misleading.

“You download it and develop it and use it…”

DEVELOP? What the? How can I download anything that lets me develop sybase? hrrmm.. they have the source up there? I don’t think so. I’m sure he meant ‘develop on top of it’. In other words, for a database, use it.

The MySQL way of “you can download it, develop it and use it” is just that. Download it, the source, actually hack the database, write a new storage engine, change the parser, the optimiser – hey, even go make it something that isn’t SQL (for example Fred’s Query Language or something). Or, if you’re just going to use it for your web site, just do that. You have the freedom.

MySQL AB (the company) makes money of being the most knowledgeable and well qualified group to provide knowledge, support, training and development of the MySQL database. We don’t hold our users hostage – they are free (all the freedoms granted to them under the GPL) to go support themselves, or get somebody else to, or even to go and develop the database themselves.

So, here’s to not holding your users hostage!

(this isn’t an attack on Sybase being a good database or not – i have never used it, i don’t think – and don’t really have an interest to. It’s an attack on proprietry software, and especially proprietary software throwing around the term ‘open source’ to try and look good. Proprietary software is what it is, and people should know what that is.)

revision control

Brian Aker has blogged about BitKeeper versus CVS

no doubt this has stemmed from somebody’s rant on the BK license. Now, this is a valid rant, but, really – it’s getting[1] old.

Personally, I quite like the GNU Arch Revision control system. Unfortunately, the UI is sort of sucky and takes a bit of getting used to. Bazaar is one to watch for improvements on this front (although I haven’t made the switch, mainly due to there not being enough hours in the day).

One thing that Arch does really well is cherry picking changesets. A simple ‘tla reply’ will do the equivilent of ‘patch -p1 < foobar’, but preserving where it came from. BRILLIANT. I wish bk did this. I once looked at branching in CVS and quickly ran away.

A smaller player, Darcs is one to take a close look at too. The UI is really sweet. I’ve only used it to test/submit fixes upstream on a small project (namely xseq – a project that is way cooler than the name suggests[2].)

In the future, bazaar-ng (back online soon) will probably be the way to go. Now is the time to bombard it with ideas though :)

At least we’re not stuck with Visual Source Safe. Full on MS people bag that pile of poo.

[1] Many would, in fact, believe i should be leaving out the word ‘getting’.
[2] I’m sure Andrew would be appreciative of funky names as well.

Update: why, oh why does this edit post thing think it must fight against the will of the correct closing tags?

Update 2: it seems that wordpress doesn’t want to save an update if you’re only fixing your markup. you have to add text. the suck.

what you don’t want to see from gdb

/build/buildd/gdb-6.3/gdb/linux-nat.c:1208: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) 
/build/buildd/gdb-6.3/gdb/linux-nat.c:1208: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) 

Install Day (Semester 1, 2005) – Monash IT Society (Clayton)

Install Day (Semester 1, 2005) – Monash IT Society (Clayton)

So, back at uni (you know, that place you’re at when you’re a student), and back with the old computer science club (but under a different name… well… ah…urrr…because… who the fuck knows why) they’re having a linux install day next week.

i’ll probably go along for a couple of hours and hand out lots of ubuntu cds that i’ve got lying around and probably get asked a lot of mysql questions.

maybe it’d be cool to have a “MySQL CD” with all mysql stuff for all platforms on it (with lots of autorun stuff). esp if we could make them cheaply enough to give away at a bunch of events (like install days).

Maybe i’ll burn a couple of CDs to take along (windows, mac, linux binaries)