Solaris, Linux, it is GNU folks…

Brian “Krow” Aker’s Idle Thoughts – Solaris, Linux, it is GNU folks…

Brian hits the nail on the head… The way you get a usable system is install all the GNU tools.

This is how I go from fresh Ubuntu install to building MySQL:

apt-get build-dep mysql-server

apt-get install bison

(now go and build).

(and i could do this graphically if I wasn’t so stuck in my ways)

For Solaris? umm… there was a point where I could get Solaris to apply security updates and Brian could get all the stuff needed to build a MySQL Server. Together we had the knowledge needed… but neither was as trivial as with Ubuntu and combining knowledge was too much – I just gave up and went on to more productive things.

Even on an existing Solaris system… getting your PATH right is a trip into some weird fantasy land seemingly designed to annoy you. No doubt this all made some sense back in the day… but now it just causes pain when all you want to do is compile your program, find the bug and fix it.

When I started at SGI several years ago, what’s the first thing I did? Went and installed all the GNU packages. IRIX is a lot nicer then.

Same with MacOS X – the first thing you do is go and install darwinports or fink and get a remotely usable system.

With Windows, it varies – but the shell is so outrageously shit you need cygwin just for bash, you need either emacs or VisualStudio to get an editor you don’t want to kill, Firefox for a web browser that works etc etc etc. The fact that the Windows packing system just blows chunks makes it the most painful experience of all.

So even if you’ve heard rave things about the debugger in VisualStudio – actually getting a Windows install to the state where you can run the debugger takes hours. Click click click, upgrade, yes, install, swap disks, upgrade, upgrade, wait, reboot, install manually, install manually, install manually. ick.

Project Indiana is possibly the saviour of Solaris. Default userland is gnu, default shell is bash. Starts to make it feel like home. Just as when Solaris started shipping GNOME made it feel more homely.

Solaris comes with a version of vi that is old enough to drink in bars. Project Indiana realises that a drunk editor isn’t a good idea and ships something sensible.
The BSDs get a lot of things right. Sane userland that is familiar to people. Jumping onto a FreeBSD box is remarkably easy.

The typical thing said by people is “backwards compatibility” and all that… basically so that everyone can run their apps from 1985 and not change a thing. Worthy goal. Of course, 1985 does not need to be the default environment in 2008.

There is a standard for the unixy way of things: it’s Linux with GNU tools in userland.

Just as Windows set the big standard for having a kind of usable GUI (the Mac did it better, but Windows got the numbers) – and to get people to use Linux on the Desktop we needed to get it to a stage where those people are comfortable.

If you want your UNIXy system to be used by anybody today, you need to have it be comfortable for Linux people.

On the other hand though, Ubuntu is still the best desktop I’ve ever used and am rather happy with it (no matter how much i bitch and moan about certain things being obviously broken).

(and no, I’m not switching my desktop to any Solaris variant – but wholeheartedly look forward to the days when maintaing software than runs on Solaris is a heck of a lot easier because Solaris becomes less annoying).

One more point: OSX and Solaris are the only remotely proprietary UNIXes left. Everybody else is either dead or doesn’t know it yet. Solaris is nearly all free (AFAIK there’s still just some binary only drivers around… which sucks… but these things can take time, so that’s okay) and OSX has parts which are (sometimes seemingly dependent on phase of the moon) free-ish. So really, OSX is the one last hold out of the largely proprietary UNIX world. It’s a fascinating thing to think about…. freedom wins.

(and this no doubt goes on far too long and incoherently…. but that’s because of long days and late nights because of upcoming really cool stuff which I’ll blog about later)

bzr-loom – a bzr plugin with quilt like functionality

A bzr plugin to assist in developing focused patches. in Launchpad

I use quilt a lot for development. Currently, If I had to choose between BK and quilt – I’d choose quilt.

I use bzr in other development projects like MemberDB. I use git as a frontend for SVN (it is *so* much faster than the svn client and incredibly more space efficient… A copy of the entire history of a tree stored in git is usually less than a single svn checkout). I also use darcs (and quilt) for offlineimap and just about every other revision control tool at some point.

So this is a bit of a discussion about how I work and how bzr-loom would help it… (I’ve wished for a long time that bk had stuff like this… bk collapse is just not what I want, although others use it lots).

The loom plugin to bzr looks like a fantasy world of goodness where the revision control system has some knowledge of these work in progress patches. The ability to push and pull looms around the place seems awfully nice.

What’s even more awsome is that you can push your set of patches up to a normal bzr branch and they become normal commits! i.e. you get rid of the whole “convert quilt patches into changesets” pain and just push.

Revision tracking them (so you can see what you’ve changed in your patch set) is also nice (I have thought about keeping patches/ in a bzr repo for this purpose). So I can now get a history of my patchset against various mainline versions.

One of the big advantages of quilt is speed – it’s lightning fast (basically being a diff and patch wrapper) . Hopefully bzr looms continue in this fine tradition (and I wish other systems would get something like it too)

A world of FAIL

=================================== ERROR ====================================
File holyfoot/hf@mysql.com/deer.(none)|mysql-test/r/bdb_notembedded.result|20061113160642|60022|276fa5181da9a588
is marked as gone in this repository and therefor cannot accept updates.
The fact that you are getting updates indicates that the file is not gone
in the other repository and could be restored in this repository.
if you want to "un-gone" the file(s) using the s.file from a remote
repository, try "bk repair "
takepatch: other patches left in PENDING
==============================================================================
2.85M uncompressed to 16.4M, 5.77X expansion
Pull failed: takepatch exited 1.

stewart@willster:~/MySQL/5.1/ndb$ cp ../../5.0/ndb/mysql-test/r/SCCS/s.bdb_notembedded.result mysql-test/r/SCCS/

A world of fail. Of course, the “bk repair” suggested does not fix the problem. This kind of problem should just not be possible. grr…

Getting a file size (on Windows)

The first point I’d like to make is that you’re using a Microsoft Windows API, so you have already lost. You are just not quite aware of how much you have lost.

A quick look around and you say “Ahh… GetFileSize, that’s what I want to do!” Except, of course, you’re wrong. You don’t want to use GetFileSize at all. It has the following signature:

DWORD WINAPI GetFileSize(  __in       HANDLE hFile,

__out_opt  LPDWORD lpFileSizeHigh

);

Yes, it supports larger than 4GB files! How? A pointer to the high-order doubleword is passed in! So how do you know if this errored? Return -1? WRONG! Because the high word could have been set and your file length could legitimately be 0x1ffffffff. So to find out if you actually had an error, you must call GetLastError! Instead of one call, you now have two.

The Microsoft documentation even acknowledges that this is stupid: “Because of this behavior, it is recommended that you use GetFileSizeEx instead.”

GetFileSizeEx is presumably named “Ex” as in “i broke up with my ex because their API sucked.”

You now have something that looks like this:

BOOL WINAPI GetFileSizeEx(  __in   HANDLE hFile,

__out  PLARGE_INTEGER lpFileSize

);

Which starts to look a little bit nicer. For a start, the return code of BOOL seems to indicate success or failure.

You now get to provide a pointer to a LARGE_INTEGER. Which, if you missed it, a LARGE_INTEGER is:

typedef union _LARGE_INTEGER {  struct {

DWORD LowPart;

LONG HighPart;

};

struct {

DWORD LowPart;

LONG HighPart;

} u;

LONGLONG QuadPart;

} LARGE_INTEGER,

*PLARGE_INTEGER;

Why this abomination? Well… ” If your compiler has built-in support for 64-bit integers, use the QuadPart member to store the 64-bit integer. Otherwise, use the LowPart and HighPart members to store the 64-bit integer.”

That’s right kiddies… if you’ve decided to loose from the get-go and have a compiler that doesn’t support 64-bit integers, you can still get the file size! Of course, you’re using a compiler that doesn’t have 64bit integer support… and the Microsoft documentation indicates that the GetFileSizeEx call requires Windows 2000… so it’s post y2k and you’re using a compiler without 64-bit ints? You have already lost.

Oh, but you say something about binary compatibility for apps written in the old days (handwave something like that). Well… let’s see… IRIX will give you 64bit numbers in stat (stat64) unless you build with -o32 – giving you the old ABI. I just can’t see a use for GetFileSize….. somebody please enlighten me.

Which header would you include? Any Linux/UNIX person would think of something logical – say sys/stat.h (Linux man page says sys/types.h, sys/stat.h and unistd.h). No, nothing sensible like that. It’s “Declared in WinBase.h; include Windows.h”.

So… you thought that obviously somebody went through the API and gave you all this Ex function goodness to get rid of mucking about with parts of a 64bit int? You were wrong. Let me say it with this:

DWORD WINAPI GetCompressedFileSizeTransacted(
  __in       LPCTSTR lpFileName,
  __out_opt  LPDWORD lpFileSizeHigh,
  __in       HANDLE hTransaction
);

I’ll now tell you that this was introduced in Vista/Server 2008.

Obviously, you want to be able to use Transaction NTFS on Windows Vista with a compiler that doesn’t have 64 bit ints. Oh, and you must then make another function call to see if something went wrong?

But you know what… perhaps we can get away from this complete and utter world of madness and use stat()…. or rather… perhaps _stati64().

Well… you’d be fine except for the fact that these seem to lie to you (at least on Windows Server 2003 and Vista) – it seems that even Explorer can lie to you.

But perhaps you’ve been barking up the wrong tree… you obviously don’t want to find the file size at all – what you want is to FindFirstFile! No, you don’t want FindFirstFileEx (in this case, Ex is for Extremely complicated). It’s meant to be faster too… you know, maybe.

So remember kids, smoke your crack pipe – you’re going to need it if using this thing called the Microsoft Windows File Management Functions.

MemberDB speed improvements

So I finally installed the xdebug PHP extension and started doing some performance analysis of MemberDB using xdebug and kcachegrind. The upshot of which is a number of commits to the bzr tree that dramatically improve performance in several key areas. The answer? Caching.

I’m not even talking using memcached or caching things in database tables or anything like that – just about everything is still the same dynamically produced content as before, but I’m now caching some simple things avoiding many round-trips to the database while executing a script.

There were a few things that were taking a fair bit of execution time:

  1. The generation of the menu. In MemberDB, there’s a menu on the left. There’s also a powerful (read: non-trivial) permissions system allowing relatively fine grained granting of permissions. So, we need to check that the user has permission to go to the page before showing the page in the menu.
    Previously, for each item in the menu, we’d do a lookup to the database – checking if they have the permission or they are an admin. This ended up taking a bit of time – up to 30% of the time for the front page was taken up just generating the menu!
    So, now I cache the set of permissions for the user. One function to fetch it from the DB into a structure, another function to check the permissions of the user in that struct.
    While testing this, I actually used memcached to cache the menu to see how much of an improvement I could get… I’m about 69/70ths of the speed of using memcached with a purely PHP implementation caching the permissions info.
  2. Getting the information about a member is done in a variety of places. On some pages, you want information on the current logged in user (or just need to find their member ID). These are now cached for the duration of the script. Saved quite a few DB round trips
  3. When viewing an election (not the results, just the normal “view election” page that lists candidates), we need to get the membership information on a number of users (okay… so technically I should rewrite some of the queries to use joins in the DB… but this was easier). I now have a (limited) cache of membership info. So now, when a member has nominated multiple people, we only pull the member info out of the database once.
  4. Rewrite the “current_members” view. The old one was not as efficient as it could be. While the new one has slightly different semantics (can have duplicate rows, it turns out the use of DISTINCT was adding a bit of execution time, which for a bunch of queries is not needed) it’s significantly quicker.

I used the faithful Apache Bench (ab) to do benchmarks against the modified PHP code. I think the biggest improvement was the view election page which went from about 6seconds/page to 0.2seconds/page.

My 2nd book is available! (MySQL 5.1 Cluster DBA Certification Study Guide)

Neither of the books I’ve been an author of has been just me. For Practical MythTV (Christmas is coming, buy it for all your TV and tech loving friends!), Michael Still and I worked hard to get a well rounded and practical (not to mention good) book. I think we succeeded – certainly has gotten positive reviews (check the amazon page).

For my second endeavor (just to make it fun, I was working on both at the same time) we have a much longer list of authors. The aim was to write a study guide for those wishing to be certified in MySQL Cluster. Being a developer with a fair bit of knowledge in the product (and somebody who also presents and writes) – I was a natural fit to join the team (some may say “roped into the team”… and they could possibly be me, but I couldn’t possibly comment).

My fellow authors:

  • Jon Stephens
    Among other things, he’s brought the MySQL Cluster section of the MySQL Manual forward leaps and bounds. He also edited the study guide – no doubt a daunting task.
  • Mike Krukenberg
    Author of Pro MySQL (along with Jay Pipes) and numerous blog posts on MySQL and related topics.
  • Roland Bouman
    Who took on the brave task of the actual Certification. This is a certification to be proud of – really makes sure that people deserve it without being overly hard or tricky.
  • Solomon Chang
    founding member of LAMPSIG of Los Angeles and a professional DBA.

So, the book is now shipping, from lulu.com (an on-demand printing service) for $49.99USD.
MySQL 5.1 Cluster DBA Certification Study Guide by Jon Stephens, Mike Kruckenberg, Roland Bouman, Stewart Smith, Solomon Chang

It feels good to have it out there now. Daniel va Eeden has received his copy (shipping box and all!). This is another book you should buy for all your database friends  everybody you’ve ever met.

libeatmydata

Following my successful linux.conf.au talk “Eat My Data: How Everybody Gets POSIX File I/O Wrong“, I started to feel the need to easily be able to have my data eaten.

Okay, not quite. However, when you’ve written your software properly, so it uses fsync() correctly, opening files with O_SYNC or whatever – tests take longer as you’re having to wait for things to hit the rust.

So….. LD_PRELOAD=libeatmydata.so to the rescue! With a POSIX compliant fsync() (that does nothing) and filtering on open(2), it can take your test run times down dramatically.

The only time you shouldn’t use it for your tests is when you end up crashing the machine to test durability (i.e. when the OS doesn’t have the opportunity to cleanly write out the data to disk).

See the libeatmydata project page: http://www.flamingspork.com/projects/libeatmydata/

and the bazaar repository: http://www.flamingspork.com/src/libeatmydata

(it’s seemed to have saved somewhere between 20 and 30% of the time for innodb/ndb tests in mysql-test-run).

mysql-5.1.22-stew2

New:

  • Updated NDB Compressed LCP and BACKUP patches (now with O_DIRECT support)
  • InnoDB patch for Windows that should give ~5x improvement on commits/sec (Bug31876)
  • Everything in current telco-6.3 tree (ndb ~6.3.5)
    • Lots of NDB improvements and new features over regular 5.1.
        • WL3686 Remove read before update
        • WL2680 NDB Batched Update
        • WL2679 NDB Batched Delete
        • WL4108 NDB Handler statistics
        • WL4096 NDB Realtime performance and settings
        • WL3126 and WL3127 Client and Replication bind address
        • NDB Online ALTER TABLE ADD COLUMN
        • NDB Multi-Master replication conflict resolution (limitations apply :)
        • NDB prepare for endian independence
        • NDB micro-gcp (reduces replication lag)
        • NDB SendBuffer throttling
        • NDB MySQL Server TC selection (improve performance)

    Old (In previous patchset too):

    • Remove ndb_use_exact_count giving up to 300% performance improvements on Joins in NDB
    • INFORMATION_SCHEMA table for NDB node status
    • NDB Cluster Log as CSV file (suitable for ENGINE=CSV)
    • Skeleton Engine (build from storage/skeleton)
    • MyHTTP Engine (build from storage/myhttp)
    • PBXT Engine (build from storage/pbxt)
    • Make ARCHIVE faster at compressing (at slight expense of space usage)

    Availability:

    • Patch (apply with -p1 to mysql 5.1.22) 4.0 MB
      • Applies cleanly on a BK source tree… a few files don’t exist in the tarball on dev.mysql.com (due to the way it’s built)… so when asked for “file to patch” just hit enter and then choose y to skip that patch.
    • README (list of patches, descriptions) 13kb
    • quilt patch series tarball (individual patches) 4.1MB
    • diffstat 228k

    Feedback much appreciated.

    Speaking at VITTA (Victorian IT Teachers Association Inc) Conference

    I’m speaking at the upcoming VITTA conference.

    Title:MySQL database administration for non DBAs

    Abstract: MySQL is incredibly ubiquitous. MySQL database administrators are not everywhere; MySQL is. Often MySQL is run to power a small web site or two, an application or two, or run on a machine purely for someone else’s use (and the install made MySQL just work so you don’t have to care). This session goes over the things you need to know about your MySQL installations to keep them healthy without burdening you with work, including MySQL Basics, installation, security, backup, restore, performance and upgrades.

    When: 12:15 PM, Wednesday 21 November 2007

    Should be fun!

    MySQL 5.1.22(ish)-stew1

    I’ve decided to publish my patch series. The goal of the -stew patches is to collect things I find interesting and that at some point could (should) make it into the main MySQL tree (even if others don’t think so).

    It’s not designed for use in production.. I don’t really care if there’s failing test cases…. if it builds it’s perfect.

    It includes the following which could be interesting:

    • Removal of ndb_use_exact_count (performance for NDB)
    • NDB node status in an INFORMATION_SCHEMA table
    • Compressed Backup and LCP for NDB
    • Cluster log as CSV
    • Skeleton Engine
    • MyHTTP Engine
    • PBXT Engine
    • Skeleton of MyBS support for NDB
      • (in the hope that somebody finishes it)

    Currently the additional engines have to be built separately in their storage/ENGINE directories. I have some preliminary patches to get them to build in-tree via the plug.in file, but it’s not finished (patches welcome).

    This is all currently based off the 5.1-ndb tree. In future, it will likely be based off 5.1-telco.

     http://www.flamingspork.com/mysql/patch-5.1-ndb-stew1-20071016.patch.gz

    and broken out in:
    http://www.flamingspork.com/mysql/patch-5.1-ndb-stew1-20071016/
    (including the not-quite-working ENGINE_in_tree_build patches)

    Known to apply against this tree:

    http://www.flamingspork.com/mysql/mysql-5.1-ndb-20071016.tar.bz2

    Comments, thoughts, patches to include, all welcome!

    Compressed LCP and Compressed Backup (and switching them on/off online)

    Quick experiment with online changing of enabling/disabling compressed backups and local checkpoints (LCPs).

    Backup is incredibly trivial and correct (even have some nodes do compressed, some not).

    LCPs are a bit trickier when it comes to restore… currently how the code sits is that a block using the compressed file interface in NDBFS must specify if it wants to use the compressed read/write interface or not. So when you have LCPs that differ in compressed/non-compressed than the current config file setting, you’re not going to be able to restore them (although setting CompressedLCP=1 should let you restore either compressed or non-compressed LCPs).

    At some point, I’ll probably move AsyncFile (our async file IO class) to just use azio alway, and modify azio to be transparent for non-compressed files…. I just have to fix up azio for direct io.

    Things that break while travelling….

    This year, it seesm that whenever I go out for significant travel, the following things will break on my trip:

    • a laptop power supply
    • a disk

    At least this time the disk is part of a RAID1 array.

    Oh, and for some reason my mythbackend stopped doing anything a few days ago…. and I wasn’t checking it. grr… annoying. At least there’s not much on TV.