NdbRecord

Kristian is currently talking about the new NdbRecord API for the NDBAPI and how it relates to ha_ndbcluster (the mysql storage engine, which uses ndbapi to talk to the cluster nodes) and how it can be used by ndbapi applications.

It looks like we’re getting a really neat API that avoids so much mess and makes it possible to write incredibly efficient mappings between what comes over the wire from data nodes and whatever internal structures the application wants to fill out.

Talking about this and Monty Taylor’s ORM mapping stuff could be very interesting.

Off to Stockholm (well, on Monday)

Having just moved apartments, it’s obviously time to get on a plane again.

On Monday I fly off to Stockholm again to attend the MySQL Cluster team meeting. Somehow we’re going to squeeze everybody into the Stockholm office (I’ll post humorous cramped photos, I promise).

Of course the thing to do now is to prepare for the meeting… packing can be done on sunday night or something.

Of course, if you’re in the area, come for food/beer!

telco/carrier grade MySQL cluster source trees on bkbits

Over at mysql.bkbits.net you can also get the “telco” (commonly known as CGE or “Carrier Grade Edition”) source trees of MySQL Cluster.

I think it’s exciting that we now have the source trees up here. You can use the freely available bk client to pull the sources or the commercial bitkeeper.

Since I just got back from the US for the MySQL UC (which was rather awesome) I don’t quite have the energy to go into the difference between normal mysql 5.1 and the telco trees… so wait for part 2 :)

adding a pluggable information schema table to a pluggable engine in mysql 5.1

Also now up is the patch series in my “ndb-work” tree which small patch for adding INFORMATION_SCHEMA.NDB_NODE_STATUS. It’s nearly useful… I haven’t brought in the nice “id to string” functions in the management client that make pretty printing nice… so not quite end user friendly :)

But it’s a nice patch to learn how to add an INFORMATION_SCHEMA table in a pluggable engine and put some engine specific information in it.

(kudos to the falcon code… which i looked at on how to do it).

Doesn’t take long – this was completed in less than 2hrs while watching and paying attention to sessions…. so should take next to no time if you actually concentrate on it.

Of course, this totally abuses the purity of the information schema.

Experimental NDB Patches

I’ve just put up the current “add node” patch… which is like, totally experimental and kills kittens… but could be interesting for people to have a look at as it progresses. Still lots of work before production ready – but people here at the MySQL Conf have said they’re interested in looking at the code for it.

You can grab a combined patch or the quilt series from:

http://saturn.flamingspork.com/~stewart/ndb-experimental/

Applies to 5.1… at least on a few weeks ago tree.

Zeroconf, conferences and privacy

So, probably like lots of people – i run a few web apps locally that I use for various purposes. In my case, this also includes some cool custom developed things.

I also use Zeroconf to easily discover all this foo around a network.

I run my critical mysql install by hand – it’s not constantly up. This is so, as somebody noticed (during Eben’s keynote at the MySQL Conference where he talked a lot about privacy) that one of the apps i run is entitled “tax”.

Since I’m somewhere other than at home, my mysql instance was stopped (much harder for people to grab the data out of it if the process isn’t running to begin with).

So yeah… good points – check what random people out on the network may have access to on your laptop – and know what you should not run as default (I’m careful there).

MySQL Conference: Day 2

Day 2 Photos

I gave my Intro to Cluster talk and then a Design and Internals of MySQL Cluster.

Also some photos from the DRBD BoF in the evening (which was really good). So was the BLOB streaming BoF earlier (but I didn’t take my camera out).

Currently in Eben’s keynote on Wednesday morning. As always, insightful and thought provoking.

World of awesome.

Arrived

Nine dollars (US) of Water (how many hours would somebody on minimum wage have to work to buy this 1.5L of water?):

$9USD water

Apart from that, jetlagged – managed to find food, TV, internet. All good.

I’ll be putting photos up on my gallery (which is running a MySQL Cluster 5.1 backend – with disk data) over at:

http://saturn.flamingspork.com/gallery/v/conf/mysql2007/

Nearly on way to the MySQL Conference

Tomorrow morning (11.5hrs time actually) I’ll be on a plane to SFO (then down to Santa Clara) in preparation for the MySQL Conference.

So, if you’re in the area – give us a buzz. My aussie phone will work, as will traditional email.

Also on IRC… should be easy to find me (freenode).

I heart recordMyDesktop

So, I wanted to get some feedback before I presented my sessions at the upcoming MySQL Conference (be there, it’ll be cool). I thought… hrrm.. distributed company… I can’t just ask a couple of people to listen to me in the conference room as we don’t really have one (apart from IRC).

So… I thought.. hrrm… didn’t i see something about screencasting on the program for linux.con f.au ? Well, the answer was yes – Screencasting HOWTO. Started watching – I then proceeded to try the list of screencasting software.

Istanbul didn’t work – I got images and audio, but only when there was a change to what was being displayed… so a static slide with me talking, didn’t work. Same with a similar python script.

I then grabbed recordMyDesktop and it worked. ./configure; make; ./src/recordMyDesktop  …. and ctrl-c when done.. encodes to Ogg Theora and *WORKS*

Brilliant.

I then got to convince some coworkers to spend time listening to me speak about stuff they may already know to test it before the conf.

MySQL Conf: Getting Drunk with Eben Moglen

So Jay Pipes pointed out that Eben Moglen is speaking at the upcoming MySQL Conference in his attention grabbing post: Getting Drunk with Eben Moglen.

I saw Eben speak at linux.conf.au 2005 in Canberra – which was totally totally awesome.

I’m really looking forward to seeing him again – honestly, it’s probably worth the conference admission fee just to see this session.

Q&A with MySQL Cluster content (my 2c thrown in)

Ivan mentions the Q and As from a Q&A session in which MySQL Cluster is mentioned – I thought I’d add my perspective here as well:

Q from Matthew: When are we likely to see disk based indexing for ndb?
Disk based indexing is planned in one of the future releases, but we can’t say when we will implement it. During the webinar, Anders pointed out that he does not see this as an important thing. I tend to agree with Anders, at least considering the current status of the storage engine. At the moment, ndb can perform an unbeatable job (in terms of HA and performance) on small transactions and simple queries and we should not consider it as a full replacement for the whole database, in general. The future versions of ndb will probably be more and more general purpose and at some point a full disk based ndb will be valuable. Please take this as my personal opinion.

Implementing disk based indexes is a fair bit of work… Certainly not this year (or early next). Sure, it’s a crucial step towards world domination… but it does have to sit in a priority queue of other steps.

Q from Malcolm: Is their any difference between MysQL Cluster and the telecoms version?
As Bertrand said, MySQL Cluster Carrier Grade is a specific version for telecom, developed closely with major equipment manufacturers. During the presentation I have highlighted some differences – such as the availability of more data nodes and so on. We will cover MySQL Cluster and MySQL Cluster Carrier Grade Edition in one of the future sessions.

It’ll be good to have a special session on the difference. The basic difference is that we’re a bit more selective about what patches go into the Carrier grade trees – and sometimes some features will go there first (when customers really need it). We will typically try to be less invasive in some areas too. Odds are though, if you’re not a telco, you don’t need it.

Q from Fabio: Any plan for MySQL Cluster for Windows?
We are considering it sometimes in the future, but no plans have been made so far.

Yes, this has been “being considered” for years. No, it’s not going to happen any time soon. Patches welcome.

Q from Owen: Is it difficult to define memory requirements for MySQL Cluster?
MySQL Cluster configuration is the most important step when you adopt this technology. We have seen several do-it-yourself configurations, running perfectly. But Cluster configuration is not straightforward and we always recommend to get some help from our Professional Services team.

Each time I patch ndb_size.pl it gets more accurate and is less outrageously wrong in some scenarios now :) It can help… although you also need to know what you’re measuring – and account for future growth.

Q from Alessandro: Is carrier grade avalaible for download?
As Bertrand said, please contact us at http://www.mysql.com/company/contact/ if you are interested in MySQL Cluster Carrier Grade for telecom customers

I beleive the plan is to publish the BK trees as well… but certainly not the supported way to run it.

There was also some talk on DRBD and shared disk clusters. Neither of these prevent against file system corruption. Also, if using a non-crash safe engine (e.g. MyISAM) when you fail over you’ll probably have to do a bunch of table checks – not exactly HA.

Record autotest numbers for NDB

So, with a bunch of recent tests I added (and some bugs that have been fixed) we’re now consistently getting 203 or 204 passing tests. We’ve got typically around 8 or 9 that often fail – often because the test may be broken or not quite deterministic. Or there’s a bug… :)

(all numbers for the daily-basic list of tests for various 5.1 branches).

It would be great to hit 300 by this time next year… which means a lot of test cases… hrrm… anybody want to volunteer?

MySQL Conf coming up (and memories of last year)

Andy Dustman just blogged referencing his previous posts on last years MySQL User Conference. This years is coming close (April 23-26) and the pressure to have all my presentations all perfect is mounting (err.. by the way, they will be).

Last year was a blast. Long days (and into the evenings) with sessions, BoFs, food and beer discussing all sorts of things that in some way related back to databases (and rather often, surprisingly enough, MySQL).

What was also great was being able to talk to lots of people who are doing real things out in the real world abotu MySQL Cluster and if it’s remotely suitable to their application. Often the answer can be “I think you’re looking for replication”, which is perfectly okay too.

I’m in a few days early (and around a few days after) – so if you’re around the area do give me a yell – it’d be cool to hang.

FYI, I’m giving the following sessions:

  • MySQL Cluster: The Complete Tutorial (Parts I and II)
    Which is a total of 6hrs of MySQL Cluster goodiness. It’s aimed at people who know MySQL (or are pretty good with other RDBMSs and can fake it) and are wanting to know about MySQL Cluster. It’s a hands-on tutorial, so be prepared!
  • Introduction to MySQL Cluster
    A 45minute whirlwind introduction to MySQL Cluster. Assumes some MySQL knowledge. Good if you’ve heard about this cluster thing (even from just reading the title of this session) and want to know what it’s all about.
  • Exploring New Features in MySQL 5.1 Cluster
    A 45 minute blast of a session on what’s new for MySQL Cluster in the 5.1 release. This will cover just about everything that was in my last years presentation on the same topic. So if you came to last years and come to this one again… I’m going to make fun of you for being a groupie :)
  • Bleeding Edge MySQL Cluster: Upcoming Cool Things
    A whole hour on the stuff you shouldn’t use in production. The topic list is sort-of known… it really is what is the latest and greatest that should be coming to a tree somewhere, sometime… this year. We’ll no doubt talk about online add node, online add/drop attribute, multithreaded NDB kernel, API improvements and a whole lot more!
  • The Design and Internals of MySQL Cluster
    What happens under the hood in MySQL Cluster? Find out here! An hour for those with the real technical mind. If source code and network protocol discriptions scare you, possibly not for you – expect an hour of coolness.

Yes, there seems to be a “Stewart” track at the conf :) Aparrently people enjoyed my session last year… so there was a tendancy to accept my sessions this year.

Patching your mission-critical email syncing software on your life setup… my OfflineIMAP patch for today

I’ve used OfflineIMAP for quite a while now. On the whole I’m fairly happy with it. Today I sent this to the list:

Forgive the potentially bad python, not my native tongue :)

This patch is motivated by three things:
- offlineimap is extremely slow at syncing lots of locally deleted
messages
- offlineimap uses lots of memory
- LocalStatus files aren't written safely (a hard crash can cause
corruption)
        - I've been bitten by this in the past, causing a complete resync of
the folder... so I get duplicate messages.

I am currently using 4.0.14 (from Debian) with this patch. I used it to
convert the files and everything. Seems quite reliable and quick.

In my tests, execution time for a normal sync is relatively the same.

Execution time for when lots of messages have been deleted in a
reasonably sized folder (e.g. during re-organisation of mail folders) is
as much as 10x faster.

In my tests, running with 1 thread uses as much as 20% less memory with
this patch (i.e. about 160MB instead of 200MB+ for my maildir)

Disk space used by the LocalStatus files isn't much more... for me it
looks like it's 6.5MB now versus 4.5MB then. We get the added benefit of
indexes for all our queries... nice :)

I had disable the threading for copying messages as this means that
LocalStatus objects are shared between threads, which pysqlite doesn't
like (it asserts).

I think the part of this patch that implements the uidexists does
actually slow things down compared with having the messagelist.... a
more optimal implementation may be possible, but I think the other speed
improvements (and memory savings) are worth it.

A future patch may convert other storage types to sqlite (or similar) to
further reduce memory consumption (and hopefully runtime).

This does add a dependency on pysqlite... which is packaged in debian
(and ubuntu) - and i'm using the stock packages for these.

Comments very much appreciated.
Of course, the patch is here. I’m using it now… although I’ll warn you that it does update your .offlineimap to a new format (and doesn’t provide you a way to go back, without restoring the backed-up LocalStatus files and probably getting message duplicates).

So, those around the MySQL circles I tend to hang around may ask “Why not libmysqld?” (the embedded MySQL server). Well… a few reasons… sqlite is file-per-db (even though I’m essentially using file-per-table here), the python bindings are everywhere (and work), it’s tiny and crash safe.

You may also ask “Why?”… well, I’ve been re-organising a bunch of mail folders, which means deleting a *lot* of messages from some folders (and moving them to others).. offlineimap has been really slow at this. So I fixed it, with code (not whining).

I also wrote a bit-of-a-hack perl script to remove duplicate messages from a bunch of folders (a bug in offlineimap had caused me to get several copies of each message in a bunch of my folders a while ago). So that script is here. Commented out are bits to do comparison via md5 as well as message-id. Don’t use unless you know what you’re doing… it may also use a few hundred MB RAM on large (few hundred thousand messages) folder.

Hopefully these will help improve my productivity.
Now, back to my regular programming….