popcon-historical: a tool for monitoring package popularity in debian/ubuntu

I’ve just uploaded (where ‘just’ is defined as “a little while ago”) popcon-historical to github. It’s a rather rudimentary way to look at the popcon data from Debian and Ubuntu over time. It loads all the data into a Drizzle database and then has a small perl web-app to generate graphs (and CSV).

Github: https://github.com/stewartsmith/popcon-historical

I’ve also put up a project page on it: https://flamingspork.com/popcon-historical/

An example graph is this one of Percona Toolkit vs Maatkit installs in Ubuntu over time:

You can actually get it to graph any package (which, unlike the graphs on debian.org, the package does not have to be in the Debian archive to graph it over time – it can be a package from third party repos).

“We open source it, and then developers show up and do work for free”

Those who have been around the free and open source software world long enough have heard “We open source it, and then developers show up and do work for free” at least once and have called bullshit on it at least once.

It turns out that people don’t go and work on software for free. They are either modifying software to scratch their own itch (in which case they’re getting 99+% of the code for nothing, so contributing a small bit back is the equivalent of paying for it – with their time rather than money) or it’s a good bit of fun.

So why do software projects that are dual licensed with a commercial license get fewer outside contributions? I think it’s quite simple: people don’t tend to spend their spare time making other people money while making none for themselves. Simply, these projects are left with only contributions from those being paid to work on it (usually by the company who sells the commercial license) and people/companies scratching an itch. Projects that aren’t dual licensed are more likely to have contributors from several companies as then it’s not all-but-one company spending time and money to make another company money.

Stewart’s dot twenty rule

I realised I haven’t written on this for a while and I was asked about it again today.

Stewart’s dot twenty rule is that a piece of software is never really mature until a dot twenty release.

This was a variant of “never use a dot zero release” which has been around the industry for a long time (i.e. always wait for X.0.1).

My first written observation on my variant on this rule was back in 2006:

This is a really stupid metric of software maturity. It is, however, disturbingly accurate.

It seems to continue to be both really stupid and disturbingly accurate. The first few point releases are still going to have rough edges and once you get to about 5 you likely have something that’s intensely usable for a good number of people, by dot 10 the more complex use cases should start to be okay and once you get to dot twenty, then you could say it’s mature.

A topic for another time is how releasing often is one thing but maintaining a release is quite another.

Previously:

An argument for popcon

There is a package called popularity-contest that’s available in both Debian and Ubuntu (and likely other Debian derivatives). It grabs the list of packages installed on the machine and submits it to the Debian or Ubuntu popularity contests.

There you can see which are the most popular packages in Debian and Ubuntu. Unsurprisingly, dpkg, the package manager is rather popular.

Why should you enable it? Looking at popcon results are solid numbers as to how many users you may have. Although the absolute numbers may not be too accurate, it’s a sample set and if you examine the results over time you can start to get an idea on if your software is growing in popularity or not.

But there’s something more than that, if you can prove that a lot of people are installing your software on Debian, then you’re likely going to be able to argue for more work time being spent on improving the packaging for Debian.

Quite simply, enabling popcon is a way to help people like me argue for more time being spent on making Debian better.

DevStack woes

DevStack is meant to be a three step “get me an openstack dev environment” thing. You’re meant to be able to grab a fresh installation of something like Ubuntu 12.04 or Fedora and “git clone $foo && cd devstack && ./stack.sh”, wait a while and then be able to launch instances.

This much does work.

What does not work is being able to ssh to those instances. The networking is completely and utterly broken. I have tried both Ubuntu and Fedora in a fresh VM (KVM, on an Ubuntu host) and have asked a variety of experts for help. No dice.

What I want to hear is a way to remotely get it going locally, in a VM.

At the moment I’m tempted to submit a pull request to the devstack website adding a 4th step of “muck around for a few days before giving up on ever being able to ssh into a launched instance as these instructions are wrong”.

Switching to Fedora from Ubuntu

I’ve run Ubuntu on my desktop (well… and laptop) since roughly the first release back in 2004. I’ve upgraded along the way, with reinstalls on the laptop limited to changing CPU architecture and switching full disk encryption.

Yesterday I wiped Ubuntu and installed Fedora.

Previously to Ubuntu I ran Debian. I actually ran Debian unstable on my desktop/laptop. I ran Debian Stable on any machines that had to sit there and “just work” and were largely headless. Back then Debian stable just didn’t have even remotely recent enough kernel, X and desktop apps to really be usable with any modern hardware. The downside to this was that having an IRC client open to #debian-devel and reading the topic to find if sid (codename for the unstable distribution) was pretty much a requirement if you ever thought about running “apt-get dist-upgrade”. This was exactly the kind of system that you wouldn’t give to non-expert family members as a desktop and expect them to maintain it.

Then along came Ubuntu. The basic premise was “a Debian derived distribution for the desktop, done right.” Brilliant. Absolutely amazingly brilliant. This is exactly what I wanted. I had a hope that I’d be able to focus on things other than “will dist-upgrade lead to 4 hours of fixing random things only to discover that X is fundamentally broken” and a possibility that I could actually recommend something to people who weren’t experts in the field.

For many years, Ubuntu served well. Frequent updates and relatively flawless upgrades between releases. A modern desktop environment, support for current hardware – heck, even non computer literate family members started applying their own security updates and even upgrading between versions!

Then, something started to go awry…. Maybe it was when Ubuntu shipped a kernel that helpfully erased the RAID superblock of the array in the MythTV machine… Maybe it was when I realized that Unity was failing as a basic window manager and that I swore less at fvwm…. Or maybe it was when I had a bug open for about 14,000 years on that if you left a Unity session logged in for too long all the icons in the dock would end up on top of each other at the top left of the screen making it rather obvious that nobody working on Ubuntu actually managed to stay logged in for more than a week. Or could it be that on the MythTV box and on my desktop having the login manager start (so you can actually log in to a graphical environment) is a complete crapshoot, with the MythTV box never doing it (even though it is enabled in upstart… trust me).

I think the final straw was the 13.04 upgrade. Absolutely nothing improved for me. If I ran Unity I got random graphics artifacts (a pulldown menu would remain on the screen) and with GNOME3 the unlock from screensaver screen was half corrupted and often just didn’t show – just type in your password and hope you’re typing it into the unlock screen and it hasn’t just pasted it into an IM or twitter or something. Oh, and the number of times I was prompted for my WiFi network password when it was saved in the keyring for AT LEAST TWO YEARS was roughly equivalent to the number of coffee beans in my morning espresso. The giant regressions in graphics further removed any trust I had that Mir may actually work when it becomes default(!?) in the next Ubuntu release.

GNOME3 is not perfect… I have to flip a few things in the tweak tool to have it not actively irritate me but on the whole there’s a number of things I quite like about it. It wins over Unity in an important respect: it actually functions as a window manager. A simple use case: scanning photos and then using GIMP to edit the result. I have a grand total of two applications open, one being the scanning software (a single window) and the other being the GIMP. At least half the time, when I switch back to the scanning program (i.e. it is the window at the front, maximized) I get GIMP toolbars on top of it. Seriously. It’s 2013 and we still can’t get this right?

So… I went and tried installing Fedora 19 (after ensuring I had an up to date backup).

The install went pretty smoothly, I cheated and found an external DVD drive and used a burnt DVD (this laptop doesn’t have an optical drive and I just CBF finding a suitably sized USB key and working out how to write the image to it correctly).

The installer booted… I then managed to rather easily decrypt my disk and set it to preserve /home and just format / and /boot (as XFS and ext3 respectively) and use the existing swap. Brilliant – I was hoping I wouldn’t have to format and restore from backup (a downside to using Maildir is that you end up with an awful lot of files). Install was flawless, didn’t take any longer than expected and I was able to reboot into a new Fedora environment. It actually worked.

I read somewhere that Fedora produces an initramfs that is rather specific to the hardware you’re currently running on, which just fills me with dread for my next hardware upgrade. I remember switching hard disks from one Windows 98 machine to another and it WAS NOT FUN. I hope we haven’t made 2013 the year of Windows 98 emulation, because I lived through that without ever running the damn thing at home and I don’t want to repeat it.

Some preferences had to be set again, there’s probably some incompatibility between how Ubuntu does things and how Fedora does things. I’m not too fussed about that though.

I did have to go and delete every sign of Google accounts in GNOME Online Accounts as it kept asking for a password (it turns out that two-factor-auth on Google accounts doesn’t play too nice). To be fair, this never worked in Ubuntu anyway.

In getting email going, I had to manually configure postfix (casually annoying to have to do it again) and procmail was actually a real pain. Why? SELinux. It turns out I needed to run “restorecon -r /home”. The way it would fail was silently and without any error anywhere. If I did “setenforce 0” it would magically work, but I actually would like to run with SELinux: better security is better. It seems that the restorecon step is absolutely essential if you’re bringing across an existing partition.

Getting tor, polipo and spamassasin going was fairly easy. I recompiled notmuch, tweaked my .emacs and I had email back too. Unfortunately, it appears that Chromium is not packaged for Fedora (well.. somebody has an archive, but the packages don’t appear to be GPG signed, so I’m not going to do that). There’s a complaint that Chromium is hard to package blah blah blah but if Debian and Ubuntu manage it, surely Fedora can. I use different browsers for different jobs and although I can use multiple instances of Firefox, it doesn’t show up as different instances in alt-tab menu, which is just annoying.

It appears that the version of OTR is old, so I filed a bug for that (and haven’t yet had the time to build+package libotr 4.0.0 – but it’s sorely needed). The pytrainer app that is used to look at the results of my Garmin watch was missing some depedencies (bug filed) and I haven’t yet tried to get the Garmin watch to sync… but that shouldn’t be too hard…

The speakers on my laptop still don’t work – so it’s somebody screwing up either the kernel driver or pulseaudio that makes the speakers only sometimes work for a few seconds and then stop working (while the headphone port works fine).

On the whole, I’m currently pretty happy with it. We’ll see how the upgrade to Fedora 20 goes though…. It’s nice using a desktop environment that’s actually supported by my distribution and that actually remotely works.

An old note on the Storage Engine API

Whenever I stick my head into the MySQL storage engine API, I’m reminded of a MySQL User Conference from several years ago now.

Specifically, I’m reminded of a slide from an early talk at the MySQL User Conference by Paul McCullagh describing developing PBXT. For “How to write a Storage Engine for MySQL”, it went something like this:

  1. Develop basic INSERT (write_row) support – INSERT INTO t1 VALUES (42)
  2. Develop full table scan (rnd_init, rnd_next, rnd_end)  – SELECT * from t1
  3. If you’re sane, stop here.

A lot of people stop at step 3. It’s a really good place to stop too. It avoids most of the tricky parts that are unexpected, undocumented and unlogical (yes, I’m inventing words here).

MySQL vs Drizzle plugin APIs

There’s a big difference in how plugins are treated in MySQL and how they are treated in Drizzle. The MySQL way has been to create a C API in front of the C++-like (I call it C- as it manages to take the worst of both worlds) internal “API”. The Drizzle way is to have plugins be first class citizens and use exactly the same API as if they were inside the server.

This means that MySQL attempts to maintain API stability. This isn’t something worth trying for. Any plugin that isn’t trivial quickly surpasses what is exposed via the C API and has to work around it, or, it’s a storage engine and instead you have this horrible mash of C and C++. The byproduct of this is that no core server features are being re-implemented as plugins. This means the API is being developed in a vacuum devoid of usefulness. At least, this was the case… The authentication plugin API seems to be an exception, and it’s interesting to note that semisync replication is in fact a plugin.

So times may be changing… sort of. Yesterday I noted that some storage engine API features are only available if you’re InnoDB and I’ve voiced my general disappointment in the audit API being unsuitable to implement various forms of query logging already in the server (general query log, slow query log).

One thing to note: when the API is the same for both inside the server and a plugin, it makes initial refactoring very easy, and you quickly see the bits that could be improved.

Some storage engine features you only get if you’re InnoDB

I had reason to look into the extended secondary index code in MariaDB and MySQL recently, and there was one bit that I really didn’t like.

MariaDB:

share->set_use_ext_keys_flag(legacy_db_type == DB_TYPE_INNODB);

MySQL:

use_extended_sk= (legacy_db_type == DB_TYPE_INNODB);

In case you were wondering what “legacy_db_type” actually does, let me tell you: it’s not legacy at all, it’s kind of key to how the whole “metadata” system in MySQL works. For example, to drop a table, this magic number is used to work out what storage engine to call to drop the table.

Now, these code snippets basically kiss goodbye to the idea of a “pluggable storage engine” architecture. If you’re not InnoDB, you don’t get to have certain features. This isn’t exactly MySQL or MariaDB encouraging an open storage engine ecosystem (quite the opposite really).

Having the MySQL server have this incredibly basic, busy and incomplete understanding of metadata has always been a bit of a mess. The code for reading a table definition out of the FRM file really does show its age, and has fingers all through the server.

If somebody was serious about refactoring server code, you’d certainly be looking here, as this code is a major source of arbitrary limitations. However, if you have the server and the engine(s) both having separate views of what is the “correct” state of metadata you end up with a mess (anyone who has had InnoDB be out of sync with FRMs knows this one). I worry that the FRM code will be replaced with something even less understandable by humans, again making the mistake that the server knows the state of the engine better than the engine does.

See Also:

New libeatmydata release

Good news everyone! There’s a new libeatmydata release! I’ve put a source tarball up on the launchpad page: release-79.

This version packs:

  • RPM and debian packaging in tree
  • A bug fix so that O_SYNC and O_DSYNC are properly discarded on 32bit machines both with and without _FILE_OFFSET_BITS being set.

I’d love to hear any feedback and receive any patches (hopefully things still work well on MacOS X and Solaris). So far, libeatmydata has had contributions from the following people, and many thanks to them:

  • Stewart Smith
  • Alexey Bychko
  • Blair Zajac
  • Phillip Susi
  • Modestas Vainius
  • Monty Taylor
  • Olly Betts
  • Pavel Pushkarev
  • Elliot Murphy
  • Eric Wong
  • Tamas TEVESZ
  • Joachim Berdal Haga
  • Mohsen Hariri

The EXAMPLE storage engine

The Example storage engine is meant to serve mainly as a code example of the stub of a storage engine for example purposes only (or so the code comment at the start of ha_example.cc reads). In reality however, it’s not very useful. It likely was back in 2004 when it could be used as a starting point for starting some simple new engines (my guess would be that more than a few of the simpler engines started from ha_example.cc).

The sad reality is the complexity of the non-obviousness of the bits o the storage engine API you actually care about are documented in ha_ndbcluster.cc, ha_myisam.cc and ha_innodb.cc. If you’re doing something that isn’t already done by one of those three engines: good luck.

Whenever I looked at ha_example.cc I always wished there was something more behind it… basically hoping that InnoDB would get a better and cleaner API with the server and would use that rather than the layering violations it has to do the interesting stuff.

That all being said, as a starting point, it probably helped spawn at least a dozen storage engines.

The ARCHIVE Storage Engine

I wonder how much longer the ARCHIVE storage engine is going to ship with MySQL…. I think I’m the last person to actually fix a bug in it, and that was, well, a good number of years ago now. It was created to solve a simple problem: write once read hardly ever. Useful for logs and the like. A zlib stream of rows in a file.

You can actually easily beat ARCHIVE for INSERT speed with a non-indexed MyISAM table, and with things like TokuDB around you can probably get pretty close to compression while at the same time having these things known as “indexes”.

ARCHIVE for a long time held this niche though and was widely and quietly used (and likely still is). It has the great benefit of being fairly lightweight – it’s only about 2500 lines of code (1130 if you exclude azio.c, the slightly modified gzio.c from zlib).

It also use the table discovery mechanism that NDB uses. If you remove the FRM file for an ARCHIVE table, the ARCHIVE storage engine will extract the copy it keeps to replace it. You can also do consistent backups with ARCHIVE as it’s an append-only engine. The ARCHIVE engine was certainly the simplest example code of this and a few other storage engine API things.

I’d love to see someone compare storage space and performance of ARCHIVE against TokuDB and InnoDB (hint hint, the Internet should solve this for me).

The MySQL Cluster storage engine

This is one close to my heart. I’ve recently written on other storage engines: Where are they now: MySQL Storage Engines, The MERGE storage engine: not dead, just resting…. or forgotten and The MEMORY storage engine. Today, it’s the turn of MySQL Cluster.

Like InnoDB, MySQL Cluster started outside of MySQL. Those of you paying attention at home may notice a correlation between storage engines not written exclusively for MySQL and being at all successful.

NDB (for Network DataBase) started inside Ericsson, originally written in a language called PLEX, which was internal to Ericsson and used in the AXE telephone switches. Mikael Ronstrom’s PHD thesis covered NDB and even covered things that (at least were) yet to be implemented (it’s been quite a few years since I leafed through it last). The project at Ericsson (IIRC) was shelved a couple of times, but eventually got spun out into an Ericsson Business Innovation company called Alzato.

Some remnants of PLEX can still be found in the NDB source code (if you look really hard that is). At some point the code was fed through a PLEX to C++ converter and development continued from there. Some of the really, really old parts of the source may seem weird either due to this or some hand optimization for SPARC processors in the 1990s.

In 2003, MySQL AB acquired Alzato and work on a storage engine plugin for MySQL to interface to the (C++ API only) NDB was underway. Seeing as the storage engine interface was so simple, easy and modular it would only take several years for the interface to NDB to become mature.

The biggest problem: NDB itself worked really well if your workload fit exactly what it was good at… if you deviated, horrific performance and/or crashes were not as uncommon as we’d have liked. This was a source of strain for many years with the developers and support team on one side and some of the less-than-careful sales team on the other. That being said, there have been some absolutely awesome sales people selling NDB into markets it truly fits, and this is why there’s barely a place in the world where placing a mobile phone call doesn’t go through MySQL Cluster at some point.

You should read Tomas Ulin’s post Celebrating 10 years @MySQL for a bit of an insight into how Alzato became part of MySQL AB (which later became part of Sun which became part of Oracle).

I joined the MySQL Cluster team at MySQL in December 2004, not too long after Alzato was acquired, but certainly when the NDB storage engine in MySQL 4.1 was in its very early stages – it was then by no means a general purpose database.

Over the years, MySQL Cluster gained both traction and features, making it useful for more applications. One of the biggest marketing successes of MySQL was the storage engine architecture and how you could just “plug in” different engines. The reality (of course) was far different and even though MySQL Cluster did just “plug in” to MySQL, it was certainly not a drop in replacement.

In MySQL 5.0, a bunch of neat new features were added:

  • Engine condition pushdown
    This enabled conditions on non-indexed columns to be evaluated on the data nodes rather than having every row pulled up to the SQL node to be evaluated.
  • Batched read interface
    So that queries like SELECT FOO FROM BAR WHERE A IN (1,2,3) were executed as a single network round trip rather than 3 round trips.
  • Query cache
    Although the query cache should die, hey, at least it worked with NDB now…. in a way.
  • Reduced IndexMemory usage
    Remember, NDB is an in-memory database, so saving a bunch of bytes for secondary indexes was a big thing.

the first release with things I really worked on was MySQL 5.1. My first talk (to a packed room) at the MySQL User Conference in 2006 was on new features in MySQL Cluster 5.1. I’m still quite proud of that talk even though I know I am a much better speaker than I was then (It would have been great to have had more guidance… but hey, learning from experience is good too).

We added a lot in 5.1:

  • Integration with replication
    This is where row based replication was born. It was a real team effort with the NDB kernel part (going from memory and bzr logs) having been written by Tomas and Jonas seems to have a bunch of code there too. I worked a bunch on the NDB Injector thread in mysqld, Mats worked on the core row based code (at the time the most C++ like code in the entire MySQL world). You could now have a cluster replicate to another cluster with the giant bottleneck that is MySQL replication.
  • disk data
    You could store non-indexed columns on disk. I implemented the INFORMATION_SCHEMA.FILES table for this, I was young and naive enough to think that the InnoDB guys would also fill out this table and all would be happy with the world (I’m lucky I haven’t been holding my breath on this one).
  • Variable Sized columns
    A VARCHAR(255) would actually not always use more than 255bytes if you just stored a single character in it. Catch? Only for in-memory columns.
  • User defined partitioning
    Because NDB desperately needed more options, we let the user choose how they wanted to partition up their data (per table).
  • Autodiscovery of schema changes
    This was a giant workaround to the epic mess that is FRM files and data dictionary things inside the MySQL Server. It is because of all this code that when I went to rewrite the whole thing for Drizzle I took the approach of “just pass it down to the engines, the server must not attempt to know better”. FWIW, I’m still right: if the server tries to be clever you now have two places for bugs to be, not just one.
  • Distribution awareness
    i.e. better selection of which data node to talk to for a particular query, reducing latency.
  • Online add/drop index.
    How long did it take for other engines to get this? Let’s not think about that :)

After that the really interesting stuff started to happen, that is, the first major fork of MySQL: MySQL Cluster Carrier Grade Edition (CGE). Why? We had customers that simply couldn’t wait for MySQL 6.0 (after all, they’d still be waiting).

We had MySQL Cluster CGE 6.1, 6.2, 6.3 and now we’re into 7.0, 7.1 and 7.2. There is without doubt that it’s the longest serving and surviving MySQL fork. There were non-trivial changes inside the MySQL server too, which caused enough of a merge problem for the (small) Cluster team.

One big thing that you’re probably still all waiting for? Replication conflict detection and resolution in circular/multi-master replication setups. It was an NDB first and been used in production for a decent amount of time.

I remember a hack while on an airplane led to the CompressedBackup and CompressedLCP options (used zlib when writing out checkpoints/backups) – something that took more time than you’d think to go from prototype to production ready code.

The last few things I worked on in MySQL Cluster before going and working full time on Drizzle was the Windows port, online add/drop node and NDBINFO.

I’ve left out so many cool MySQL Cluster things that were worked on over the years (e.g. online add/drop column, rewriting of LCP code, micro GCPs, crash-safe DDL, the test suite). I really should mention the test suite, in lines of code it was over three times that of MyISAM.. and that was probably six years ago that I worked that out.

One thing to think about: when Innobase Oy was bought by Oracle and there was this effort to have a transactional storage engine that was inside MySQL AB rather than another company, I pointed out that I thought it would take less time adding the needed features to NDB and integrating it inside the MySQL server binary (and with the addition of online add node you could go from stand alone DB server to a full cluster with no down time) than it would for any of the alternatives to get to a suitable level of maturity.

I wish I put money on this… I put money on the MySQL 5.1 GA release date (which I was happy to loose), but in the years since you can see that InnoDB is still reigning supreme with all that came to replace it having fallen away for one reason or another. It’s still on track to have MySQL Cluster be the only real alternative (now also, funnily enough, owned by Oracle). I have to say, it’s kind of a hollow victory though, it would have been nice to see Falcon and PBXT be serious players in today’s market.

A few points on talking about the internet

There are a few things you should keep in mind when talking about the internet:

  • Use of the word “cyber” is not cool.
  • Whenever you hear the word “cyber” substitute it with “Information Super-Highway”… yes, it sounds that dated.
  • Use of the word “cyber” is applicable only in discussions relating to Doctor Who.
  • Whenever you see “facebook.com/BLAH” just think “AOL Keyword”. If you don’t know what AOL was, I likely have 437 trial CDs you can have.
  • There is no differentiation between life and online-life – just about everything is internet connected now. This very much counts for speech vs online speech.

A reminder: Leah and I are running in the upcoming MS Fun Run raising money for Australians affected by Multiple Sclerosis (MS). If you don’t sponsor us you’re going to die poor and alone. Really, I have an arrangement with all known deities to make it happen. Sponsor us here: http://register.mswalk.org.au/2013-MS-Walk-and-Fun-Run-Melbourne/Meep

OMG we appear to have entered a run for MS…

We are team MEEP – as that is the noise Beaker the cockatiel, our team mascot, makes. Stewart is going for 10 km, Leah is planning for 5 km … and Beaker will enjoy having two tired people to sit on afterwards. :)

We are taking part in the 2013 MS Walk and Fun Run in order to raise funds for people affected by multiple sclerosis (MS). MS is is the most common disease of the central nervous system and affects more than 23,000 Australians.

Did you know?

  • The average age of diagnosis of MS is just 30 years
  • MS affects three times as many women as men

MS Australia aims to minimize the impact of multiple sclerosis on all individuals affected by the disease, as well as their families, carers and the community, by offering a wide range of services, equipment and support. MS Australia’s goal is to assist everyone affected by MS to live life to their fullest potential and secure the care and support they need, until we ultimately find a cure.

You can sponsor us here: http://register.mswalk.org.au/2013-MS-Walk-and-Fun-Run-Melbourne/Meep

TokuDB

Big news at Percona Live MySQL Conference and Expo this week, Tokutek open sourced TokuDB thus making my previous post Where are they now: MySQL Storage Engines out of date in just a few days.

In this case, I really don’t mind. It’s rather exciting that they’ve gone ahead and done this –  and it’s not just a code drop: https://github.com/Tokutek/ft-engine is where things are at, and recent commits were 2hrs and 18hrs ago which means that this is being maintained. This is certainly a good way to grow a developer community.

While being a MySQL engine is really interesting, the MongoDB integration is certainly something to watch – and may be something quite huge for Tokutek (after all, it’s the database engine parts of Mongo that are the most troublesome – along with the client library license).

A few notes on InnoDB in MySQL 5.7.1

I’ve started poking around the MySQL 5.7.1 source tree (although just from tarball as I don’t see a BZR tree yet). I thought I’d share a few thoughts:

  • InnoDB temporary tables. Not REDO logged. What does this mean? It’s a huge step in removing the dependency on MEMORY and MyISAM engines for temporary tables used in query execution. With InnoDB temporary tables there is no reason for MEMORY engine to continue to exist, there is absolutely no way in which it is better.
  • InnoDB temp tables aren’t insert buffered
    This probably doesn’t really matter as you’re not going to be doing REDO logging for them (plus things are generally short lived)… but it could be a future area for performance improvement
  • The NO_REDO log mode appears to be implemented fairly neatly.
  • Improvements in innodb read only mode. What does this mean? Maybe we can finally get rid of the oddity of compressed read only MyISAM tables on read only media. (on the other hand, CDs and DVDs aren’t exactly a modern form of software distribution).
  • Some of the source code comments have improved.. it’s getting easier to understand InnoDB. I’d still make the argument that if you need source code comments you’re code isn’t clear enough… but any step is an improvement. (that being said, InnoDB was always easier than the server)
  • There is some pretty heavy refactoring of lock0lock.cc – I really need to sit down and poke at it a bunch.
  • The shared tablespace code (innodb system tablespace) has been heavily refactored. This also introduces tablespaces for temporary tables – and it appears to be implemented in the correct way.

I need to look into things a bunch more, and it’ll be really useful to see a bzr tree to better understand some of the changes.

More to come later, but that’s my quick look.

The MEMORY storage engine

I recently wrote about Where are they now: MySQL Storage Engines and The MERGE storage engine: not dead, just resting…. or forgotten. Today, it’s the turn of the MEMORY storage engine – otherwise known as HEAP.

This is yet another piece of the MySQL server that sits largely unmaintained and unloved. The MySQL Manual even claims that it supports encryption… with the caveat of having to use the SQL functions for encryption/decryption rather than in the engine itself (so, basically, it supports encryption about as much as every other engine does).

The only “recent” innovation in the MEMORY engine was the dynamic row patch that ended up making its way into Percona Server (and isn’t enabled by default). This forced me to go and look at the code of the MEMORY engine again and I cannot possibly drink enough in my lifetime to erase the memory.

The MEMORY engine is used by just about everybody as you probably have a SQL query somewhere that uses an in memory temporary table. I can, however, feel the comments being added to this post right now by people who use gdb to set server variables that not a single query in their systems use MEMORY….. (IIRC there have been some patches around that would throw an error rather than create a temporary table)

We had a early version of the dynamic row format patch in Drizzle for a while… and if you turned it on, all sorts of things horrifically broke. It was a remarkably non-trivial amount of work to get that code to work properly and this is largely a testament to the “design” of the MEMORY engine.

While it may be efficient or fast or something (likely on 1990s hardware and workloads), it misses the boat completely on the things that matter today: simultaneous access, MVCC, BLOB/TEXT columns and transactions. Basically, it’s a engine that’s really only useful for a single connection in limited use cases…. and even then, it’s likely a good way to ruin things. MyISAM is better as at least on memory pressure things may be written out to disk sensibly…. and if InnoDB had a “don’t log this table” mode it would beat that absolute pants off it.

It is, again, another part of the MySQL server that’s remarkably hard to pull out and replace with something different/better. Why? Well, I wrote about it before: Refactoring Internal temporary tables (another stab at it). If it was easy, we’d likely have Tokyo Cabinet (via BlitzDB) or similar (some bit of code maintained by other people) doing the same job in Drizzle rather than this large chunk of code that nobody really cares about.

The MERGE storage engine: not dead, just resting…. or forgotten.

Following on from my fun post on Where are they now: MySQL Storage Engines, I thought I’d cover the few storage engines that are really just interfaces to a collection of things. In this post, I’m talking about MERGE.

The MERGE engine was basically a multiplexer down to a number of MyISAM tables. They all had to be the same, there was no parallel query execution and it saw fairly limited use. One of the main benefits was that then you could actually put more rows in a MyISAM table than your “files up to 2/4GB” file system allowed. With the advent of partitioning, this really should have instantly gone away and been replaced by it. It wasn’t.

It is another MySQL feature that exists likely due to customer demand at the time. It’s not a complete solution by any means, PARTITIONING is way more complete and universal…. and much harder to get right inside the MySQL server – which is why MERGE exists. It was easier to write a storage engine that wrapped MyISAM than it was to have any form of partitioning in the server.

One advantage of MERGE tables is it means that you could parallelize myisamchk to repair your broken MyISAM tables after a crash. One step better than no crash safety is at least parallel recovery. The disadvantage being that you’re using MERGE and MyISAM tables.

There is also the great security problem of MRG_MYISAM (the other name for MERGE tables): if you create a MyISAM table t1 and have a user able to access it, if they can create a MERGE table that accesses t1 (say m1) and you then revoke their access to t1, they’ll still be able to access t1 through m1.

MERGE still seems to exist in MySQL 5.6 without even a warning that it’ll go away… which I suspect it will…. we long since got rid of it in Drizzle as, well, what you really want is a query rewrite engine that does views, partitioning etc etc.

Can anyone think of a reason why you should still use MERGE tables in 2013? I can’t.