Which is bigger: MySQL or PostgreSQL?

From my previous posts, we have some numbers (excluding NDB) for the size of MySQL, so what about PostgreSQL? Here, I used PostgreSQL git trunk and classing things in the contrib/ directory as plugins. I put the number of lines of code in the src/backend/storage directory down as storage engines LoC but did not count it as non-kernel code.

Version Total LoC Plugin LoC Storage Engines LoC Remaining (kernel)
MySQL 5.5.30 858,441 2,706 171,009 684,726 (79% kernel)
MySQL 5.6.10 1,049,344 29,122 236,067 784,155 (74% kernel)
MariaDB 5.5 1,142,118 11,781 304,015 826,322 (72% kernel)
Drizzle trunk 334,810 31,150 130,727 172,933 (51% kernel)
PostgreSQL trunk 648,691 61,934 17,802 586,757 (90% kernel)

What we can see is that the PostgreSQL kernel size is actually smaller than any recent MySQL version (5.1 was slightly smaller). This is rather interesting as it is generally thought that PostgreSQL does more than MySQL. What’s more telling is that total code size, PostgreSQL is about half of MySQL 5.6 or MariaDB 5.5. Only Drizzle ends up being smaller, which makes sense as it “does less”.

Is MySQL bigger than Linux?

I’m going to take the numbers from my previous post, MySQL Modularity, Are We There Yet? for the “kernel” size of MySQL – that is, everything that isn’t a plugin or storage engine.

For Linux kernel, I’m just going to use the a-bit-old git tree I have on my laptop. I’ve decided that the following directories are for “plugins” drivers/ arch/ sound/ firmware/ crypto/ usr/ virt/ tools/ scripts/ fs/*/* and everything else is core kernel code.

Version Total LoC Total Plugin LoC Remaining (kernel)
MySQL 5.6.10 1,049,344 265,189 784,155 (74% kernel)
MariaDB 5.5 1,142,118 315,796 826,322 (72% kernel)
Linux 9,983,269 8,824,121 1,159,148 (11% kernel)

The scary thing is that it’s surprisingly close, MySQL/MariaDB core is roughly 68-71% the  size of the Linux kernel. This is probably an unfairly large number for Linux too as there’s much more of Linux that is pluggable and modular… so I actually suspect they’re closer to exactly the same size.

If we look at the net/ directory in linux, it’s a grand total of 493,000 lines of code, all of which is fairly modular and independent. You could, quite reasonably, claim that the core of Linux is in fact closer to half a million lines of code than a million, making MySQL significantly larger.

So how many engineers are looking after each code base? We know there are over a thousand Linux kernel developers contributing to each release (e.g. https://lwn.net/Articles/395961/ for data back in 2010, and https://lwn.net/Articles/537110/ for Feb 2013).

I’m now going to fudge some things to attempt to work out how many “developers” are working on linux core code rather than drivers and arch specific things. I work out there’s probably about 20-25% of linux developers who work on things that are not drivers, filesystems or arch code. This is around 250-300 developers for each kernel release.

So… how many people have ever committed code to MySQL? This is fairly easy to find out: I simply looked at the entire bzr history, grepped out every committer and then uniqued the list (this required more than just sort -u as people used different email addresses and names). How many people have ever committed code to MySQL (i.e. their code can be found in the MySQL 5.6 bzr tree)? 312.

How many committers to MySQL 5.6 are there? 161. This is pretty amazing, that’s about half of what the total is. However, this number is misleading. For example, my name is there and the last commit to the MySQL tree from me was in 2008. You also see names such as Monty Taylor and Kristian Nielsen – all three of us not having worked for MySQL/Sun/Oracle for a great number of years. At the very least, there’s been a lot of code integration into MySQL 5.6 from many existing sources that were not previously in MySQL trunk.

MySQL modularity, are we there yet?

MySQL is now over four times the size than it was with MySQL 3.23. This has not come in the shape of plugins.

Have we improved modularity over time? I decided to take LoC count for plugins and storage engines (in the case of Drizzle, memory, myisam and innobase are storage engines and everything else comes under plugin). I’ve excluded NDB from these numbers as it is rather massive and is pretty much still a separate thing.

Version Total LoC Plugin LoC Storage Engines LoC Remaining (kernel)
MySQL 3.23.58 371,987 0 (0%) 176,276 195,711 (52% kernel)
MySQL 5.1.68 721,331 228 237,124 483,979 (67% kernel)
MySQL 5.5.30 858,441 2,706 171,009 684,726 (79% kernel)
MySQL 5.6.10 1,049,344 29,122 236,067 784,155 (74% kernel)
MariaDB 5.5 1,142,118 11,781 304,015 826,322 (72% kernel)
Drizzle trunk 334,810 31,150 130,727 172,933 (51% kernel)

I’ve used the non-plugin and non-storage engine code size to be the database “kernel” – i.e. the core of the database server.

What I find really interesting here is that yes, the amount of code that is to some degree modular has increased. The amount of code that is a MySQL plugin is still very small compared to the server size

Drizzle is 20-25% of the size of a modern MySQL or MariaDB server and for many applications does largely or exactly the same thing.

Other MySQL branch code sizes

Continuing on from my previous posts, MySQL code size over releases and MariaDB code size I’ve decided to also look into some other code branches. I’ve used the same methodology as my previous few posts: sloccount for C and C++ code only.

There are also other branches around in pretty widespread use (if only within a single company). I grabbed the Google, Facebook and Twitter patches and examined them too, along with Percona Server 5.1 and 5.5.

Codebase LoC (C, C++) +/- from MySQL
Google v4 patch 5.0.37 970,110 +26,378 (from MySQL 5.0.37)
MySQL@Facebook 1,087,715 +15,768 (from MySQL 5.1.52)
Twitter 5.5.29.t10 1,192,718 +3,624
Percona Server 5.1 trunk 1,066,418 +14,878 (from MySQL 5.1.66)
Percona Server 5.5 trunk 1,208,577 +19,483 (from MySQL 5.5.29) +142,159 (from PS 5.1)
Drizzle trunk 334,810

The Google patch has always had a reputation of being large, and with an extra 26kLOC of code, it certainly is the biggest of any of the more current branches – and that’s actually a surprise to me that it adds this much code.

The Facebook and Percona Server 5.1 branches are amazingly similar in how much extra code they add, and they’re not carbon copies of each other. The Twitter patch quite notable for how little extra code it adds.

For giggles, I included Drizzle – which is (even with all the plugins) less than a third of the size of MySQL 5.1.

It’s clear that the Percona Server and Facebook patches introduce much less code than MariaDB does, which does go with the general wisdom of them being closer to Oracle MySQL than MariaDB is.

If we look at Percona Server, we see that with Percona Server 5.5 there is indeed a bunch more code than was in Percona Server 5.1, with roughly 5,000 more lines of code than we’d expect from a simple port from MySQL 5.1 to MySQL 5.5. This feels about right, we’ve added new things to Percona Server 5.5 that weren’t in Percona Server 5.1.

MariaDB code size

Continuing on from my previous post, MySQL code size over releases.

I wanted to look at the different branches/patch sets of MySQL out there and work out how far from upstream they deviated. I’m just going to compare against whatever upstream version the most easily accessible version is based on (be it 5.0.x, 5.1.x or whatever).

For MariaDB versions, I removed innodb_plugin and replaced it with xtradb for stats purposes as the MariaDB innodb_plugin is essentially the same as upstream and I don’t want to artificially inflate the diff size.

The first three major versions of MariaDB were all based on MySQL 5.1. I used sloccount and only counted C and C++ code.

So, let’s look at some of the MySQL patch sets/branches that are around. Firstly, let’s look at MariaDB:

Codebase LoC (C, C++) +/- from MySQL +/- from prev maj Version
MariaDB 5.1 1,210,168 +157,532 0
MariaDB 5.2 1,227,434 +174,798 +17,266 (since MariaDB 5.1)
MariaDB 5.3 1,264,995 +212,359 +37,561 (since MariaDB 5.2)
MariaDB 5.5 1,377,405 +187,658 (from MySQL 5.5) +112,410 (since MariaDB 5.3)

From my previous post on lines of code in MySQL versions, we learned that with MySQL 5.6 we saw a 354kLOC increase over MySQL 5.5. What is quite surprising is how close some of the MariaDB differences are to this. With MariaDB 5.5, we’re looking at a 187kLOC difference, which is roughly two thirds that of MySQL 5.6. What’s also interesting is that each incremental MariaDB release has not added nearly as much code as the MySQL 5.1 to 5.5 and 5.5 to 5.6 jumps did.

MariaDB LoC over major versions

The MariaDB code size has also been increasing, if we look at the graph above  you can really see the jump in code size over the past few releases.

If we look at the delta between MariaDB and MySQL, the first MariaDB release (MariaDB 5.1) was certainly a large jump. Each incremental MariaDB release (5.2 and 5.3) have been a smaller delta than the initial one. With MariaDB 5.5 we actually decrease the delta from MySQL, which is something that’s interesting to look at.

If we were going a straight port of MariaDB 5.3 to be based off MySQL 5.5, we’d expect the delta to be around 137kLOC (what MySQL 5.1 to 5.5 is) but it isn’t. The difference to MariaDB 5.5 from MariaDB 5.3 is only ~112kLOC, and the on the whole delta decreases.

But what makes up this big initial jump for MariaDB? Let’s look at some of the MariaDB 5.1 only modules and what’s left:

MariaDB 5.1 component LoC (MariaDB 5.1)
PBXT 45,107
FederatedX 3,076
IBM DB2i 13,486
Total 61,669
Other 95,863

So the MariaDB delta is not increase just because they included some existing modules, there’s more code in there, about as much as any major MySQL version bump.

Tomorrow we look at other MySQL branches, and we see that the MariaDB delta truly is significantly larger than any other MySQL branch.

MySQL code size over releases

As the start of a bit of a delve into the various MySQL branches and patch sets that have been around, let’s start looking at the history of MySQL itself. This is how big MySQL has been over all of the major releases since the beginning (where beginning=3.23). (edit: These numbers were all gathered using sloccount and only counting C++ and C source files.)

Codebase LoC (C, C++) +/- from previous MySQL
MySQL 3.23.58 371,987 0
MySQL 4.0.30 368,695 -3,292 (from MySQL 3.23)
MySQL 4.1.24 859,572 +490,877 (from MySQL 4.0)
+174,352 excluding NDB
MySQL 5.0.96 916,667 +57,095 (from MySQL 4.1)
MySQL 5.1.68 1,052,636 +135,969 (from MySQL 5.0)
MySQL 5.5.30 1,189,747 +137,111 (from MySQL 5.1)
MySQL 5.6.10 1,544,202 +354,455 (from MySQL 5.5)
increase in MySQL source code size over version

MySQL code size over major versions

Note the sharp increase in 5.6

LoC Delta for major MySQL release

We can see that MySQL has had some interesting code size changes over time, the big jump in 4.1 over 4.0 was mostly due to the introduction of MySQL Cluster, but even so, it was a big jump.

MySQL 5.6 is the largest MySQL code size increase in a MySQL version ever. The last time we saw anything like this was with the merging of MySQL Cluster in 4.1. At the very least, Oracle is paying people to write lines of code to extent that nobody has before.

unireg.h is finally gone

I got rid of unireg.cc way back in 2009 as I rewrote all the FRM related code inside Drizzle to instead use a nice protobuf based structure. If you’re wondering what was there, I just quote this part of pack_screens() from unireg.cc in MySQL 5.6:

start_row=4; end_row=22; cols=80; fields_on_screen=end_row+1-start_row;

We have gradually pulled things out of unireg.h over the years too. But, let’s go back to ask the question “What is UNIREG?”. To answer that, I’m going to quote from something that was current back when MySQL 3.22 was the latest and greatest:

In 1979, he developed an in-house database tool called UNIREG for managing databases. Since 1979, UNIREG has been rewritten in several different languages and extended to handle big databases.

No doubt the definition of big has changed for most people since then. If we take what the MySQL 3.23 manual said:

Unireg is our tty interface builder, but it uses a low level connection to our ISAM (which is used by MySQL) and because of this it is very quick. It has existed since 1979 (on Unix in C since ~1986).

This is where the FRM file comes from, and likely where unireg.cc and unireg.h in MySQL  originated from. Since we didn’t want to have the limitations of FRM files in Drizzle, and since all code that mentioned UNIREG was generally not modern, we’ve been gradually removing bits or refactoring it since.

So, what was in unireg.h to begin with? I peeked into the MySQl 5.6 unireg.h to remind myself:

  • defines for libc5
  • defines for the MySQL thing that isn’t gettext
  • the PLUGINDIR define
  • some macros for errors
  • a bunch of SPECIAL_ prefixed defines for a bitfield which is a truly odd thing.
  • store_record(), restore_record(), cmp_record(), empty_record() macros
  • defines for flags to openfrm(), openprt() and openfrd() (the latter two not being present in MySQL)
  • defines used in relation to INFORMATION_SCHEMA tables and triggers
  • the BIN_LOG_HEADER_SIZE define
  • the DEFAULT_KEY_CACHE_NAME define

Soo… really a whole bunch of stuff that should never have been put there in the first place as most of that has clearly come in after it was MySQL.

But I’m getting sidetracked, the main point was that I looked at what was remaining in unireg.h inside Drizzle and it was just one thing: Code to abort the server in certain odd situations (that we’d completely rewritten so had no relation to unireg). So, I renamed it.

Renaming a file is a pretty minor change, but that hides all the work leading up to that point so that now I can go and explain Drizzle internals while making one less reference to UNIREG.

Now to go and remove the last tiny bit of unireg_check…

Fun with Coverity found bugs (Episode 1)

Taking the inspiration of  great series of blog posts “Fun with Bugs” (and not http://funwithbugs.com/ which is about both caring for and eating bugs), and since I recently went and run Coverity against Drizzle, I thought I’d have a small series of posts on bugs that it has found (and I’ve fixed).

An idea that has been pervasive in the Drizzle project (and one that I rather subscribe to) is that there is two types of correct: correct and obviously correct. Being obviously correct is much, much better than merely being correct.

The first category of problems that Coverity found was kind of interesting, there was a warning that data_file_name and index_file_name in class ha_myisam weren’t initialized in the ha_myisam constructor nor in any function that it calls. It turns out that this was basically because the code wasn’t exactly optimal, and these variables were used kind of oddly. In fact, in writing this blog post I went back and found that there’s a bunch of extra dead code and these should just be removed, along with the code that “used” them.

The historical use for data_file_name and index_file_name were that (in MySQL) you could specify different paths for MyISAM data and index files, so that the FRM ended up in the server datadir, the data file ended up some other place and the index file was off behind the sofa. Since MyISAM is used only for temporary tables in Drizzle, this is entirely not needed.

Another place where a similar bug was found by Coverity was in the SQLExecutor class of the json_server plugin. The _err variable wasn’t initialized in the constructor. After some careful auditing, I think this was actually a false positive as it was set to something before being used, but it was pretty simple to prevent future bugs by initializing it.

Two instances of the same warning, one just found a bunch of code to delete (rather useful) and the other is rather minor but may help someone in the future.

Coming up next: total embarrassment bugs.

Coverity scan for Drizzle

Coverity is a static analysis tool, which although proprietary itself does offer a free scanning service for free and open source software (which is great by the way, I totally owe people who make that happen a frosty beverage).

Prompted by someone volunteering to get MariaDB into the Coverity Scan, I realized that I hadn’t actually followed through with this for Drizzle. So, I went and submitted Drizzle. As a quick overview, this is the number of problems of each severity both projects got back:

Severity MariaDB Drizzle
High 178 96
Medium 1020 495
Low 47 52

I don’t know what MySQL may be, but it’d be great to see this out in the open too.

Impact of MySQL slow query log

So, what impact does enabling the slow query log have on MySQL?

I decided to run some numbers. I’m using my laptop, as we all know the currently most-deployed database servers have mulitple cores, SSDs and many GB of RAM. For the curious: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz

The benchmark is going to be:
mysqlslap -u root test -S var/tmp/mysqld.1.sock -q 'select 1;' --number-of-queries=1000000 --concurrency=64 --create-schema=test

Which is pretty much “run a whole bunch of nothing, excluding all the overhead of storage engines, optimizer… and focus on logging”.

My first run was going to be with the slow query log on. I’ll start the server with mysql-test-run.pl as it’s just easy:
eatmydata ./mysql-test-run.pl --start-and-exit --mysqld=--slow-query-log --mysqld=--long-query-time=0

The results? It took 18 seconds.

How long without the slow query log (starting with mysql-test-run.pl again, but this time without any of the extra mysqld options)? 13 seconds.

How does this compare to a Drizzle baseline? On a freshly build Drizzle trunk, using the same mysqlslap binary I used above, connecting via UNIX socket: 8 seconds.

New Jenkins Bazaar plugin release

I’ve just uploaded version 1.20 of the Bazaar plugin for Jenkins. This release is based on feedback from users and our experiences at Percona.

  • Do a lightweight checkout instead of a heavyweight checkout (if “Checkout” is enabled)
  • Fix bug: lightweight checkout “update” would always fail as bzr update didn’t accept a repository argument. Switch to using bzr update followed by bzr switch. This should massively improve performance for those not doing a full branch.
  • Remove “Clean Branch” advanced option (replaced with “Clean Tree” option)
  • Add a “Clean Tree” advanced option. This will run “bzr clean-tree –quiet –ignored –unknown –detritus”, preserving the .bzr directory but doing the equivalent of wiping the workspace (starting with a fresh slate). This should massively improve performance for projects that do not have a clean build.
  • Clarify that Loggerhead is the repository browser used by Launchpad, and have a complete example of how to configure it.

Finding out What’s Next at BarCampMel 2012 with Drizzle, SQL, JavaScript and a web browser

Just for the pure insane fun of it, I accepted the challenge of “what can you do with the text format of the schedule?” for BarCampMel. I’m a database guy, so I wanted to load it into a database (which would be Drizzle), and I wanted it to be easy to keep it up to date (this is an unconference after all).

So… the text file itself isn’t in any standard format, so I’d have to parse it. I’m lazy and didn’t want to leave the comfort of the database. Luckily, inside Drizzle, we have a js plugin that lets you execute arbitrary JavaScript. Parsing solved. I needed to get the program and luckily we have the http_functions plugin that uses libcurl to allow us to perform HTTP GET requests. I also wanted it in a table so I could query it when not online, so I needed to load the data. Luckily, in Drizzle we have the built in EXECUTE functionality, so I could just use the JavaScript to parse the response from the HTTP GET request and construct SQL to load the data into a table to then query.

So, grab your Drizzle server with “plugin-add=js” and “plugin-add=http_functions” in the config file or as options to drizzled (prefixed with –) and….

This simple one liner pulls the current schedule and puts it into a table called ‘schedule’:

SELECT EXECUTE(JS("function sql_quote(s) {return s ? '\"'+ s.replace('\"', '\\\"') + '\"' : 'NULL'} function DrizzleDateString(d) { function pad(n) { return n<10 ? '0'+n : n } return d.getFullYear()+'-'+pad(d.getMonth()+1)+'-'+pad(d.getDate())+' '+pad(d.getHours())+':'+pad(d.getMinutes())+':'+pad(d.getSeconds()) } var sql = 'COMMIT;CREATE TABLE IF NOT EXISTS schedule (start_time datetime, stage varchar(1000), mr2 varchar(1000), mr1 varchar(1000), duration int); begin; delete from schedule;' ; var time= new Date;var input= arguments[0].split(\"\\n\"); var entry = new Array(); var stage, mr2, mr1; for(var i=0; i < input.length; i++) { var p= input[i].match('^(.*?) (.*)$'); if(p) {if(p[1]=='Time') { time=new Date(Date.parse(p[2]));} if(p[1]=='Duration') { sql+='INSERT INTO schedule (start_time,stage,mr2,mr1,duration) VALUES (\"' + DrizzleDateString(time) + '\", ' + sql_quote(stage) + ', ' + sql_quote(mr2) + ',' + sql_quote(mr1) + ',' + p[2] + '); '; time= new Date(time.getTime()+p[2]*60*1000); stage= mr2= mr1= ''; } if(p[1]=='stage') {stage=p[2]} if (p[1]=='mr2') {mr2=p[2]} if (p[1]=='mr1') {mr1=p[2]} }}; sql+='COMMIT;'; sql", (select http_get('https://dl.dropbox.com/s/01yh7ji7pswjwwk/live-schedule.txt?dl=1'))));

Which you can then find out “what’s on now and coming up” with this query:

select * from schedule where start_time > DATE_ADD(now(), INTERVAL 9 HOUR) ORDER BY start_time limit 2\G
But it’s totally not fun having to jump to the command line all the time, and you may want it in JSON format for consuming with some web thing…. so you can load the json_server plugin and browse to the port that it’s running on (default 8086) and type the SQL in there and get a JSON response, or just look at the pretty table there.

Jenkins Bazaar plugin 1.19

I recently released a new version of the Bazaar plugin for Jenkins. This release was inspired by a problem we noticed at Percona. It is:

  • run “bzr revert” after a pull, as if you have a directory that is removed and re-added while having unknown files in said directory (e.g. build artifacts), you would end up in a very bad place (this is a BZR bug, so we work-around it with a “bzr revert”).

The update has already appeared in the Jenkins update centre, so you should already be able to upgrade to it.

New Jenkins Bazaar plugin release! 1.18

From the desk of your new Bazaar plugin for Jenkins maintainer, I give you Version 1.18.

This release has two good bug fixes:

  • UI fix for checkout option (JENKINS-12261)
  • Auto-recover from corrupt BZR branches (e.g. bzr branch/checkout killed at inopportune moment) by cleaning the workspace and trying again (this is now default behaviour, best used with the Jenkins SCM retry count feature being > 1)

We’ve been running the same code as this release at Percona for about 2 months now (the second bugfix was one I wanted to test first before submitting upstream). This is the big fix that fixed all our problems with using bazaar with Jenkins in a large deployment.

The other news? I’m now maintainer, and this is my first release.

The page on the Jenkins wiki is here:

and updates should come through the standard Jenkins channels as all the auto-foo happens.

Hacking the Jenkins BZR plugin

For Drizzle and for all of the projects we work on at Percona we use the Bazaar revision control system (largely because it’s what we were using at MySQL and it’s what MySQL still uses). We also use Jenkins.

We have a lot of jobs in our Jenkins. A lot. We build upstream MySQL 5.1, 5.5 and 5.6, Percona Server 5.1, Percona Server 5.5, XtraBackup 1.6, 2.0 and 2.1. For each of these we also have the normal trunk builds as well as parameterised ones that allow a developer to test out a tree before they ask for it to be merged. We also have each of these products across seven operating systems and for each of those both x86 32bit and 64bit. If we weren’t already in the hundreds of jobs, we certainly are once you multiply out between release and debug and XtraBackup being across so many MySQL and Percona Server versions.

I honestly would not be surprised if we had the most jobs of any user of the Bazaar plugin to Jenkins, and we’re probably amongst the top few of all Jenkins installations.

So, in August last year we discovered a file descriptor leak in the Bazaar plugin. Basically, garbage collection doesn’t get kicked off when you run out of file descriptors. This prevented us from even starting back up Jenkins until I found and fixed the bug. Good times.

We later hit a bug that was triggered in the parallel loading of jobs during startup. We could get stuck in an infinite loop during Jenkins starting that would just eat CPU and get nowhere. Luckily Jenkins provides a workaround: specify “-Djenkins.model.Jenkins.parallelLoad=false” as an argument and it just does it single threaded. For us, this solves that problem.

We were also hitting another problem. If you kill bzr at just the wrong time, you can leave the repository in not an entirely happy state. An initial branch can be killed at a time where it’ll think it’s a repository rather than a checkout and there’s a bunch of other weirdness (including file system corruption if you happen to use bad VM software).

The way we were solving this was to sometimes go and “clean workspace” on the jobs that needed it (annoying with matrix builds). We’d switched to just doing “clean tree” for a bunch of builds. The problem with doing a clean tree was that “bzr branch” to check out the source code could take a very long time – especially for Percona Server which is a branch of MySQL and hence has hundreds of megabytes of history.

We couldn’t use bzr shared repositories as we kept hitting concurrency bugs when more than one jenkins job was trying to do a bzr operation at the same time (common when matrix builds kick off builds for release and debug for example).

So.. I fixed that in the Jenkins bazaar plugin too (which should be in an upcoming release) and we’ve been running it on our Jenkins instance for the past ~2 months.

Basically, if we fail to check out the Bazaar tree, we wipe it clean and try again (Jenkins has a “retry count” for source checkouts). This is a really awesome form of self healing. Even if the bazaar team fixed all the bugs, we’d still have to go and get that new version of bzr on all our build machines – including ancient systems such as CentOS 5. Not as much fun as bashing your head into a vice.

After all of that, I seem to now be the maintainer of the Bazaar plugin for Jenkins as Monty pointed out I was using it a lot more than him and kept finding and fixing bugs.

Soooo… say hello to the new Jenkins Bazaar plugin maintainer, me.

Yes, I maintain Java code now. Be afraid. Be very afraid.

ZFS: could have been the future of UNIX Filesystems

There was a point a few years ago where Sun could have had the next generation UNIX filesystem. It was in Solaris (and people were excited), there was a port to MacOS X (that was quite exciting for people) and there was a couple of ways to run it on linux (and people were excited). So… instead of the fractured landscape of ext3, HFS+ and (the various variations of) UFS we could have had one file system that was common between all of the commonly used UNIX-like variants. Think of being able to use a file system on a removable drive that isn’t FAT and being able to take it from machine to machine (well… Windows would be a problem, but it always is).

There was some really great work done in OpenSolaris with integration between the file manager and ZFS snapshots (a slider bar to browse the history of a directory, an idea I’ve championed for over a decade now, although the Sun implementation was likely completely independently developed). The integration with the package manager was also completely awesome, crash safe upgrades!

However, all this is pretty much moot. Solaris is used by fewer people than ever, it’s out of OS X and BTRFS is going to take the place that ZFS could have held in the Linux world. So, unfortunately, ZFS is essentially dead. This is a shame…. it could have been something huge.

There is a story….

I have a friend who is fond of telling a story from way back in November 2008 at the OpenSQL camp in Charlottesville, Virgina. This was relatively shortly after we had announced to the public that we’d started something called Drizzle (we did that at OSCON) and was even closer to the date I started working on Drizzle full time (which was November 1st). Compared to what it is now, the Drizzle code base was in its infancy. One of the things we hadn’t yet sorted out was the rewrite of the replication code.

So, I had my laptop plugged into a projector, and somebody suggested opening up some random source file… so I did. It was a bit of the replication code that we’d inherited from MySQL. Immediately we spotted a bug. In fact, between myself and Brian I think we worked out that none of the error handling in this code path ever even remotely worked.

Fast forward a bunch of years, and recently I had open part of the replication code in MySQL 5.5 and (again) instantly spotted a bug. Well.. the code is correct in 2 out of 3 situations…

It is rather impressive that the MySQL Replication team has managed to add the features they have in MySQL 5.6.

I’m also really happy with what we managed to do inside Drizzle for replication. Ripping out all the MySQL legacy code was a big step to take, and for a while it seemed like possibly the wrong one  – but ultimately, it was incredibly the right thing to do. I love going and looking at the Drizzle replication code. I simply love it.

Sessions at the Percona Live MySQL Conference that interest me

For the past many years, there’s been a conference in April, at the Santa Clara Convention Centre where the topic has been MySQL and the surrounding ecosystem. The first year I went, I gave a talk on the new features in MySQL Cluster 5.1 to a overflowing room of attendees. For me, it’s an event that’s mixed with speaking about something I’ve been working on and talking to other attendees about everything from how a particular part of the server works to where we can escape to for nearby good vegan food.

So, I thought I’d share some of the sessions that I’m really looking forward to. My selection is probably atypical, but may be interesting to others. I’m not going to list the keynotes, although they are often of a lot of value. I’m also going to attempt to avoid listing a few really awesome well known speakers simply because there are other really interesting sessions that also need exposure!

  • Starring Sakila: Building Data Warehouses and BI solutions using MySQL and Pentaho
    I need to base decisions off data, not simply a gut feeling (I’m not Stephen Colbert after all). I ran into a bunch of stumbling blocks when trying to work with Pentaho a couple of weeks ago, and I’m really hoping that this session shines some light on how to use it to better and more easily make arguments based on evidence to others in the company.
  • Testing MySQL Databases: The State Of The Art
    I’ve worked with Patrick for several years now, and he’s currently a valuable member of my team at Percona. For those who are interested in the state of the art of open source database testing, this is the session to be in.
  • Getting InnoDB Compression Ready for Facebook Scale
    This session is on at the same time as I’m speaking, so I probably won’t be able to attend (people keep coming to my sessions so I usually can’t sneak out). I’m really interested in how they’ve modified the compression code to help with their (large) workload.
  • Backing Up Facebook
    I hear that Facebook has a couple of database servers, a few dozen users and a few floppy disks full of data. This should be a fun story :)
  • Introducing XtraBackup Manager
    Being responsible for XtraBackup development at Percona, the XtraBackup topics really interest me. Lachlan has been working on a simple backup manager for XtraBackup to help create something that is a more complete backup solution than a tool which simply creates a backup.
  • Extending Xtrabackup – A Point-In-Time System
    Another good case of using XtraBackup as part of a comprehensive backup strategy. I have to be honest, I’m looking for ways in which we can improve XtraBackup to better fit the needs of people. It may be that there are a few small things we can do to make it easier for people do deploy and use.
  • Getting Started with Drizzle 7.1
    We’re about to do the 7.1 release of Drizzle! If you’re interested in having a SQL database that is designed to be used in large scale web applications and cloud environments, come along to this talk.
  • MySQL Idiosyncrasies That Bite
    I have to admit, I’m interested in Ronalds talk here to basically ensure we didn’t miss fixing anything in Drizzle. I do promise not to at any point yell out “Fixed in Drizzle” though.

Go here to register: http://www.percona.com/live/mysql-conference-2012/ (early bird pricing and discounted hotel rooms end March 12th, so you want to register sooner rather than later).

Drizzle Day 2012

Henrik has already posted it over on the Drizzle Blog, but I thought I’d give a shout out here too.

We’re holding a Drizzle Day right after the Percona Live MySQL Conference and Expo in April. So, since you’re all like me and don’t book your travel this far in advance, it’ll be easy to stay for the extra day and come and learn awesome things about Drizzle.

I’m also pretty glad that my employer, Percona is sponsoring the event.