No implicit commit (on the road to transactional DDL)

A long time ago, in a time that can only serve to make some feel old and others older, MySQL didn’t support transactions. Each statement was executed as it went, there was no ROLLBACK (or COMMIT or crash recovery etc). Then there were transactions. Other RDBMSs implement auto_commit functionality, but for MySQL users, we think of it as the magic compatibility mode that (mostly) makes applications written for MyISAM magically work on InnoDB (okay, and making “you should use transactions” a really easy consulting gig :)

I’m currently working on finishing up a patch that removes the implicit COMMIT from DDL operations in Drizzle. Instead, you get an error message saying that Transactional DDL is not currently supported. I see a future where we have one of two situations (possibly depending on the storage engine): support DDL within normal transactions, DDL only transactions (cannot mix with DML). The latter (DDL only transactions) I see as the option for InnoDB/HailDB.

Is your Storage Engine buggy or the database server?

If your storage engine returns an error from rnd_init (or doStartTableScan as it’s named in Drizzle) and does not save this error and return it in any subsequent calls to rnd_next, your engine is buggy. Namely it is buggy in that a) an error may not be reported back to the user and b) everything may explode horribly when rnd_next is called after rnd_init returned an error.

Unless it is running on MariaDB 5.2 or (soon, when the patch hits the tree) Drizzle.

Monty (Widenius, not Taylor) wrote a patch for MariaDB based on my bug report that addressed that problem. It uses the compiler feature to throw a warning if the result of a function isn’t checked to make sure that all places that call rnd_init are checking for an error from the engine.

Today I (finally) pulled that into Drizzle as well.

So… if your engine does the logical thing and goes “oh look, this method returns an error… I’ll return my error” it will exhibit bugs in MySQL but not MariaDB 5.2 or Drizzle (when patch hits).

Which is buggy, the server or the engine?

The MySQL bug number is 54166, filed in June 2010.

MySQL 5.5 is GA and 5.5.8 missing from launchpad…

While it’s great that MySQL 5.5 is GA with the 5.5.8 release (you can download it here), I’m rather disappointed that the bzr repositories on launchpad aren’t being kept up to date. At time of writing, it looked like this:

Yep – nothing for five weeks in the 5.5 repo – nothing since the 5.5.7 release :(

There hasn’t been zero changes either – the changelog has a decent number of fixes.

Limiting functions to 32k stack in Drizzle (and scoped_ptr)

I wonder if this comes under “Code Style” or not…

Anyway, Monty and I finished getting Drizzle ready for adding “-Wframe-larger-than=32768” as a standard compiler flag. This means that no function within the Drizzle source tree can use greater than 32kb stack – it’s a compiler warning – and with -Werror, it means that it’s a build error.

GCC is not perfect at detecting stack usage, but it’s pretty good.

Why have we done this?

Well, there is a little bit of recursion in the server… and we can craft queries to blow a small stack (not so good). On MacOS X, the default thread stack size is only 512kb. This gives not many frames if 32kb stack is a even remotely common.

I found some interesting places to throw a lot of things on the stack too – that would be rather far down on a callchain – leading to the possibility of blowing up in really strange ways.

We’d love to make it 16kb…. but that’s a fair bit more work, so something for the future.

We’ve used the Boost scoped_ptr to address a bunch of these situations as it provides pretty much minimal code change for the same effect (except that memory is dynamically allocated instead of as part of the stack frame).

Cursor states

Following on from my post yesterday on the various states of a Storage Engine, I said I’d have a go with the Cursor object too. A Cursor is used by the Drizzle kernel to get and set data in a table. There can be more than one cursor open at once, and more than one per thread. If your engine cannot cope with this, it is its responsibility to figure it out and return the appropriate errors.

Let’s look at a really simple operation, inserting a couple of rows and then reading them back via a full table scan.

Now, this graph is slightly incomplete as there is no doEndTableScan() call. But you can see in which order things are meant to happen. In this case, “store_lock()” means that store_lock() has been called, so when coming back from doInsertRecord() we do not call store_lock() again, rather, we’re just in a state where it has already been executed.

For MySQL handler, think ::write_row() for doInsertRecord() and ::rnd_init() for doStartTableScan().

This diagram was again auto-generated from my test engine.

Storage Engine API state graph

Drizzle still has a number of quirks inherited from the MySQL Storage Engine API (e.g. BLOBs, row buffer, CREATE SELECT and lack of DDL transaction boundaries, key tuple format). One of the things we fixed a long time ago was to have proper methods for StorageEngines to be called for: startTransaction, startStatement, endStatement, commit and rollback.

If you’ve had to implement a transactional storage engine in MySQL you will be well aware of the pattern of “in every Storage Engine/handler call: if transaction doesn’t exist, begin.” We’ve tried to fix this in the Drizzle API for a number of reasons. I think having this obvious set of calls will make the API a lot easier to understand. I am also very interested in making things much easier to prove correct.

A while ago I spotted Bug 587772, which was the READ COMMITTED isolation level not working correctly with InnoDB. It turns out that the most basic example for READ COMMITTED failed. Hrrm… this is no good. It worked on MySQL, so this was certainly something that we broke. What was more worrying is that there wasn’t a test for this in the test suite (and at the time I couldn’t find one in the MySQL test suite either, so I think we inherited the missing test).

I recently started delving in, actually going to solve this. I noticed something worrying, endStatement wasn’t being called, which is where the innobase plugin would release the read view that it used for the statement. You’d think that it would grab a new one on startStatement, but because of the previous design of the API (remember “if txn isn’t started, start it!”) this also happened for getting the read view for the statement… so we instead got a REPEATABLE READ isolation level.

I wanted a test.

Previously, I’ve created a dummy storage engine (tableprototester) and used it to test the server code for reading the table protobuf message. I thought about doing a Storage Engine for this problem too, basically looking at the calls to the Storage Engine as transitions between states in a state machine.

A basic view of a transaction could be:

State transitions for a transaction. Transaction can be empty OR have one or more statementsThat is, a transaction starts and has zero or more statements before it commits or gets rolled back.

By coding up a data structure of allowable state transitions, a small function to assert() on invalid transitions and enough of the boilerplate to make the engine “work”, I was able to hit an assert() exactly where I’d expected it: at an invalid transition from START STATEMENT to COMMIT.

To fix the initial bug (READ COMMITTED not working), I filled in a few state transitions for the system as a whole that aren’t quite correct. From the diagram below, you can quite obviously see where the obvious bugs are (it helps that I’ve coloured them red):

There is absolutely no sense in going BEGIN -> END STATEMENT or immediately to COMMIT. These should be relatively easy to solve too, but are separate bugs.

I wish to expand this in the future to cover Cursor as well. It will also be useful to ensure that DDL can be wrapped in transactions. Not to mention the last few HTON flags that exist (and should likely go away).

To generate the diagrams, I just wrote a little utility to dump out the state transitions in dot, using it to generate the diagrams.

HailDB being built by default in Drizzle

It just it trunk – if you have HailDB installed when you build Drizzle, you will now get the HailDB plugin built. You can even run Drizzle with it (remove innobase plugin, load HailDB plugin). Previously, we had problems building both due to symbol conflicts between innobase and HailDB. We’ve fixed this thanks to the linker.

So, enjoy HailDB… welll, test it and report bugs that I can fix :)

Second Drizzle Beta (and InnoDB update)

We just released the latest Drizzle tarball (2010-10-11 milestone). There are a whole bunch of bug fixes, but there are two things that are interesting from a storage engine point of view:

  • The Innobase plugin is now based on innodb_plugin 1.0.6
  • The embedded_innodb engine is now named HailDB and requires HailDB, it can no longer be built with embedded_innodb.

Those of you following Drizzle fairly closely have probably noticed that we’ve lagged behind in InnoDB versions. I’m actively working on fixing that – both for the innobase plugin and for the HailDB library.

If building the HailDB plugin (which is planned to replace the innobase plugin), you’ll need the latest HailDB release (which as of writing is 2.3.1). We’re making good additions to the HailDB API to enable the storage engine to have the same features as the Innobase plugin.

Drizzle7 Beta!

Just in case you missed it, I’m rather thrilled that our latest tarball of Drizzle is named Beta. Specifically, we’re calling it Drizzle7. Seven is a very nice number, and it seems rather appropriate.

This release is for a stand alone database server. A lot of the infrastructure for replication is there (with testing), but the big thing we want to hammer on and get perfect here is Drizzle7 as a stand alone database server.

Can I trust it? If you trust InnoDB to store your data, then yes, you can trust Drizzle (it uses InnoDB too)

Warnings are now actual problems

Yesterday, I reached a happy milestone in HailDB development. All compiler warnings left in the api/ directory (the public interface to the database engine) are now either probable/possible bugs (that we need to look at closely) or are warnings due to unfinished code (that we should finish).

There’s still a bunch of compiler warnings that we’ve inherited (HailDB compiles with lots of warnings enabled) that we have to get through, but a lot will wait until after we update the core to be based on InnoDB 1.1.

HOWTO screw up launching a free software project

Josh Berkus gave a great talk at linux.conf.au 2010 (the CFP for linux.conf.au 2011 is open until August 7th) entitled “How to destroy your community” (lwn coverage). It was a simple, patented, 10 step program, finely homed over time to have maximum effect. Each step is simple and we can all name a dozen companies that have done at least three of them.

Simon Phipps this past week at OSCON talked about Open Source Continuity in practice – specifically mentioning some open source software projects that were at Sun but have since been abandoned by Oracle and different strategies you can put in place to ensure your software survives, and check lists for software you use to see if it will survive.

So what can you do to not destroy your community, but ensure you never get one to begin with?

Similar to destroying your community, you can just make it hard: “#1 is to make the project depend as much as possible on difficult tools.

#1 A Contributor License Agreement and Copyright Assignment.

If you happen to be in the unfortunate situation of being employed, this means you get to talk to lawyers. While your employer may well have an excellent Open Source Contribution Policy that lets you hack on GPL software on nights and weekends without a problem – if you’re handing over all the rights to another company – there gets to be lawyer time.

Your 1hr of contribution has now just ballooned. You’re going to use up resources of your employer (hey, lawyers are not cheap), it’s going to suck up your work time talking to them, and if you can get away from this in under several hours over a few weeks, you’re doing amazingly well – especially if you work for a large company.

If you are the kind of person with strong moral convictions, this is a non-starter. It is completely valid to not want to waste your employers’ time and money for a weekend project.

People scratching their own itch, however small is how free software gets to be so awesome.

I think we got this almost right with OpenStack. If you compare the agreement to the Apache License, there’s so much common wording it ends up pretty much saying that you agree you are able to submit things to the project under the Apache license.  This (of course) makes the entire thing pretty redundant as if people are going to be dishonest about submitting things under the Apache licnese there’s no reason they’re not going to be dishonest and sign this too.

You could also never make it about people – just make it about your company.

#2 Make it all about the company, and never about the project

People are not going to show up, do free work for you to make your company big, huge and yourself rich.

People are self serving. They see software they want only a few patches away, they see software that serves their company only a few patches away. They see software that is an excellent starting point for something totally different.

I’m not sure why this is down at number three… it’s possibly the biggest one for danger signs that you’re going to destroy something that doesn’t even yet exist…

#3 Open Core

This pretty much automatically means that you’re not going to accept certain patches for reasons of increasing your own company’s short term profit. i.e. software is no longer judged on technical merits, but rather political ones.

There is enough politics in free software as it is, creating more is not a feature.

So when people ask me about how I think the OpenStack launch went, I really want people to know how amazing it can be to just not fuck it up to begin with. Initial damage is very, very hard to ever undo. The number of Open Source software projects originally coming out of a company that are long running, have a wide variety of contributors and survive the original company are much smaller than you think.

PostgreSQL has survived many companies coming and going around it, and is stronger than ever. MySQL only has a developer community around it almost in spite of the companies that have shepherded the project. With Drizzle I think we’ve been doing okay – I think we need to work on some things, but they’re more generic to teams of people working on software in general rather than anything to do with a company.

A tale of a bug…

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

This happened within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug 45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2

PBMS in Drizzle

Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)

One last bit of evil….

You can store things for later!
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= malloc(1000); return snprintf(s, 100, \"%p\", a); }") as RESULT;
+-----------+
| RESULT    |
+-----------+
| 0x199c610 |
+-----------+
1 row in set (0 sec)
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x199c610; strcpy(a, \"Hello World!\"); strcpy(s,\"done\"); return strlen(s); }") as result;
+--------+
| result |
+--------+
| done   |
+--------+
1 row in set (0.01 sec)
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x199c610; strcpy(s, a); return strlen(s); }") as result;
+--------------+
| result       |
+--------------+
| Hello World! |
+--------------+
1 row in set (0.01 sec)
And then… i can disconnect, reconnect, or whatever (as for any of the above really) before cleaning up my memory:
drizzle> select libtcc("#include <string.h>\n#include <stdlib.h>\nint foo(char* s) { char *a= 0x19a9bc0; free(a); strcpy(s,\"done\"); return strlen(s); }") as result;
+--------+
| result |
+--------+
| done   |
+--------+
1 row in set (0 sec)

A MD5 stored procedure for Drizzle… in C

So, just in case that wasn’t evil enough for you… perhaps you have something you want to know the MD5 checksum of. So, you could just do this:

drizzle> select md5('Hello World!');
+----------------------------------+
| md5('Hello World!')              |
+----------------------------------+
| ed076287532e86365e841e92bfc50d8c |
+----------------------------------+
1 row in set (0 sec)

But that is soooo boring.

Since we have the SSL libs already loaded into Drizzle, and using my very evil libtcc plugin… we could just implement it in C. We can even use malloc!

drizzle> SELECT LIBTCC("#include <string.h>\n#include <stdlib.h>\n#include <openssl/md5.h>\nint foo(char* s) { char *a = malloc(100); MD5_CTX context; unsigned char digest[16]; MD5_Init(&context); strcpy(a,\"Hello World!\"); MD5_Update(&context, a, strlen(a)); MD5_Final(digest, &context); snprintf(s, 33, \"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x\", digest[0], digest[1], digest[2], digest[3],digest[4], digest[5], digest[6], digest[7],digest[8], digest[9], digest[10], digest[11],digest[12], digest[13], digest[14], digest[15]); free(a); return 32; }") AS RESULT;

+----------------------------------+
| RESULT                           |
+----------------------------------+
| ed076287532e86365e841e92bfc50d8c | 
+----------------------------------+
1 row in set (0.01 sec)

Currently the parameter is static in the C version due to me not having… well.. done a good job implementing the calling of C code.

Stored Procedures/Functions for Drizzle

Previously, in “Thoughts on Thoughts on Drizzle” I theorized that one of the major reasons why we did not see lots of people jumping at stored procedures in MySQL was that it wasn’t in their native language (for lack of a better term). We’ve seen External Language Stored Procedures for MySQL that let you write stored procedures in some other languages…. but I felt something was missing.

Firstly, I wanted a language I was really familiar with and comfortable writing complex things in.

Secondly, it should be compiled so that it runs as fast as possible.

Thirdly, it shouldn’t just be linking to a pre-compiled library (drizzle function plugins do that already)

So… the obvious choice was C.

I have a really, really, really early prototype:

drizzle> SELECT LIBTCC("int foo(char* s) { s[0]='4'; s[1]='2'; s[2]=0; return 2; }") AS RESULT;

+--------+
| RESULT |
+--------+
| 42     |
+--------+
1 row in set (0 sec)

or… a bit more sophisticated:

drizzle> SELECT LIBTCC("#include <string.h>\nint foo(char* s) { strcpy(s,\"Hello World!\");; return strlen(s); }") AS RESULT;

+--------------+
| RESULT       |
+--------------+
| Hello World! |
+--------------+
1 row in set (0 sec)

I’m using a function as a bit of a cheat… but the string is passed to libtcc (modified so it’s a shared library so I can load it into drizzle), where it is compiled into native object code (in my case x86-64) and then run.

With the right bits of foo… I could allow calling of all sorts of server functions…. such as those to execute SQL inside the current transaction context.

There are a number of reasons why this is Pure Evil(TM):

  • It executes inside the address space of your database server
    one null pointer dereference and your database server is all gone.
  • It’s arbitrary code injection by design
    Exactly how insane are you? Security–;
  • While great for me and my C hacking friends, possibly not for web app developers, who likely aren’t writing their apps in C every day.
  • See the first reason. Is that not reason enough? Memory protection is a good thing yo.

Anyway, you can see the code up on launchpad in the drizzle-libtcc-function branch. You’ll need to modify your tcc source so that the Makefile snippet for libtcc.o looks like this:

# libtcc generation and test
libtcc.o: $(NATIVE_FILES)
        $(CC) -fPIC -o $@ -c libtcc.c $(NATIVE_TARGET) $(CFLAGS)

libtcc.a: libtcc.o
        $(AR) rcs $@ $^

libtcc.so: libtcc.o
        $(CC) -shared -Wl,-soname,libtcc.so.1 -o $@ libtcc.o

Shocked and Stunned (that code exists and does work)

#define READ_ALL		1	/* openfrm: Read all parameters */
#define EXTRA_RECORD		8	/* Reservera plats f|r extra record */

and later on….

  if (prgflag & (READ_ALL+EXTRA_RECORD))
    records++;

Feel free to think about that for a second.

(I have an urge to add this to questions asked in a job interview…)