Embedded InnoDB: querying the configuration

I am rather excited about being able to do awesome things such as this to get the current configuration of your server:

drizzle> SELECT NAME,VALUE 
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION
    ->  WHERE NAME IN ("data_file_path", "data_home_dir");
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| data_file_path | NULL  | 
| data_home_dir  | ./    | 
+----------------+-------+
2 rows in set (0 sec)

drizzle> SELECT NAME,VALUE
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION
    -> WHERE NAME IN ("data_file_path", "data_home_dir");
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| data_file_path | NULL  | 
| data_home_dir  | ./    | 
+----------------+-------+
2 rows in set (0 sec)

drizzle> SELECT NAME,VALUE 
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION 
    -> WHERE NAME = "io_capacity";
+-------------+-------+
| NAME        | VALUE |
+-------------+-------+
| io_capacity | 200   | 
+-------------+-------+
1 row in set (0 sec)

Coming soon: status in a table.

(this is for the upcoming embedded_innodb plugin, which using the API provided by Embedded InnoDB to implement a Storage Engine for Drizzle)

Thoughts on Thoughts on Drizzle :)

Mark has some good thoughts on drizzle. I think they’re all valid… and have some extra thoughts too:

“I have problems to solve today”. This is (of course) an active concern in my brain… If we don’t have something out that solves some set of problems with reasonable stability and reliability (and soon), then we are failing. I feel we’re getting there, and will have a solid foundation to build upon.

Drizzle replication, MySQL replication: “I can’t compare the two until Drizzle replication is running in production.“. Completely agree. We need to only say replication is stable and reliable when it really is. Realistic test suites are needed. Very defensive programming of the replication system is needed (you want to know when something has gone wrong). We also need to have it constantly be verifying the right thing is going on. We want our problems to be user visible, not silent and invisible. Having high standards will hopefully pay off when people start running it in production….

3 byte int: “Does this mean that some of my tables will grow from 3GB to 4GB on disk?” I think we’re moving the responsibility down to the engines. The 3 byte int type says two things: use less storage space, limit the maximum value. Often you want the former, not the latter. There are many ways to more efficiently pack integers for storage when they are usually smaller than the maximum you want. The protobuf library does a good job of it.

I think it is the job of storage engines to do better here. Once you’re in memory, 3 byte numbers are horrible to work with.. copy out of a row buffer, convert into a 32bit number and then do foo. Modern CPUs favor 32 or 64bit alignment of data a *lot*. 3byte numbers do not align to 32 or 64bits very well… making things much slower for the common case of using cached data.

“I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions.” The reduction of network round trips is always crucial. I think a lot of round trips could go away if you could issue multiple statements at once (not via semicolon separating them, by protocol awesomeness).

There should be a way to send a set of statements that should be executed. There should also be a way to specify that if no error occurred, commit. This could then be (in the common case) a single round trip to the database. You then only have to make round-trips when what statement to issue next depends on the result of a previous one. The next step being to reduce these round trips… which can either be solved by executing something inside the database server (e.g. stored procedures) or something closer to the database server so that the round trips aren’t as large. This would be where Gearman enters.

I’m interested to see where these two approaches (issuing in batches and executing closer to the DB server) fall down… I know that latency may not be as good… but throughput should be a lot better.

I take heart with “I have yet to use them in MySQL” though. I have my own theories as to why this is… my biggest thought is that it’s because the many, many programmers writing SQL that Mark sees aren’t SQL Stored Procedure programmers. They spend their days in a couple of languages (maybe Perl, Python, PHP, Java, C, C++) and never programmed SQL:2003 Stored Procedures and it just doesn’t come as quickly (or as bug free) as writing code in the languages you use every day.

“Long Running insert, update and delete statements consume too many resources in InnoDB.” I wonder if this desire for MyISAM could be filled by PBXT or BlitzDB? The main reason that MyISAM is currently a temporary table only engine is that MyISAM and the server core were never that well separated.

My ultimate wish is that all engine authors take the approach of that there is an API to their engine and the Storage Engine is merely glue between the database server and their API.

The BlitzDB engine has this, Innobase partially does (and my Embedded InnoDB work goes the whole way) and MySQL Cluster is likely the oldest example.

As a side note, the BlitzDB plugin should go into the main Drizzle tree fairly soon. One of the joys of having an optional plugin that doesn’t touch the core of the server is that we can do this without much worry at all.

“Does Drizzle build on Windows?” Well… no. Funnily enough though, it is increasingly easy to make a Windows port. All the platform specific things are increasingly just plugins. The build system is a sticker… and no, we’re not going to switch to CMake. The C stands for something, and it’s something that even I may not print here… (I had never thought that being able to open up automake generated Makefiles and look at them would be a feature).

This next Drizzle milestone release should be exciting though…

I look forward to having Drizzle widely deployed and relied upon… I think we’ll do well..

Drizzle Developer Day 2010

Hi one and all!

Interested in database systems? Interested because you use them? Because you manage them? Write SQL that goes to them? Or are you one of the people of questionable sanity like myself who develops them?

Well… do we have the offer for you.

Friday, April 16th. Right after the MySQL Conference and Expo at the Santa Clara Convention Center, you can come along to the Drizzle Developer Day.

You will want to add your name to this wiki page: http://drizzle.org/wiki/Drizzle_Developer_Day_2010_signup

Suggest topics over at:
http://drizzle.org/wiki/Drizzle_Developer_Day_2010

Hope to see you there!

Writing A Storage Engine for Drizzle, Part 2: CREATE TABLE

The DDL code paths for Drizzle are increasingly different from MySQL. For example, the embedded_innodb StorageEngine CREATE TABLE code path is completely different than what it would have to be for MySQL. This is because of a number of reasons, the primary one being that Drizzle uses a protobuf message to describe the table format instead of several data structures and a FRM file.

We are pretty close to having the table protobuf message format being final (there’s a few bits left to clean up, but expect them done Real Soon Now (TM)). You can see the definition (which is pretty simple to follow) in drizzled/message/table.proto. Also check out my series of blog posts on the table message (more posts coming, I promise!).

Drizzle allows either your StorageEngine or the Drizzle kernel to take care of storage of table metadata. You tell the Drizzle kernel that your engine will take care of metadata itself by specifying HTON_HAS_DATA_DICTIONARY to the StorageEngine constructor. If you don’t specify HTON_HAS_DATA_DICTIONARY, the Drizzle kernel stores the serialized Table protobuf message in a “table_name.dfe” file in a directory named after the database. If you have specified that you have a data dictionary, you’ll also have to implement some other methods in your StorageEngine. We’ll cover these in a later post.

If you ever dealt with creating a table in MySQL, you may recognize this method:

virtual int create(const char *name, TABLE *form, HA_CREATE_INFO *info)=0;

This is not how we do things in Drizzle. We now have this function in StorageEngine that you have to implement:

int doCreateTable(Session* session, const char *path,
                  Table& table_obj,
                  drizzled::message::Table& table_message)

The existence of the Table parameter is largely historic and at some point will go away. In the Embedded InnoDB engine, we don’t use the Table parameter at all. Shortly we’ll also get rid of the path parameter, instead having the table schema in the Table message and helper functions to construct path names.

Methods name “doFoo” (such as doCreateTable) mean that there is a method named foo() (such as createTable()) in the base class. It does some base work (such as making sure the table_message is filled out and handling any errors) while the “real” work is done by your StorageEngine in the doCreateTable() method.

The Embedded InnoDB engine goes through the table message and constructs a data structure for the Embedded InnoDB library to create a table. The ARCHIVE storage engine is much simpler, and it pretty much just creates the header of the ARZ file, mostly ignoring the format of the table. The best bet is to look at the code from one of these engines, depending on what type of engine you’re working on. This code, along with the table message definition should be more than enough.

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Continuing the journey

A couple of months ago (December 1st for those playing along at home) it marked five years to the day that I started at MySQL AB (now Sun, now Oracle). A good part of me is really surprised it was for that long and other parts surprised it wasn’t longer. Through MySQL and Sun, I met some pretty amazing people, worked with some really smart ones and formed really solid and awesome friendships. Of course, not everything was perfect (sometimes not even close), but we did have some fun.

Up until November 2008 (that’s 3 years and 11 months for those playing at home) I worked on MySQL Cluster. Still love the product and love how much better we’re making Drizzle so it’ll be the best SQL interface to NDB :)

The ideas behind Drizzle had been talked about for a while… and with my experience with internals of the MySQL server, I thought that some change and dramatic improvement was sorely needed.

Then, in 2008, Brian created a tree. I was soon sending in patches at nights, we announced to the whole world at OSCON and it captured a lot of attention.

Since November 2008 I’ve been working on Drizzle full time. It was absolutely awesome that I had the opportunity to spend all my days hacking on Drizzle – both directly with fantastic people and for fantastic people.

But… the Sun set… which was exciting and sad at the same time.

Never to fear! There were plenty of places wanting Drizzle hackers (and MySQL hackers). For me, it came down to this: “real artists ship”. While there were other places where I would no doubt be happy and work on something really cool, the only way I could end up working out where I should really be was: what is the best way to have Drizzle make a stable release that we’d see be suitable for deployment? So, Where Am I Now?

Rackspace.

Where I’ll again be spending all my time hacking Drizzle.

Bike Riding in the storm

Out on a pier down St Kilda, the weather looked… well… like it could be a bit annoying on the way back:

but then… just a bit down the way…. it hit:

It was “a bit wet”. Big blocks of ice falling from the sky (that hurt).

Anyway, on the way back we found a storm water drain:

Yes, behind Michael is just all water (and I’m not talking about the Bay).

Still managed to get a 36.5km ride out of it, so not all bad.

Writing A Storage Engine for Drizzle, Part 1: Plugin basics

So, you’ve decided to write a Storage Engine for Drizzle. This is excellent news! The API is continually being improved and if you’ve worked on a Storage Engine for MySQL, you’ll notice quite a few differences in some areas.

The first step is to create a skeleton StorageEngine plugin.

You can see my skeleton embedded_innodb StorageEngine plugin in its merge request.

The important steps are:

1. Create the plugin directory

e.g. mkdir plugin/embedded_innodb

2. Create the plugin.ini file describing the plugin

create the plugin.ini file in the plugin directory (so it’s plugin/plugin_name/plugin.ini)
An example plugin.ini for embedded_innodb is.

[plugin]
title=InnoDB Storage Engine using the Embedded InnoDB library
description=Work in progress engine using libinnodb instead of including it in tree.
sources=embedded_innodb_engine.cc
headers=embedded_innodb_engine.h

This gives us a title and description, along with telling the build system what sources to compile and what headers to make sure to include in any source distribution.

3. Add plugin dependencies

Your plugin may require extra libraries on the system. For example, the embedded_innodb plugin uses the Embedded InnoDB library (libinnodb).

Other examples include the MD5 function requiring either openssl or gnutls, the gearman related plugins requiring gearman libraries, the UUID() function requiring libuuid and BlitzDB requiring Tokyo Cabinet libraries.

For embedded_innodb, pandora-build has a macro for finding libinnodb on the system. We want to run this configure check, so we create a plugin.ac file in the plugin directory (i.e. plugin/plugin_name/plugin.ac) and add the check to it.

For embedded_innodb, the plugin.ac file just contains this one line:

PANDORA_HAVE_LIBINNODB

We also want to add two things to plugin.ini; one to tell the build system only to build our plugin if libinnodb was found and the other to link our plugin with libinnodb. For embedded_innodb, it’s these two lines:

build_conditional="x${ac_cv_libinnodb}" = "xyes"
ldflags=${LTLIBINNODB}
Not too hard at all! This should look relatively familiar for those who have seen autoconf and automake in the past.

Some plugins (such as the md5 function) have a bit more custom auto-foo in plugin.ini and plugin.ac (as one of two libraries can be used). You can do pretty much anything with the plugin system, but you’re a lot more likely to keep it simple like we have here.

4. Add skeleton source code for your StorageEngine

While this will change a little bit over time (and is a little long to just paste into here), you can see what I did for embedded_innodb in the skeleton-embedded-innodb-engine tree.

5. Build!

You will need to re-run ./config/autorun.sh so the build system picks up your new plugin. When you run ./configure --help afterwards, you should see options for building with/without your new plugin.

6. Add a test

You will probably want to add a test to see that your plugin loads successfully. When your plugin is built, the test suite automatically picks up any tests you have in the plugin/plugin_name/tests directory. This is in the same format as general MySQL and Drizzle tests: tests go in a t/ directory, expected results in a r/ directory.

Since we are loading a plugin, we will also need some server options to make sure that plugin is loaded. These are stored in the rather inappropriately named test-master.opt file (that’s the test name with “-master.opt” appended to the end instead of “.test“). For the embedded_innodb plugin_load test, we have a plugin/embedded_innodb/tests/t/plugin_load-master.opt file with the following content:

--plugin_add=embedded_innodb

You can have pretty much anything in the plugin_load.test file… if you’re fancy, you’ll have a SELECT query on data_dictionary.plugins to check that the plugin really is there. Be sure to also add a r/plugin_load.result file (My preferred method is to just create an empty result file, run the test suite and examine the rejected output before renaming the .reject file to .result)

Once you’ve added your test, you can run it either by just typing “make test” (which will run the whole test suite), or you can go into the main tests/ directory and run ./test-run.pl --suite=plugin_name (which will just run the tests for your plugin).

7. Check the code in, feel good about self

and you’re done. Well… the start of a Storage Engine plugin is done :)

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Playing with multiple exposure

So, I discovered that my D200 had a built in “multiple exposure” option. While you can do exactly the same thing in GIMP (or Photoshop I guess) a whole lot easier (for one, you get to see what’s gonig on), we had been discussing Holga earlier in the night… so I felt it kind of appropriate to not really see what I was doing.

Leah playing guitar hero, me sitting across the room only slightly distracting her with a camera.

Guitar Hero

Maybe I will end up getting a Holga one of these days… being restricted can be fun.

anti-anti-feature: Windows license stickers

Anti-Anti-Feature: An antifeature that doesn’t actually do what it’s meant to (something you didn’t want in the first place)

My laptop came with a Windows Vista license. An anti-feature in itself – I didn’t want it, have never used it (I run Ubuntu and love freedom).

However, if you try and read the license key off this sticker, it’s increasingly difficult to do so. It’s being worn away. Why? Because it’s on the bottom of the laptop and I’m using it on my lap (so friction rubs it away).

Luckily I don’t run Windows Vista and need to re-install it any time soon.

on presenting

Dilbert.com

This is totally not confined to at-work presentations.

The number of sessions I have sat through that could have taken 5 minutes instead of 20,30,40 or even 60 is amazing. Remember: I have not flown half way around the globe to see you read. I have come to hear a story, to see how conclusions were formed and interact.

Often, the tools are deficient. Powerpoint encourages bad habits (you can use PowerPoint for excellent slide decks too, but ignore the temptations of boring templates, bad effects and dot lists). The dot point list is more often than not your enemy. I (and anybody else in the audience who has learnt to read) can read your dot points faster than you can. While I’m reading, I’m not listening to you. If you spoke a cure for all forms of cancer just after having put a slide up filled with dot points… 90% of people will miss it.

Now, dot points are an excellent way to remind you what the heck you’re meant to be talking about (and in what order). Use presenter notes! They are really useful.

If your laptop/presentation software doesn’t support a “presenter” mode that lets you view presenter notes but not the whole room, simply write them down, print them out, or anything like that. One simple practice run through will make you be able to do this seamlessly.

The last couple of presentations I did were completely assembled using 280slides.com. An excellent web app for doing presentations. It will import and export ODF (and other formats) so you’re not tied to a (unfortunately) non open source web app. That being said, it ran fine in my browser and unlike OpenOffice.org, did not make me want to stab people repeatedly every time I used it.

So, Stewart’s quick tips:

  • Tell a story. How did you get to your conclusions?
  • Don’t just read. Use visuals to accompany the talk. Visuals aren’t the talk.
  • Practice. Just once or twice through will make things a lot smoother.

Equipment:

  • Make sure your equipment works beforehand. Nobody wants to see you fiddle around with your Windows/OSX laptop only to find out you didn’t bring the dongle or can’t operate the Displays control panel. (Interestingly enough, I see Linux “just work” more than Windows or OSX these days).
  • If there is a microphone, use it. I don’t want to struggle to hear you.
  • If you are constantly using a laser pointer you either have too much on your slides or the slide does not highlight the important information. (laser pointers are useful when people ask questions though)

One blog I love on the subject is Presentation Zen. I’ll also recommend the book, but you can get so much just from the web site.

Some excellent recent presentations:

  • Simplicity Through OptimizationPaul McKenney
    Paul is able to explain RCU clearly and concisely through visuals. You are left with no doubt that this does really work. The visuals are not everything, they assist in the telling of the RCU story
  • Teach every child about foodJamie Oliver
    I watched this online. Note how not everything was smooth the whole way. Also note how this was still effective. Passion is an awesome tool. Check out the simple graph showing lead causes of death: simple and effective.
  • Bill Gates on energy: Innovating to Zero!
    Historically, Bill Gates has not been the most engaging speaker. We can all forget the horrible PowerPoint slides with four hundred dot points about some release of something that nobody cared about. This is different. Clear, concise, engaging and simple visuals to make the point.

First roll through the Nikon F80

A little while ago I bit the bullet and bought a Nikon film body – a F80. May as well have a film body that’s a bit automatic and takes the same lens mount as my digital.

So, I got it and thought “hrrm… I better run a roll of film through it to make sure it works”. Off to the fridge i went to find the cheapest, shittiest roll of film possible… I found “Walgreens” brand film. Manufactured by one of many, bought for cheap, and run through the F80.

Some shots turned out pretty good. I have the full set (most of the roll) up on flickr. A few choice ones are:

Which due to some nice accident of lighting, turned out pretty good. IIRC this was pretty late at night and I was editing photos as Michael came over (bringing much needed beer).

Slides and beer, do you need anything else? I just like this because it’s a snapshot of what I was working on (well, kinda, I was mostly just manipulating digital images).

Leah and I went bushwalking… so had to snap a shot of her. I do like the Nikon 50mm as a portrait lens. The film… well… it was cheap, but not too bad actually.

A shallow depth of field can be a lot of fun. Although not entirely sure how I feel about the bokeh….

Which has some odd colours. Nice, but odd.

I like my “new” body. It’ll be fun.

Anti-anti-features: region coding

DVD anti-features are rather well documented. The purpose of “region coding” was to make sure that everybody who ever visited a foreign country and picked up some DVDs while there would get home to find out that they wouldn’t work.

Luckily, those of us who pay good money for DVDs have free software solutions to let us used our payed for product and not force us to download “pirated” copies just so we can view what we payed for.

The region coding in DVDs was designed with the idea that DVD players would always be expensive. You could “change” which region your DVD player was in a set number of times before you could no longer change it.

DVD players can now be bought for $30 (or less). This is what you could pay for a DVD movie. So with economies of scale driving prices down, even if CSS wasn’t completely broken, you can brute force the region coding by just buying 6 DVD players ($180) – less than many of us payed for our first, second or third DVD player.

The same thing will happen with BluRay. You can now get BluRay players for a couple of hundred dollars. One for each of the regions (A, B and C) will cost you less than original BluRay players cost.

So the antifeature of limiting who can watch a DVD/BluRay release is easily broken as player costs come down.

Anti-anti-features: copyright notices

Mako has often talked very well about anti-features. The “features” in software that nobody wants and often cost money to do the easier task of not including the feature. Examples include the non-skip parts of DVDs and BluRay Discs (see here for more).

I’d like to coin a new term… anti-anti-features. These are antifeatures (i.e. a feature you didn’t want in the first place) doesn’t actually function properly itself.

The other day, I sat down with a friend to watch a movie. We had hired out a BluRay of a recently released movie, popped it in the player and attempted to hit “Pause”. Why pause? Well… movies often can auto-play and we wanted to fetch a beer, snack and otherwise prepare for the great movie watching experience.

It turns out you cannot pause the copyright notice. So if you’re trying to be good and understand your obligations under the license in which you have received this disc, you cannot actually finish reading them!

Try it – put in a DVD or BluRay and try to read the copyright notice. I bet you that for a large number of discs you cannot do so in the time allowed.

This just goes to show how utterly useless these “no skip” zones are. You will see hundreds of exactly the same notice (one for each disc you view) many, many times (each time you view it) – one would think that after the first, second, third or even 10th time you’d understand it.

Amazingly, under DVD playback software that lets you skip the “no skip” zones (e.g. every DVD player on Linux) it also allows you to pause on the copyright notice and read it.

H.264 Anti-features

No, You Can’t Do That With H.264 is an excellent write up of how H.264 (“MPEG-4”) is fraught with problems that you just would not have if using Free (as in Freedom) formats such as Ogg Vorbis and Theora.

It is amazing that Final Cut “Pro” cannot actually be used to create H.264 content for commercial (i.e. “Pro”) use!

It’s the same for MPEG-2.

Oh, and if you use it to decode video that was encoded by somebody without the proper license… well, then you’re also screwed. How the heck you’re meant to work that one out I have no idea.

NDB$INFO with SQL hits beta

Bernhard blogged over at http://ocklin.blogspot.com/2010/02/mysql-cluster-711-is-there.html that MySQL Cluster 7.1.1 Beta has been released. The big feature (from my point of view) is the SQL interface on top of NDB$INFO. This means there is now full infrastructure from the NDB data nodes right out to SQL in the MySQL Server for adding monitoring to any bit of the internals of the data nodes.

You have already lost

When the following code introduces a valgrind warning… you are in a world of pain and loss:

=== modified file 'drizzled/field/blob.h'
--- drizzled/field/blob.h	2009-12-21 08:16:13 +0000
+++ drizzled/field/blob.h	2010-01-18 01:36:48 +0000
@@ -32,6 +32,7 @@
  */
 class Field_blob :public Field_str {
 protected:
+  uint32_t assassass;
   uint32_t packlength;
   String value;				// For temporaries
 public: