Data Storage miniconf for linux.conf.au 2011

Posted on 2011-01-25 by Stewart Smith

Today I’ll be running the Data Storage Miniconf at linux.conf.au 2011. See the Tuesday Schedule on the LCA Website for the up to date schedule for today (the one in the badges probably out of date).

We’ve got some great talks today, so be sure to catch them. There’s also plenty of opportunity and time for discussion.

Monday linux.conf.au 2011 plan

Posted on 2011-01-24 by Stewart Smith

It’s currently my plan to really try and make it to the following sessions:

Applying martial arts to the workplace: your guide to kicking arse (or The Kernel Report – I may have to be in 2 places at once here)
Cloud computing: finding the silver lining for government
License compliance in Open Source business
Training Allies

The middle of the day will probably become “Stewart goes and panics over talks” kinda time.

Should be an awesome day.

Data Storage miniconf Lightning Talk CFP

Posted on 2011-01-21 by Stewart Smith

Going to linux.conf.au ?ï»¿

Use storage, have tales?

Admin storage system, have stories?

Hack on a storage system, have software to promote?

We want your Lightning Talk!

Databases, file systems, cloud storage, network storage, my-insane-mythtv-storage all welcome!

Send me email if you’d like toÂ present (stewart at flamingspork dot com).

Tuesday, from 4:15pm at linux.conf.au

No implicit commit (on the road to transactional DDL)

Posted on 2011-01-06 by Stewart Smith

A long time ago, in a time that can only serve to make some feel old and others older, MySQL didn’t support transactions. Each statement was executed as it went, there was no ROLLBACK (or COMMIT or crash recovery etc). Then there were transactions. Other RDBMSs implement auto_commit functionality, but for MySQL users, we think of it as the magic compatibility mode that (mostly) makes applications written for MyISAM magically work on InnoDB (okay, and making “you should use transactions” a really easy consulting gig :)

I’m currently working on finishing up a patch that removes the implicit COMMIT from DDL operations in Drizzle. Instead, you get an error message saying that Transactional DDL is not currently supported. I see a future where we have one of two situations (possibly depending on the storage engine): support DDL within normal transactions, DDL only transactions (cannot mix with DML). The latter (DDL only transactions) I see as the option for InnoDB/HailDB.

Is your Storage Engine buggy or the database server?

Posted on 2011-01-05 by Stewart Smith

If your storage engine returns an error from rnd_init (or doStartTableScan as it’s named in Drizzle) and does not save this error and return it in any subsequent calls to rnd_next, your engine is buggy. Namely it is buggy in that a) an error may not be reported back to the user and b) everything may explode horribly when rnd_next is called after rnd_init returned an error.

Unless it is running on MariaDB 5.2 or (soon, when the patch hits the tree) Drizzle.

Monty (ï»¿ï»¿Widenius, not Taylor) wrote a patch for MariaDB based on my bug report that addressed that problem. It uses the compiler feature to throw a warning if the result of a function isn’t checked to make sure that all places that call rnd_init are checking for an error from the engine.

Today I (finally) pulled that into Drizzle as well.

So… if your engine does the logical thing and goes “oh look, this method returns an error… I’ll return my error” it will exhibit bugs in MySQL but not MariaDB 5.2 or Drizzle (when patch hits).

Which is buggy, the server or the engine?

The MySQL bug number is 54166, filed in June 2010.

Making B&W Prints

Posted on 2010-12-21 by Stewart Smith

I’m getting better at making prints, and starting to understand how all the bits fit together properly. I’m finding myself disappointed that I’ve shot colour sometimes :)

The light-sealing of the darkroom (also known as laundry (also known as brewery)) is not exactly pretty… but it does work:

MySQL 5.5 is GA and 5.5.8 missing from launchpad…

Posted on 2010-12-16 by Stewart Smith

While it’s great that MySQL 5.5 is GA with the 5.5.8 release (you can download it here), I’m rather disappointed that the bzr repositories on launchpad aren’t being kept up to date. At time of writing, it looked like this:

Yep – nothing for five weeks in the 5.5 repo – nothing since the 5.5.7 release :(

There hasn’t been zero changes either – the changelog has a decent number of fixes.

Persistent index statistics for InnoDB

Posted on 2010-12-13 by Stewart Smith

In browsing the BZR tree for lp:mysql-server, I noticed some rather exciting code had been merged into the Innobase code.

You may be aware that InnoDB will do some index dives when opening a table to get some statistics about the indexes that can help the optimiser make good query plans.

The problem being that this is many disk seeks. It means that on server restart, you have to spend a whole bunch of time seeking around the disk reading index pages.

Not any more.

There is now code merged in to store the calculated statistics in a table inside InnoDB so that these index dives don’t have to happen on startup.

Originally, this looked like it was going to make it into InnoDB+. The good news is that it’s now in a public source tree. I look forward to when it hits a stable release.

(hopefully somebody other than me can beat me to it and write a nice description of the algorithms involved… the code is pretty easy to follow, so it shouldn’t be hard)

Innobase 1.1.3 in Drizzle

Posted on 2010-12-13 by Stewart Smith

In case you haven’t heard yet, I’ve merged in the latest InnoDB from MySQL 5.5.7 into Drizzle. The innobase plugin is now based on InnoDB 1.1.3.

This gets a lot of bug fixes and improvements from 1.1.2 (and on 1.1.1). Enjoy!

Replication log inside InnoDB

Posted on 2010-12-06 by Stewart Smith

The MySQL replication system has always had the replication log (“binlog”) as a separate set of files on disk. Originally, this really didn’t matter as, well, MyISAM wasn’t transactional or crash safe so the binlog didn’t need to be either. If you crashed on a busy write workload, your replication was just going to be hosed anyway.

So then came a time where everybody used InnoDB. Transactional, crash-safe and all the goodies. Then, a bit later, came storing master rpl log position in InnoDB log and XA with the binlog. So a rather long time after MySQL first had replication, you could pull the power cord on the master with a decent amount of certainty that things would be okay when you turned it on again.

I am, of course, totally ignoring the slave state and if it’s safe to do that on slaves.

Using XA to make the binlog and InnoDB consistent does have a cost. That cost is fsync()s. You have to do a lot more of them (two phase commit here).

As you may be aware, at a (much) earlier point in Drizzle we completely ripped out the replication code. Why? A lot of it was very much still geared to support statement based replication – something we certainly didn’t want to support. We also did not really want to keep the legacy binlog format. We wanted it to be very, very pluggable.

So the initial implementation is a transaction log file. Basically, we write out the replication messages to a file. A slave reads this and applies the operations. Pretty simple and foolproof to implement.

But it’s pluggable.

What if we stored the transaction log inside innodb? Not only that, what if we wrote it as part of the transaction that was doing the changes? That way, no XA is needed – everything is consistent with a COMMIT. This would greatly reduce the number of fsync()s needed to be consistent.

Now… the first thing people will say is “arrggh! You’re writing the data *four* times now”. First being the txn data into the log, then the replication log into the log, and then both of these are written back to the data file. It turns out that this is much cheaper than doing the additional fsync()s.

In one of our tests, the file based transaction log: ~300tps. Transaction log in InnoDB: ~1200tps.

I think that’s an acceptable trade-off.

We’ve just merged the first bit of this work into Drizzle.

Props go to Joe Daly, Brian and myself for making it work.

The camera never lies

Posted on 2010-12-02 by Stewart Smith

Of course it does! We have The GIMP and Photoshop! Well…

Back in the day, when everybody shot film, things were a bit more difficult. For a lot of operations it was pretty easy: select the right film, right exposure. For control you could vary how you developed it and beyond that, you could do a million things in the darkroom when printing. However, if you wanted to do something like combine 2 images or take out part of an image or smooth a skin tone, you were in for a lot more fun.

Retouching was done by changing the negative. If you wanted to remove that pimple from a portrait? Go get some paint and pant over it. This was tricky, as for 35mm film, this was very small and fiddly.

This is why publications such as Playboy shot on larger format film. From what I’ve read, either 120 (“medium format” to you and I – bigger than 35mm, but still not huge) or 4×5 (inches – much bigger) or even 8×10. While we can all wish that we too could get hold of some 8×10 Kodachrome to play with (and presumably a lab to process it for us) – those days are long gone.

With a negative of 8×10 inches, you have a lot more to play with and it’s much easier. For one thing, a contact print is as big as most enlargements people do from 35mm!

With humans essentially painting on negatives, it became relatively easy to spot when things had been manipulated (meaning there were experts who did it). However, with the increased sophistication of digital tools, creating quite realistic (even to expert eye) manipulations wasn’t that hard.

Recently, Canon (among others) has tried to bring technology to digital that would enable you to check that the image has not been manipulated after it came out of the camera.

This technology is, of course, flawed.

From the guy who enabled blind people to read eBooks comes the breaking of this system (Boing Bong and Network World).

“Pics or it didn’t happen” just is completely not true.

A more complete look at Storage Engine API

Posted on 2010-11-29 by Stewart Smith

Okay… So I’ve blogged many times before about the Storage Engine API in Drizzle. This API is somewhat inherited from MySQL. We have very much attempted to make it a much cleaner interface. Our goals in making changes include: make it much easier to write and maintain a storage engine, make the upper layer code obviously correct and clear in what it’s doing and being able to more easily introduce optimisations.

I’ve recently added a Storage Engine that is only used in testing: storage_engine_api_tester. I’ve blogged on it producing call graphs (really state transition graphs) before both for Storage Engine and Cursor.

I’ve been expanding the test. My test engine is now a wrapper around a real engine instead of just a fake one. This lets us run real queries (and test cases) while testing what’s going on. At some point in the near future I plan to make it so that it will be able to log what calls go on to the engine and produce a graph just of those.

I added a lot more to the Storage Engine part of the wrapper. Below is what you can see is the current graph:

I’ve coded what I consider to be bugs as red and what I consider suspect as blue.

Also for the Cursor (colours mean the same):

As you can see, there’s currently some wacky possibilities. I’m investigating exactly what’s going on here – If I’m somehow missing some calls that I should be wrapping (I don’t think so) or if we are really doing some dumb-ass things in the upper layer.

Also, please do not be under any impression that any of this means that we’re going to have a stable API. We’re not. To stabilise on this would just be insane – way too much of it still makes not much sense.

Innobase 1.1.1 in Drizzle

Posted on 2010-11-27 by Stewart Smith

Our innobase plugin in Drizzle is now based on Innobase 1.1.1 from MySQL 5.5.5-m3.

This brings all the great new improvements from the Innobase team to Drizzle.

Making my own B&W Prints

Posted on 2010-11-20 by Stewart Smith

I managed to light seal the Laundry (not pretty… but it worked) and started playing with one of the enlargers I bought recently. I had a bit of an inkling from some reading I did ages ago about what I had to do to make prints.

I didn’t really have any developer meant for prints… so I just grabbed some Rodinal and dived right in. Basically started with the lens wide open and around 0.5 to 1 seconds exposure.

Because I was just experimenting, I skipped a stop bath (did a rinse though) and then straight into some fixer.

Here are the results of my experimentation (photos taken with my phone of the drying prints)

Contrast these with the scans of the negatives:

Limiting functions to 32k stack in Drizzle (and scoped_ptr)

Posted on 2010-11-15 by Stewart Smith

I wonder if this comes under “Code Style” or not…

Anyway, Monty and I finished getting Drizzle ready for adding “ï»¿ï»¿ï»¿-Wframe-larger-than=32768” as a standard compiler flag. This means that no function within the Drizzle source tree can use greater than 32kb stack – it’s a compiler warning – and with -Werror, it means that it’s a build error.

GCC is not perfect at detecting stack usage, but it’s pretty good.

Why have we done this?

Well, there is a little bit of recursion in the server… and we can craft queries to blow a small stack (not so good). On MacOS X, the default thread stack size is only 512kb. This gives not many frames if 32kb stack is a even remotely common.

I found some interesting places to throw a lot of things on the stack too – that would be rather far down on a callchain – leading to the possibility of blowing up in really strange ways.

We’d love to make it 16kb…. but that’s a fair bit more work, so something for the future.

We’ve used the Boost scoped_ptr to address a bunch of these situations as it provides pretty much minimal code change for the same effect (except that memory is dynamically allocated instead of as part of the stack frame).

Drizzle gets InnoDB 1.0.9

Posted on 2010-11-12 by Stewart Smith

My branch that updates the innobase plugin in Drizzle to be based on innodb_plugin 1.0.9 has been merged. For the next milestone, we’ll probably have 1.0.11 as well.

How’s the progress getting 1.1 and 1.2 in? Pretty good actually. We’ll have it for either this milestone or the next one.

and merging newer innodb into HailDB? It’s going well too, expect more news “soon”.

Dead possums do not go here

Posted on 2010-11-06 by Stewart Smith

There are much better places. For example: not in my roof and especially not in my roof with no crawl space and especially especially not in the roof above my office.

Improved HailDB documentation

Posted on 2010-11-01 by Stewart Smith

I just spent some time on the Doxygen documentation for HailDB, making sure all the new APIs are documented. You can view them online over at http://www.haildb.com/doc/api/html/

Cursor states

Posted on 2010-10-26 by Stewart Smith

Following on from my post yesterday on the various states of a Storage Engine, I said I’d have a go with the Cursor object too. A Cursor is used by the Drizzle kernel to get and set data in a table. There can be more than one cursor open at once, and more than one per thread. If your engine cannot cope with this, it is its responsibility to figure it out and return the appropriate errors.

Let’s look at a really simple operation, inserting a couple of rows and then reading them back via a full table scan.

Now, this graph is slightly incomplete as there is no doEndTableScan() call. But you can see in which order things are meant to happen. In this case, “store_lock()” means that store_lock() has been called, so when coming back from doInsertRecord() we do not call store_lock() again, rather, we’re just in a state where it has already been executed.

For MySQL handler, think ::write_row() for doInsertRecord() and ::rnd_init() for doStartTableScan().

This diagram was again auto-generated from my test engine.

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: