Drizzle progress… (testing can be good)

We’ve been working on fixing up the remaining test cases so that they run with Drizzle. We’ve found: bugs in Drizzle, bugs in MySQL (one that seems to have been there for at least 10 years), bugs in the tests, tests that no longer apply and occationally, something like this:


/* Please god, will someone rewrite this to be readable :( */
if (to->pack_length() == from->pack_length() &&
!(to->flags & UNSIGNED_FLAG && !(from->flags & UNSIGNED_FLAG)) &&
to->real_type() != DRIZZLE_TYPE_ENUM &&
(to->real_type() != DRIZZLE_TYPE_NEWDECIMAL || (to->field_length == from->field_length && (((Field_num*)to)->dec == ((Field_num*)from)->dec))) &&
from->charset() == to->charset() &&
to->table->s->db_low_byte_first == from->table->s->db_low_byte_first &&
(!(to->table->in_use->variables.sql_mode & (MODE_NO_ZERO_DATE | MODE_INVALID_DATES)) || (to->type() != DRIZZLE_TYPE_DATE && to->type() != DRIZZLE_TYPE_DATETIME)) &&
(from->real_type() != DRIZZLE_TYPE_VARCHAR || ((Field_varstring*)from)->length_bytes == ((Field_varstring*)to)->length_bytes))
{ // Identical fields
/* This may happen if one does 'UPDATE ... SET x=x' */
if (to->ptr != from->ptr)
memcpy(to->ptr,from->ptr,to->pack_length());
return 0;
}

and no, I haven’t really changed the formatting.

Speaker: MySQL Conference & Expo 2009 – O’Reilly Conferences, April 20 – 23, 2009, Santa Clara, CA

Yes, I’m speaking at  the upcoming MySQL Conference & Expo 2009 – on April 20 – 23 (and yes, it’s in Santa Clara again).

I have three sessions:

MySQL Cluster Tutorial: this time with 6.4 feature goodness. Very hands-on, very interactive.

MySQL Cluster on Windows:  (insert witty text about hating operating system freedom here)

Memory Management in MySQL and Drizzle: not magic setting of buffer variables, but memory allocation and management inside the server, a bunch of malloc() discussion and hopefully some interesting numbers.

VirtualBox 2.1.0 (and OpenSolaris 2008.11)

Upgraded VirtualBox and booted up my OpenSolaris VM. VirtualBox 2.1.0 finally fixes the bug where if 127.0.0.1 was in resolv.conf on the host – no DNS for you in the guest (unless in the guest you were running a DNS server).

Haven’t tried it yet… but OpenGL Accelleration makes at least a checkbox appearance in VirtualBox 2.1…. so that could be rather awesome.

Going a lot better with OpenSolaris 2008.11 than previous releases.. It looks like it might be quite easy to get to the stage of building Drizzle on it.

Just figured out how to change to Dvorak! Yay, I can type again! (Go to Input Methods preference panel and add US/DVORAK as a language, move it to the top, and enable the input method application and do it that way).

Currently installing sunstudioexpress. Why not gcc? I’m pretty sure the version in OpenSolaris is still ancient (so won’t build drizzle) and Sun Studio does produce different warnings (which indicate real bugs in a bunch of cases).

Things I wish were packaged: latest protobufs, latest bzr, gcc 4.x

fixing drizzle on linux sparc

Since I got fed up with Solaris the other day, the T1000 is running Debian. This means that “I’ll care about Drizzle on Linux Sparc”.

OMG were things broken in the most “trivial” ways.

A good quick intro to the issues is Memory alignment on SPARC, or a 300x speedup!

It all comes down to memory alignment.

So I pulled the MySQL 6.0 bzr tree onto the box to try it too… I haven’t seen so many compiler warnings in ages (okay… since I last built MySQL.. drizzle is warning-clean and it makes it hard to remember a time before that). I think it works purely by accident.

So I’m gradually getting all of Drizzle working on Linux Sparc (a few things fixed already).

It’d be great if the T1k had faster disk though (make -j30 is fun… but IO isn’t on a single 160GB 7200rpm disk)… anybody wanting to donate an SSD?

Debian about 1234533 times easier to install than Solaris

After what many hours trying to netboot the T1000 to install Solaris Express, I wondered “how hard is it for debian?”.

Easy. get the sparc64 boot.img, put it on TFTP server, add “filename “boot.img”;” or similar to dhcp, boot the T1000 from the service console something like this “bootmode bootscript=”boot net:dhcp”\n restart -c” and install away!

As for Solaris?

Well… dhcpd.conf:

option space SUNW;
option SUNW.root-mount-options code 1 = text;
option SUNW.root-server-ip-address code 2 = ip-address;
option SUNW.root-server-hostname code 3 = text;
option SUNW.root-path-name code 4 = text;
option SUNW.swap-server-ip-address code 5 = ip-address;
option SUNW.swap-file-path code 6 = text;
option SUNW.boot-file-path code 7 = text;
option SUNW.posix-timezone-string code 8 = text;
option SUNW.boot-read-size code 9 = unsigned integer 16;
option SUNW.install-server-ip-address code 10 = ip-address;
option SUNW.install-server-hostname code 11 = text;
option SUNW.install-path code 12 = text;
option SUNW.sysid-config-file-server code 13 = text;
option SUNW.JumpStart-server code 14 = text;
option SUNW.terminal-name code 15 = text;
option SUNW.SbootURI code 16 = text;

host hurricane {
hardware ethernet 0:14:4f:1e:28:e;
fixed-address 192.168.1.19;
option host-name “hurricane”;
filename “sparc64-etch-boot.img”;
#       filename “sol-nv-b103-sparc”;
#       option SUNW.install-server-ip-address 192.168.1.1;
#       option SUNW.install-server-hostname “saturn”;
#       option SUNW.install-path “/mnt/sol-nv-b103-sparc/”;
#       option SUNW.root-server-ip-address 192.168.1.1;
#       option SUNW.root-server-hostname “saturn”;
#       option SUNW.root-path-name “/mnt/sol-nv-b103-sparc/Solaris_11/Tools/Boot”;

}

(obviously changing the comments around) and having the Solaris Express DVD mounted and NFS exported…. it *still* doesn’t work. It goes “unable to mount filesystem” with no further hints (even when tcpdumping the network).

Documentation for doing the simple thing of using $dhcp_server and $nfs_server to network boot a Solaris install on a Sparc box is *COMPLETELY* missing.

Now, I’m a smart guy (and if you don’t believe that, at least believe I’m not stupid). If I can’t get it to boot the installer, what chance do others have?

I’ll try OpenSolaris out when it’s on SPARC (and please oh please oh please just have an easy way to net boot the installer using a linux host). Please take the debian way (just a single file on tftp).

So now it’s goodbye Solaris (I’m not going to have something I can’t  re-install, upgrade or security patch) and it’s hello Debian (and sanity).

Yes, this does mean I’ll care about Drizzle on Linux Sparc.

OpenSolaris 2008.11 first impressions

Using the wonderful bittorrent, I got the CD image in next to no time (in contrast to the Solaris Express image I’m currently downloading via HTTP that’s taking forever).

Boot time in VirtualBox (off the ISO image) was rather quick, usual questions on keyboard layout and desired language (it’d be neater if these were GUI questions… but anyway). The GNOME desktop loaded up, popup window informed me that it had connected to the network. Awesome.

Package manager: opend quickly, using the opensolaris.org repository and it does seem to have a lot of packages… even MySQL 5.0.67 (and 4.0.24). Not 5.1 though, but it is early days (and it was just released as GA the other day).

At least one unusual thing was SUNWgrub and SUNWgrubS (where the S is for source). I assume this is some packaging oddity as I don’t ese other packages like this.

SUNWii wins the odd package name award.

The Time Slider seems like possibly the most awesome thing ever. It periodically takes ZFS snapshots of your disk and presents you with a time slider in nautilus so you can just view your data how it was in the past (at previous snapshots).

I can’t see how to change the keyboard layout to DVORAK (at least while booted off the CD image).

The getting started guide also shows how to get a development environment going… this is quite promising. Will do proper install shortly and do a step by step “building drizzle on OpenSolaris 2008.11” post.

Stewart learns SQL oddities…

What would you expect the following to fail with?

CREATE TABLE t1 (a int, b int);
insert into t1 values (100,100);
CREATE TEMPORARY TABLE t2 (a int, b int, primary key (a));
BEGIN;
INSERT INTO t2 values(100,100);
CREATE TEMPORARY TABLE IF NOT EXISTS t2 (primary key (a)) select * from t1;

If you answered ER_DUP_ENTRY then you are correct.

From the manual:

Note

If you use IF NOT EXISTS in a CREATE TABLE ... SELECT statement, any rows selected by the SELECT part are inserted regardless of whether the table already exists.

Does anybody else find this behaviour “interesting”?

What constitutes an identifier for a table?

Well… there’s:

  • database
  • table name

(both of these are quite obvious).

But then you have:

  • temporary tables

Well… two types of temporary tables:

  • those created in the course of query execution, typically in /tmp/
  • those created during ALTER TABLE, typically in the database directory

You may have seen these “#sql-foo.frm” etc files around.

but you can also CREATE TABLE `#sql-foo` (you know, because that’s a good string to use in your app). In fact, you can (and shouldn’t) create a table with the exact same name as the temporary #sql table and use it in your app.

So really the primary key for a table is: string database name, string table name, bool is_tmp. Oh, and the /tmp/ temporary tables just to screw with your braiiiinn.

In drizzle, this is what the storage engine API is ending up looking like. It’s the responsibility of the engine to encode the temporary table names so that they don’t clash with the non-temporary table names. This could be done by putting them in a different set of files, marking them with som eflag, or just passing the is_tmp flag to build_table_filename (or make_table_key) and being done with it.

I think this ends up being the least-hacky approach and it does mean we can remove the bass ackwards if(table_name[0]==’#’) checks from some engines.

Temporary tables in any engine….

Well… nearly any engine.

I have a plan forming in my head to add some hooks to engines to help with creating temporary tables (the ones created while executing a query, not ones created during ALTER TABLE).

Currently, if you ALTER TABLE and we require a temporary table, it’s still database.table but we generate a table name that’s small, unique and begins with “#sql”.

I’ve changed some of the handler interface to accept two strings (database name, table name) instead of one “path” that may (or may not) end in “.FRM” and may (or may not) begin with “./” and may (or may not) use the “/” separator between database and table name (hint: on win32, it’s sometimes “”).

So ha_delete_table is now: ha_delete_table(db, table_name). Sanity!

(this has the downside of being a incompatible change that doesn’t break the build as there seems to be no way in C++ to say “derived classes cannot implement a function of this name”).

This is annoying for temporary tables however.

They don’t reside in a database… they’re off in la-la land (otherwise known as opt_drizzle_tmpdir).

Now… in MySQL 5.1 the ability for multiple temporary directories was added, and the MySQL server will cycle through them. The clusterfuck part of this was that a mutex was added… so every time the code goes to get the name of a (or for 99.99999999% of cases, the) temporary directory, it has to grab a mutex. With modern systems being able to have *many* simultaneous IO operations (e.g. create files) this is just dumb. It’s gone.

If you want to use multiple spindles for temporary tables it’s called RAID people (or buy an SSD, mkfs.ext2 it and just wipe it on reboot. simple.)

Anyway… back to temporary tables:

they’re in /tmp or something. So an API that’s foo(const char *db, const char *table_name) doesn’t work as well.

Also, for a bunch of engines, it’s good to know that you’re using a temporary table. You probably want to store these somewhere that never needs fsync() or anything like that. If the server goes away, these tables are *gone*. So optimise for that.

Heck, it may even be good to store temporary tables in your distributed engine (if that engine is memory based) as it’s often faster to access remote memory than local disk (although with SSD this is a whole different ball game… in fact, I’m not sure if it’s even still a ball game.. it’s possibly frisbee)

But currently there’s hardcoded a mi_create (MyISAM create) call in the server and in the 6.0-maria tree, an #ifdef around if it’s mi_create or maria_create.

So not any engine yet… but one can dream. Yes, I dare to dream.

(although why I was dreaming of a small board with a PowerPC 603, 8MB RAM and a mini DVI port the other night is quite beyond me)

I don’t read code comments

They are wrong.

Misleading at best.

Reworking parts of Drizzle (which came directly from MySQL) it can get painfully obvious. Things like “afaiu” and “???” appear in more than one place (that is if the comment isn’t just obviously wrong).

A comment merely states what one person thought the code did at some point in the past. It has no relation to what the code actually does now.

Hold me, I’m scared….

From ha_myisam.cc:

/*
TODO: switch from protocol to push_warning here. The main reason we didn’t
it yet is parallel repair. Due to following trace:
mi_check_print_msg/push_warning/sql_alloc/my_pthread_getspecific_ptr.

Also we likely need to lock mutex here (in both cases with protocol and
push_warning).
*/
protocol->prepare_for_resend();
protocol->store(name, length, system_charset_info);
protocol->store(param->op_name, system_charset_info);
protocol->store(msg_type, system_charset_info);
protocol->store(msgbuf, msg_length, system_charset_info);
if (protocol->write())
sql_print_error(“Failed on my_net_write, writing to stderr instead: %s\n”,
msgbuf);
return;

Hopefully this will serve as a good TODO list to go and fix at some point.

The Drizzle Snowman – PlanetMySQL fail

If you went “wtf” at The Drizzle Snowman – WIN! – Jay Pipes on PlanetMySQL suddenly ending at “create table”, you should click through and see the unicode character for a snowman.

In other fun, we’ve also created tables with the name of cloud symbol, umbrella symbol and umbrella with rain drops symbol. All of these seem rather appropriate for Drizzle.

What VERSION in INFORMATION_SCHEMA.TABLES means (hint: not what you think)

It’s the FRM file format version number.

It’s not the version of the table as one might expect (i.e. after CREATE it’s 1. Then, if you ALTER, it’s 2. Alter again 3 etc).

In Drizzle, we now return 0.

In future, I plan that Drizzle will allow the engine to say what version it is (where 0 is “dunno”).

This’ll be a good step towards being able to cope with multiple versions of a table in use at once (and making sense of this to the user).

drop table fail (on the road to removing the FRM)

So… in removing the FRM file in Drizzle, I found a bit of a nugget on how drop table works (currently in the MySQL server and now “did” in Drizzle).

If you DROP TABLE t1; this is what happens

  • open the .frm file
  • read first 10bytes (oh, and if you get EIO there, in a SELECT * FROM INFORMATION_SCHEMA.TABLES you’ll get “Error” instead of “Base Table”)
  • if (header[0] != (unsigned char) 254 || header[1] != 1 ||
    (header[2] != FRM_VER && header[2] != FRM_VER+1 &&
    (header[2] < FRM_VER+3 || header[2] > FRM_VER+4)))
    return true;
    Which means that you probably (well, should have) set your enum legacy_db_type to DB_TYPE_UNKNOWN in the caller of bool mysql_frm_type(Session *, char *path, enum legacy_db_type *dbt) otherwise you end up in some form of pain.
  • Else, *dbt= (enum legacy_db_type) (uint) *(header + 3);
    return true;                   // Is probably a .frm table

I do like the “probably”.

Oh, and on a “storage engine api” front, some places seem to expect handler::delete_table(const char* name) to return ENOENT on table not existing. In reality however:

  • int ha_heap::delete_table(const char *name)
    {
    -  int error= heap_delete_table(name);
    -  return error == ENOENT ? 0 : error;
    +  return heap_delete_table(name);
    }
  • InnoDB (note the behaviour of returning DB_TABLE_NOT_FOUND… which isn’t ENOENT)
    err = DB_TABLE_NOT_FOUND;
    ut_print_timestamp(stderr);

    fputs(“  InnoDB: Error: table “, stderr);
    ut_print_name(stderr, trx, TRUE, name);
    fputs(” does not exist in the InnoDB internaln”
    “InnoDB: data dictionary though MySQL is”
    ” trying to drop it.n”
    “InnoDB: Have you copied the .frm file”
    ” of the table to then”
    “InnoDB: MySQL database directory”
    ” from another database?n”
    “InnoDB: You can look for further help fromn”
    “InnoDB: http://dev.mysql.com/doc/refman/5.1/en/”
    “innodb-troubleshooting.htmln”,
    stderr);

  • and MyISAM would generate the error message itself, but that’s fixed with:
    -  if (my_delete_with_symlink(from, MYF(MY_WME)))
    +  if (my_delete_with_symlink(from, 0))
    return(my_errno);

and just to add to the fun, elsewhere in the code, a access(2) call on the frm file name is used to determine if the table exists or not.

The road to removing the FRM has all sorts of these weird-ass things along it. Kinda nice to be able to replace this with something better (and, hopefully – good).

But let me sum up with sql_table.cc:

“This code is wrong and will be removed, please do not copy.”

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

*cough*

(and then wipe coffee off the computer)

of course the real aim should be to scale with one instance on the machine as scaling with multiple instances on the one machine isn’t scaling at all – it’s scale out, but with more problems (now when one machine goes down, so do 1110202434 database instances).

Technology predictions

In 2 years (ish):

  • the majority of consumer bought machines (which will be laptops) will have SSD and not rotational media
  • At the same time, servers with larger storage requirements will use disk as we once used tape.
  • At least one Linux distributoin will be shipping with btrfs as default
  • OpenSolaris will be looking interesting and not annoying to try out (a lot more “just work” and easy to get going).
  • Unless Sun puts ZFS under a GPL compatible license so it can make it into the Linux kernel, it will become nothing more than a Solaris oddity as other file systems will have caught up (and possibly surpassed).
  • There will be somebody developing a a MySQL compatible release based off Drizzle
  • Somebody will have ported Drizzle back to Microsoft Windows… possibly Microsoft.
  • X will still be used for graphics on Linux, although yet another project will start up to “replace X with something modern”, get a lot of press and then fail.

In 5 years:

  • Apple will single handedly control 1/3rd the mobile phone market
  • The other 2/3rds will be divided between Blackberry (small), Windows Mobile and Android.
  • Linux desktop market share will be much higher than Apple’s

That’s all for now…