OpenSolaris 2008.11 first impressions

Using the wonderful bittorrent, I got the CD image in next to no time (in contrast to the Solaris Express image I’m currently downloading via HTTP that’s taking forever).

Boot time in VirtualBox (off the ISO image) was rather quick, usual questions on keyboard layout and desired language (it’d be neater if these were GUI questions… but anyway). The GNOME desktop loaded up, popup window informed me that it had connected to the network. Awesome.

Package manager: opend quickly, using the opensolaris.org repository and it does seem to have a lot of packages… even MySQL 5.0.67 (and 4.0.24). Not 5.1 though, but it is early days (and it was just released as GA the other day).

At least one unusual thing was SUNWgrub and SUNWgrubS (where the S is for source). I assume this is some packaging oddity as I don’t ese other packages like this.

SUNWii wins the odd package name award.

The Time Slider seems like possibly the most awesome thing ever. It periodically takes ZFS snapshots of your disk and presents you with a time slider in nautilus so you can just view your data how it was in the past (at previous snapshots).

I can’t see how to change the keyboard layout to DVORAK (at least while booted off the CD image).

The getting started guide also shows how to get a development environment going… this is quite promising. Will do proper install shortly and do a step by step “building drizzle on OpenSolaris 2008.11” post.

Soap nuts

The other day I ordered some soap nuts from Green And Nutty – and they arrived (rather quickly) yesterday (or today… I forget.. a package on the doorstep anyway). Kristy (owner) was telling me about the soap nuts ages ago… and I finally got around to giving them a go (otherwise known as “stewart put two and two together when needing to buy washing powder”).

I don’t have an allergy problem to normal soaps and aren’t on some exotic “everything must be natural” trip (although it does have a certain appeal).

One bonus thing (for me) is that I *know* that what I’m using for washing clothes is vegan.

Anyway, first wash with them today and no problems to report. Seemed to produce clean clothes (although weren’t that dirty to begin with). Will report further in future washes.

There’s a 2-for-1 offer for December which I took (1kg for $17.99)… which should be enough to last a year (for the average household… and I’m smaller than that).

Stewart learns SQL oddities…

What would you expect the following to fail with?

CREATE TABLE t1 (a int, b int);
insert into t1 values (100,100);
CREATE TEMPORARY TABLE t2 (a int, b int, primary key (a));
BEGIN;
INSERT INTO t2 values(100,100);
CREATE TEMPORARY TABLE IF NOT EXISTS t2 (primary key (a)) select * from t1;

If you answered ER_DUP_ENTRY then you are correct.

From the manual:

Note

If you use IF NOT EXISTS in a CREATE TABLE ... SELECT statement, any rows selected by the SELECT part are inserted regardless of whether the table already exists.

Does anybody else find this behaviour “interesting”?

The passing of a faithful bottle opener….

Sadly, after many years of service (over 5), my Microsoft VisualStudio.net bottle opener is no longer functional enough to keep on my keyring. See image below for the arrow pointing to where too much metal has chipped off that it no longer can open all beer.

It is a sad day.

Luckily, I have another one I can quickly add to my keyring to facilitate beer opening when needed.

What constitutes an identifier for a table?

Well… there’s:

  • database
  • table name

(both of these are quite obvious).

But then you have:

  • temporary tables

Well… two types of temporary tables:

  • those created in the course of query execution, typically in /tmp/
  • those created during ALTER TABLE, typically in the database directory

You may have seen these “#sql-foo.frm” etc files around.

but you can also CREATE TABLE `#sql-foo` (you know, because that’s a good string to use in your app). In fact, you can (and shouldn’t) create a table with the exact same name as the temporary #sql table and use it in your app.

So really the primary key for a table is: string database name, string table name, bool is_tmp. Oh, and the /tmp/ temporary tables just to screw with your braiiiinn.

In drizzle, this is what the storage engine API is ending up looking like. It’s the responsibility of the engine to encode the temporary table names so that they don’t clash with the non-temporary table names. This could be done by putting them in a different set of files, marking them with som eflag, or just passing the is_tmp flag to build_table_filename (or make_table_key) and being done with it.

I think this ends up being the least-hacky approach and it does mean we can remove the bass ackwards if(table_name[0]==’#’) checks from some engines.

Temporary tables in any engine….

Well… nearly any engine.

I have a plan forming in my head to add some hooks to engines to help with creating temporary tables (the ones created while executing a query, not ones created during ALTER TABLE).

Currently, if you ALTER TABLE and we require a temporary table, it’s still database.table but we generate a table name that’s small, unique and begins with “#sql”.

I’ve changed some of the handler interface to accept two strings (database name, table name) instead of one “path” that may (or may not) end in “.FRM” and may (or may not) begin with “./” and may (or may not) use the “/” separator between database and table name (hint: on win32, it’s sometimes “”).

So ha_delete_table is now: ha_delete_table(db, table_name). Sanity!

(this has the downside of being a incompatible change that doesn’t break the build as there seems to be no way in C++ to say “derived classes cannot implement a function of this name”).

This is annoying for temporary tables however.

They don’t reside in a database… they’re off in la-la land (otherwise known as opt_drizzle_tmpdir).

Now… in MySQL 5.1 the ability for multiple temporary directories was added, and the MySQL server will cycle through them. The clusterfuck part of this was that a mutex was added… so every time the code goes to get the name of a (or for 99.99999999% of cases, the) temporary directory, it has to grab a mutex. With modern systems being able to have *many* simultaneous IO operations (e.g. create files) this is just dumb. It’s gone.

If you want to use multiple spindles for temporary tables it’s called RAID people (or buy an SSD, mkfs.ext2 it and just wipe it on reboot. simple.)

Anyway… back to temporary tables:

they’re in /tmp or something. So an API that’s foo(const char *db, const char *table_name) doesn’t work as well.

Also, for a bunch of engines, it’s good to know that you’re using a temporary table. You probably want to store these somewhere that never needs fsync() or anything like that. If the server goes away, these tables are *gone*. So optimise for that.

Heck, it may even be good to store temporary tables in your distributed engine (if that engine is memory based) as it’s often faster to access remote memory than local disk (although with SSD this is a whole different ball game… in fact, I’m not sure if it’s even still a ball game.. it’s possibly frisbee)

But currently there’s hardcoded a mi_create (MyISAM create) call in the server and in the 6.0-maria tree, an #ifdef around if it’s mi_create or maria_create.

So not any engine yet… but one can dream. Yes, I dare to dream.

(although why I was dreaming of a small board with a PowerPC 603, 8MB RAM and a mini DVI port the other night is quite beyond me)

I don’t read code comments

They are wrong.

Misleading at best.

Reworking parts of Drizzle (which came directly from MySQL) it can get painfully obvious. Things like “afaiu” and “???” appear in more than one place (that is if the comment isn’t just obviously wrong).

A comment merely states what one person thought the code did at some point in the past. It has no relation to what the code actually does now.

Hold me, I’m scared….

From ha_myisam.cc:

/*
TODO: switch from protocol to push_warning here. The main reason we didn’t
it yet is parallel repair. Due to following trace:
mi_check_print_msg/push_warning/sql_alloc/my_pthread_getspecific_ptr.

Also we likely need to lock mutex here (in both cases with protocol and
push_warning).
*/
protocol->prepare_for_resend();
protocol->store(name, length, system_charset_info);
protocol->store(param->op_name, system_charset_info);
protocol->store(msg_type, system_charset_info);
protocol->store(msgbuf, msg_length, system_charset_info);
if (protocol->write())
sql_print_error(“Failed on my_net_write, writing to stderr instead: %s\n”,
msgbuf);
return;

Hopefully this will serve as a good TODO list to go and fix at some point.

The Drizzle Snowman – PlanetMySQL fail

If you went “wtf” at The Drizzle Snowman – WIN! – Jay Pipes on PlanetMySQL suddenly ending at “create table”, you should click through and see the unicode character for a snowman.

In other fun, we’ve also created tables with the name of cloud symbol, umbrella symbol and umbrella with rain drops symbol. All of these seem rather appropriate for Drizzle.

What VERSION in INFORMATION_SCHEMA.TABLES means (hint: not what you think)

It’s the FRM file format version number.

It’s not the version of the table as one might expect (i.e. after CREATE it’s 1. Then, if you ALTER, it’s 2. Alter again 3 etc).

In Drizzle, we now return 0.

In future, I plan that Drizzle will allow the engine to say what version it is (where 0 is “dunno”).

This’ll be a good step towards being able to cope with multiple versions of a table in use at once (and making sense of this to the user).

drop table fail (on the road to removing the FRM)

So… in removing the FRM file in Drizzle, I found a bit of a nugget on how drop table works (currently in the MySQL server and now “did” in Drizzle).

If you DROP TABLE t1; this is what happens

  • open the .frm file
  • read first 10bytes (oh, and if you get EIO there, in a SELECT * FROM INFORMATION_SCHEMA.TABLES you’ll get “Error” instead of “Base Table”)
  • if (header[0] != (unsigned char) 254 || header[1] != 1 ||
    (header[2] != FRM_VER && header[2] != FRM_VER+1 &&
    (header[2] < FRM_VER+3 || header[2] > FRM_VER+4)))
    return true;
    Which means that you probably (well, should have) set your enum legacy_db_type to DB_TYPE_UNKNOWN in the caller of bool mysql_frm_type(Session *, char *path, enum legacy_db_type *dbt) otherwise you end up in some form of pain.
  • Else, *dbt= (enum legacy_db_type) (uint) *(header + 3);
    return true;                   // Is probably a .frm table

I do like the “probably”.

Oh, and on a “storage engine api” front, some places seem to expect handler::delete_table(const char* name) to return ENOENT on table not existing. In reality however:

  • int ha_heap::delete_table(const char *name)
    {
    -  int error= heap_delete_table(name);
    -  return error == ENOENT ? 0 : error;
    +  return heap_delete_table(name);
    }
  • InnoDB (note the behaviour of returning DB_TABLE_NOT_FOUND… which isn’t ENOENT)
    err = DB_TABLE_NOT_FOUND;
    ut_print_timestamp(stderr);

    fputs(“  InnoDB: Error: table “, stderr);
    ut_print_name(stderr, trx, TRUE, name);
    fputs(” does not exist in the InnoDB internaln”
    “InnoDB: data dictionary though MySQL is”
    ” trying to drop it.n”
    “InnoDB: Have you copied the .frm file”
    ” of the table to then”
    “InnoDB: MySQL database directory”
    ” from another database?n”
    “InnoDB: You can look for further help fromn”
    “InnoDB: http://dev.mysql.com/doc/refman/5.1/en/”
    “innodb-troubleshooting.htmln”,
    stderr);

  • and MyISAM would generate the error message itself, but that’s fixed with:
    -  if (my_delete_with_symlink(from, MYF(MY_WME)))
    +  if (my_delete_with_symlink(from, 0))
    return(my_errno);

and just to add to the fun, elsewhere in the code, a access(2) call on the frm file name is used to determine if the table exists or not.

The road to removing the FRM has all sorts of these weird-ass things along it. Kinda nice to be able to replace this with something better (and, hopefully – good).

But let me sum up with sql_table.cc:

“This code is wrong and will be removed, please do not copy.”

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

*cough*

(and then wipe coffee off the computer)

of course the real aim should be to scale with one instance on the machine as scaling with multiple instances on the one machine isn’t scaling at all – it’s scale out, but with more problems (now when one machine goes down, so do 1110202434 database instances).

phrase from nearest book

from elliot: phrase from nearest book

  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

My result:

“A lot of UN staff call themselves human rights experts but need a shoulder to cry on each time there’s a killing”

– Emergency Sex (and other desperate measures)
Kenneth Cain, Heidi Postlewait and Andrew Thomson

Technology predictions

In 2 years (ish):

  • the majority of consumer bought machines (which will be laptops) will have SSD and not rotational media
  • At the same time, servers with larger storage requirements will use disk as we once used tape.
  • At least one Linux distributoin will be shipping with btrfs as default
  • OpenSolaris will be looking interesting and not annoying to try out (a lot more “just work” and easy to get going).
  • Unless Sun puts ZFS under a GPL compatible license so it can make it into the Linux kernel, it will become nothing more than a Solaris oddity as other file systems will have caught up (and possibly surpassed).
  • There will be somebody developing a a MySQL compatible release based off Drizzle
  • Somebody will have ported Drizzle back to Microsoft Windows… possibly Microsoft.
  • X will still be used for graphics on Linux, although yet another project will start up to “replace X with something modern”, get a lot of press and then fail.

In 5 years:

  • Apple will single handedly control 1/3rd the mobile phone market
  • The other 2/3rds will be divided between Blackberry (small), Windows Mobile and Android.
  • Linux desktop market share will be much higher than Apple’s

That’s all for now…

libmallocfail

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)

Goodbye FRM (or at least the steps to it)

Since before MySQL was MySQL, there has been the .FRM file. Of course, what it really wanted to be was “.form” -  a file that stored how to display a form on your (green) CRT. Jan blogged earlier in the year on this still being there, even in MySQL 5.1 (albeit not in any useful form).

So why do we want it to die?

Well… it’s not exactly very useful anymore.

There are a few things it’s used for….

If database/table.frm exists, the table exists (or, on Windows, you may also get databasetable.frm). This is tested in a few bits in the code by a call to access(2).

Most engines have their own data dictionaries (Innodb, PBXT, NDB, Falcon). Keeping these in sync with the FRMs can be problematic at best. This is especially true with distributed engines such as NDB.

The current solution is that on the SQL node that is creating the table, we create the FRM file, gzip it, and store it in the cluster. Then, other nodes, if they go “err… no local frm” first call ha_create_table_from_engine() which NDB will go and see if the table exists in the cluster. If so, it copies the FRM from the cluster to local disk and then the SQL server continues on its way with the standard way of opening a table (through the FRM). If you do DDL through the NDB-API (and not via SQL) then well… you get to keep both pieces.

As for if you crash during a table rename (with any engine with its own data dictionary.. e.g. InnoDB)… you again get to keep both pieces. (There is a bit of discussion on this over here)

Having FRM files also doesn’t especially lead to having multiple versions of table metadata co-existing in the server.

The fun part of reading a frm is open_binary_frm in table.cc. It reads in the frm into a TABLE_SHARE. If we only had some other way of filling out a TABLE_SHARE… one from the engine itself…

But what about any metadata that the engine data dictionary doesn’t have? For example, many server types may map to 1 engine type. An example of this is the GIS types in MySQL. For most engines, these just map to BLOBs. The engine itself has no knowledge about that, but we should fill out the table definition correctly…. so for this type of thing the engine may need to store some additional metadata. This is pretty easy for transactional engines: put it in a table! (although you then have your own problem about keeping this synchronised with any DDL). For engines that don’t have their own data dictionary, we can just provide a set of routines to store/read a frm type file (based on protobufs no doubt).

There also seems to be some entanglement with LOCK_open. Ahhh LOCK_open, the lock that nobody can possibly understand.

The tricky thing will be not rewriting every little bit from scratch all at once but rather go for the incremental bits….

Perhaps you’re not all stupid…. although…

As we (the rest of the world) heave a huge sigh of relief at the result of the US Presidential election it’s morning here and I’m happy that Obama is the president elect.

However, it looks like Proposition 8 will get up in California. EPIC FAIL. I hope the remaining ballots tip the result in the other direction, I really do.

I don’t see how the state (where state includes country and state… i.e. governments) has any right to legislate anything to do with human relationships.

What is marriage? Beyond the personal committment between consenting adults, it sets up a few things in case one party is incapacitated or dies. So why don’t we have a set of forms for this information?

In case I’m unable to make my own medical decisions, a list of people who (in order) I want to make them for me. If I’m not married, surely I still have a right to say who gets to make medical decisions for me.

On the “who gets my stuff when I die” front, a legal will pretty much covers that.

Some places seem to have tax benefits if you’re married… which to me is much like the government interfering in something that is none of its business. Surely being with a partner is its own reward, not something you do for tax incentives.

So you’re left with things like: shared property if relationship ends, right to adopt, access to fertility treatment etc

Shared property: How is this different from “two friends buy a house together, share it. they then fall out/decide to sell and there is dispute as one put more $$ into it”? It’s not, it’s just on a larger scale.

All I can say on who gets to reproduce is that it is surely a child is best raised by people who love them and can provide them with a safe, nurturing and positive environment.

Solution? Abolish Marriage.

Well… at least any legal definition thereof and everybody can live happily ever after.

Except that’s not what the issue is. It’s a debate between those who believe in personal freedom and right to live as you choose (on the proviso that it does not cause harm to others) and those who do not.

I have exactly zero time for those who think abortion should be illegal. When no child on this earth dies of hunger or easily preventative disease, I will then hear your argument. Odds are I will disagree with you, but until that day, you are just heartless.

But for the moment, we have until January 20th to see what more Bush/Cheney can do to make Americans constantly apologise for their country.

There is a great deal of hope for Obama, so I say this: don’t fuck up.

(Oh, and please don’t follow our new guys and try and erect a great internet firewall… because the club of countries that do that: Iran, Myanmar, North Korea, and Syria is one you really want to be part of).

I remain cautiously optimistic.