The Drizzle Snowman – PlanetMySQL fail

If you went “wtf” at The Drizzle Snowman – WIN! – Jay Pipes on PlanetMySQL suddenly ending at “create table”, you should click through and see the unicode character for a snowman.

In other fun, we’ve also created tables with the name of cloud symbol, umbrella symbol and umbrella with rain drops symbol. All of these seem rather appropriate for Drizzle.

What VERSION in INFORMATION_SCHEMA.TABLES means (hint: not what you think)

It’s the FRM file format version number.

It’s not the version of the table as one might expect (i.e. after CREATE it’s 1. Then, if you ALTER, it’s 2. Alter again 3 etc).

In Drizzle, we now return 0.

In future, I plan that Drizzle will allow the engine to say what version it is (where 0 is “dunno”).

This’ll be a good step towards being able to cope with multiple versions of a table in use at once (and making sense of this to the user).

drop table fail (on the road to removing the FRM)

So… in removing the FRM file in Drizzle, I found a bit of a nugget on how drop table works (currently in the MySQL server and now “did” in Drizzle).

If you DROP TABLE t1; this is what happens

  • open the .frm file
  • read first 10bytes (oh, and if you get EIO there, in a SELECT * FROM INFORMATION_SCHEMA.TABLES you’ll get “Error” instead of “Base Table”)
  • if (header[0] != (unsigned char) 254 || header[1] != 1 ||
    (header[2] != FRM_VER && header[2] != FRM_VER+1 &&
    (header[2] < FRM_VER+3 || header[2] > FRM_VER+4)))
    return true;
    Which means that you probably (well, should have) set your enum legacy_db_type to DB_TYPE_UNKNOWN in the caller of bool mysql_frm_type(Session *, char *path, enum legacy_db_type *dbt) otherwise you end up in some form of pain.
  • Else, *dbt= (enum legacy_db_type) (uint) *(header + 3);
    return true;                   // Is probably a .frm table

I do like the “probably”.

Oh, and on a “storage engine api” front, some places seem to expect handler::delete_table(const char* name) to return ENOENT on table not existing. In reality however:

  • int ha_heap::delete_table(const char *name)
    {
    -  int error= heap_delete_table(name);
    -  return error == ENOENT ? 0 : error;
    +  return heap_delete_table(name);
    }
  • InnoDB (note the behaviour of returning DB_TABLE_NOT_FOUND… which isn’t ENOENT)
    err = DB_TABLE_NOT_FOUND;
    ut_print_timestamp(stderr);

    fputs(“  InnoDB: Error: table “, stderr);
    ut_print_name(stderr, trx, TRUE, name);
    fputs(” does not exist in the InnoDB internaln”
    “InnoDB: data dictionary though MySQL is”
    ” trying to drop it.n”
    “InnoDB: Have you copied the .frm file”
    ” of the table to then”
    “InnoDB: MySQL database directory”
    ” from another database?n”
    “InnoDB: You can look for further help fromn”
    “InnoDB: http://dev.mysql.com/doc/refman/5.1/en/”
    “innodb-troubleshooting.htmln”,
    stderr);

  • and MyISAM would generate the error message itself, but that’s fixed with:
    -  if (my_delete_with_symlink(from, MYF(MY_WME)))
    +  if (my_delete_with_symlink(from, 0))
    return(my_errno);

and just to add to the fun, elsewhere in the code, a access(2) call on the frm file name is used to determine if the table exists or not.

The road to removing the FRM has all sorts of these weird-ass things along it. Kinda nice to be able to replace this with something better (and, hopefully – good).

But let me sum up with sql_table.cc:

“This code is wrong and will be removed, please do not copy.”

libmallocfail

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)

Goodbye FRM (or at least the steps to it)

Since before MySQL was MySQL, there has been the .FRM file. Of course, what it really wanted to be was “.form” -  a file that stored how to display a form on your (green) CRT. Jan blogged earlier in the year on this still being there, even in MySQL 5.1 (albeit not in any useful form).

So why do we want it to die?

Well… it’s not exactly very useful anymore.

There are a few things it’s used for….

If database/table.frm exists, the table exists (or, on Windows, you may also get databasetable.frm). This is tested in a few bits in the code by a call to access(2).

Most engines have their own data dictionaries (Innodb, PBXT, NDB, Falcon). Keeping these in sync with the FRMs can be problematic at best. This is especially true with distributed engines such as NDB.

The current solution is that on the SQL node that is creating the table, we create the FRM file, gzip it, and store it in the cluster. Then, other nodes, if they go “err… no local frm” first call ha_create_table_from_engine() which NDB will go and see if the table exists in the cluster. If so, it copies the FRM from the cluster to local disk and then the SQL server continues on its way with the standard way of opening a table (through the FRM). If you do DDL through the NDB-API (and not via SQL) then well… you get to keep both pieces.

As for if you crash during a table rename (with any engine with its own data dictionary.. e.g. InnoDB)… you again get to keep both pieces. (There is a bit of discussion on this over here)

Having FRM files also doesn’t especially lead to having multiple versions of table metadata co-existing in the server.

The fun part of reading a frm is open_binary_frm in table.cc. It reads in the frm into a TABLE_SHARE. If we only had some other way of filling out a TABLE_SHARE… one from the engine itself…

But what about any metadata that the engine data dictionary doesn’t have? For example, many server types may map to 1 engine type. An example of this is the GIS types in MySQL. For most engines, these just map to BLOBs. The engine itself has no knowledge about that, but we should fill out the table definition correctly…. so for this type of thing the engine may need to store some additional metadata. This is pretty easy for transactional engines: put it in a table! (although you then have your own problem about keeping this synchronised with any DDL). For engines that don’t have their own data dictionary, we can just provide a set of routines to store/read a frm type file (based on protobufs no doubt).

There also seems to be some entanglement with LOCK_open. Ahhh LOCK_open, the lock that nobody can possibly understand.

The tricky thing will be not rewriting every little bit from scratch all at once but rather go for the incremental bits….