Storing Passwords (securly) in MySQL

Frank talks about Storing Passwords in MySQL. He does, however, miss something that’s really, really important. I’m talking about the salting of passwords.

If I want to find out what  5d41402abc4b2a76b9719d911017c592 or 015f28b9df1bdd36427dd976fb73b29d MD5s mean, the first thing I’m going to try is a dictionary attack (especially if i’ve seen a table with only user and password columns). Guess what? A list of words and their MD5SUMS can be used to very quickly find what these hashes represent.

I’ll probably have this dictionary in a MySQL database with an index as well. Try it yourself – you’ll probably find a dictionary with the words “hello” and “fire” in it to help. In fact, do this:

mysql> create table words (word varchar(100));
Query OK, 0 rows affected (0.13 sec)
mysql> load data local infile ‘/usr/share/dict/words’ into table words;
Query OK, 98326 rows affected (0.85 sec)
Records: 98326  Deleted: 0  Skipped: 0  Warnings: 0

mysql> alter table words add column md5hash char(32);
Query OK, 98326 rows affected (0.39 sec)
Records: 98326  Duplicates: 0  Warnings: 0

mysql> update words set md5hash=md5(word);
Query OK, 98326 rows affected (3.19 sec)
Rows matched: 98326  Changed: 98326  Warnings: 0
mysql> alter table words add index md5_idx (md5hash);
Query OK, 98326 rows affected (2.86 sec)
Records: 98326  Duplicates: 0  Warnings: 0
mysql> select * from words where md5hash=’5d41402abc4b2a76b9719d911017c592′;
+——-+———————————-+
| word  | md5hash                          |
+——-+———————————-+
| hello | 5d41402abc4b2a76b9719d911017c592 |
+——-+———————————-+
1 row in set (0.11 sec)
mysql> select * from words where md5hash=’015f28b9df1bdd36427dd976fb73b29d’;
+——+———————————-+
| word | md5hash                          |
+——+———————————-+
| fire | 015f28b9df1bdd36427dd976fb73b29d |
+——+———————————-+
1 row in set (0.00 sec)
$EXCLAMATION I hear you go.

Yes, this is not a good way to “secure” passwords. Oddly enough, people have known about this for a long time and there’s a real easy  solution. It’s called salting.

Salting is prepending a random string to the start of the password when you store it (and when you check it).

So, let’s look at how our new password table may look:

mysql> select * from passwords;
+——+——–+———————————-+
| user | salt   | md5pass                          |
+——+——–+———————————-+
| u1   | ntuk24 | ce6ac665c753714cb3df2aa525943a12 |
| u2   | drc,3  | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+——–+———————————-+
2 rows in set (0.00 sec)
As you can see, the MD5s are different than before. If we search these up in our dictionary, we won’t find a match.

mysql> select * from words where md5hash=’ce6ac665c753714cb3df2aa525943a12′;
Empty set (0.01 sec)

instead, we’d have to get the salt and do an md5 of the salt and the dictionary word and see if the md5 matches. Guess what, no index for that! and with all the possible values for salt, we’ve substantially increased the problem space to construct a dictionary (i won’t go into the maths here).

mysql> create view v as select word, md5(CONCAT(‘ntuk24′,word)) as salted from words;
Query OK, 0 rows affected (0.05 sec)

mysql> select * from v where salted=’ce6ac665c753714cb3df2aa525943a12’;
+——-+———————————-+
| word  | salted                           |
+——-+———————————-+
| hello | ce6ac665c753714cb3df2aa525943a12 |
+——-+———————————-+
1 row in set (2.04 sec)

mysql> create or replace view v as select word, md5(CONCAT(‘drc,3′,word)) as salted from words;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from v where salted=’7f573abbb9e086ccc4a85d8b66731ac8’; +——+———————————-+
| word | salted                           |
+——+———————————-+
| fire | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+———————————-+
1 row in set (2.12 sec)

So we’ve gone from essentially instantaneous retreival, to now taking about 2 seconds. Even if I assume that one of your users is going to be stupid enough to have a dictionary password, It’s going to take me 2 seconds to check each user – as the salt is different for each user! So it could take me hours just to find that user. Think about how many users are in your user table – with 1000 users, it’s over 1/2hr. For larger systems, it’s going to be hours.

Welcome to Beijing (day 1)

I’ve just come back from lunch. I’ve managed to eat Chinese food, in China, with chopsticks and not totally embarass myself. Ate some new food, new vegetables and a seemingly different type of seaweed than I have eaten before. It tasted good though. I even think Kit would have liked some of it (once she got over the fact that it looked different and some things were green things).
I arrived safely after a flight that was fine (except for getting up rather early to get to Sydney to then take a sane timed flight). Beijing seems to be a bit like the firefly world, except with less flying cars. You’ve got heaps of stuff in English and Chinese. It could be really interesting to live here and experience things.

There’s a national English language newspaper which is fairly up to date on world events – the fact that our dear Mr Howard is going to go to the election seems to be news here! It’s not packed with local news, which would be interesting to read (although I think I’ll have to learn to read first).

The hotel is a short walk from the office (down the street, across the road). Oh, the roads are at least 7 lanes – they’re big!

Hotel is pretty nice, probably about half the price of what I’d expect to pay back home. Breakfast was good – some totally delicious watermelon. Honestly thinking of just having watermelon for breakfast tomorrow :)

Although it’s rather obvious that the hotel is aimed at western visitors. At breakfast you could only really tell you’re in China by: looking out the front window at all the Chinese writing or looking at the waiters and waitresses and noticing they all a) spoke Chinese to each other and b) were Chinese. About 5 languages before my first coffee – what a way to start the day!

At some point I’m going to have to have some Chinese tea – it seems like a real obvious must-do. Although maybe I should give in at some point and buy coffee from starbucks as well….

Heroes in Tyrol

23rd Mostra – “Heroes in Tyrol”, by Niki List (Austria-Sweeden-Germany)

I managed to see most of this film a few years ago. Anybody know where or how I can get a DVD of it? (with English subtitles). I know somebody in the wider community has to know where (hence why i’ll put this entry in the MySQL category – i know somebody there has to know something about this film).

Besides – it has drinking songs, and MySQLers will get the connection.

MySQL Bug Deskbar plugin

Over at my junkcode section, I have mysqlbug.py which is a plugin for the GNOME deskbar panel applet.

If you’ve used Quicksilver on MacOSX, then you know the kind of app that Deskbar Applet is.

This one lets you type “bug 1234” and be given the action of “open mysql bug 1234”. If you type “edit bug 1234” it gives you the option of editing that bug number.

We’ll see if this proves useful.

Many thanks to kamstrup (one of the Deskbar developers) on #deskbar on gimpnet for helping me out with the plugin.

I totally heart Deskbar. It’s awesome.

Arjen’s MySQL Community Journal – HyperThreading? Not on a MySQL server…

Arjen’s MySQL Community Journal – HyperThreading? Not on a MySQL server…

I blame the Linux Process Scheduler. At least it’s better than the earlier 2.6 days where things would get shunted a lot from one “cpu” to the other “cpu” for no real reason.

Newer kernel verisons are probably better… but don’t even think of HT and pre-2.6 – that would be funny.

A Followup on: a bug on failure failure

Ramblings » Blog Archive » a bug on failure failure

I blogged about MySQL Bug 17928 a little while ago. Well, I’ve submitted a patch that fixes the problem. I wrote rather a rather detailed explanation in the Changeset comments (and I encourage every body who commits code anywhere to do the same). You can see the patch over on the commits list (or here if you don’t want to sift through the archives).

In all theory it shouldn’t be hard to enable multiple simultaneous backups for Cluster. How useful this would be is very debatable. Arguably of little use of all (it’s a REDO log backup). Of course, testing for this (as we test Node Failure) would start to get horrific. Any good arguments one way or the other are welcome.

Matt on Ruby resource management (and why you can do it elsewhere)

ERROR: The requested URL could not be retrieved (yeah, site didn’t work when i clicked on it from RSS).

Matt uses this bit of ruby code to demonstrate that here you can’t ever not close the file handle:

File.open('something.txt') do |fd|
# Manipulate the file through 'fd'
end
# File handle is now closed

Which seems pretty cool. However, a good C++ programmer can also acheive the same (or better) things!

For example, in NDB (well, in the portability library we use inside NDB) we have a class called Guard. The constructor for Guard pthread_mutex_locks a mutex. The destructor unlocks it. So, for when you’ve got some simple mutual exclusion you need doing, you can have bits of code like this:

{
Guard g(m_config_mutex);

if(m_config.foo<10)
return -1;

// other_stuff
}

Guess what, those nice little error checking bits that should just return -1 look a lot cleaner and you can never forget to unlock the mutex! You’ll see code like this in the management server (ndb_mgmd – source in storage/ndb/src/mgmsrvr/ mostly in MgmtSrvr.cpp).

In fact, you can do this with files as well (multiple ones)  and have them all closed on exit from the block. It’s all a matter of having (or writing) some good classes. I’m no Ruby expert – but I think you’d have to have more indentation to do that with two files?

a bug on failure failure

I’ve been working on BUG#17928, which is all about “testBackup fails in error handling testcases” which appeared after we merged in some work to the 5.1 tree (which is okay in 5.0) that changes some things in the way that online backups are done in NDB to better support recovery in the event of various types of failures and various times in the process.

Anyway, not all systems are affected by this bug… I’m at least reproducing some of the failures on my laptop and have spent the past while in the depths of the BACKUP and NDBFS blocks trying to work out what’s going on and why we’re hitting this assert.

NDBFS is an interesting block as it’s the file system interaction for NDB – so we’re doing things that could take an arbitrary amount of time. We don’t like waiting for those sorts of things in cluster, so we go on and do other work.

For backup, we buffer up writes and send FSAPPENDREQ (File System Append Request) to NDBFS only when we have a reasonable amount of data to send. For example, for very small rows being put into the log file, no use making a write() call for each row – batch them into big chunks!

Back to the bug; I’ve got a theory[1], half a patch and some work to do tomorrow.

[1] excluding gratuitous Buffy quote

Call for Comments on MySQL Online Backup API – Jay Pipes

Call for Comments on MySQL Online Backup API – Jay Pipes

It’s been interesting watching the ideas develop for online, consistent Backup for MySQL.

I should expand that… consistent across storage engines. Other RDBMS vendors get it easy – they just have one storage engine to back up. We have several – and people want them done consistently (especially when you have multiple transactional ones).

For Cluster (NDB) we have added complications because you’re then wanting a “cross storage engine consistent backup” and there’s synchronisation down inside the cluster to make that happen. Greg and I have been discussing possible ways we could make this work. Not sure if it’ll be in the first version though :)

This is really interesting because AFAIK this is the first time that anybody has tried to solve this particular problem.

So go check it out and see what you think of our ideas.

the special level of hell

“reserved for child molesters and people who talk at the theatre”

(also Malcolm Reynolds if he takes sexual advantage of “Mrs. Reynolds”[1])

I’d like to add to that people who don’t merge their patches.

Special hell.

[1] If you don’t get the Firefly reference you haven’t watched Firefly enough. Go do that. Now. No, you can’t do work first. Firefly, now!

it’s also good to note that this is mostly tongue in cheek. although MERGE YOUR FRIGGEN PATCHES BEFORE PUSHING. Every time you don’t, a cute bunny and baby dies.

DaveM on Ingo’s SMP lock validator

DaveM talks about Ingo’s new SMP lock validator for linux kernel

A note reminding me to go take a look and see what can be ripped out and placed into various bits of MySQL and NDB. Ideally, of course, it could be turned into a LD_PRELOAD for pthread mutexes.

Anybody who wants to look deeper into it before I wake up again is welcome to (and tell me what they find)

ha_file

In what I laughingly call “spare time” I started hacking on ha_file.cc, otherwise known as the FILE storage engine. My idea is relatively simple, I want to be able to store and access my photos from MySQL. I also want the storage to be relatively efficient and have the raw image files on disk, not tied up too much in any different format (my file system is pretty good at storing multi-megabyte files thank you very much) – it also doesn’t require any fancy things to re-use space when I delete things. I should also be able to (efficiently) directly serve the images out of a web server (satisfying the efficiency itch). You could also use something like DMF to migrate old rows off to tape.
So, I started some hacking and some designing and have a working design and a nearly basically working read/write implementation. I’ll share the code when it does, in fact, actually work (by “work” i mean reads and writes basic rows).
I’ve decided to go for the approach of storing columns in extended attributes. Why columns? ’cause then you can access them either from the command line or programmatically through another interface. It also adds an extra layer of evil. With XFS and sufficiently large inodes, these should all fit in the inode anyway. ext3 also has some nice optimisations that should help with performance too.

For blob data, I plan to just store that in the file. In my table for photos example, you could then just run a image browser (e.g. gthumb) on the data directory for the table and see your images. It also means that recovery programs (see my jpeg_recover.c) will work as well.

Knowing the primary key of the row (which I plan to use as the file name for the row) then allows us to generate URLs that could be directly served by a lightweight http server, avoiding all that database code when you’re just serving up an image to a client.
Symbolic links can be used to have indexes.

We can write new rows to a temp directory, sync them, then move them into place. Zero time crash recovery. Index consistency can be handled at runtime with a small extra check.

At some point I should write down how I plan to do isolation levels too. but that’s for another day.

I at least hope that the resulting code may be a useful example for people wanting to implement a storage engine.

A simple implementation should be fairly fast too (with a slightly tuned file system).

I heart valgrind (or: an early patch integrating the MySQL MEM_ROOT stuff with valgrind)

Everybody knows that valgrind is great.

Well, I was observing a problem in some MySQL code, it looked like we were writing over some memory that we weren’t meant to be (as the structure hadn’t been initialised yet). But, seeing as this was memory that had been allocated off a MEM_ROOT (one of our memory allocators), valgrind wasn’t gonig to spit out anything.

This is because this bit of memory had already been allocated and subsequently “freed”, but then reallocated. The “free”ing overwrites the memory with garbage (which is what the MEM_ROOT code does) so that you should see crashes (and a pattern) when you do something bad.

The traditional way to troubleshoot this in to modify your memory allocator so that it just calls malloc() and free() all the time (and valgrind will trap them). We have some code in there to do that too. However, this requires changing some ifdefs and probably not being as efficient.

Valgrind has some macros you can insert into your memory allocator code that tell valgrind that you have “freed” this memory (VALGRIND_MAKE_NOACCESS) or have allocated it (VALGRIND_MAKE_WRITABLE) or have written valid data to it (VALGRIND_MAKE_READABLE). These are semi-documented in the valgrind/valgrind.h file.

These are designed to only add a few CPU instructions to your code, so it should be possible to always have them in your build (you can disable them donig anything by building with -DNVALGRIND IIRC).

(I haven’t done any benchmarks on the code to see if there is any impact though).

Valgrind also has a great (largely undocumented) feature of being able to integrate with memory pools. Since our MEM_ROOT is largely just this, we can get some added benefits here too (one should be better valgrind warnings when we do some bad stuff).
It lets you associate memory with a memory pool, and then just say “this pool has been freed”. Saves you having to keep track of each pointer in the code to pass to “free”. It also can give you valgrind warnings when you try and allocate memory to something that hasn’t been initialised as a memory pool.

The most interesting thing of writing the patch was finding some false positive warnings. Namely, a trick used in a couple of places (i see 2) in the code is to create a temporary memory root on the stack, allocate a larger block of memory and then “swap” the memory roots to be based in this block of memory. I had to write a swap_root function to implement this as valgrind doesn’t export a “swap memory pool” function. It would be a useful addition, maybe I’ll go and suggest it to the developers.

Anyway, I got over that hurdle and now have this patch which seems to work pretty well. I still get a couple of (possible) false positives. We’ll see if this finds any neat bugs. Also, a good exercise would be to see how many extra instructions are really generated and if this has any affect on performance at all.

===== include/my_sys.h 1.196 vs edited =====
--- 1.196/include/my_sys.h 2006-05-22 20:04:34 +10:00
+++ edited/include/my_sys.h 2006-05-26 16:22:11 +10:00
@@ -804,6 +804,7 @@
extern void set_prealloc_root(MEM_ROOT *root, char *ptr);
extern void reset_root_defaults(MEM_ROOT *mem_root, uint block_size,
uint prealloc_size);
+extern void swap_root(MEM_ROOT* new_root, MEM_ROOT* old);
extern char *strdup_root(MEM_ROOT *root,const char *str);
extern char *strmake_root(MEM_ROOT *root,const char *str,uint len);
extern char *memdup_root(MEM_ROOT *root,const char *str,uint len);
===== mysys/my_alloc.c 1.33 vs edited =====
--- 1.33/mysys/my_alloc.c 2005-11-24 07:44:54 +11:00
+++ edited/mysys/my_alloc.c 2006-05-26 19:21:12 +10:00
@@ -22,6 +22,8 @@
#undef EXTRA_DEBUG
#define EXTRA_DEBUG
+#include "valgrind/valgrind.h"
+#include "valgrind/memcheck.h"

/*
Initialize memory root
@@ -66,9 +68,12 @@
mem_root->free->size= pre_alloc_size+ALIGN_SIZE(sizeof(USED_MEM));
mem_root->free->left= pre_alloc_size;
mem_root->free->next= 0;
+ VALGRIND_MAKE_NOACCESS(mem_root->free+ALIGN_SIZE(sizeof(USED_MEM)),
+ pre_alloc_size);
}
}
#endif
+ VALGRIND_CREATE_MEMPOOL(mem_root,0,0);
DBUG_VOID_RETURN;
}

@@ -217,6 +222,9 @@
mem_root->first_block_usage= 0;
}
DBUG_PRINT(“exit”,(“ptr: 0x%lx”, (ulong) point));
+// fprintf(stderr,”root: %lx point: %lx size:%lx\n”,mem_root,point,Size);
+ VALGRIND_MEMPOOL_ALLOC(mem_root,point,Size);
+ VALGRIND_MAKE_WRITABLE(point,Size);
DBUG_RETURN(point);
#endif
}
@@ -286,7 +294,8 @@
for (next= root->free; next; next= *(last= &next->next))
{
next->left= next->size – ALIGN_SIZE(sizeof(USED_MEM));
– TRASH_MEM(next);
+ VALGRIND_MAKE_NOACCESS(next+ALIGN_SIZE(sizeof(USED_MEM)),next->left);
+// TRASH_MEM(next);
}

/* Combine the free and the used list */
@@ -296,7 +305,8 @@
for (; next; next= next->next)
{
next->left= next->size – ALIGN_SIZE(sizeof(USED_MEM));
– TRASH_MEM(next);
+ VALGRIND_MAKE_NOACCESS(next+ALIGN_SIZE(sizeof(USED_MEM)),next->left);
+// TRASH_MEM(next);
}

/* Now everything is set; Indicate that nothing is used anymore */
@@ -357,12 +367,55 @@
{
root->free=root->pre_alloc;
root->free->left=root->pre_alloc->size-ALIGN_SIZE(sizeof(USED_MEM));
– TRASH_MEM(root->pre_alloc);
+ //TRASH_MEM(root->pre_alloc);
root->free->next=0;
}
root->block_num= 4;
root->first_block_usage= 0;
+ VALGRIND_DESTROY_MEMPOOL(root);
+ VALGRIND_CREATE_MEMPOOL(root,0,0);
+ VALGRIND_MAKE_READABLE(root,sizeof(MEM_ROOT));
+ if(root->pre_alloc)
+ {
+ VALGRIND_MAKE_READABLE(root->pre_alloc, ALIGN_SIZE(sizeof(USED_MEM)));
+ VALGRIND_MEMPOOL_ALLOC(root,root->pre_alloc,root->pre_alloc->size);
+ VALGRIND_MAKE_READABLE(root->pre_alloc, ALIGN_SIZE(sizeof(USED_MEM)));
+ }
DBUG_VOID_RETURN;
+}
+
+void swap_root(MEM_ROOT* new_root, MEM_ROOT* old)
+{
+ memcpy((char*) new_root, (char*) old, sizeof(MEM_ROOT));
+ VALGRIND_DESTROY_MEMPOOL(old);
+ VALGRIND_CREATE_MEMPOOL(new_root,0,0);
+
+ reg1 USED_MEM *next;
+
+ VALGRIND_MEMPOOL_ALLOC(new_root,new_root,sizeof(MEM_ROOT));
+ VALGRIND_MAKE_READABLE(new_root,sizeof(MEM_ROOT));
+
+ /* iterate through (partially) free blocks */
+ next= new_root->free;
+ do
+ {
+ if(!next)
+ break;
+ VALGRIND_MEMPOOL_ALLOC(new_root,next,next->size-next->left);
+ VALGRIND_MAKE_READABLE(next,next->size-next->left);
+ next= next->next;
+ } while(1);
+
+ /* now go through the used blocks and mark them free */
+ next= new_root->used;
+ do
+ {
+ if(!next)
+ break;
+ VALGRIND_MEMPOOL_ALLOC(new_root,next,next->size-next->left);
+ VALGRIND_MAKE_READABLE(next,next->size-next->left);
+ next= next->next;
+ } while(1);
}

/*
===== sql/table.cc 1.215 vs edited =====
— 1.215/sql/table.cc 2006-05-23 05:54:55 +10:00
+++ edited/sql/table.cc 2006-05-26 18:12:21 +10:00
@@ -150,7 +150,8 @@

#endif

– memcpy((char*) &share->mem_root, (char*) &mem_root, sizeof(mem_root));
+// memcpy((char*) &share->mem_root, (char*) &mem_root, sizeof(mem_root));
+ swap_root(&share->mem_root,&mem_root);
pthread_mutex_init(&share->mutex, MY_MUTEX_INIT_FAST);
pthread_cond_init(&share->cond, NULL);
}
@@ -252,7 +253,7 @@
hash_free(&share->name_hash);

/* We must copy mem_root from share because share is allocated through it */
– memcpy((char*) &mem_root, (char*) &share->mem_root, sizeof(mem_root));
+ swap_root(&mem_root,&share->mem_root);//memcpy((char*) &mem_root, (char*) &share->mem_root, sizeof(mem_root));
free_root(&mem_root, MYF(0)); // Free’s share
DBUG_VOID_RETURN;
}
===== storage/ndb/src/kernel/blocks/dbdict/Dbdict.cpp 1.87 vs edited =====
— 1.87/storage/ndb/src/kernel/blocks/dbdict/Dbdict.cpp 2006-04-25 22:02:07 +10:00
+++ edited/storage/ndb/src/kernel/blocks/dbdict/Dbdict.cpp 2006-05-26 12:15:43 +10:00
@@ -3148,9 +3148,23 @@

CreateTableRecordPtr createTabPtr;
ndbrequire(c_opCreateTable.find(createTabPtr, callbackData));

– //@todo check error
– ndbrequire(createTabPtr.p->m_errorCode == 0);
+
+ if(createTabPtr.p->m_errorCode != 0)
+ {
+ char buf[255];
+ TableRecordPtr tabPtr;
+ c_tableRecordPool.getPtr(tabPtr, createTabPtr.p->m_tablePtrI);
+
+ BaseString::snprintf(buf, sizeof(buf),
+ “Unable to restart, fail while creating table %d”
+ ” error: %d. Most likely change of configuration”,
+ tabPtr.p->tableId,
+ createTabPtr.p->m_errorCode);
+ progError(__LINE__,
+ NDBD_EXIT_INVALID_CONFIG,
+ buf);
+ ndbrequire(createTabPtr.p->m_errorCode == 0);
+ }

Callback callback;
callback.m_callbackData = callbackData;

Upgrade to OpenOffice.org 2.0.2 and stop murderous urges

It’s no great secret that I think the stability of OpenOffice.org2 Impress in what’s shipped in Ubuntu Breezy leaves a lot to be desired. By ‘a lot’ I mean copy and pasting is unreliably and the Slide Sorter just stopped working for me without crashes (in at least one document).

However, I took the plunge and did something I usually don’t like doing – installing non-official debs.

deb http://people.ubuntu.com/~doko/ubuntu/ breezy-updates/
deb-src http://people.ubuntu.com/~doko/ubuntu/ breezy-updates/
I am now a much happy camper.

Saving is still amazingly slow, but the lack of crashes has made my week.

doko is my hero for the week. A Tip Of The Hat for him.

Kristian on “How to blog for a planet”

How to blog for a planet – MySQL-dump

I have to say I disagree with the whole teaser/article body thing. I really don’t like having RSS feeds that don’t contain the full article. It means I can’t read them offline. I often like to catch up on RSS while offline. I also don’t particularly feel the need to have to make yet another click to view the content of an article.

Yes, it’s a little more bandwidth. But really, it’s cheap. Especially with mod_gzip and whatever else optimised foo we can do.

Maybe planet aggregators could get more clever in summarising entries? Or not. How many people actually read a planet from the web site anyway?

An Apple article on MySQL on Mac OS X

MySQL on Mac OS X: An Ideal Development Combination

They got one bit a bit unclear. They say “In fact, the development team at MySQL AB uses the Mac platform for developing the MySQL server software itself.” Which is misleading at best if not downright wrong.

Yes, some people do use MacOS X. But some also use Microsoft Windows, some FreeBSD and a lot use Linux (various flavours – mine’s Ubuntu). The way their sentence reads is that we only use the Mac platform. This, is wrong. They even quote Brian later on as saying that “A significant number of the developers inside MySQl AB use MacOS X as one of their development platforms.” So they can’t be ignorant of the fact that the Mac is just another platform.
They then go on to again, mislead at best. “the MySQL database was originally an open-source project, but is now owned by a commercial enterprise”. WHAT??? Oh, if you read the next sentence (and disregard this one) you find out you can get both commercial and free, open source licenses.

Apart from that, it reads like marketing. Good for us though, more exposure of MySQL to OSX people is a good thing.

Oh, and I am pointing this out to Apple too. I’m not some asshole who just whines on his blog :)

UPDATE: Brian mentions in the comments of this entry that Apple is taking the feedback seriously and have contacted him about my feedback. So, I’m quite impressed. In fact, kudos to Apple. Anybody who actually takes notice of comments submitted on their web site is doing pretty good. I also have the feeling that this entry perhaps came over a little strong…. so go back and interperet it as “hey, maybe people could read the article and get the wrong idea”.

How auto_increment is implemented in NDB

I was writing this in an email to a co-worker today, could possibly interest people in the outside world as well. It’s a good idea to look at the source at the same time as reading this :)

In ha_ndbcluster::write_row(byte*),

if (table_share->primary_key != MAX_KEY)
{
/*
* Increase any auto_incremented primary key
*/
if (has_auto_increment)
{
THD *thd= table->in_use;

m_skip_auto_increment= FALSE;
update_auto_increment();
/* Ensure that handler is always called for auto_increment values */
thd->next_insert_id= 0;
m_skip_auto_increment= !auto_increment_column_changed;
}
}

We set next_insert_id to 0 so that in handler::update_auto_increment() we end calling the handler and never doing it just inside the server.

The handler function that we end up in is: ha_ndbcluster::get_auto_increment().

From here we end up inside NDB to do the actual work (not in the table handler).

Looking inside storage/ndb/src/ndbapi/Ndb.cpp at the method:

Ndb::getAutoIncrementValue(NdbDictionary::Table*,Uint32)

which really just calls Ndb::getTupleIdFromNdb(Uint32,Uint32)
which either returns a cached value, or goes off and does a call to NDB to get either 1 auto increment value or the full cacheSize we’ve requested (which is worked out in ha_ndbcluster::get_auto_increment()). This increment is done in the interestingly named Ndb::opTupleIdOnNdb(Uint32 aTableId, Uint64 opValue, Uint32 op) (with op=0).

This increments an entry in the SYSTAB_0 table inside the sys database in NDB. The row with SYSKEY_0 equal to the table id keeps the auto increment value. You can watch this by using a tool such as ndb_select_all on this table (grepping for the table id which you found with ndb_show_tables) while inserting rows into a table with an auto_increment value.