CREATE, INSERT, SELECT, DROP benchmark

Inspired by PeterZ’s Opening Tables scalability post, I decided to try a little benchmark. This benchmark involved the following:

  • Create 50,000 tables
  • CREATE TABLE t{$i} (i int primary key)
  • Insert one row into each table
  • select * from each table
  • drop each table
  • I wanted to test file system impact on this benchmark. So, I created a new LVM volume, 10GB in size. I extracted a ‘make bin-dist’ of a recent MySQL 5.1 tree, did a “mysql-test-run.pl –start-and-exit” and ran my script, timing real time with time.

    For a default ext3 file system creating MyISAM tables, the test took 15min 8sec.

    For a default xfs file sytem creating MyISAM tables, the test took 7min 20sec.

    For an XFS file system with a 100MB Version 2 log creating MyISAM tables, the test took 7min 32sec – which is within repeatability of the default XFS file system. So log size and version made no real difference.

    For a default reiserfs (v3) file system creating MyISAM tables, the test took 9m 44sec.

    For a ext3 file system with the dir_index option enabled creating MyISAM tables, the test took 14min 21sec.

    For an approximate measure of the CREATE performance…. ext3 and reiserfs averaged about 100 tables/second (although after the 20,000 mark, reiserfs seemed to speed up a little). XFS  averaged about 333 tables/second. I credit this to the check for if the files exist being performed by a b-tree lookup in XFS once the directory reached a certain size.

    Interestingly, DROPPING the tables was amazingly fast on ext3 – about 2500/sec. XFS about 1000/sec. So ext3 can destroy easier than it can create while XFS keeps up to speed with itself.

    What about InnoDB tables? Well…

    ext3(default): 21m 11s

    xfs(default): 12m 48s

    ext3(dir_index): 21m 11s

    Interestingly the create rate for XFS was around 500 tables/second – half that of MyISAM tables.

    These are interesting results for those who use a lot of temporary tables or do lots of create/drop tables as part of daily life.

    All tests performed on a Western Digital 250GB 7200rpm drive in a 2.8Ghz 800Mhz FSB P4 with  2GB memory running Ubuntu 6.10 with HT enabled.

    At the end of the test, the ibdata1 file had grown to a little over 800MB – still enough to fit in memory. If we increased this to maybe 200,000 tables (presumably about a 3.2GB file) that wouldn’t fit in cache, then the extents of XFS would probably make it perform better when doing INSERT and SELECT queries as opposed to the list of blocks that ext3 uses. This is because the Linux kernel caches the mapping of in memory block to disk block lookup making the efficiency of this in the file system irrelevant for data sets less than memory size.

    So go tell your friends: XFS is still the coolest kid on the block.

    8.6GB of email

    If you tar my Maildir, it comes out at about 8.6GB currently. That’s about all my mail since October 2001. Notable exceptions are most of the Spam I’ve received and any messages from LKML.
    Doing a first time sync with offlineimap takes an amount of time that is truly scary. Over 8 hours. When connected directly to the IMAP server with 100base.

    With only 2.7GB synced so far (on one of my machines), there’s been about 210,000 messages transfered. There are about 500,000 messages to go. So a total time of around 24hrs. eep.
    Thunderbird did an offline download of the 700MB of my INBOX very quickly (i.e. at a speed that was encouraging and made me not stop it before it completed). Evolution seemed to be really slow, and want to put header and body in separate files (hrrm… that’s going to be a lot of files). However, I am not ready to switch mail clients to Thunderbird (a variety of reasons).

    I tried switching to dovecot over the weekend – didn’t turn out so great. The sync speed was even slower, and syncing from remote dovecot to local dovecot was horrifically slow (probably would have completed by the end of the week). key word, probably.

    I don’t consider my Maildir to be big. I don’t get many large attachments.

    The fact that Evolution uses about 250MB of memory when I start it up, and 300MB now after sending a few mails is sort of disturbing. Although this does seem to be down from previous versions – so horray Evolution team for that.

    I wonder why such a key piece of infrastructure seems to be so neglected.

    Big Day Out Promoters on crack…

    Listening to JJJ’s Hack from Monday 16th October and the interview with BDO people in regards to scalping, balloting tickets and stuff and woah, talk about agressive.

    Although Tool being the reason that things sold out quickly…. quite likely.

    I cannot wait until those few days in January that will be Tool filled.

    Rusty on LCA talks and other stuff…

    As email is *sooo* non-“Web 2.0”, i reply in blog form….
    Rusty’s Bleeding Edge Page talks about a “Writing an x86 hypervisor: all the cool kids are doing it!” session that sounds really cool (better not be on at the same time as my talk… :)

    I don’t (currently) intend to be one of the cool kids though.

    He also mentions a session entitled “First-timer’s Introduction to LCA”. A couple of possible suggestions (or thoughts, and stuff I’ve seen):

    • be careful if you intend to bitch endlessly about a piece of software – it’s quite likely you’re talking to the person who wrote it (or a chunk of it)
    • sometimes it can be really good to just listen and ask a few good questions to understand. there are a lot of really smart people about
    • you will (at some point) ask a really dumb question (that you’ll only realise is dumb a few months later). Don’t panic – we all do it.
    • Don’t be scared – nobody bites too hard.
    • when staying in the halls, odds are the coffee isn’t that good – be prepared to bring your own or go out every morning.
    • do not be afraid to go up and start talking to people – it’s a great way to meet interesting characters and cool hackers.
    • wash
    • use deodorant
    • encourage others to do the above 2
    • read the summary of a session, not just the title. sometimes you can be misled by the title (for example, not everybody thinks of the same thing when “hacking BLAH” is the title of a session)
    • especially if talking, bring backups, backup (without erasing old backups) and backup. Also, be sure restore works.
    • While a lot of people do enjoy downing a few (or more than a few) Ales, it’s not compulsary. There are people attending LCA who don’t drink (and who may/may not join others at the pub even though they don’t drink alcohol). It’s also okay to not drink too much – in fact, it’s often recommended.
    • Don’t be afraid to ask people who they are, what they do etc. Even if you then immediately recognise the name, it’s good to put a face to the name.
    • You will never see everything you want to.
    • do join the IRC channels – great way of meeting people and organising groups to go do things (like get food, go to pub etc).
    • do talk to people around the dorms – great way of meeting people
    • expect to want a day of rest afterwards
    • there are some “in” jokes – but don’t be afraid to ask what they’re about, strange traditions are part of the LCA experience

    I wonder what should/could be written about going all fanboy/fangirl over favourite hackers? and taking/asking to get taken photos?

    The last thing Rusty talks about is the “Hacking in groups” tutorial. I really liked his and Robert Love’s tutorial in Canberra (Kernel Hacking – where you wrote a PCI driver for the excellent Love Rusty 3000. A device with real specifications, coffee cup stain and all). I’ve had a bit of a mixed feeling about it from Rusty since then, but I reckon it was seriously one of the best tutorials I have ever attended. I also took the hands-on approach as great inspiration for various MySQL Cluster Tutorials I’ve given since (and people have commented on how the hands-on part is great).

    I guess the thing about the kernel hacking tute was that not everybody in the room was at the same skill level (which is something you totally run the risk of with hands-on). Also, if you hadn’t done the prep material, you were probably going to be in trouble.

    But anyway, the idea of having 20 talented coders with 5 people in the tute for each of them and working on some project could be interesting – although rather ambitious. I worry that people without a good enough skillset would rock up and not get much out of it. Although those with adequate skill would do well.

    Picking a project that could be doable in a handful of hours (or a day) is tricky – as it’d probably be an extension to some existing project, which requires learning of it. Or, starting something from scratch can be equally as hard (to end up anywhere useful).

    Some ideas for projects could include:

    • linux file system driver (perhaps read only) for a simple file system (mkfs provided)
    • MySQL table handler for some simple format (indexes get trickier… but maybe simple bitmapped index… or just an in memory table handler)
    • fsck for some file format/file system format

    These have the benefit of being able to run existing good test suites against the software and see how well people did. They’d probably also help people land jobs :)

    Another interesting one would be implementing a library for journaling writes to a file. i.e. instead of write to temp, sync, rename – do journaling.  This would let people easily write apps that did safe updates to large files. You could then use this to implement other things (like a really simple crash-safe storage engine, FUSE file system or something).

    I’m just not sure how much “cool tricks” could really happpen in that time (instead of just getting the job done). 20 coders talking about their neat tricks would probably make a good book though…

    Saturn comes back around…

    For certain evil purposes last week, I assembled the old Saturn with a hard disk I found when cleaning a little while ago (I have that kind of tech stuff – you clean up and find 40GB disks – I’m pretty sure I have an 8.4 bumming around somewhere too).

    Saturn comes back around

    I ended up being able to do the evil I needed to, but I could tell that the room was a bit warmer due to the extra box being alive. I was also lazy and couldn’t be bothered going downstairs for the D200, so this was shot with my old and trusty Coolpix 4500.

    I used the box to be able to get remote access to a customers’ test setup to do some diagnosis on a bug (that’s notoriously hard to reproduce). I think I have a fair idea of what it is now though (timing related – not fun).

    Remember kids, threads are evil.

    Also, an interesting thing to note is that there is, in fact, a limit to not the number of fds you can pass to the select(2) system call, but to the actual number (on my Ubuntu box here, passing a fd of, say 2000 is probably going to lead to trouble). This has nothing to do with the previously mentioned bug, but an interesting point.

    and the morning annoyance award goes to….

    goes to VMware. Honestly, why every time i go and upgrade a kernel or  version of the free (as in beer) VM it asks me about serial numbers.

    They also get a “annoyance award” for not listing Victoria as a state that could be in Australia on their web site. They do list other Australian states though (e.g. Westeren Australia and the Australian Capital Territory) yet not one of the most populous.
    Or it should really go to Solaris. What a pain in the arse to get to the point of being able to compile $random_free_software_project. Look at Ubuntu/Debian: install system, apt-get build-dep $project, grab source, build. No fucking around with PATH or some strange application to do security updates (which I don’t know how on earth I figured out – I know that somebody else I work with hasn’t been able to easily find it). Why oh why is it so hard? Can’t there be an easy way? Please, somebody enlighten me!

    WRT54GL client mode OpenWRT fun!

    the wireless USB dongle I had running on my MythTV box had drivers that weren’t always reliable. I have recently totally decided that if I haven’t had time to debug them and fix the problems by now, I won’t in the near future.

    Today a courier arrived with two of the Linksys WRT54GL for me. yay! My aim is to put OpenWRT on them and use them in client mode (one for me, one for mum) to get around unstable wireless drivers.

    I just set mine up and it works! MythTV box now much more reliably on the network!

    Although, I did hit one snag – the MAC address on the sticker on my unit was NOT the actual MAC address of the router. Really annoying when setting up MAC filtering. Grr….

    (i really should set up better wireless security here)

    dosbox

    I showed kit dosbox. She’s now playing alleycat (sorry, ALLEYCAT.EXE) on it and we’ve all forgotten that we were actually hungry.

    Of course, I did have to play a bit of Hugo’s House of Horrors – sorry, HHH.EXE.

    Oh old DOS games, how awesome you are.

    Storing Passwords (securly) in MySQL

    Frank talks about Storing Passwords in MySQL. He does, however, miss something that’s really, really important. I’m talking about the salting of passwords.

    If I want to find out what  5d41402abc4b2a76b9719d911017c592 or 015f28b9df1bdd36427dd976fb73b29d MD5s mean, the first thing I’m going to try is a dictionary attack (especially if i’ve seen a table with only user and password columns). Guess what? A list of words and their MD5SUMS can be used to very quickly find what these hashes represent.

    I’ll probably have this dictionary in a MySQL database with an index as well. Try it yourself – you’ll probably find a dictionary with the words “hello” and “fire” in it to help. In fact, do this:

    mysql> create table words (word varchar(100));
    Query OK, 0 rows affected (0.13 sec)
    mysql> load data local infile ‘/usr/share/dict/words’ into table words;
    Query OK, 98326 rows affected (0.85 sec)
    Records: 98326  Deleted: 0  Skipped: 0  Warnings: 0

    mysql> alter table words add column md5hash char(32);
    Query OK, 98326 rows affected (0.39 sec)
    Records: 98326  Duplicates: 0  Warnings: 0

    mysql> update words set md5hash=md5(word);
    Query OK, 98326 rows affected (3.19 sec)
    Rows matched: 98326  Changed: 98326  Warnings: 0
    mysql> alter table words add index md5_idx (md5hash);
    Query OK, 98326 rows affected (2.86 sec)
    Records: 98326  Duplicates: 0  Warnings: 0
    mysql> select * from words where md5hash=’5d41402abc4b2a76b9719d911017c592′;
    +——-+———————————-+
    | word  | md5hash                          |
    +——-+———————————-+
    | hello | 5d41402abc4b2a76b9719d911017c592 |
    +——-+———————————-+
    1 row in set (0.11 sec)
    mysql> select * from words where md5hash=’015f28b9df1bdd36427dd976fb73b29d’;
    +——+———————————-+
    | word | md5hash                          |
    +——+———————————-+
    | fire | 015f28b9df1bdd36427dd976fb73b29d |
    +——+———————————-+
    1 row in set (0.00 sec)
    $EXCLAMATION I hear you go.

    Yes, this is not a good way to “secure” passwords. Oddly enough, people have known about this for a long time and there’s a real easy  solution. It’s called salting.

    Salting is prepending a random string to the start of the password when you store it (and when you check it).

    So, let’s look at how our new password table may look:

    mysql> select * from passwords;
    +——+——–+———————————-+
    | user | salt   | md5pass                          |
    +——+——–+———————————-+
    | u1   | ntuk24 | ce6ac665c753714cb3df2aa525943a12 |
    | u2   | drc,3  | 7f573abbb9e086ccc4a85d8b66731ac8 |
    +——+——–+———————————-+
    2 rows in set (0.00 sec)
    As you can see, the MD5s are different than before. If we search these up in our dictionary, we won’t find a match.

    mysql> select * from words where md5hash=’ce6ac665c753714cb3df2aa525943a12′;
    Empty set (0.01 sec)

    instead, we’d have to get the salt and do an md5 of the salt and the dictionary word and see if the md5 matches. Guess what, no index for that! and with all the possible values for salt, we’ve substantially increased the problem space to construct a dictionary (i won’t go into the maths here).

    mysql> create view v as select word, md5(CONCAT(‘ntuk24′,word)) as salted from words;
    Query OK, 0 rows affected (0.05 sec)

    mysql> select * from v where salted=’ce6ac665c753714cb3df2aa525943a12’;
    +——-+———————————-+
    | word  | salted                           |
    +——-+———————————-+
    | hello | ce6ac665c753714cb3df2aa525943a12 |
    +——-+———————————-+
    1 row in set (2.04 sec)

    mysql> create or replace view v as select word, md5(CONCAT(‘drc,3′,word)) as salted from words;
    Query OK, 0 rows affected (0.00 sec)

    mysql> select * from v where salted=’7f573abbb9e086ccc4a85d8b66731ac8’; +——+———————————-+
    | word | salted                           |
    +——+———————————-+
    | fire | 7f573abbb9e086ccc4a85d8b66731ac8 |
    +——+———————————-+
    1 row in set (2.12 sec)

    So we’ve gone from essentially instantaneous retreival, to now taking about 2 seconds. Even if I assume that one of your users is going to be stupid enough to have a dictionary password, It’s going to take me 2 seconds to check each user – as the salt is different for each user! So it could take me hours just to find that user. Think about how many users are in your user table – with 1000 users, it’s over 1/2hr. For larger systems, it’s going to be hours.

    Tapioca VoIP happiness

    Tapioca – TapiWiki

    and specifically, the Landell front end as it supports the use of a http proxy.

    I’ve been able to call Kit and chat while I’ve been on the road this time. Means we get to avoid nasty GSM roaming charges (or any charges) and even though there’s some lag (like a second or so) and the voice quality isn’t brilliant – using Landell/Tapioca and Google Talk on her end means we get to stay in touch without feeling guilty about massive phone bills.

    I totally heart free software.

    Welcome to Beijing (day 1)

    I’ve just come back from lunch. I’ve managed to eat Chinese food, in China, with chopsticks and not totally embarass myself. Ate some new food, new vegetables and a seemingly different type of seaweed than I have eaten before. It tasted good though. I even think Kit would have liked some of it (once she got over the fact that it looked different and some things were green things).
    I arrived safely after a flight that was fine (except for getting up rather early to get to Sydney to then take a sane timed flight). Beijing seems to be a bit like the firefly world, except with less flying cars. You’ve got heaps of stuff in English and Chinese. It could be really interesting to live here and experience things.

    There’s a national English language newspaper which is fairly up to date on world events – the fact that our dear Mr Howard is going to go to the election seems to be news here! It’s not packed with local news, which would be interesting to read (although I think I’ll have to learn to read first).

    The hotel is a short walk from the office (down the street, across the road). Oh, the roads are at least 7 lanes – they’re big!

    Hotel is pretty nice, probably about half the price of what I’d expect to pay back home. Breakfast was good – some totally delicious watermelon. Honestly thinking of just having watermelon for breakfast tomorrow :)

    Although it’s rather obvious that the hotel is aimed at western visitors. At breakfast you could only really tell you’re in China by: looking out the front window at all the Chinese writing or looking at the waiters and waitresses and noticing they all a) spoke Chinese to each other and b) were Chinese. About 5 languages before my first coffee – what a way to start the day!

    At some point I’m going to have to have some Chinese tea – it seems like a real obvious must-do. Although maybe I should give in at some point and buy coffee from starbucks as well….

    Uninviting a Vampire

    Brian “Krow” Aker’s Idle Thoughts – Know your rights

    Brian muses on if you can uninvite a vampire and thinks this wasn’t addressed in Buffy or Angel.

    Angel was uninvited in Season 2 (in the episode Passion).

    I’m pretty sure Dracula was uninvited too, but I think I have to rewatch the episode (shock!)

    Spike was uninvited and couldn’t enter Buffy’s house in Season 5.

    So at least in the Buffyverse, you can uninvite a vampire.

    Beware Digital Rights Traps

    I’ve added the “Beware Digital Rights Traps” buttons to both my blog page and the main page of flamingspork.com.

    iownmydvds.org

    iownmymusic.org

    I’ve also taken a photo I took ages ago and used it as a header image thingy. doesn’t look to bad on the front page… not 100% happy with the wordpress theme integration atm.

    Heroes in Tyrol

    23rd Mostra – “Heroes in Tyrol”, by Niki List (Austria-Sweeden-Germany)

    I managed to see most of this film a few years ago. Anybody know where or how I can get a DVD of it? (with English subtitles). I know somebody in the wider community has to know where (hence why i’ll put this entry in the MySQL category – i know somebody there has to know something about this film).

    Besides – it has drinking songs, and MySQLers will get the connection.