stringstream is completely useless (and why C++ should have a snprintf)

  1. It’s easy to screw up thread safety.
    If you’re trying to format something for output (e.g. leading zeros, only 1 decimal place or whatever… you know, format specifiers in printf) you are setting a property on the stream, not on what you’re converting. So if you have a thread running that sets a format, adds something to the stream, and then unsets the format, you cannot have another thread able to come in and do something to that stream. Look out for thread unsafe cout code.
  2. You cannot use streams for any text that may need to be translated.
    gettext is what everybody uses. You cannot get a page into the manual before it tells you that translators may want to change the order of what you’re printing. This goes directly against stringstream.
  3. You need another reason? Number 2 rules it out for so much handling it’s not funny.

Does linux fallocate() zero-fill?

In an email disscussion for pre-allocating binlogs for MySQL (something we’ll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don’t want to be doing zero-fill as getting up and running quickly is the most important thing.

So what does Linux do? Does it zero-fill, or behave sensibly and pre-allocate quickly?

Let’s look at hte kernel:

Inside the fallocate implementation (fs/open.c):

if (inode->i_op->fallocate)
ret = inode->i_op->fallocate(inode, mode, offset, len);
else
ret = -EOPNOTSUPP;

and for ext4:
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
*/
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EOPNOTSUPP;

XFS has always done fast pre-allocate, so it’s not a problem there and the only other filesystems to currently support fallocate are btrfs and ocfs2 – which we don’t even have to start worrying too much about yet :)

But this is just kernel behaviour – i *think* libc ends up wrapping it
with a ENOTSUPP from kernel being “let me zero-fill” (which might be
useful to check). Anybody want to check libc for me?

This was all on slightly post 2.6.30-rc3 (from git: 8c9ed899b44c19e81859fbb0e9d659fe2f8630fc)

c++ stl bitset only useful for known-at-compile-time number of bits

Found in the libstdc++ docs:

Extremely weird solutions. If you have access to the compiler and linker at runtime, you can do something insane, like figuring out just how many bits you need, then writing a temporary source code file. That file contains an instantiation of bitset for the required number of bits, inside some wrapper functions with unchanging signatures. Have your program then call the compiler on that file using Position Independent Code, then open the newly-created object file and load those wrapper functions. You’ll have an instantiation of bitset for the exact N that you need at the time. Don’t forget to delete the temporary files. (Yes, this can be, and has been, done.)

Oh yeah – feel the love.

Brought to you by the stl-is-often-worse-for-you-than-meth dept.

Drizzle low-hanging-fruit

We have an ongoing Drizzle milestone called low-hanging-fruit. The idea is that when there’s something that  could be done, but we don’t quite have the time to do it immediately, we’ll add a low-hanging-fruit blueprint so that people looking to get a start on the codebase and contributing code to Drizzle have a place to go to find things to do.

Some of my personal favourites are:

Also relatively low hanging fruit can be writing some plugins. Some simple plugin types include:

  • Authentication
    Got somewhere that you could authenticate against for connecting to a DB? Write a plugin for it! Current auth plugins are auth_http and auth_pam.

    • Perhaps you want to authenticate against a central DB? checking in memcached first?
    • Perhaps a htaccess style method
  • Functions
    Apply some function to a column. These are pretty simple to write (see md5, compress examples). Perhaps interfaces to encryption/decryption? a hashing function?

    • ROT13
    • 3DES
    • AES
      Bonus points if you get any of these to use the T2000 crypto accellerator stuff
    • ID3 tag decoding
    • file type detection (well.. BLOB)

So there’s a fair bit you can do to get started. Best of all, you can chat with the Drizzle developers next week at the MySQL Conference and Expo and Drizzle Developer Day.

Improving the Storage Engine “API”

I increasingly enclose the API part of “Storage Engine API” in quotes as it does score a rather large number on the API Design Rusty levels (Coined by Rusty Russell). I give it a 15 (out of 18. lower is better) in this case “The obvious use is wrong”.

The ideas is that your handler gets called to write a row (the amazingly named handler::write_row()). It’s passed a buffer which is the row to be stored. An engine that uses the MySQL row format (lets say, ARCHIVE) will simply pack the row and write it out.

Unless there is a TIMESTAMP field with auto set on insert. Up until now (and still now in MySQL) the engine had to check if this was the case and make sure the timestamp field was updated.

To remove this particular bonghit is actually a really small patch, which Jay recently got merged:

~drizzle-developers/drizzle/development : revision 873.1.16

Hopefully somebody does this soon for MySQL as well.

Fun with the 387

Filed  GCC bug 39228:

#include <stdio.h>
#include <math.h>
int main()
{
        double a= 10.0;
        double b= 1e+308;
        printf("%d %d %dn", isinf(a*b), __builtin_isinf(a*b), __isinf(a*b));
        return 0;
}

mtaylor@drizzle-dev:~$ gcc -o test test.c
mtaylor@drizzle-dev:~$ ./test
0 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c -std=c99
mtaylor@drizzle-dev:~$ ./test
1 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c   -mfpmath=sse -march=pentium4
mtaylor@drizzle-dev:~$ ./test
1 1 1
mtaylor@drizzle-dev:~$ g++ -o test test.c
mtaylor@drizzle-dev:~$ ./test
1 0 1

Originally I found the simple isinf() case to be different on x86 than x86-64, ppc32 and sparc (32 and 64).

After more research, I found that x86-64 uses the sse instructions to do it (and using sse is the only way for __builtin_isinf() to produce correct results). For the g++ built version, it calls __isinf() instead of inlining (and as can be seen, the __isinf() version is always correct).

Specifically, it’s because the optimised 387 code is doing the math in double extended precision inside the FPU. 10.0*1e308 fits in 80bits but not in 64bit. Any code that forces it to be stored and loaded gets the correct result too. e.g.

mtaylor@drizzle-dev:~$ cat test-simple.c

#include <stdio.h>
#include <math.h>
int main()
{
        double a= 10.0;
        double b= 1e+308;
    volatile    double c= a*b;
        printf("%dn", isinf(c));
        return 0;
}

mtaylor@drizzle-dev:~$ gcc -o test-simple test-simple.c
mtaylor@drizzle-dev:~$ ./test-simple
1

With this code you can easily see the load and store:

 8048407:       dc 0d 18 85 04 08       fmull  0x8048518 804840d:       dd 5d f0                fstpl  -0x10(%ebp)
 8048410:       dd 45 f0                fldl   -0x10(%ebp)
 8048413:       d9 e5                   fxam

While if you remove volatile, the load and store doesn’t happen (at least on -O3, on -O0 it hasn’t been optimised away):

 8048407:       dc 0d 18 85 04 08       fmull  0x8048518
 804840d:       c7 44 24 04 10 85 04    movl   $0x8048510,0x4(%esp)
 8048414:       08
 8048415:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 804841c:       d9 e5                   fxam

This is also a regression from 4.2.4 as it just calls isinf() and doesn’t expand the 387 code inline. My guess is the 387 optimisation was added in 4.3.

Recommended fix: store and load in the 387 version so to operate on same precision as elsewhere.

Now I just have to make a patch I like that makes Drizzle behave because of this (showed up as a failure in the SQL func_math test) and then submit to MySQL as well… as this may happen there if “correctly” built.

libmallocfail

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)