For such a long worked on feature, with such potential – I find the resistence to publishing a source tree curious (my comments on the topic have been moderated away but others have asked too). I could go and grep through the commits list searching for things (hint: look for mysql-6.0-perf), and then start to re-construct a tree; but I have more important things to do (yes, Brian, like FRM patches :)
Instead of re-inventing the wheel in Drizzle for a performance schema like interface, it’d be great to go with existing work. Evaluating the code as it’s coming along is important.
I also have concerns about the code itself:
- Mutex instrumentation:
- how expensive is this in the common case of not instrumenting.
- Is this yet-another wrapper around pthread_mutex_t?
- Could this be done in another, more generic way?
- How can engine devs use this? Do you have to completely be integrated with the MySQL server way of doing things (and give up being able to be a sep piece of software) if you’re going to use this. If there’s a null header, what license is it under?
- Can we use some of the code in (for example) the ndbd and then pass back mutex data from remote systems to the SQL server in a usable mannar? If we do this via NDB$INFO, could this show up in the performance schema?
- Is the code clean or littered with ugly #ifdef?
- Could this be done without having a special mutex type?
- memory instrumentation
- is it there at all?
- how are they doing it?
- MEM_ROOT based? what about new/delete malloc/free and various buffers and how this all ties to session etc?
- In drizzle I’ve been seriously looking at talloc to help with instrumentation of memory usage.
- IO instrumentation
- mmap?
- how are the performance schema tables being generated (constructing table shares, running CREATE TABLE manually by the user, CREATE TABLE in a helper thread, similar to I_S or some black magic?)
- for NDB$INFO, we generate SQL CREATE TABLE statements and run them to generate a FRM file (I architected and wrote the base kernel code, Martin wrote the MySQL code) as other approaches were considered too hairy and likely to produce bugs.
- For Drizzle, I’m completely removing the FRM files in favour of a discovery based interface that’ll let the engines be in charge of metadata. It’s all in protobufs, so a standard and easy to read data format. So I’m one of the few people on the planet that know about the related data structures. I would like our proto based code to work for performance schema as well.
- Is the instrumentation always-in, or using d-trace style no-op funkyness?
- Is it better to hook into d-trace (or similar) on platforms that have it instead of providing custom code?
- Could performance be gained by using LD_PRELOAD or similar linker foo to only enable some instrumentation but allow others to be selectable on server startup?
So stuck waiting for some code to look at to answer the questions (random commits on the commits list doesn’t really do it like the finished product – i really hate commit lists).
So I can’t currently comment on the performance schema work much at all, nor if it’s useful to Drizzle. Hopefully will soon.