In a previous post, I covered porting MySQL 5.6 to POWER and subsequently, some new record performance numbers with MySQL 5.6.17 on POWER8.
Well, those following at home will be aware that not only is the next sentence sponsored by IBM Legal, but that MySQL 5.7 alleviates a bunch of the mutex contention that we saw with MySQL 5.6. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.
In looking at MySQL performance on POWER, it’s inevitable that I should look at MySQL 5.7 and what’s coming up in the next stable release of MySQL.
Surprisingly, a bunch of the core code in InnoDB and MySQL dealing with mutexes has changed in MySQL 5.7 when compared to MySQL 5.6. Enough that I actually had to post a few bug reports about the changes that apply to any CPU architecture:
- Bug 72805: mutex_delay() creating excess memory traffic, GCC mem barrier needed
- This is now more generic mutex code, so it’s even more important to get it right. There’s a bunch of tricks that have been learned in other places (e.g. Linux kernel) in getting these things right. We need to get them right in MySQL too.
- One of these tricks is in ensuring that the compiler doesn’t compile down spinloops to nothing.
- Bug 72806: mutex_delay() missing x86 pause instruction optimization
- This is actually a regression over 5.6.
- On x86, there is an instruction (PAUSE) that tells the CPU that you’re in a spin loop and that it should yield resources in the CPU core to other threads (or thread, as HT CPUs only have 2 threads per core).
- We have a different way of doing things on POWER, and I’ve got a patch for that too.
- What’s interesting is reading the Intel CPU manual about the PAUSE instruction and how even if you went and benchmarked it, it depends on the CPU on if this is a NO-OP or not.
- I suspect that with this bug fixed, performance on Hyper Threaded Intel systems will improve.
- Bug 72807: Set thread priority in my_pthread_fastmutex_lock
- This is the POWER equivalent of the x86 PAUSE instruction.
- I’ve found this patch to have a quite decent positive impact on sysbench point select performance.
There were also the bugs I mentioned in my MySQL 5.6 on POWER blog post. Notably, I had to port Yasufumi’s memory barrier patch from 5.6 to 5.7. My port is incomplete (I can still crash mysqld without too much trying) but I’ve deemed it currently “good enough for benchmarking” and it’s attached to bug 47213 (I hope to spend some time fixing it up soon too). I don’t think I’m missing anything that’s going to have a major performance impact – so while not suitable for production use, it’s good enough to poke some benchmarks at.
So… I’m close to the point where I’ll share my patch for MySQL 5.7, but I’m really wanting to solve the last couple of issues before doing so. The majority of patches are attached to bug reports and get 99% of the way.
Amazingly enough, MySQL 5.7 works fairly well on POWER “out of the box”, and with sysbench point selects, I could quite easily get 320kQPS on a 24 core POWER8 with SMT8 mode without changing a single line of code or doing anything special. This alone is an impressive result when compared to the previous record on both POWER and other CPU architectures with MySQL 5.6 that had been optimized for POWER (while out-of-the-box MySQL 5.7 has not)
For my benchmarks, I’m doing the same procedure, workload and basic my.cnf settings that Dimitri has used and written about, so I won’t repeat that here.With my preliminary patch for MySQL 5.7.4-m14 to have it work well on POWER, on the same system I was using for my MySQL 5.6 benchmarks, I could easily match and indeed exceed the previous published maximum sysbench point select results (I got ~630kQPS). Consider this number a bit preliminary as my patch isn’t completely solid, but it does mean that we’re in the right ballpark for MySQL 5.7 performance, which is great news!So, you might just say “Mission Accomplished” and be done with it. Well… there was one issue:Â with the maximum numbers I was getting there was still 30-40% idle CPU on the POWER 8 machine.Now… you could just use that idle 30-40% of total CPU to do other things (solving Sudoku in SQL for example) but that’s no fun.