Sometimes you don’t want to do anything. This is understandably human, and probably a sign you should either relax or get up and do something.
For processors, you sometimes do actually want to do absolutely nothing. Often this will be while waiting for a lock. You want to do nothing until the lock is free, but you want to be quick about it, you want to start work once that lock is free as soon as possible.
On CPU cores with more than one thread (e.g. hyperthreading on Intel, SMT on POWER) you likely want to let the other threads have all of the resources of the core if you’re sitting there waiting for something.
So, what do you do? On x86 there’s been the PAUSE instruction for a while and on POWER there’s been the SMT priority instructions.
The x86 PAUSE instruction delays execution of the next instruction for some amount of time while on POWER each executing thread in a core has a priority and this is how chip resources are handed out (you can set different priorities using special no-op instructions as well as setting the Relative Priority Register to map how these coarse grained priorities are interpreted by the chip).
So, when you’re writing spinlock code (or similar, such as the implementation of mutexes in InnoDB) you want to check if the lock is free, and if not, spin for a bit, but at a lower priority than the code running in the other thread that’s doing actual work. The idea being that when you do finally acquire the lock, you bump your priority back up and go do actual work.
Usually, you don’t continually check the lock, you do a bit of nothing in between checking. This is so that when the lock is contended, you don’t just jam every thread in the system up with trying to read a single bit of memory.
So you need a trick to do nothing that the complier isn’t going to optimize away.
Current (well, MySQL 5.7.5, but it’s current in MariaDB 10.0.17+ too, and other MySQL versions) code in InnoDB to “do nothing” looks something like this:
ulint ut_delay(ulint delay)
{
ulint i, j;
UT_LOW_PRIORITY_CPU();
j = 0;
for (i = 0; i < delay * 50; i++) {
j += i;
UT_RELAX_CPU();
}
if (ut_always_false) {
ut_always_false = (ibool) j;
}
UT_RESUME_PRIORITY_CPU();
return(j);
}
On x86, UT_RELAX_CPU() ends up being the PAUSE instruction.
On POWER, the UT_LOW_PRIORITY_CPU() and UT_RESUME_PRIORITY_CPU() tunes the SMT thread priority (and on x86 they’re defined as nothing).
If you want an idea of when this was all written, this comment may be a hint:
/*!< in: delay in microseconds on 100 MHz Pentium */
But, if you’re not on x86 you don’t have the PAUSE instruction, instead, you end up getting this code:
# elif defined(HAVE_ATOMIC_BUILTINS)
# define UT_RELAX_CPU() do { \
volatile lint volatile_var; \
os_compare_and_swap_lint(&volatile_var, 0, 1); \
} while (0)
Which you may think “yep, that does nothing and is not optimized away by the compiler”. Except you’d be wrong! What it actually does is generates a lot of memory traffic. You’re now sitting in a tight loop doing atomic operations, which have to be synchronized between cores (and sockets) since there’s no real way that the hardware is going to be able to work out that this is only a local variable that is never accessed from anywhere.
Additionally, the ut_always_false and j variable there is also attempts to trick the complier into not optimizing the loop away, and since ut_always_false is a global, you’re generating traffic to a single global variable too.
Instead, what’s needed is a compiler barrier. This simple bit of nothing tells the compiler “pretend memory has changed, so you can’t optimize around this point”.
__asm__ __volatile__ ("":::"memory")
So we can eliminate all sorts of useless non-work and instead do what we want: do nothing (a for loop for X iterations that isn’t optimized away by the compiler) and don’t have side effects.
In MySQL bug 74832 I detailed this with the appropriately produced POWER assembler. Unfortunately, this patch (submitted under the OCA) has sat since November 2014 (so, over 9 months) with no action. I’m a bit disappointed by that to be honest.
Anyway, the real moral of this story is: don’t implement your own locking primitives. You’re either going to get it wrong or you’ll be wrong in a few years when everything changes under you.
See also:
Like this:
Like Loading...