This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).
Sensors over IPMI
One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.
stewart@blackbird9$ sudo ipmitool sensor |head
occ | na | discrete | na | na | na | na | na | na | na
occ0 | 0x0 | discrete | 0x0200| na | na | na | na | na | na
occ1 | 0x0 | discrete | 0x0100| na | na | na | na | na | na
p0_core0_temp | na | | na | na | na | na | na | na | na
p0_core1_temp | na | | na | na | na | na | na | na | na
p0_core2_temp | na | | na | na | na | na | na | na | na
p0_core3_temp | 38.000 | degrees C | ok | na | -40.000 | na | 78.000 | 90.000 | na
p0_core4_temp | na | | na | na | na | na | na | na | na
p0_core5_temp | 38.000 | degrees C | ok | na | -40.000 | na | 78.000 | 90.000 | na
p0_core6_temp | na | | na | na | na | na | na | na | na
It’s kind of annoying to read there, so standard unix tools to the rescue!
stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
occ | na
occ0 | 0x0
occ1 | 0x0
p0_core0_temp | na
p0_core1_temp | na
p0_core2_temp | na
p0_core3_temp | 38.000
p0_core4_temp | na
p0_core5_temp | 38.000
p0_core6_temp | na
p0_core7_temp | 38.000
p0_core8_temp | na
p0_core9_temp | na
p0_core10_temp | na
p0_core11_temp | 37.000
p0_core12_temp | na
p0_core13_temp | na
p0_core14_temp | na
p0_core15_temp | 37.000
p0_core16_temp | na
p0_core17_temp | 37.000
p0_core18_temp | na
p0_core19_temp | 39.000
p0_core20_temp | na
p0_core21_temp | 39.000
p0_core22_temp | na
p0_core23_temp | na
p0_vdd_temp | 40.000
dimm0_temp | 35.000
dimm1_temp | na
dimm2_temp | na
dimm3_temp | na
dimm4_temp | 38.000
dimm5_temp | na
dimm6_temp | na
dimm7_temp | na
dimm8_temp | na
dimm9_temp | na
dimm10_temp | na
dimm11_temp | na
dimm12_temp | na
dimm13_temp | na
dimm14_temp | na
dimm15_temp | na
fan0 | 1200.000
fan1 | 1100.000
fan2 | 1000.000
p0_power | 33.000
p0_vdd_power | 5.000
p0_vdn_power | 9.000
cpu_1_ambient | 30.600
pcie | 27.000
ambient | 26.000
You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.
The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:
What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:
[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map
[sudo] password for stewart:
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector
So we can easily look up exactly where this is:
[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map
0000000000018e20 t .__try_lock.isra.0
0000000000018e68 t .add_lock_request
So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:
[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map 0000000000051614 t .bt_add_ipmi_msg_head 0000000000051688 t .bt_add_ipmi_msg 00000000000516fc t .bt_poll
OCCTOOL
This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.
The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)
Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.
But there are some interesting sensors at the end of the list:
Sensor Details: (found 86 sensors, details only for Status of 0x00)
GUID Name Sample Min Max U Stat Accum UpdFreq ScaleFactr Loc Type
....
0x014A MRDM0……….. 688 3 15015 GBs 0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
0x014E MRDM4……….. 480 3 14739 GBs 0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
0x0156 MWRM0……….. 560 4 16605 GBs 0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
0x015A MWRM4……….. 360 4 16597 GBs 0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200
is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again: