This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).
Sensors over IPMI
One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.
stewart@blackbird9$ sudo ipmitool sensor |head occ | na | discrete | na | na | na | na | na | na | na occ0 | 0x0 | discrete | 0x0200| na | na | na | na | na | na occ1 | 0x0 | discrete | 0x0100| na | na | na | na | na | na p0_core0_temp | na | | na | na | na | na | na | na | na p0_core1_temp | na | | na | na | na | na | na | na | na p0_core2_temp | na | | na | na | na | na | na | na | na p0_core3_temp | 38.000 | degrees C | ok | na | -40.000 | na | 78.000 | 90.000 | na p0_core4_temp | na | | na | na | na | na | na | na | na p0_core5_temp | 38.000 | degrees C | ok | na | -40.000 | na | 78.000 | 90.000 | na p0_core6_temp | na | | na | na | na | na | na | na | na
It’s kind of annoying to read there, so standard unix tools to the rescue!
stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2 occ | na occ0 | 0x0 occ1 | 0x0 p0_core0_temp | na p0_core1_temp | na p0_core2_temp | na p0_core3_temp | 38.000 p0_core4_temp | na p0_core5_temp | 38.000 p0_core6_temp | na p0_core7_temp | 38.000 p0_core8_temp | na p0_core9_temp | na p0_core10_temp | na p0_core11_temp | 37.000 p0_core12_temp | na p0_core13_temp | na p0_core14_temp | na p0_core15_temp | 37.000 p0_core16_temp | na p0_core17_temp | 37.000 p0_core18_temp | na p0_core19_temp | 39.000 p0_core20_temp | na p0_core21_temp | 39.000 p0_core22_temp | na p0_core23_temp | na p0_vdd_temp | 40.000 dimm0_temp | 35.000 dimm1_temp | na dimm2_temp | na dimm3_temp | na dimm4_temp | 38.000 dimm5_temp | na dimm6_temp | na dimm7_temp | na dimm8_temp | na dimm9_temp | na dimm10_temp | na dimm11_temp | na dimm12_temp | na dimm13_temp | na dimm14_temp | na dimm15_temp | na fan0 | 1200.000 fan1 | 1100.000 fan2 | 1000.000 p0_power | 33.000 p0_vdd_power | 5.000 p0_vdn_power | 9.000 cpu_1_ambient | 30.600 pcie | 27.000 ambient | 26.000
You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.
The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:
$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
What are the 0x300xxxxx
addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:
[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map [sudo] password for stewart: 0000000000000000 R __builtin_kernel_end 0000000000000000 R __builtin_kernel_start 0000000000000000 T __head 0000000000000000 T _start 0000000000000010 T fdt_entry 00000000000000f0 t boot_sem 00000000000000f4 t boot_flag 00000000000000f8 T attn_trigger 00000000000000fc T hir_trigger 0000000000000100 t sreset_vector
So we can easily look up exactly where this is:
[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 0000000000018e20 t .__try_lock.isra.0 0000000000018e68 t .add_lock_request
So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg
call there which is probably how everything starts:
[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map 0000000000051614 t .bt_add_ipmi_msg_head 0000000000051688 t .bt_add_ipmi_msg 00000000000516fc t .bt_poll
OCCTOOL
This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9
) can be used for a variety of things, one of which is getting sensor data out of the OCC.
$ sudo ./occtoolp9 -SL Sensor Type: 0xFFFF Sensor Location: 0xFFFF (only displaying non-zero sensors) Sending 0x53 command to OCC0 (via opal-prd)… MFG Sub Cmd: 0x05 (List Sensors) Num Sensors: 50 [ 1] GUID: 0x0000 / AMEintdur……. Sample: 20 (0x0014) [ 2] GUID: 0x0001 / AMESSdur0……. Sample: 7 (0x0007) [ 3] GUID: 0x0002 / AMESSdur1……. Sample: 3 (0x0003) [ 4] GUID: 0x0003 / AMESSdur2……. Sample: 23 (0x0017)
The odd thing you’ll see is “via opal-prd
” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru
“. Yeah, this isn’t a in-production thing :)
Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.
But there are some interesting sensors at the end of the list:
Sensor Details: (found 86 sensors, details only for Status of 0x00) GUID Name Sample Min Max U Stat Accum UpdFreq ScaleFactr Loc Type .... 0x014A MRDM0……….. 688 3 15015 GBs 0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200 0x014E MRDM4……….. 480 3 14739 GBs 0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200 0x0156 MWRM0……….. 560 4 16605 GBs 0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200 0x015A MWRM4……….. 360 4 16597 GBs 0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200
is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:
0x014A MRDM0……….. 15165 3 17994 GBs 0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200 0x014E MRDM4……….. 17145 3 18016 GBs 0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200 0x0156 MWRM0……….. 8063 4 24280 GBs 0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200 0x015A MWRM4……….. 1138 4 24215 GBs 0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200
It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.
lm-sensors
$ rpm -qf /usr/bin/sensors lm_sensors-3.5.0-6.fc31.ppc64le
Ahhh, old faithful lm-sensors
! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.
[stewart@blackbird9 ~]$ sensors ibmpowernv-isa-0000 Adapter: ISA adapter Chip 0 Vdd Remote Sense: +1.02 V (lowest = +0.72 V, highest = +1.02 V) Chip 0 Vdn Remote Sense: +0.67 V (lowest = +0.67 V, highest = +0.67 V) Chip 0 Vdd: +1.02 V (lowest = +0.73 V, highest = +1.02 V) Chip 0 Vdn: +0.68 V (lowest = +0.68 V, highest = +0.68 V) Chip 0 Core 0: +47.0°C (lowest = +25.0°C, highest = +71.0°C) Chip 0 Core 4: +47.0°C (lowest = +26.0°C, highest = +66.0°C) Chip 0 Core 8: +48.0°C (lowest = +27.0°C, highest = +67.0°C) Chip 0 Core 12: +48.0°C (lowest = +26.0°C, highest = +67.0°C) Chip 0 Core 16: +47.0°C (lowest = +25.0°C, highest = +67.0°C) Chip 0 Core 20: +47.0°C (lowest = +26.0°C, highest = +69.0°C) Chip 0 Core 24: +48.0°C (lowest = +27.0°C, highest = +67.0°C) Chip 0 Core 28: +51.0°C (lowest = +27.0°C, highest = +64.0°C) Chip 0 DIMM 0 : +40.0°C (lowest = +34.0°C, highest = +44.0°C) Chip 0 DIMM 1 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 2 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 3 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 4 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 5 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 6 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 7 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 8 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 9 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 10 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 11 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 12 : +43.0°C (lowest = +36.0°C, highest = +47.0°C) Chip 0 DIMM 13 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 14 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 DIMM 15 : +0.0°C (lowest = +0.0°C, highest = +0.0°C) Chip 0 Nest: +48.0°C (lowest = +27.0°C, highest = +64.0°C) Chip 0 VRM VDD: +47.0°C (lowest = +39.0°C, highest = +66.0°C) Chip 0 : 44.00 W (lowest = 31.00 W, highest = 132.00 W) Chip 0 Vdd: 15.00 W (lowest = 4.00 W, highest = 104.00 W) Chip 0 Vdn: 10.00 W (lowest = 8.00 W, highest = 12.00 W) Chip 0 : 227.11 kJ Chip 0 Vdd: 44.80 kJ Chip 0 Vdn: 58.80 kJ Chip 0 Vdd: +21.50 A (lowest = +6.50 A, highest = +104.75 A) Chip 0 Vdn: +14.88 A (lowest = +12.63 A, highest = +18.88 A)
The best thing? It’s really quick! The hwmon interface is fast and efficient.