Skip to content

fix(collector): avoid double-counting shelf power#4116

Merged
cgrinds merged 3 commits intoNetApp:mainfrom
rmilkowski:fix-shelf-power-double-counting-17f7478
Feb 6, 2026
Merged

fix(collector): avoid double-counting shelf power#4116
cgrinds merged 3 commits intoNetApp:mainfrom
rmilkowski:fix-shelf-power-double-counting-17f7478

Conversation

@rmilkowski
Copy link
Contributor

@rmilkowski rmilkowski commented Feb 5, 2026

Harvest is reporting 699W fot a disk shelf which seems to be much more than what it actually is.

Here are the exact ONTAP CLI commands and the evidence (from your output) showing how Harvest is double‑counting shelf power:

Commands to run

1) Authoritative shelf power draw

storage shelf show -fields shelf,psu-power-drawn,psu-power-rating

› xxx1::*> storage shelf show -fields shelf,psu-power-drawn,psu-power-rating
  shelf psu-power-rating psu-power-drawn
  ----- ---------------- ---------------
  1.1   1600,1600        171,189
  2.0   -                -
  2 entries were displayed.

2) Per‑shelf voltage/current rails (what Harvest is effectively summing)

storage shelf show -instance

  xxx1::*> storage shelf show -instance

             Shelf Name: 1.1
               Stack ID: 1
               Shelf ID: 1
              Shelf UID: 00:a0:98:00:00:57:6a:a8
          Serial Number: xxx
            Module Type: NSM100
                  Model: NS224NSM100
           Shelf Vendor: NETAPP
             Disk Count: 4
        Connection Type: NVMe
            Shelf State: Online
                 Status: Normal


  Modules:                                          Module is
                                   Monitor   Is     Reporting FW Update             Latest   Swap Operational Module
   ID Part No.     ES Serial No.   is Active Master Element   Progress      FW Rev. FW Rev. Count Status      Location
  --- ------------ --------------- --------- ------ --------- ------------- ------- ------- ----- ----------- --------------
    A 111-04256+D1 xxxxxxxxxxxx    true      true   true      not-available 0403    -           0 normal      rear of the shelf at the top, on module A
    B 111-04256+D1 xxxxxxxxxxxx    true      false  false     not-available 0403    -           0 normal      rear of the shelf at the bottom, on module B

  Paths:
                                                                                                               Speed
  Controller         Initiator Initiator Side Switch Port Target Side Switch Port    Target Port        TPGN    Gb/s I/O KB/s    IOPS
  ------------------ --------- -------------------------- -------------------------- ------------------ ------ ----- -------- -------
  xxx1-01     0x        -                          0x                         01000004abff0000   80         0        0       0
  xxx1-02     0x        -                          0x                         01000004abff0000   80         0        0       0

  Power Supply Units:
                                                       Crest  Power   Reset   PSU              Operational
   ID Type Part#        Serial#         Power Rating   Factor Drawn   Capable Enabled Firmware Status           PSU Location
  --- ---- ------------ --------------- -------------- ------ ------- ------- ------- -------- ---------------- ------------------------
    1 00   TDPS-1600GB A PST0D9230607417 1600          -      171     false   true    03.00.00 normal           rear of the shelf at the top left
    2 00   TDPS-1600GB A PST0D9230607420 1600          -      189     false   true    03.00.00 normal           rear of the shelf at the bottom left

  Voltage Sensors:
      Voltage Operational
   ID     (V) Status                 Sensor Location
  --- ------- ---------------------- ------------------------
    1   12.22 normal                 rear of the shelf on the top left power supply
    2  243.24 normal                 rear of the shelf on the top left power supply
    3   12.22 normal                 rear of the shelf on the lower left power supply
    4  243.24 normal                 rear of the shelf on the lower left power supply

  Current Sensors:
      Current Operational
   ID    (mA) Status                 Sensor Location
  --- ------- ---------------------- ------------------------
    1   12650 normal                 rear of the shelf on the top left power supply
    2     700 normal                 rear of the shelf on the top left power supply
    3   14020 normal                 rear of the shelf on the lower left power supply
    4     820 normal                 rear of the shelf on the lower left power supply

  Fans:
       Speed Operational
   ID  (RPM) Status                  Fan Location
  --- ------ ----------------------- ----------------------
    1   7480 normal                  inlet FAN of fan module 1 in the top shelf module (A)
    2   7050 normal                  inlet FAN of fan module 2 in the top shelf module (A)
    3   7160 normal                  inlet FAN of fan module 3 in the top shelf module (A)
    4   7240 normal                  inlet FAN of fan module 4 in the top shelf module (A)
    5   6930 normal                  inlet FAN of fan module 5 in the top shelf module (A)
    6   7090 normal                  outlet FAN of fan module 1 in the top shelf module (A)
    7   6710 normal                  outlet FAN of fan module 2 in the top shelf module (A)
    8   6940 normal                  outlet FAN of fan module 3 in the top shelf module (A)
    9   7070 normal                  outlet FAN of fan module 4 in the top shelf module (A)
   10   6910 normal                  outlet FAN of fan module 5 in the top shelf module (A)
   11   7040 normal                  inlet FAN of fan module 1 in the bottom shelf module (B)
   12   6730 normal                  inlet FAN of fan module 2 in the bottom shelf module (B)
   13   6680 normal                  inlet FAN of fan module 3 in the bottom shelf module (B)
   14   7000 normal                  inlet FAN of fan module 4 in the bottom shelf module (B)
   15   7070 normal                  inlet FAN of fan module 5 in the bottom shelf module (B)
   16   7340 normal                  outlet FAN of fan module 1 in the bottom shelf module (B)
   17   6840 normal                  outlet FAN of fan module 2 in the bottom shelf module (B)
   18   7290 normal                  outlet FAN of fan module 3 in the bottom shelf module (B)
   19   6840 normal                  outlet FAN of fan module 4 in the bottom shelf module (B)
   20   6550 normal                  outlet FAN of fan module 5 in the bottom shelf module (B)

  Temperature:
                   -- Thresholds °C --
      Temp Is       Low  Low High High Operational
   ID   °C Ambient Crit Warn Crit Warn Status             Sensor Location
  --- ---- ------- ---- ---- ---- ---- ------------------ --------------------------------
    1   23 true       0    5   47   42 normal             ambient temp sensor on ODP board
    2   30 false      0    5   60   55 normal             temp sensor on midplane left
    3   24 false      0    5   60   55 normal             diode pair on midplane left
    4   23 false      0    5   60   55 normal             temp sensor on midplane right
    5   29 false      0    5   60   55 normal             diode pair on midplane right
    6   25 false      0    5   65   60 normal             top module temp sensor near midplane
    7   53 false      0    5   94   89 normal             CPU package on top module
    8   54 false      0    5   85   80 normal             Ethernet port 1 on top module
    9   59 false      0    5   87   82 normal             Ethernet port 2 on top module
   10   44 false      0    5   85   80 normal             PCIe switch 1 on top module
   11   50 false      0    5   84   79 normal             PCIe switch 2 on top module
   12   25 false      0    5   65   60 normal             bottom module temp sensor near midplane
   13   55 false      0    5   94   89 normal             CPU package on bottom module
   14   54 false      0    5   85   80 normal             Ethernet port 1 on bottom module
   15   59 false      0    5   87   82 normal             Ethernet port 2 on bottom module
   16   48 false      0    5   85   80 normal             PCIe switch 1 on bottom module
   17   53 false      0    5   84   79 normal             PCIe switch 2 on bottom module

  DIMM:
   ID Mod Type Size Speed   Status       Location
  --- --- ---- ---- ------- ------------ -------------------------------
    1  A  DIMM  8GB 2933Mhz normal       DIMM slot 1 in the top shelf module (A)
    2  A  DIMM  8GB 2933Mhz normal       DIMM slot 2 in the top shelf module (A)
    3  A  DIMM  8GB 2933Mhz normal       DIMM slot 3 in the top shelf module (A)
    4  A  DIMM  8GB 2933Mhz normal       DIMM slot 4 in the top shelf module (A)
    5  B  DIMM  8GB 2933Mhz normal       DIMM slot 1 in the bottom shelf module (B)
    6  B  DIMM  8GB 2933Mhz normal       DIMM slot 2 in the bottom shelf module (B)
    7  B  DIMM  8GB 2933Mhz normal       DIMM slot 3 in the bottom shelf module (B)
    8  B  DIMM  8GB 2933Mhz normal       DIMM slot 4 in the bottom shelf module (B)

  Boot Devices:
   ID Mod Type         Size Status
  --- --- ---------- ------ ------------
    1  A  SATA SSD    111GB normal
    2  B  SATA SSD    111GB normal

  Coin Battery:
          Voltage
   ID Mod    (mV) Status
  --- --- ------- ------------
    1  A     2991 normal
    2  B     2912 normal

  SAS Ports:
                                        -- Port Speeds Gb/s -- Power  Port
  Phy # IOM Port Type WWPN              Operational Negotiated Status Status
  ----- --- --------- ----------------- ----------- ---------- ------ -----------
      -  -  -         -                           -          - -      -

  FC Ports:
                Port
   ID Port Type Status
  --- --------- -----------
    - -         -

  PCIe Ports:
                       -- Speed (Gb/s) --
  Mod  ID Type     Bay Negotiated Maximum Status
  --- --- -------- --- ---------- ------- ------------------
   A    0 ethernet   -        8.0     8.0 ok
   A    1 ethernet   -        8.0     8.0 ok
   A    2 roce       -        8.0     8.0 ok
   A    3 roce       -        8.0     8.0 ok
   A    4 cpu        -        8.0     8.0 ok
   A    5 cpu        -        8.0     8.0 ok
   A    6 disk       0        8.0     8.0 ok
   A    7 disk       1        8.0     8.0 ok
   A    8 disk       2          -     8.0 no-drive
   A    9 disk       3          -     8.0 no-drive
   A   10 disk       4          -     8.0 no-drive
   A   11 disk       5          -     8.0 no-drive
   A   12 disk       6          -     8.0 no-drive
   A   13 disk       7          -     8.0 no-drive
   A   14 disk       8          -     8.0 no-drive
   A   15 disk       9          -     8.0 no-drive
   A   16 disk      10          -     8.0 no-drive
   A   17 disk      11          -     8.0 no-drive
   A   18 disk      12          -     8.0 no-drive
   A   19 disk      13          -     8.0 no-drive
   A   20 disk      14          -     8.0 no-drive
   A   21 disk      15          -     8.0 no-drive
   A   22 disk      16          -     8.0 no-drive
   A   23 disk      17          -     8.0 no-drive
   A   24 disk      18          -     8.0 no-drive
   A   25 disk      19          -     8.0 no-drive
   A   26 disk      20          -     8.0 no-drive
   A   27 disk      21          -     8.0 no-drive
   A   28 disk      22        8.0     8.0 ok
   A   29 disk      23        8.0     8.0 ok
   B    0 ethernet   -        8.0     8.0 ok
   B    1 ethernet   -        8.0     8.0 ok
   B    2 roce       -        8.0     8.0 ok
   B    3 roce       -        8.0     8.0 ok
   B    4 cpu        -        8.0     8.0 ok
   B    5 cpu        -        8.0     8.0 ok
   B    6 disk       0        8.0     8.0 ok
   B    7 disk       1        8.0     8.0 ok
   B    8 disk       2          -     8.0 no-drive
   B    9 disk       3          -     8.0 no-drive
   B   10 disk       4          -     8.0 no-drive
   B   11 disk       5          -     8.0 no-drive
   B   12 disk       6          -     8.0 no-drive
   B   13 disk       7          -     8.0 no-drive
   B   14 disk       8          -     8.0 no-drive
   B   15 disk       9          -     8.0 no-drive
   B   16 disk      10          -     8.0 no-drive
   B   17 disk      11          -     8.0 no-drive
   B   18 disk      12          -     8.0 no-drive
   B   19 disk      13          -     8.0 no-drive
   B   20 disk      14          -     8.0 no-drive
   B   21 disk      15          -     8.0 no-drive
   B   22 disk      16          -     8.0 no-drive
   B   23 disk      17          -     8.0 no-drive
   B   24 disk      18          -     8.0 no-drive
   B   25 disk      19          -     8.0 no-drive
   B   26 disk      20          -     8.0 no-drive
   B   27 disk      21          -     8.0 no-drive
   B   28 disk      22        8.0     8.0 ok
   B   29 disk      23        8.0     8.0 ok

  Bays:

      Has               Operational
   ID Disk  Bay Type    Status
  --- ----- ----------- -----------
    0 true  single-disk normal
    1 true  single-disk normal
    2 false single-disk unknown
    3 false single-disk unknown
    4 false single-disk unknown
    5 false single-disk unknown
    6 false single-disk unknown
    7 false single-disk unknown
    8 false single-disk unknown
    9 false single-disk unknown
   10 false single-disk unknown
   11 false single-disk unknown
   12 false single-disk unknown
   13 false single-disk unknown
   14 false single-disk unknown
   15 false single-disk unknown
   16 false single-disk unknown
   17 false single-disk unknown
   18 false single-disk unknown
   19 false single-disk unknown
   20 false single-disk unknown
   21 false single-disk unknown
   22 true  single-disk normal
   23 true  single-disk normal


             Shelf Name: 2.0
               Stack ID: 2
               Shelf ID: 0
              Shelf UID: 00:a0:98:00:00:57:6a:11
          Serial Number: SHJHU2341000702
            Module Type: NSM8E
                  Model: NS224NSM8E
           Shelf Vendor: NETAPP
             Disk Count: 24
        Connection Type: NVMe
            Shelf State: Online
                 Status: Normal


  Modules:                                          Module is
                                   Monitor   Is     Reporting FW Update             Latest   Swap Operational Module
   ID Part No.     ES Serial No.   is Active Master Element   Progress      FW Rev. FW Rev. Count Status      Location
  --- ------------ --------------- --------- ------ --------- ------------- ------- ------- ----- ----------- --------------
    A 111-05329+B3 792335000130    true      false  true      not-available 0140    -           0 normal      rear of the shelf at the top, on shelf module (A)
    B 111-05329+B3 792335000044    true      true   false     not-available 0140    -           0 normal      rear of the shelf at the bottom, on shelf module (B)

  Paths:
                                                                                                               Speed
  Controller         Initiator Initiator Side Switch Port Target Side Switch Port    Target Port        TPGN    Gb/s I/O KB/s    IOPS
  ------------------ --------- -------------------------- -------------------------- ------------------ ------ ----- -------- -------
  xxx1-01     0s        -                          0s                         ef000000abef0000   40         0        0       0
  xxx1-02     0s        -                          0s                         ef000000abef0000   40         0        0       0

  Power Supply Units:
                                                       Crest  Power   Reset   PSU              Operational
   ID Type Part#        Serial#         Power Rating   Factor Drawn   Capable Enabled Firmware Status           PSU Location
  --- ---- ------------ --------------- -------------- ------ ------- ------- ------- -------- ---------------- ------------------------
    - -    -            -               -              -      -       -       -       -        -                -

  Voltage Sensors:
      Voltage Operational
   ID     (V) Status                 Sensor Location
  --- ------- ---------------------- ------------------------
    -       - -                      -

  Current Sensors:
      Current Operational
   ID    (mA) Status                 Sensor Location
  --- ------- ---------------------- ------------------------
    -       - -                      -

  Fans:
       Speed Operational
   ID  (RPM) Status                  Fan Location
  --- ------ ----------------------- ----------------------
    -      - -                       -

  Temperature:
                   -- Thresholds °C --
      Temp Is       Low  Low High High Operational
   ID   °C Ambient Crit Warn Crit Warn Status             Sensor Location
  --- ---- ------- ---- ---- ---- ---- ------------------ --------------------------------
    -    - -          -    -    -    - -                  -

  DIMM:
   ID Mod Type Size Speed   Status       Location
  --- --- ---- ---- ------- ------------ -------------------------------
    -  -  -       -       - -            -

  Boot Devices:
   ID Mod Type         Size Status
  --- --- ---------- ------ ------------
    -  -  -               - -

  Coin Battery:
          Voltage
   ID Mod    (mV) Status
  --- --- ------- ------------
    -  -        - -

  SAS Ports:
                                        -- Port Speeds Gb/s -- Power  Port
  Phy # IOM Port Type WWPN              Operational Negotiated Status Status
  ----- --- --------- ----------------- ----------- ---------- ------ -----------
      -  -  -         -                           -          - -      -

  FC Ports:
                Port
   ID Port Type Status
  --- --------- -----------
    - -         -

  PCIe Ports:
                       -- Speed (Gb/s) --
  Mod  ID Type     Bay Negotiated Maximum Status
  --- --- -------- --- ---------- ------- ------------------
   A    0 disk       0        8.0     8.0 ok
   A    1 disk       1        8.0     8.0 ok
   A    2 disk       2        8.0     8.0 ok
   A    3 disk       3        8.0     8.0 ok
   A    4 disk       4        8.0     8.0 ok
   A    5 disk       5        8.0     8.0 ok
   A    6 disk       6        8.0     8.0 ok
   A    7 disk       7        8.0     8.0 ok
   A    8 disk       8        8.0     8.0 ok
   A    9 disk       9        8.0     8.0 ok
   A   10 disk      10        8.0     8.0 ok
   A   11 disk      11        8.0     8.0 ok
   A   12 disk      12        8.0     8.0 ok
   A   13 disk      13        8.0     8.0 ok
   A   14 disk      14        8.0     8.0 ok
   A   15 disk      15        8.0     8.0 ok
   A   16 disk      16        8.0     8.0 ok
   A   17 disk      17        8.0     8.0 ok
   A   18 disk      18        8.0     8.0 ok
   A   19 disk      19        8.0     8.0 ok
   A   20 disk      20        8.0     8.0 ok
   A   21 disk      21        8.0     8.0 ok
   A   22 disk      22        8.0     8.0 ok
   A   23 disk      23        8.0     8.0 ok
   B    0 disk       0        8.0     8.0 ok
   B    1 disk       1        8.0     8.0 ok
   B    2 disk       2        8.0     8.0 ok
   B    3 disk       3        8.0     8.0 ok
   B    4 disk       4        8.0     8.0 ok
   B    5 disk       5        8.0     8.0 ok
   B    6 disk       6        8.0     8.0 ok
   B    7 disk       7        8.0     8.0 ok
   B    8 disk       8        8.0     8.0 ok
   B    9 disk       9        8.0     8.0 ok
   B   10 disk      10        8.0     8.0 ok
   B   11 disk      11        8.0     8.0 ok
   B   12 disk      12        8.0     8.0 ok
   B   13 disk      13        8.0     8.0 ok
   B   14 disk      14        8.0     8.0 ok
   B   15 disk      15        8.0     8.0 ok
   B   16 disk      16        8.0     8.0 ok
   B   17 disk      17        8.0     8.0 ok
   B   18 disk      18        8.0     8.0 ok
   B   19 disk      19        8.0     8.0 ok
   B   20 disk      20        8.0     8.0 ok
   B   21 disk      21        8.0     8.0 ok
   B   22 disk      22        8.0     8.0 ok
   B   23 disk      23        8.0     8.0 ok

  Bays:

      Has               Operational
   ID Disk  Bay Type    Status
  --- ----- ----------- -----------
    0 true  single-disk normal
    1 true  single-disk normal
    2 true  single-disk normal
    3 true  single-disk normal
    4 true  single-disk normal
    5 true  single-disk normal
    6 true  single-disk normal
    7 true  single-disk normal
    8 true  single-disk normal
    9 true  single-disk normal
   10 true  single-disk normal
   11 true  single-disk normal
   12 true  single-disk normal
   13 true  single-disk normal
   14 true  single-disk normal
   15 true  single-disk normal
   16 true  single-disk normal
   17 true  single-disk normal
   18 true  single-disk normal
   19 true  single-disk normal
   20 true  single-disk normal
   21 true  single-disk normal
   22 true  single-disk normal
   23 true  single-disk normal

  2 entries were displayed.

  xxx1::*>

What the output shows (shelf 1.1)

From storage shelf show -fields …:

  • psu-power-drawn = 171,189 → total = 360 W

From storage shelf show -instance (Voltage/Current sensors):

  • PSU1
    • 12.22 V @ 12.650 A → 12.22 × 12.65 = 154.5 W
    • 243.24 V @ 0.700 A → 243.24 × 0.7 = 170.3 W
    • If you sum both rails → 324.8 W
  • PSU2
    • 12.22 V @ 14.020 A → 12.22 × 14.02 = 171.3 W
    • 243.24 V @ 0.820 A → 243.24 × 0.82 = 199.5 W
    • Sum → 370.8 W

Total (summing both rails):
324.8 W + 370.8 W = 695.6 W → this matches Harvest’s ~699 W.

So Harvest is effectively adding both input and output rails. That doubles the true draw, because input (243 V, ~0.7–0.82 A) already represents the PSU’s power draw and matches the psu-power-drawn values (~171/189 W). The 12 V rail is the output to the shelf and should not be added on top of input.

@cgrinds
Copy link
Collaborator

cgrinds commented Feb 5, 2026

Thanks for the great problem description and PR @rmilkowski We'll take a look and get back to you

@cgrinds cgrinds requested a review from Copilot February 5, 2026 15:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where Harvest double-counts disk shelf power by summing both input and output power rails, resulting in approximately double the actual power consumption (e.g., reporting ~699W instead of ~360W for shelf 1.1).

Changes:

  • Introduces rail classification logic to distinguish between input and output power rails based on sensor labels
  • Updates shelf power calculation to prioritize input rails, fall back to output rails, and only sum all pairs when rail type cannot be determined
  • Adds comprehensive test coverage for the new rail classification functions

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/power/rail.go Implements rail classification logic to identify input/output power rails from sensor labels
pkg/power/rail_test.go Provides test coverage for rail classification and resolution functions
cmd/collectors/zapiperf/plugins/disk/disk.go Updates ZAPI collector to classify rails and use input-only pairs for power calculation
cmd/collectors/restperf/plugins/disk/disk.go Updates REST collector to classify rails and use input-only pairs for power calculation
docs/resources/power-algorithm.md Documents the new rail-based power calculation behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cgrinds
Copy link
Collaborator

cgrinds commented Feb 5, 2026

hi @rmilkowski I addressed the lint issues in rmilkowski#1 Once you get a chance to review and approve those changes we can merge this. Nice job! 🎊

rahulguptajss
rahulguptajss previously approved these changes Feb 5, 2026
refactor: disk lint errors
@cgrinds cgrinds merged commit c443b20 into NetApp:main Feb 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants