Skip to content

Conversation

@siddhantsangwan
Copy link
Contributor

@siddhantsangwan siddhantsangwan commented Jan 19, 2026

What changes were proposed in this pull request?

Please see the jira for the problem description.

Changes proposed:

  1. In Datanode volume storage reports, also include raw (the actual capacity and available space for a volume) file system stats: Filesystem Capacity, Filesystem Available. These are different from the capacity and available space that's currently reported:

capacity = fsCapacity - du.reserved

This is what the new proto message looks like:

message StorageReportProto {
  required string storageUuid = 1;
  required string storageLocation = 2;
  optional uint64 capacity = 3 [default = 0];
  optional uint64 scmUsed = 4 [default = 0];
  optional uint64 remaining = 5 [default = 0];
  optional StorageTypeProto storageType = 6 [default = DISK];
  optional bool failed = 7 [default = false];
  optional uint64 committed = 8 [default = 0];
  optional uint64 freeSpaceToSpare = 9 [default = 0];
  optional uint64 reserved = 10;
  /*
   Raw filesystem stats (as reported by the local filesystem). These represent the real device
   capacity/available, independent of Ozone's reserved-space adjustment.
   */
  optional uint64 fsCapacity = 11 [default = 0];
  optional uint64 fsAvailable = 12 [default = 0];
}

Note fsUsed is not reported in the heartbeat as it can be calculated where needed using fsCapacity - fsAvailable.

  1. These stats are then updated at various places: Datanode JMX metrics, SCM JMX metrics, usageinfo command, and Recon Datanodes page mouse hover screen.
  2. Instead of using the ambiguous term capacity, now we clearly differentiate by using Ozone Capacity to mean Filesystem Capacity - du.reserved etc.

What's remaining (to be done in subsequent pull requests):

  1. Recon main overview page.
  2. SCM Web UI main page.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14446

How was this patch tested?

Edited and added various unit tests.

The Recon overview page is still the same as before:

image

usageinfo command shows the new names and the new fs stats:

bash-5.1$ ozone admin datanode usageinfo -c=5 -m
Usage Information (3 Datanodes)

UUID                    : 5953e64b-1acc-4cf3-9aaf-aedb271a02fb 
IP Address              : 172.18.0.9 
Hostname                : ozone-datanode-2.ozone_default 
Filesystem Capacity     : 1081100128256 B (1006.85 GB) 
Filesystem Used         : 68737728512 B (64.02 GB) 
Filesystem Used %       : 6.36% (Filesystem Used/Filesystem Capacity) 
Filesystem Available    : 1012362399744 B (942.84 GB) 
Filesystem Available %  : 93.64% (Filesystem Available/Filesystem Capacity) 
Ozone Capacity          : 1080992018248 B (1006.75 GB) 
Ozone Used              : 5812224 B (5.54 MB) 
Ozone Used %            : 0.00% (Ozone Used/Ozone Capacity) 
Ozone Available         : 1012362399744 B (942.84 GB) 
Ozone Available %       : 93.65% (Ozone Available/Ozone capacity) 
Pipeline(s)             : 2 
Container(s)            : 0 
Container Pre-allocated : 0 B (0 B) 
Remaining Allocatable   : 1012362399744 B (942.84 GB) 
Free Space To Spare     : 104857600 B (100 MB) 
Reserved                : 108110008 B (103.10 MB) 

UUID                    : 75b9772e-61bc-4b79-b93c-807584ee604c 
IP Address              : 172.18.0.8 
Hostname                : ozone-datanode-1.ozone_default 
Filesystem Capacity     : 1081100128256 B (1006.85 GB) 
Filesystem Used         : 68737728512 B (64.02 GB) 
Filesystem Used %       : 6.36% (Filesystem Used/Filesystem Capacity) 
Filesystem Available    : 1012362399744 B (942.84 GB) 
Filesystem Available %  : 93.64% (Filesystem Available/Filesystem Capacity) 
Ozone Capacity          : 1080992018248 B (1006.75 GB) 
Ozone Used              : 5812224 B (5.54 MB) 
Ozone Used %            : 0.00% (Ozone Used/Ozone Capacity) 
Ozone Available         : 1012362399744 B (942.84 GB) 
Ozone Available %       : 93.65% (Ozone Available/Ozone capacity) 
Pipeline(s)             : 2 
Container(s)            : 0 
Container Pre-allocated : 0 B (0 B) 
Remaining Allocatable   : 1012362399744 B (942.84 GB) 
Free Space To Spare     : 104857600 B (100 MB) 
Reserved                : 108110008 B (103.10 MB) 

UUID                    : e9587d3d-200e-4186-b983-9fde5a4c9a40 
IP Address              : 172.18.0.2 
Hostname                : ozone-datanode-3.ozone_default 
Filesystem Capacity     : 1081100128256 B (1006.85 GB) 
Filesystem Used         : 68737728512 B (64.02 GB) 
Filesystem Used %       : 6.36% (Filesystem Used/Filesystem Capacity) 
Filesystem Available    : 1012362399744 B (942.84 GB) 
Filesystem Available %  : 93.64% (Filesystem Available/Filesystem Capacity) 
Ozone Capacity          : 1080992018248 B (1006.75 GB) 
Ozone Used              : 5812224 B (5.54 MB) 
Ozone Used %            : 0.00% (Ozone Used/Ozone Capacity) 
Ozone Available         : 1012362399744 B (942.84 GB) 
Ozone Available %       : 93.65% (Ozone Available/Ozone capacity) 
Pipeline(s)             : 2 
Container(s)            : 0 
Container Pre-allocated : 0 B (0 B) 
Remaining Allocatable   : 1012362399744 B (942.84 GB) 
Free Space To Spare     : 104857600 B (100 MB) 
Reserved                : 108110008 B (103.10 MB) 

SCM JMX metrics:

{
    "name" : "Hadoop:service=StorageContainerManager,name=SCMNodeMetrics",
    "modelerType" : "SCMNodeMetrics",
    "tag.Context" : "ozone",
    "tag.Hostname" : "d5f1173fd92a",
    "InMaintenanceStaleNodes" : 0,
    "InMaintenanceHealthyReadonlyNodes" : 0,
    "InMaintenanceHealthyNodes" : 0,
    "InMaintenanceDeadNodes" : 0,
    "InServiceStaleNodes" : 0,
    "InServiceHealthyReadonlyNodes" : 0,
    "InServiceHealthyNodes" : 3,
    "InServiceDeadNodes" : 0,
    "DecommissioningStaleNodes" : 0,
    "DecommissioningHealthyReadonlyNodes" : 0,
    "DecommissioningHealthyNodes" : 0,
    "DecommissioningDeadNodes" : 0,
    "DecommissionedStaleNodes" : 0,
    "DecommissionedHealthyReadonlyNodes" : 0,
    "DecommissionedHealthyNodes" : 0,
    "DecommissionedDeadNodes" : 0,
    "EnteringMaintenanceStaleNodes" : 0,
    "EnteringMaintenanceHealthyReadonlyNodes" : 0,
    "EnteringMaintenanceHealthyNodes" : 0,
    "EnteringMaintenanceDeadNodes" : 0,
    "AllNodes" : 3,
    "NonWritableNodes" : 0,
    "TotalOzoneCapacity" : 3242976054744,
    "DecommissionedDiskRemaining" : 0,
    "DecommissionedSSDRemaining" : 0,
    "MaintenanceSSDRemaining" : 0,
    "SSDCapacity" : 0,
    "DecommissionedDiskCapacity" : 0,
    "SSDUsed" : 0,
    "MaintenanceDiskCapacity" : 0,
    "TotalFilesystemCapacity" : 3243300384768,
    "TotalOzoneUsed" : 17436672,
    "DiskCapacity" : 3242976054744,
    "DecommissionedDiskUsed" : 0,
    "TotalFilesystemUsed" : 206214365184,
    "TotalFilesystemAvailable" : 3037086019584,
    "DecommissionedSSDUsed" : 0,
    "MaintenanceDiskUsed" : 0,
    "SSDRemaining" : 0,
    "DecommissionedSSDCapacity" : 0,
    "DiskRemaining" : 3037086019584,
    "DiskUsed" : 17436672,
    "MaintenanceDiskRemaining" : 0,
    "MaintenanceSSDCapacity" : 0,
    "MaintenanceSSDUsed" : 0,
    "NumHBProcessed" : 33192,
    "NumHBProcessingFailed" : 0,
    "NumNodeCommandQueueReportProcessed" : 33192,
    "NumNodeCommandQueueReportProcessingFailed" : 0,
    "NumNodeReportProcessed" : 2763,
    "NumNodeReportProcessingFailed" : 0
  }

Datanode Web UI main page (ozone-datanode-1):

image

Note that Total Capacity shown above should be the same as Filesystem Capacity. I've retained Total Capacity just because it was already there. We could remove it if needed.

Datanode JMX metrics (ozone-datanode-1):

{
    "name" : "Hadoop:service=HddsDatanode,name=VolumeInfoMetrics-/data/hdds",
    "modelerType" : "VolumeInfoMetrics-/data/hdds",
    "tag.Context" : "ozone",
    "tag.StorageType" : "DISK",
    "tag.DatanodeUuid" : "75b9772e-61bc-4b79-b93c-807584ee604c",
    "tag.VolumeType" : "DATA_VOLUME",
    "tag.StorageDirectory" : "/data/hdds/hdds",
    "tag.VolumeState" : "NORMAL",
    "tag.Hostname" : "9cb7bc189a70",
    "AvailableSpaceInsufficient" : 0,
    "DbCompactLatencyNumOps" : 7,
    "DbCompactLatencyAvgTime" : 0.0,
    "NumScans" : 16,
    "NumScansSkipped" : 0,
    "ReservedCrossesLimit" : 1,
    "Committed" : 0,
    "Containers" : 0,
    "LayoutVersion" : 1,
    "OzoneCapacity" : 1080992018248,
    "OzoneAvailable" : 1012361478144,
    "OzoneUsed" : 5812224,
    "Reserved" : 108110008,
    "TotalCapacity" : 1081100128256,
    "FilesystemCapacity" : 1081100128256,
    "FilesystemAvailable" : 1012361478144,
    "FilesystemUsed" : 68738650112
  }

PR is draft while CI runs in my fork.

Comment on lines 114 to 127
/**
* @return raw filesystem capacity (cached) for the configured volume path.
*/
public long getFsCapacity() {
return source.getCapacity();
}

/**
* @return raw filesystem available space (cached) for the configured volume path.
*/
public long getFsAvailable() {
return source.getAvailable();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siddhantsangwan for the patch.

Instead of adding these new methods in VolumeUsage, I suggest getting the same values via realUsage() in a single call, and passing that to getCurrentUsage().

HDDS-14446.patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, I updated the PR.

@adoroszlai adoroszlai changed the title HDDS-14446. Make Datanode disk space related commands and metrics clear HDDS-14446. Clarify Datanode disk space related commands and metrics Jan 20, 2026
Comment on lines 1806 to 1832
/**
* Simple class for grouping disk space related values.
*/
public static final class FsUsageTotals {
private final long capacity;
private final long available;
private final long used;

private FsUsageTotals(long capacity, long available, long used) {
this.capacity = capacity;
this.available = available;
this.used = used;
}

public long getCapacity() {
return capacity;
}

public long getAvailable() {
return available;
}

public long getUsed() {
return used;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this earlier: can we use SpaceUsageSource.Fixed instead?

/**
* A static source of space usage. Can be a point in time snapshot of a
* real volume usage, or can be used for testing.
*/
final class Fixed implements SpaceUsageSource {
private final long capacity;
private final long available;
private final long used;
public Fixed(long capacity, long available, long used) {
this.capacity = capacity;
this.available = Math.max(Math.min(available, capacity - used), 0);
this.used = used;
}
@Override
public long getCapacity() {
return capacity;
}
@Override
public long getAvailable() {
return available;
}
@Override
public long getUsedSpace() {
return used;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you please check acceptance test failure? Happened for both:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants