HDDS-13183. Create Cluster Capacity page UI. #9584

devabhishekpal · 2026-01-05T08:18:47Z

What changes were proposed in this pull request?

Create the Cluster Capacity page UI

Please describe your PR in detail:

This PR adds a Cluster Capacity page to the Recon UI.
The intention of this page is to provide a clear picture of the space usage breakdown to the end user.
It intends to alleviate some of the pain points regarding stuck deletions by showing more details into the pending deletion and the stages where it is pending.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13183

How was this patch tested?

Patch was tested manually.

(cherry picked from commit cc5cf02)

devmadhuu

Thanks @devabhishekpal for the patch.

I think , we decided to use datanode dropdown as combo of textbox and dropdown, where user can enter also to search in long list of drop down if needed.
We need export option for datanodes card atleast.
By default, datanodes in dropdown needs to be populated in descending order of pending deletion size.
We should gray out the list of failed datanodes in dropdown as their pending deletion data is no longer useful for user to be displayed. This should be possible using css.
How are we planning to handle the case when the API response will be having duplicate nodes with same name. Pls check the API behavior in cases when same name DNs joins the cluster and if API will have response because API response contains datanode names as well as UUID, and UUID is only being treated as unique.

hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/api/db.json

priyeshkaratha

Thanks @devabhishekpal for working on the patch.
I have two comments regarding current UI implementation for showing DN pendingDeletion and related refreshing strategy. Please have a look on that.

...ne/recon/src/main/resources/webapps/recon/ozone-recon-web/src/v2/pages/capacity/capacity.tsx

priyeshkaratha

Thanks @devabhishekpal for updating the patch. Overall changes LGTM

devabhishekpal · 2026-01-21T08:57:53Z

@adoroszlai I have added the commons-csv library from apache commons.
It would be great if you could take a look as well in case we have any existing library which serves the purpose.

adoroszlai

Thanks @devabhishekpal for the heads-up.

I have added the commons-csv library from apache commons. It would be great if you could take a look as well in case we have any existing library which serves the purpose.

dependency check failure has instructions on what needs to be done when adding new one.

pom.xml

devmadhuu

Thanks @devabhishekpal for the patch. Kindly add unit and integration test for the code including new API end point. Also UI dev tests. And a small nit.

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/PendingDeletionEndpoint.java

yandrey321 · 2026-01-21T14:47:58Z

[Usability issue] Its better to use grid (+ pagination or infinity scrolling and filtering/sorting/searching capabilities) with info for all data data nodes. In case of hundreds datanodes it would be hard to explore and pinpoint data nodes that cause probles.

devabhishekpal · 2026-01-21T19:52:10Z

[Usability issue] Its better to use grid (+ pagination or infinity scrolling and filtering/sorting/searching capabilities) with info for all data data nodes. In case of hundreds datanodes it would be hard to explore and pinpoint data nodes that cause probles.

Thanks @yandrey321, you are correct that we should have used pagination, however here we decided that instead of showing all the DNs, since this shows the list where datanodes might be stuck in deletion state, we show the top 10 data-nodes sorted by size of pending deletes i.e the first DN has the most size of pending deletion and so on.
This assumes that the user is concerned with where the deletion is stuck in order to debug further, however there is also an option to download the list of all DNs and their pending deletion size as a CSV so they can further process that raw data as they see fit.

devabhishekpal · 2026-01-21T20:26:49Z

@yandrey321 Thanks for the inputs, I reflected on this and I added a tooltip to better convey this information to the end user i.e the list only contains the top 15 DNs and the whole information can be downloaded as a CSV file.

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/DataNodeMetricsService.java

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/PendingDeletionEndpoint.java

priyeshkaratha · 2026-01-22T10:26:15Z

Thanks @devabhishekpal for improving the patch. Changes looks fine for me. Please check the CI failures

priyeshkaratha

Thanks @devabhishekpal for fixing ci issues. LGTM

priyeshkaratha · 2026-01-22T16:40:18Z

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/DataNodeMetricsService.java

  }

-  public DataNodeMetricsServiceResponse getCollectedMetrics() {
+  public DataNodeMetricsServiceResponse getCollectedMetrics(Integer limit) {


@devmadhuu I was thinking about whether we should implement sorting and fetching the top X results in the backend API, or move this logic to the frontend. Since we rely on JMX, we have to fetch all the data from JMX before we can perform any sorting. Because of this, applying sorting or limits in the backend does not provide any performance benefit—it only reduces the response size.

Given this, would it make more sense to move the sorting logic (based on pending deletion data) to the frontend?

But if we are not sure of the worst case scenario for number of DNs this might cause issues with sorting and slicing for a very large dataset. If it's in hundreds or thousands maybe it'll still perform well, but if this data is expanded to include other properties and it is thousands of nodes I think the browser might not be able to handle it properly.

Perhaps we can use something similar to infinite scroll instead of limit, but for sorting I'd prefer if it is sorted from the backend itself.

@priyeshkaratha yes it may have performance issues if number of DNs are high and if some DN may respond slow and 10min is quite high for timeout here, because we wait for all futures to be completed, and in worst case 10 mins can be the time taken for this all futures to be completed and return the pending blocks data. Now if we are displaying top 10 or top 20 datanodes sorted by size (descending), then we could have implemented the above collectMetrics data in such a way that instead of waiting for all datanodes to return the data after their futures is completed, we could have started pushing their results in queue and another thread (any client program) who needs this data like API end point here could just start picking results from that queue and can implement priority queue to start sorting for top 10 or 20, so this way I think memory footprint will be less and since on UI, we display only top 10 or top 20, then backend logic based on priority can keep only that much and keep discarding remaining. And UI should have a scroll dynamic logic that should send request to backend if user goes beyond those top 10 or top 20. But we can have this logic later sometime in future and assume that in case of large clusters where 1000+ datanodes can be there, jmx response from all datanodes can come quickly or else worst 10 mins.

devmadhuu

Thanks @devabhishekpal for updating the patch. Largely changes LGTM +1. A comment on approach which we can do in future, but I still insists to do some testing on real cluser having 1000+ DNs where some DN could be running slow. IMO, You can write an integration test where you can introduce some delay in returning JMX response by adding some sleep and see how your API endpoint response behaves and then you can predict how will be UI behavior for user. Basically it should not be a long wait where UI simply gets stuck for worst 10 mins.

devmadhuu · 2026-01-23T06:06:16Z

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/DataNodeMetricsService.java

  }

-  public DataNodeMetricsServiceResponse getCollectedMetrics() {
+  public DataNodeMetricsServiceResponse getCollectedMetrics(Integer limit) {


@priyeshkaratha yes it may have performance issues if number of DNs are high and if some DN may respond slow and 10min is quite high for timeout here, because we wait for all futures to be completed, and in worst case 10 mins can be the time taken for this all futures to be completed and return the pending blocks data. Now if we are displaying top 10 or top 20 datanodes sorted by size (descending), then we could have implemented the above collectMetrics data in such a way that instead of waiting for all datanodes to return the data after their futures is completed, we could have started pushing their results in queue and another thread (any client program) who needs this data like API end point here could just start picking results from that queue and can implement priority queue to start sorting for top 10 or 20, so this way I think memory footprint will be less and since on UI, we display only top 10 or top 20, then backend logic based on priority can keep only that much and keep discarding remaining. And UI should have a scroll dynamic logic that should send request to backend if user goes beyond those top 10 or top 20. But we can have this logic later sometime in future and assume that in case of large clusters where 1000+ datanodes can be there, jmx response from all datanodes can come quickly or else worst 10 mins.

ArafatKhan2198

LGTM +1

devabhishekpal · 2026-01-23T12:26:26Z

Thanks for the reviews and inputs @priyeshkaratha @devmadhuu @yandrey321 @adoroszlai @ArafatKhan2198 .

Merging this PR

devabhishekpal added 2 commits December 30, 2025 12:48

HDDS-13183. Create Cluster Capacity page UI (apache#9022).

687f14c

(cherry picked from commit cc5cf02)

Add new changes for the Capacity distribution UI

0d3b706

devabhishekpal requested review from ArafatKhan2198 and devmadhuu January 5, 2026 08:19

devabhishekpal self-assigned this Jan 5, 2026

adoroszlai added recon UI labels Jan 5, 2026

devmadhuu reviewed Jan 6, 2026

View reviewed changes

hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/api/db.json Show resolved Hide resolved

priyeshkaratha reviewed Jan 8, 2026

View reviewed changes

...ne/recon/src/main/resources/webapps/recon/ozone-recon-web/src/v2/pages/capacity/capacity.tsx Show resolved Hide resolved

...ne/recon/src/main/resources/webapps/recon/ozone-recon-web/src/v2/pages/capacity/capacity.tsx Show resolved Hide resolved

devabhishekpal added 3 commits January 10, 2026 13:14

address some review comments

a8f9db5

Address some comments

325c43c

Address review comments

2885699

devabhishekpal requested review from devmadhuu and priyeshkaratha January 20, 2026 06:59

priyeshkaratha approved these changes Jan 20, 2026

View reviewed changes

devabhishekpal added 2 commits January 21, 2026 14:19

Add download logic to the endpoint

6ba0252

Fix checkstyle issues

6ed63eb

devabhishekpal requested a review from adoroszlai January 21, 2026 08:57

adoroszlai reviewed Jan 21, 2026

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

devmadhuu reviewed Jan 21, 2026

View reviewed changes

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/PendingDeletionEndpoint.java Outdated Show resolved Hide resolved

devabhishekpal added 2 commits January 22, 2026 01:57

Address some review comments

16e8681

Fix build issues, address unit tests

1ab7751

priyeshkaratha reviewed Jan 22, 2026

View reviewed changes

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/DataNodeMetricsService.java Outdated Show resolved Hide resolved

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/PendingDeletionEndpoint.java Show resolved Hide resolved

Address review comments and fix CI issues

b19260a

Address checkstyle and dependency checks

7f19bee

devabhishekpal requested review from adoroszlai, devmadhuu and priyeshkaratha January 22, 2026 12:12

priyeshkaratha approved these changes Jan 22, 2026

View reviewed changes

priyeshkaratha reviewed Jan 22, 2026

View reviewed changes

devmadhuu reviewed Jan 23, 2026

View reviewed changes

Fixed download logic

ef1c925

devmadhuu self-requested a review January 23, 2026 07:20

devmadhuu approved these changes Jan 23, 2026

View reviewed changes

ArafatKhan2198 approved these changes Jan 23, 2026

View reviewed changes

devabhishekpal merged commit 7a791ad into apache:master Jan 23, 2026
44 checks passed

HDDS-13183. Create Cluster Capacity page UI. #9584

HDDS-13183. Create Cluster Capacity page UI. #9584

Conversation

devabhishekpal commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

priyeshkaratha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

priyeshkaratha left a comment

Choose a reason for hiding this comment

Uh oh!

devabhishekpal commented Jan 21, 2026

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yandrey321 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devabhishekpal commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devabhishekpal commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

priyeshkaratha commented Jan 22, 2026

Uh oh!

priyeshkaratha left a comment

Choose a reason for hiding this comment

Uh oh!

priyeshkaratha Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

devabhishekpal Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

priyeshkaratha Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

devmadhuu left a comment

Choose a reason for hiding this comment

Uh oh!

devmadhuu Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ArafatKhan2198 left a comment

Choose a reason for hiding this comment

Uh oh!

devabhishekpal commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

devabhishekpal commented Jan 5, 2026 •

edited

Loading

yandrey321 commented Jan 21, 2026 •

edited

Loading

devabhishekpal commented Jan 21, 2026 •

edited

Loading