HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries #6202

Indhumathi27 · 2025-11-18T17:53:29Z

What changes were proposed in this pull request?

This PR updates TopNKeyProcessor to skip creating TopNKeyOperator when ReduceSinkDesc.topN is already set by LIMIT pushdown for ORDER BY LIMIT case. This prevents TopNKey from overriding pushdown and ensures the Reduce sink TopNkey filtering is used.

Why are the changes needed?

Currently, when a query includes ORDER BY + LIMIT, LIMIT pushdown is generated during planning but is effectively overridden by the subsequent TopNKey rewrite. As a result, TopNKey operator receives full input rather than a reduced data set, leading to worse performance (e.g., 16M rows forwarded to reducer instead of a few). In cases where global ordering uses a single reducer, LIMIT pushdown is sufficient and far more efficient. This fix prevents unnecessary TopNKey creation so that pushdown can reduce shuffle and significantly improve execution time.

Test reports:

For query: select * from table order by h limit 100;
Total num of rows: 67764224

with fix:

without fix:

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual testing + existing testcases

Indhumathi27 · 2025-12-01T14:51:50Z

@kasakrisz @deniskuzZ Can you help to review this PR. Thanks

Indhumathi27 · 2025-12-05T05:25:46Z

@ayushtkn @zabetak @kasakrisz @deniskuzZ Can you help to review the PR. thanks

zabetak · 2025-12-08T14:31:06Z

I left some questions under the JIRA ticket to better understand the issue that we are trying to solve here.

Indhumathi27 · 2025-12-26T10:05:59Z

@zabetak @okumin Added the analysis in the jira. Please check

zabetak

I went over the plan changes and they look reasonable based on the goal of this PR. In terms of changes in the optimizer I left a few comments that could make the code a bit easier to follow.

zabetak · 2025-12-31T14:50:27Z

ql/src/test/results/clientpositive/llap/lateral_view.q.out

-                      sort order: ++
-                      Statistics: Num rows: 1 Data size: 178 Basic stats: COMPLETE Column stats: COMPLETE
-                      value expressions: _col1 (type: string)
+                    TopN Hash Memory Usage: 0.1


When the plan contains a Limit operator all subsequent TopN optimizations in the same mapper/reducer seem redundant. It's out of scope of the current PR but it may be worth logging a separate JIRA ticket as a potential improvement.

ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java

ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java

zabetak · 2025-12-31T15:28:26Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java

+    // Skip the current optimization when a simple global ORDER BY...LIMIT is present
+    // (topN > -1 and hasOnlyOrderByLimit()).
+    // This plan structure is handled more efficiently by the specialized 'TopN In Reducer' optimization.
+    if (reduceSinkDesc.getTopN() > -1 && reduceSinkDesc.hasOnlyOrderByLimit()) {


It seems that we don't want to introduce a TopNKeyOperator if the path between TS and RS does not contain a JOIN or GBY operator. We can add such checks at some point here or maybe better modify the respective RuleRegExp to only trigger when there is a GBY or JOIN in the path.

@zabetak I have handled in TopNKeyProcessor to disable TopNkey for cases without JOIN/GROUP BY. can you look at the changes. Thanks

sonarqubecloud · 2026-01-04T17:41:33Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zabetak · 2026-01-05T08:46:39Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java

+   * This is used to disable TopNKey for pure ORDER BY LIMIT queries where
+   * LIMIT pushdown must take precedence.
+   */
+  public static boolean isOrderByLimitPath(ReduceSinkOperator rs) {


I have a feeling that the traversal logic here could be skipped by properly setting up the RuleRegExp in org.apache.hadoop.hive.ql.parse.TezCompiler#runTopNKeyOptimization.

Something like:

new RuleRegExp("Top n key optimization", "(GBY%|JOIN%).*RS%")

The expression above is not tested but I have the impression that with some tuning we can get the desired matching scope.

…rovides Better Pruning

asf-ci-hive added tests pending tests unstable and removed tests pending labels Nov 18, 2025

Indhumathi27 force-pushed the disable_topn branch from 3a7b050 to cbb41cf Compare November 27, 2025 14:15

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Nov 27, 2025

Indhumathi27 changed the title ~~Draft: HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning~~ HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning Nov 27, 2025

Indhumathi27 force-pushed the disable_topn branch from cbb41cf to 75f9861 Compare December 1, 2025 14:26

asf-ci-hive added tests pending and removed tests unstable labels Dec 1, 2025

asf-ci-hive added tests unstable and removed tests pending labels Dec 1, 2025

Indhumathi27 force-pushed the disable_topn branch 4 times, most recently from db493f2 to a003d03 Compare December 1, 2025 18:13

Indhumathi27 changed the title ~~HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning~~ HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning for ORDER BY LIMIT Queries Dec 1, 2025

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Dec 1, 2025

Indhumathi27 force-pushed the disable_topn branch from a003d03 to bc90bb4 Compare December 3, 2025 07:57

asf-ci-hive added tests pending tests passed and removed tests passed tests pending labels Dec 3, 2025

asf-ci-hive added the tests passed label Dec 8, 2025

zabetak reviewed Dec 31, 2025

View reviewed changes

Indhumathi27 force-pushed the disable_topn branch from 4e3f88b to 5b12581 Compare January 3, 2026 13:46

asf-ci-hive added tests pending tests unstable and removed tests passed tests pending labels Jan 3, 2026

Indhumathi27 force-pushed the disable_topn branch from 5b12581 to da77521 Compare January 3, 2026 18:59

asf-ci-hive added tests pending and removed tests unstable labels Jan 3, 2026

Indhumathi27 changed the title ~~HIVE-29322: Avoid TopNKeyOperator When Map-Side LIMIT Pushdown Provides Better Pruning for ORDER BY LIMIT Queries~~ HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries Jan 3, 2026

asf-ci-hive added tests unstable and removed tests pending labels Jan 3, 2026

Indhumathi27 force-pushed the disable_topn branch from da77521 to 8e75362 Compare January 4, 2026 16:23

asf-ci-hive added tests pending and removed tests unstable labels Jan 4, 2026

asf-ci-hive added tests passed and removed tests pending labels Jan 4, 2026

zabetak reviewed Jan 5, 2026

View reviewed changes

HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering P…

bb94fff

…rovides Better Pruning

Indhumathi27 force-pushed the disable_topn branch from 8e75362 to 9e856b0 Compare January 5, 2026 16:49

asf-ci-hive added tests pending and removed tests passed labels Jan 5, 2026

fix comments

939f281

Indhumathi27 force-pushed the disable_topn branch from 9e856b0 to 939f281 Compare January 5, 2026 16:50

asf-ci-hive added tests failed and removed tests pending labels Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries #6202

HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries #6202

Indhumathi27 commented Nov 18, 2025 •

edited

Loading

Uh oh!

Indhumathi27 commented Dec 1, 2025

Uh oh!

Indhumathi27 commented Dec 5, 2025

Uh oh!

zabetak commented Dec 8, 2025

Uh oh!

Indhumathi27 commented Dec 26, 2025

Uh oh!

zabetak left a comment

Uh oh!

zabetak Dec 31, 2025

Uh oh!

Uh oh!

Uh oh!

zabetak Dec 31, 2025

Uh oh!

Indhumathi27 Jan 5, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Jan 4, 2026

Uh oh!

zabetak Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries #6202

Are you sure you want to change the base?

HIVE-29322: Avoid TopNKeyOperator When ReduceSink TopNkey Filtering Provides Better Pruning for ORDER BY LIMIT Queries #6202

Conversation

Indhumathi27 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Indhumathi27 commented Dec 1, 2025

Uh oh!

Indhumathi27 commented Dec 5, 2025

Uh oh!

zabetak commented Dec 8, 2025

Uh oh!

Indhumathi27 commented Dec 26, 2025

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

zabetak Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zabetak Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Indhumathi27 Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 4, 2026

Quality Gate passed

Uh oh!

zabetak Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Indhumathi27 commented Nov 18, 2025 •

edited

Loading

Indhumathi27 Jan 5, 2026 •

edited

Loading