For broadcast alias, insert bcast domain at the position corresponding to bcast op#5944
For broadcast alias, insert bcast domain at the position corresponding to bcast op#5944
Conversation
|
!test --diff |
|
Review updated until commit 3fbef38 Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
| ||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Logic Correctness
insert(out_allocation.begin() + i, out_logical[i]) to place broadcast dimensions at their corresponding logical positions rather than appending them. This ensures allocation domain stays close to logical domain as intended. |
|
!test --diff |
|
!test --diff |
|
code diff is from the revised test and is expected. |
|
distributed test failure in |
Greptile OverviewGreptile SummaryThis PR optimizes broadcast aliasing by inserting broadcast dimensions at their logical positions in the allocation domain rather than appending them to the end. This keeps the allocation domain closer to the logical domain structure.
The implementation correctly handles multiple broadcast dimensions by inserting them sequentially at their logical positions, which naturally accounts for array shifts from previous insertions. Confidence Score: 4/5
Important Files Changed
Flowchartflowchart TD
A[Input Tensor<br/>Logical: i0, i1<br/>Allocation: i1, i0] --> B[mapInLayoutToOutRoot]
B --> C[Mapped Layout<br/>Allocation: i1_out, i0_out]
C --> D{Iterate through<br/>output logical dims}
D -->|i=0: not broadcast| E[Skip]
D -->|i=1: IS broadcast| F[Insert b at position 1<br/>Allocation: i1_out, b, i0_out]
D -->|i=2: not broadcast| G[Skip]
F --> H[Final Output<br/>Logical: i0, b, i1<br/>Allocation: i1, b, i0]
E --> D
G --> H
style F fill:#90EE90
style H fill:#87CEEB
Last reviewed commit: 3fbef38 |
wujingyue
left a comment
There was a problem hiding this comment.
Thanks for trying to provide the context! I still don't follow why reduction+broadcast is bad. Is that something alias analysis could fix instead of dodging?
Let me add more details from a test case: Fusion is a simple normalization: T2's allocation domain is set to : Fusion is segmented due to large reduction size, then the following segment is picked up by transpose scheduler due to T2's allocation domain can't be mapped with T0's. We want to use pointwise scheduler for this segmented fusion. So I disabled the broadcast aliasing for the |
In other words, the following segment can't be picked up by pointwise because T2's allocation domain can't be mapped with T0's. What stops T2 and T0 from being mapped? IIUC, two better approaches are:
|
Approach-1 sounds good to me. I revised to set allocation domain to logical. |
|
!test |
csrc/alias_analysis.cpp
Outdated
| std::vector<IterDomain*> out_allocation(out_logical); | ||
| std::vector<std::optional<bool>> out_contiguity; | ||
| for (const auto i : arange(out_logical.size())) { | ||
| if (bcast->isBroadcastDim(i)) { | ||
| out_allocation.push_back(out_logical[i]); | ||
| out_contiguity.push_back(std::nullopt); | ||
| } else { | ||
| out_contiguity.push_back(true); |
There was a problem hiding this comment.
hardcoding contiguity to true for all non-broadcast dimensions ignores the input tensor's actual contiguity. if the input has a non-contiguous layout (e.g., from a previous transpose), this will create a mismatch.
consider whether this should be out_contiguity.push_back(out_layout->contiguity(i)) for non-broadcast dims to preserve input contiguity, or verify that aliasIfCompliant will reject incompatible cases
|
!test |
|
!test |
This reverts commit 0377924.
|
!test |
Thanks for trying this!
That's probably incorrect. You still want to start with |
OK, I added a new test where |
|
!test |
|
!test |
| if (bcast->isBroadcastDim(i)) { | ||
| out_allocation.push_back(out_logical[i]); | ||
| out_contiguity.push_back(std::nullopt); | ||
| out_allocation.insert(out_allocation.begin() + i, out_logical[i]); |
There was a problem hiding this comment.
Cool! Almost there...
Can you avoid insertion to the middle of a vector? Below is a small trick that doesn't complicate the code much at all.
auto j = std::ssize(out_allocation);
out_allocation.resize(out_logical.size());
out_contiguity...
for (auto i : arange(std::ssize(out_allocation)) | std::views::reverse) {
if is broadcast {
out_allocation[i] = ...;
out_contiguity...
} else {
j--;
out_allocation[i] = out_allocation[j];
out_contiguity...
}
}
This PR disables broadcast aliasing for the
reduction + broadcastpattern to prevent unnecessary allocation-domain changes on broadcast tensors.When segmenting normalization kernels, this chagne leads to a
reduction + pointwisedecomposition. Before this PR,reduction + transposeis used due to the unnecessary allocation domain set on the broadcast tensor.This is an optimization following PR #5884 and issue #5883