Skip to content

[FEA] Simulate CuTeDsl distributed on a single GPU. #2897

@whatdhack

Description

@whatdhack

Which component requires the feature?

CuTe DSL

Feature Request

Is your feature request related to a problem? Please describe.
It will be very useful to be able to run and debug examples like in https://github.com/NVIDIA/cutlass/tree/main/examples/python/CuTeDSL/distributed on a single GPU.

Describe the solution you'd like
Run on a singe GPU , for example as follows,
torchrun --nnodes 1 --nproc-per-node 2 --no-python python all_reduce_one_shot_lamport.py

Describe alternatives you've considered
Multiple GPU's

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions