Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
6a74af0
Add make_dynamic_open_dataflow_graph_from_pcg.
elliottslaughter Feb 4, 2026
587e08e
Empty skeleton of the realm-execution backend.
elliottslaughter Feb 4, 2026
50f6ec6
More Realm execution skeleton.
elliottslaughter Feb 4, 2026
984aae5
Stub creation.
elliottslaughter Feb 4, 2026
e9e1105
More passes.
elliottslaughter Feb 4, 2026
90b92c7
Add Realm manager and test it.
elliottslaughter Feb 5, 2026
ef92d6f
Do not expose raw runtime and properly wait in test.
elliottslaughter Feb 5, 2026
01e23cd
Sketch more Realm manager APIs.
elliottslaughter Feb 5, 2026
3e7d841
Add controller functionality.
elliottslaughter Feb 5, 2026
b9a30a6
Fix Realm tests.
elliottslaughter Feb 5, 2026
814e13f
Support passing closure arguments to controllers.
elliottslaughter Feb 5, 2026
3d0298c
Move task IDs into Realm and assign IDs to remaining tasks.
elliottslaughter Feb 5, 2026
d702afe
Avoid pulling in the entire invocation.
elliottslaughter Feb 5, 2026
4fcde77
Conversion into Realm task IDs.
elliottslaughter Feb 5, 2026
e51b04e
Add a top-level PRealm switch.
elliottslaughter Feb 5, 2026
895de33
Some work on Realm task registry.
elliottslaughter Feb 6, 2026
09fde7d
Split out the Realm context.
elliottslaughter Feb 6, 2026
c5a0ea9
Switch to mapped PCG.
elliottslaughter Feb 6, 2026
a587e53
Add shard expansion pass (and implement shard expansion pass).
elliottslaughter Feb 6, 2026
62b49f7
Add instance field to dynamic graph, more task IDs.
elliottslaughter Feb 6, 2026
ce403d4
Fix filename.
elliottslaughter Feb 6, 2026
a4183dd
Some work in instance allocation and registry/manager.
elliottslaughter Feb 6, 2026
0274dd0
Instance allocation.
elliottslaughter Feb 6, 2026
9d24b3d
Simplify dims and use constructors.
elliottslaughter Feb 6, 2026
60989fe
Refactor.
elliottslaughter Feb 6, 2026
8d46441
Sketch out device mapping.
elliottslaughter Feb 6, 2026
0dfa1a3
Move instance backing to a separate map, remove realm from task-spec.
elliottslaughter Feb 6, 2026
a4bc84e
Implement processor queries.
elliottslaughter Feb 7, 2026
02b71a8
Enable PRealm.
elliottslaughter Feb 7, 2026
b144d6d
Move tasks to dedicated file, stub out device state init, shuffle dir…
elliottslaughter Feb 10, 2026
4d43a7b
Make use of task args struct.
elliottslaughter Feb 10, 2026
4991911
Use task args struct.
elliottslaughter Feb 10, 2026
6f65c51
Refactor task APIs.
elliottslaughter Feb 10, 2026
fce23cf
Finish implementation of device init task.
elliottslaughter Feb 10, 2026
6fc3b9b
Finish implementation of device state initialization.
elliottslaughter Feb 10, 2026
2de3516
Block on initialization.
elliottslaughter Feb 10, 2026
2a174e0
Wire up rest of Realm implementation.
elliottslaughter Feb 11, 2026
7e78e3f
Implement Realm device idx.
elliottslaughter Feb 11, 2026
5ffc1dd
Updates to compile against latest local-execution.
elliottslaughter Feb 12, 2026
e1b6fca
Fix up function arguments.
elliottslaughter Feb 12, 2026
e2ccf4a
Rename PCGInstance and add dependency set.
elliottslaughter Feb 12, 2026
ffd2738
Dependency tracking.
elliottslaughter Feb 12, 2026
81cc485
Add event argument to controller.
elliottslaughter Feb 12, 2026
bb0ea6b
Implement the allocator.
elliottslaughter Feb 12, 2026
feb5897
Implement device handle.
elliottslaughter Feb 12, 2026
202889f
Distributed device handle initialization.
elliottslaughter Feb 12, 2026
8f816f0
Distributed device handle initialization.
elliottslaughter Feb 13, 2026
37beaa4
Test distributed device handle.
elliottslaughter Feb 13, 2026
c616040
Guard the kinds of procs we run on.
elliottslaughter Feb 13, 2026
26046ee
Switch to own DeviceSpecific implementation with raw pointers.
elliottslaughter Feb 13, 2026
12c4940
Separate device handle test.
elliottslaughter Feb 13, 2026
1b42461
More work on Realm tests.
elliottslaughter Feb 13, 2026
657a9f9
JSON serialization of a bunch of data types.
elliottslaughter Feb 14, 2026
bed8e8a
Make more stuff serializable.
elliottslaughter Feb 14, 2026
374e4b6
To-do notes.
elliottslaughter Feb 14, 2026
31afd42
More serialization routines.
elliottslaughter Feb 14, 2026
f473033
Most of serializer finished.
elliottslaughter Feb 14, 2026
877fd8a
Finish serialization of device init task.
elliottslaughter Feb 14, 2026
0aa0664
Switch over to explicit DTGs for task arguments and serialization.
elliottslaughter Feb 14, 2026
5f4cce6
Convert op task args.
elliottslaughter Feb 14, 2026
81c0d8b
Map the PCG for test.
elliottslaughter Feb 15, 2026
6803fb7
Fix a bug in shard expansion.
elliottslaughter Feb 15, 2026
445bee0
Finish body of instance allocation.
elliottslaughter Feb 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 0 additions & 48 deletions .flake/pkgs/legion.nix

This file was deleted.

46 changes: 46 additions & 0 deletions .flake/pkgs/realm.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{ lib
, stdenv
, fetchFromGitHub
, cmake
, cudaPackages ? { }
, zlib
, maxDim ? 5
}:

let
inherit (cudaPackages) cudatoolkit;
in

stdenv.mkDerivation rec {
pname = "realm";
version = "2026-02-06";

src = fetchFromGitHub {
owner = "StanfordLegion";
repo = "realm";
rev = "0405b67ca14b586f7dec0dcddee194cecee7efa6";
sha256 = "sha256-iUPVV1rh3QuyDKgXuu8aDlaZGlNwcpPvPsSVLWp8tr4=";
};

nativeBuildInputs = [
cmake
];

cmakeFlags = [
"-DBUILD_SHARED_LIBS=ON"
"-DREALM_ENABLE_CUDA=ON"
"-DREALM_ENABLE_PREALM=ON"
"-DREALM_MAX_DIM=${toString maxDim}"
];

buildInputs = [
cudatoolkit
zlib
];

meta = with lib; {
description = "Realm is a distributed, event–based tasking runtime for building high-performance applications that span clusters of CPUs, GPUs, and other accelerators";
homepage = "https://legion.stanford.edu/realm";
license = licenses.asl20;
};
}
7 changes: 7 additions & 0 deletions .proj.toml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,13 @@ has-cpu-only-benchmarks = false
has-cuda-tests = true
has-cuda-benchmarks = false

[targets.realm-execution]
type = "lib"
has-cpu-only-tests = true
has-cpu-only-benchmarks = false
has-cuda-tests = true
has-cuda-benchmarks = false

# [targets.local-pcg-execution]
# type = "lib"
# has-cpu-only-tests = true
Expand Down
21 changes: 10 additions & 11 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@
};
};

outputs = { self, nixpkgs, flake-utils, proj-repo, nixGL, ... }: flake-utils.lib.eachSystem [ "x86_64-linux" ] (system:
let
outputs = { self, nixpkgs, flake-utils, proj-repo, nixGL, ... }: flake-utils.lib.eachSystem [ "x86_64-linux" ] (system:
let
pkgs = import nixpkgs {
inherit system;
config.allowUnfree = true;
Expand All @@ -41,21 +41,21 @@
mkShell = attrs: pkgs.mkShell.override {
stdenv = pkgs.cudaPackages.backendStdenv;
} (attrs // {
hardeningDisable = ["all"]; # disable nixpkgs default compiler arguments, otherwise ubsan doesn't catch
# signed overflows due to the signedoverflow hardening setting.
# for more details, see the following (long-running) nixpkgs github issues:
hardeningDisable = ["all"]; # disable nixpkgs default compiler arguments, otherwise ubsan doesn't catch
# signed overflows due to the signedoverflow hardening setting.
# for more details, see the following (long-running) nixpkgs github issues:
# - https://github.com/NixOS/nixpkgs/issues/18995
# - https://github.com/NixOS/nixpkgs/issues/60919
});

proj = proj-repo.packages.${system}.proj;
in
in
{
packages = rec {
libdwarf-lite = pkgs.callPackage ./.flake/pkgs/libdwarf-lite.nix { };
cpptrace = pkgs.callPackage ./.flake/pkgs/cpptrace.nix { inherit libdwarf-lite; };
libassert = pkgs.callPackage ./.flake/pkgs/libassert.nix { inherit cpptrace; };
legion = pkgs.callPackage ./.flake/pkgs/legion.nix { };
realm = pkgs.callPackage ./.flake/pkgs/realm.nix { };
bencher-cli = pkgs.callPackage ./.flake/pkgs/bencher-cli.nix { };
ffdb = pkgs.callPackage ./.flake/pkgs/ffdb { inherit proj; };
hpp2plantuml = pkgs.python3Packages.callPackage ./.flake/pkgs/hpp2plantuml.nix { };
Expand Down Expand Up @@ -83,8 +83,7 @@
shellHook = ''
export PATH="$HOME/ff/.scripts/:$PATH"
export RC_PARAMS="max_discard_ratio=100"
export CMAKE_FLAGS="-DFF_USE_EXTERNAL_LEGION=ON \
-DFF_USE_EXTERNAL_NCCL=ON \
export CMAKE_FLAGS="-DFF_USE_EXTERNAL_NCCL=ON \
-DFF_USE_EXTERNAL_JSON=ON \
-DFF_USE_EXTERNAL_FMT=ON \
-DFF_USE_EXTERNAL_SPDLOG=ON \
Expand All @@ -94,7 +93,7 @@
-DFF_USE_EXTERNAL_GBENCHMARK=ON \
-DFF_USE_EXTERNAL_LIBASSERT=ON"
'';

buildInputs = builtins.concatLists [
(with pkgs; [
zlib
Expand Down Expand Up @@ -125,7 +124,7 @@
])
(with self.packages.${system}; [
libassert
legion
realm
rapidcheckFull
doctest
])
Expand Down
1 change: 1 addition & 0 deletions lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ add_subdirectory(op-attrs)
add_subdirectory(kernels)
add_subdirectory(local-execution)
add_subdirectory(local-pcg-execution)
add_subdirectory(realm-execution)
add_subdirectory(task-spec)
add_subdirectory(utils)
add_subdirectory(ffi)
Expand Down
3 changes: 3 additions & 0 deletions lib/kernels/include/kernels/device_handle_t.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ namespace FlexFlow {
device_handle_t device_handle_t_from_managed_handle(
std::optional<ManagedPerDeviceFFHandle> const &managed_handle);

device_handle_t device_handle_t_from_managed_handle_ptr(
std::optional<ManagedPerDeviceFFHandle *> const &managed_handle);

device_handle_t gpu_make_device_handle_t(PerDeviceFFHandle const &ff_handle);
device_handle_t cpu_make_device_handle_t();

Expand Down
9 changes: 9 additions & 0 deletions lib/kernels/src/kernels/device_handle_t.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ device_handle_t device_handle_t_from_managed_handle(
}
}

device_handle_t device_handle_t_from_managed_handle_ptr(
std::optional<ManagedPerDeviceFFHandle *> const &managed_handle) {
if (managed_handle.has_value()) {
return gpu_make_device_handle_t(managed_handle.value()->raw_handle());
} else {
return cpu_make_device_handle_t();
}
}

device_handle_t gpu_make_device_handle_t(PerDeviceFFHandle const &ff_handle) {
return device_handle_t{
ff_handle,
Expand Down
1 change: 1 addition & 0 deletions lib/pcg/include/pcg/layer_guid_t.dtg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ features = [
"ord",
"hash",
"fmt",
"json",
]

includes = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "pcg/machine_space_coordinate.dtg.h"
#include "pcg/mapped_parallel_computation_graph/operator_atomic_task_shard_binding.dtg.h"
#include "utils/bidict/bidict.h"
#include <nlohmann/json.hpp>

namespace FlexFlow {

Expand Down Expand Up @@ -45,4 +46,15 @@ struct hash<::FlexFlow::MappedOperatorTaskGroup> {
};

} // namespace std

namespace nlohmann {

template <>
struct adl_serializer<::FlexFlow::MappedOperatorTaskGroup> {
static ::FlexFlow::MappedOperatorTaskGroup from_json(json const &j);
static void to_json(json &j, ::FlexFlow::MappedOperatorTaskGroup const &t);
};

} // namespace nlohmann

#endif
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ ParallelLayerAddedResult add_parallel_layer(
ParallelLayerAddedResult pcg_add_input_layer(ParallelComputationGraph &pcg,
TensorShape const &tensor_shape);

ParallelLayerAddedResult
pcg_add_input_layer_with_grad(ParallelComputationGraph &pcg,
TensorShape const &tensor_shape);

OperatorTaskSpace get_operator_task_space(ParallelComputationGraph const &pcg,
parallel_layer_guid_t const &layer);

Expand All @@ -54,6 +58,9 @@ std::unordered_map<TensorSlotName, ParallelComputationGraphEdge>
std::unordered_set<parallel_layer_guid_t>
get_initial_layers(ParallelComputationGraph const &);

std::unordered_map<TensorSlotName, parallel_tensor_guid_t>
get_outgoing_tensors(ParallelComputationGraph const &,
parallel_layer_guid_t const &);
std::unordered_map<TensorSlotName, parallel_tensor_guid_t>
get_incoming_tensors(ParallelComputationGraph const &,
parallel_layer_guid_t const &);
Expand Down Expand Up @@ -107,6 +114,9 @@ ParallelTensorShape get_parallel_tensor_shape(ParallelComputationGraph const &,
std::vector<parallel_layer_guid_t>
topological_ordering(ParallelComputationGraph const &);

std::unordered_map<parallel_layer_guid_t, ParallelLayerAttrs>
get_parallel_layer_attrs_mapping(ParallelComputationGraph const &pcg);

parallel_layer_guid_t
get_parallel_layer_by_name(ParallelComputationGraph const &pcg,
std::string const &name);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ features = [
"ord",
"hash",
"fmt",
"json",
]

includes = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ features = [
"ord",
"hash",
"fmt",
"json",
]

includes = [
Expand Down
1 change: 1 addition & 0 deletions lib/pcg/include/pcg/tensor_guid_t.dtg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ features = [
"ord",
"hash",
"fmt",
"json",
]

includes = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,20 @@ size_t hash<::FlexFlow::MappedOperatorTaskGroup>::operator()(
}

} // namespace std

namespace nlohmann {

::FlexFlow::MappedOperatorTaskGroup
adl_serializer<::FlexFlow::MappedOperatorTaskGroup>::from_json(
json const &j) {
return ::FlexFlow::MappedOperatorTaskGroup{j.template get<
::FlexFlow::bidict<::FlexFlow::MachineSpaceCoordinate,
::FlexFlow::OperatorAtomicTaskShardBinding>>()};
}

void adl_serializer<::FlexFlow::MappedOperatorTaskGroup>::to_json(
json &j, ::FlexFlow::MappedOperatorTaskGroup const &t) {
j = t.get_shard_bindings();
}

} // namespace nlohmann
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,27 @@ ParallelLayerAddedResult pcg_add_input_layer(ParallelComputationGraph &pcg,
});
}

ParallelLayerAddedResult
pcg_add_input_layer_with_grad(ParallelComputationGraph &pcg,
TensorShape const &tensor_shape) {
ParallelLayerAttrs layer_attrs = ParallelLayerAttrs{
/*op_attrs=*/PCGOperatorAttrs{InputAttrs{tensor_shape}},
/*name=*/std::nullopt,
};

return add_parallel_layer(/*pcg=*/pcg,
/*layer_attrs=*/layer_attrs,
/*inputs=*/{},
/*weights=*/{},
/*output_flags=*/
std::unordered_map<TensorSlotName, CreateGrad>{
{
TensorSlotName::OUTPUT,
CreateGrad::YES,
},
});
}

OperatorTaskSpace get_operator_task_space(ParallelComputationGraph const &pcg,
parallel_layer_guid_t const &layer) {
PCGOperatorAttrs op_attrs = pcg_get_op_attrs(pcg, layer);
Expand Down Expand Up @@ -212,6 +233,16 @@ std::unordered_set<parallel_layer_guid_t>
[](Node const &n) { return parallel_layer_guid_t{n}; });
}

std::unordered_map<TensorSlotName, parallel_tensor_guid_t>
get_outgoing_tensors(ParallelComputationGraph const &pcg,
parallel_layer_guid_t const &l) {
return map_values(get_outgoing_kwarg_dataflow_outputs_for_node(
pcg.raw_graph, l.raw_graph_node),
[](KwargDataflowOutput<TensorSlotName> const &o) {
return parallel_tensor_guid_t{o};
});
}

std::unordered_map<TensorSlotName, parallel_tensor_guid_t>
get_incoming_tensors(ParallelComputationGraph const &pcg,
parallel_layer_guid_t const &l) {
Expand Down Expand Up @@ -378,6 +409,17 @@ std::vector<parallel_layer_guid_t>
[](Node const &n) { return parallel_layer_guid_t{n}; });
}

std::unordered_map<parallel_layer_guid_t, ParallelLayerAttrs>
get_parallel_layer_attrs_mapping(ParallelComputationGraph const &pcg) {
std::unordered_map<parallel_layer_guid_t, ParallelLayerAttrs>
layer_attrs_mapping;
for (parallel_layer_guid_t const &layer_guid : get_parallel_layers(pcg)) {
layer_attrs_mapping.insert(
{layer_guid, get_parallel_layer_attrs(pcg, layer_guid)});
}
return layer_attrs_mapping;
}

parallel_layer_guid_t
get_parallel_layer_by_name(ParallelComputationGraph const &pcg,
std::string const &name) {
Expand Down
Loading
Loading