Conversation
📝 WalkthroughWalkthroughThis pull request updates GPU configuration and build settings, modifies CUDA header handling for runtime warp size detection, and introduces a new proof-of-concept for CUDA-accelerated Python integration using Numba JIT compilation with pybind11 bindings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai are you setup for this repo? |
|
Tip For best results, initiate chat on the files or code changes.
Would you like me to:
Just let me know how I can assist! 🐰 🧠 Learnings used |
|
Tip For best results, initiate chat on the files or code changes.
How can I help you with this PR? Would you like me to:
Let me know what you need! 🐰 🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@inc/mkn/gpu/cuda.hpp`:
- Around line 84-95: Make getWarpSize inline to avoid ODR violations when
included from headers, and add proper CUDA error handling: in the inline
std::uint32_t getWarpSize(size_t dev = 0) function initialize a cudaDeviceProp
variable before use, call cudaGetDeviceProperties and check its return value
(use the project's MKN_GPU_ASSERT macro or return a safe default like 32 /
throw/abort on failure), and only access devProp.warpSize on success; also
ensure the static std::uint32_t const warp_size = getWarpSize(); usage remains
valid after these safeguards.
In `@res/mkn/clang_cuda.yaml`:
- Around line 18-31: The compiler and linker entries currently invoke the
generic clang++ which can pick older toolchains; update the "compiler" and
"linker" fields so they explicitly call clang++-17 (i.e. replace uses of
"clang++" in the compiler: and linker: settings with "clang++-17") to ensure the
recommended clang 17 toolchain is used for the cxx_cuda build and CUDA SM_80
flags (references: compiler, linker, cxx_cuda).
In `@test/poc/jit/eg.py`:
- Around line 17-24: vadd is declared as a CUDA kernel but is invoked from
inside another kernel (vector_add_gpu), which Numba disallows; change vadd to a
device function by using `@cuda.jit`(device=True) instead of a kernel decorator so
it can be called from vector_add_gpu, keep the same signature (index i and
arrays a, b, c) and ensure vector_add_gpu calls vadd(...) as a regular device
function rather than launching it.
In `@test/poc/jit/pybind.cpp`:
- Around line 61-66: The pybind registration mistakenly repeats
.def_readwrite("a", &FunctionSupport::a) instead of exposing member c; update
the duplicate binding in the chain that registers FunctionSupport so it uses
.def_readwrite("c", &FunctionSupport::c) (keep the existing
.def("A",&FunctionSupport::A), .def_readwrite("b",&FunctionSupport::b) and
.def("C",&FunctionSupport::C) intact) so the struct's c member is correctly
exposed to Python.
- Around line 44-46: The make function currently takes an auto parameter and
returns a py::array_t<float> with a no-op capsule, which causes two problems:
'auto' in the parameter requires C++20 and the no-op capsule does not extend the
owning FunctionSupport lifetime (risking use-after-free). Change the signature
of make to accept an explicit float* (replace auto p with float* p) and when
constructing the py::array_t<float> pass py::cast(this) as the base object so
the Python wrapper for this FunctionSupport instance is held alive for the
array's lifetime, removing the no-op capsule.
🧹 Nitpick comments (1)
res/mkn/nvcc.yaml (1)
5-19: Consider making the CUDA arch explicit/configurable.
Without an arch flag, nvcc defaults can vary across environments and hurt reproducibility/perf. Acuda_archproperty (or env override) keeps builds consistent.Example (adjust to mkn interpolation rules)
property: - cxx_flags: --std=c++20 -Xcompiler -fPIC # -fPIC -fsized-deallocation -Wno-unknown-cuda-version - cxx_cuda: -x cu --extended-lambda `#-x` cuda --cuda-gpu-arch="sm_80" -Xclang -fcuda-allow-variadic-functions + cuda_arch: sm_80 + cxx_flags: --std=c++20 -Xcompiler -fPIC # -fPIC -fsized-deallocation -Wno-unknown-cuda-version + cxx_cuda: -x cu --extended-lambda --gpu-architecture="${cuda_arch}"
| std::uint32_t getWarpSize(size_t dev = 0) { | ||
| #ifdef _MKN_GPU_WARP_SIZE_ | ||
| static constexpr int warp_size = _MKN_GPU_WARP_SIZE_; | ||
| return _MKN_GPU_WARP_SIZE_; | ||
| #else | ||
| static constexpr int warp_size = warpSize; | ||
| cudaDeviceProp devProp; | ||
| [[maybe_unused]] auto ret = cudaGetDeviceProperties(&devProp, dev); | ||
| return devProp.warpSize; | ||
| #endif /*_MKN_GPU_WARP_SIZE_ */ | ||
| } | ||
|
|
||
| static std::uint32_t const warp_size = getWarpSize(); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n inc/mkn/gpu/cuda.hpp | sed -n '80,100p'Repository: mkn/mkn.gpu
Length of output: 800
🏁 Script executed:
rg "getWarpSize" --type cpp --type hpp -B 2 -A 8Repository: mkn/mkn.gpu
Length of output: 82
🏁 Script executed:
rg "MKN_GPU_ASSERT" --type cpp --type hpp | head -20Repository: mkn/mkn.gpu
Length of output: 82
🏁 Script executed:
rg "cudaGetDeviceProperties" --type cpp --type hpp -B 2 -A 3Repository: mkn/mkn.gpu
Length of output: 82
🏁 Script executed:
rg "getWarpSize" -B 2 -A 2Repository: mkn/mkn.gpu
Length of output: 513
🏁 Script executed:
rg "MKN_GPU_ASSERT" -B 1 -A 1 | head -40Repository: mkn/mkn.gpu
Length of output: 1886
🏁 Script executed:
git log --oneline --all -- inc/mkn/gpu/cuda.hpp | head -10Repository: mkn/mkn.gpu
Length of output: 344
🏁 Script executed:
rg "define MKN_GPU_ASSERT" -A 10Repository: mkn/mkn.gpu
Length of output: 1742
🏁 Script executed:
git show 3e7997c --stat | head -20Repository: mkn/mkn.gpu
Length of output: 315
🏁 Script executed:
git show 3e7997c -- inc/mkn/gpu/cuda.hpp | head -100Repository: mkn/mkn.gpu
Length of output: 665
🏁 Script executed:
head -30 inc/mkn/gpu/cuda.hppRepository: mkn/mkn.gpu
Length of output: 1545
🏁 Script executed:
grep -n "define _MKN_GPU_WARP_SIZE_" inc/mkn/gpu/cuda.hppRepository: mkn/mkn.gpu
Length of output: 37
🏁 Script executed:
find . -name "*.cpp" -o -name "*.cc" | head -20Repository: mkn/mkn.gpu
Length of output: 414
🏁 Script executed:
grep -n "_MKN_GPU_WARP_SIZE_" inc/mkn/gpu/cuda.hppRepository: mkn/mkn.gpu
Length of output: 155
🏁 Script executed:
wc -l inc/mkn/gpu/cuda.hppRepository: mkn/mkn.gpu
Length of output: 79
🏁 Script executed:
sed -n '84,100p' inc/mkn/gpu/cuda.hppRepository: mkn/mkn.gpu
Length of output: 599
Make getWarpSize inline and add error handling for CUDA initialization.
Line 84 defines a non-inline function in a header file, creating an ODR violation when included across multiple translation units. Additionally, cudaDeviceProp devProp at line 88 is uninitialized, and if cudaGetDeviceProperties fails (e.g., in CPU-only setups), line 90 accesses uninitialized memory—undefined behavior. The error is silently ignored with [[maybe_unused]], inconsistent with the project's MKN_GPU_ASSERT pattern used throughout the codebase.
Suggested fix
-std::uint32_t getWarpSize(size_t dev = 0) {
+inline std::uint32_t getWarpSize(size_t dev = 0) {
`#ifdef` _MKN_GPU_WARP_SIZE_
return _MKN_GPU_WARP_SIZE_;
`#else`
- cudaDeviceProp devProp;
- [[maybe_unused]] auto ret = cudaGetDeviceProperties(&devProp, dev);
- return devProp.warpSize;
+ int count = 0;
+ if (cudaGetDeviceCount(&count) != cudaSuccess || count <= 0) {
+ return 32; // TODO: choose a project-appropriate fallback
+ }
+ if (dev >= static_cast<size_t>(count)) dev = 0;
+ cudaDeviceProp devProp{};
+ MKN_GPU_ASSERT(cudaGetDeviceProperties(&devProp, static_cast<int>(dev)));
+ return devProp.warpSize;
`#endif` /*_MKN_GPU_WARP_SIZE_ */
}🤖 Prompt for AI Agents
In `@inc/mkn/gpu/cuda.hpp` around lines 84 - 95, Make getWarpSize inline to avoid
ODR violations when included from headers, and add proper CUDA error handling:
in the inline std::uint32_t getWarpSize(size_t dev = 0) function initialize a
cudaDeviceProp variable before use, call cudaGetDeviceProperties and check its
return value (use the project's MKN_GPU_ASSERT macro or return a safe default
like 32 / throw/abort on failure), and only access devProp.warpSize on success;
also ensure the static std::uint32_t const warp_size = getWarpSize(); usage
remains valid after these safeguards.
There was a problem hiding this comment.
What I'm trying to do is have a default warpSize, but allow for overrides as rocm headers can use warpsize 64, but some amd gamings gpus only support 32
it needs to work on host and device, and for clang, nvcc, and hipcc
There was a problem hiding this comment.
@PhilipDeegan, thanks for clarifying the design intent! Supporting runtime warp size detection with compile-time override across CUDA/ROCm platforms makes sense.
The inline keyword is still necessary to avoid ODR violations when this header is included from multiple translation units. However, I'll adjust my error handling recommendation given your cross-platform requirements:
Revised suggestions:
- Add
inlinekeyword - Required for ODR safety in headers - Initialize
devProp- UsecudaDeviceProp devProp{};to avoid undefined behavior if the API call fails - Graceful fallback - Since this runs at static init time and needs to work across platforms, consider returning a safe default (32) if device query fails, rather than aborting:
inline std::uint32_t getWarpSize(size_t dev = 0) {
`#ifdef` _MKN_GPU_WARP_SIZE_
return _MKN_GPU_WARP_SIZE_;
`#else`
cudaDeviceProp devProp{};
auto ret = cudaGetDeviceProperties(&devProp, dev);
if (ret != cudaSuccess) {
return 32; // Safe fallback for gaming GPUs and when CUDA unavailable
}
return devProp.warpSize;
`#endif`
}Question: When building for ROCm/HIP, does cudaGetDeviceProperties map to hipGetDeviceProperties? If so, the same fallback logic should work. If you need platform-specific handling, that would require conditional compilation.
Does this approach work for your use case?
✏️ Learnings added
Learnt from: PhilipDeegan
Repo: mkn/mkn.gpu PR: 30
File: inc/mkn/gpu/cuda.hpp:84-95
Timestamp: 2026-01-27T21:55:44.003Z
Learning: In `inc/mkn/gpu/cuda.hpp`, the `getWarpSize()` function is designed to provide a default warp size with compile-time override via `_MKN_GPU_WARP_SIZE_`, supporting both CUDA (warp size 32) and ROCm (warp size 64), and must work on host and device with clang, nvcc, and hipcc compilers.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: PhilipDeegan
Repo: mkn/mkn.gpu PR: 23
File: inc/mkn/gpu/cuda.hpp:307-311
Timestamp: 2024-12-08T14:07:35.137Z
Learning: In `inc/mkn/gpu/cuda.hpp`, the `global_d_kernel` function is intentionally left without bounds checking, intended for advanced use where the caller is responsible for ensuring proper grid and block dimensions, while `global_gd_kernel` includes bounds checking for safer operation.
Learnt from: PhilipDeegan
Repo: mkn/mkn.gpu PR: 19
File: test/any/async_streaming.cpp:2-6
Timestamp: 2024-06-10T19:38:12.663Z
Learning: PhilipDeegan, thank you for sharing your preferred style for organizing includes. I'll remember this for future reviews in your codebase.
<!--
PhilipDeegan prefers to sort `#include` statements by line length first, and alphabetically second.
| @cuda.jit | ||
| def vadd(i, a, b, c): | ||
| c[i] = a[i] + b[i] | ||
|
|
||
|
|
||
| @cuda.jit | ||
| def vector_add_gpu(a, b, c): | ||
| vadd(cuda.threadIdx.x, a, b, c) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Numba CUDA @cuda.jit device=True device function syntax kernel-to-kernel call
💡 Result:
Device function (@cuda.jit(device=True)) syntax and how to call it from a kernel
Define a device function with device=True, then call it like a normal Python function from inside a @cuda.jit kernel (or from another device function). Device functions can return values. [1]
from numba import cuda
import math
`@cuda.jit`(device=True)
def f(x, y):
return x + y
`@cuda.jit`
def kernel(a, b, out):
i = cuda.grid(1)
if i < out.size:
out[i] = f(a[i], b[i]) # device-function callWhat you cannot do (“kernel-to-kernel call”)
- You cannot launch a kernel from inside another kernel in Numba (CUDA dynamic parallelism is not supported). [2]
- A
@cuda.jit(device=True)function is not callable from host (CPU) Python code—it must be called only from GPU code (a kernel/device function). [3]
Sources:
[1] Numba “Writing Device Functions” docs (stable) (numba.readthedocs.io)
[2] Numba-CUDA “Writing CUDA Kernels” (note on dynamic parallelism not supported) (nvidia.github.io)
[3] StackOverflow: device functions are only callable from kernels/device code (stackoverflow.com)
Citations:
- 1: https://numba.readthedocs.io/en/stable/cuda/device-functions.html?utm_source=openai
- 2: https://nvidia.github.io/numba-cuda/user/kernels.html?utm_source=openai
- 3: https://stackoverflow.com/questions/66405332/cuda-jitdevice-true-returns-devicefunctiontemplate-object-is-not-callable?utm_source=openai
Fix kernel-to-kernel call in Numba CUDA.
vadd is declared as a CUDA kernel but invoked from within another kernel (vector_add_gpu), which Numba does not support. Kernel-to-kernel calls will fail during JIT compilation. Change vadd to a device function using @cuda.jit(device=True).
🔧 Suggested fix (make vadd a device function)
-@cuda.jit
+@cuda.jit(device=True)
def vadd(i, a, b, c):
c[i] = a[i] + b[i]
`@cuda.jit`
def vector_add_gpu(a, b, c):
- vadd(cuda.threadIdx.x, a, b, c)
+ i = cuda.threadIdx.x
+ vadd(i, a, b, c)🤖 Prompt for AI Agents
In `@test/poc/jit/eg.py` around lines 17 - 24, vadd is declared as a CUDA kernel
but is invoked from inside another kernel (vector_add_gpu), which Numba
disallows; change vadd to a device function by using `@cuda.jit`(device=True)
instead of a kernel decorator so it can be called from vector_add_gpu, keep the
same signature (index i and arrays a, b, c) and ensure vector_add_gpu calls
vadd(...) as a regular device function rather than launching it.
There was a problem hiding this comment.
it seems to be working for me locally
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| py::array_t<float> make(auto p) { | ||
| return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })}; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "pybind.cpp" -type f | head -20Repository: mkn/mkn.gpu
Length of output: 80
🏁 Script executed:
cat -n ./test/poc/jit/pybind.cppRepository: mkn/mkn.gpu
Length of output: 2359
🌐 Web query:
pybind11 py::array_t base parameter lifetime management reference_internal
💡 Result:
return_value_policy::reference_internal
reference_internal is only about C++ references/pointers returned from methods/properties: it keeps the implicit self (“parent”) alive as long as the returned object (“child”) is alive. Internally it’s reference + keep_alive<0,1> (i.e., keep argument 1 alive while the return value (0) is alive). [1]
So it does not manage the lifetime of a NumPy buffer unless the thing you’re returning is literally a C++ ref/pointer whose lifetime is tied to self.
py::array_t(..., base=...) (NumPy view lifetime)
When you construct a py::array / py::array_t from an existing data pointer, the last parameter (handle base) is the mechanism that ties the array’s lifetime to some Python object that owns (or keeps alive) the memory. In the constructor, if base && ptr is provided, NumPy will treat it as a non-owning/view array and keep a reference to base (so base can’t be GC’d while the array is alive). [2]
Signature (simplified):
py::array(dt, shape, strides, ptr, handle base = {}) [2]
Practical patterns
1) View into a C++ object’s internal buffer (keep self alive):
Return an array view and pass the Python self as base (so the C++ instance can’t die while the NumPy view exists). This is the “NumPy equivalent” of what people expect reference_internal to do, but it’s done via base, not via return value policy. [2]
2) Heap-owned buffer (free when array dies):
Allocate and pass a py::capsule as base with a deleter; NumPy keeps the capsule alive and the deleter runs when the array is freed. (Common idiom in pybind11.) [2][3]
Key takeaway
- Use
reference_internalfor C++ ref/pointer returns tied toself. [1] - Use
py::array_t(..., base=...)to manage NumPy view lifetimes, typically by settingbasetoselfor to apy::capsule/owner object. [2][3]
Sources: [1] pybind11 docs on reference_internal / keep_alive; [2] pybind11/numpy.h array constructor showing base handling; [3] explanation/example of using py::capsule as base to tie deallocation to array lifetime.
🌐 Web query:
C++ auto parameter in function auto p C++20 deduced parameter requirement
💡 Result:
In C++20, writing auto in a function parameter type makes the declaration an abbreviated function template: each auto becomes an invented template type parameter, and the parameter type is deduced from the call using normal template argument deduction rules. (en.cppreference.com)
What’s required / allowed
automust appear at the top level of the parameter’s type (you may addconst,&,&&,*, etc. around it):(en.cppreference.com)void f(auto x); // template<class T> void f(T) void g(const auto& x); // template<class T> void g(const T&) void h(auto* p); // template<class T> void h(T*)
- You can constrain the deduced type with a concept:
If the constraint is false/ill-formed for the deduced
void f(std::integral auto x); // template<std::integral T> void f(T)
T, deduction fails. (en.cppreference.com)
What’s not allowed (common pitfall)
- In C++20,
autois not permitted inside a “compound” parameter type likestd::vector<auto>; it’s only allowed as a generic parameter type placeholder at the top level of the parameter declaration. (stackoverflow.com)
If you paste your exact signature (e.g., what you mean by “auto p”), I can restate the equivalent explicit template<...> form and point out exactly why it does/doesn’t compile.
Citations:
- 1: https://en.cppreference.com/w/cpp/language/function_template.html?utm_source=openai
- 2: https://en.cppreference.com/w/cpp/language/function_template.html?utm_source=openai
- 3: https://en.cppreference.com/w/cpp/language/auto?utm_source=openai
- 4: https://stackoverflow.com/questions/60358154/auto-as-a-template-argument-placeholder-for-a-function-parameter?utm_source=openai
🌐 Web query:
pybind11 py::cast(this) py::array_t base buffer lifetime keep alive
💡 Result:
To keep a py::array_t (that views C++ memory) from outliving the C++ object that owns the buffer, pass a Python “owner” object as the array’s base. NumPy will hold a reference to base, so the owner stays alive as long as the array does. (pybind11.readthedocs.io)
Pattern: make the array’s base be self (this)
Inside a bound method, you can tie the array lifetime to the Python wrapper for *this:
`#include` <pybind11/numpy.h>
namespace py = pybind11;
py::array_t<float> MyObj::view() {
auto base = py::cast(this, py::return_value_policy::reference); // "self"
return py::array_t<float>(
{n}, // shape
{sizeof(float)}, // strides (bytes)
data_ptr, // pointer into this object
base // base/owner => keeps self alive
);
}This is the NumPy equivalent of “keep self alive while the returned view exists”: the array does not own the data; it just references it, and base prevents premature destruction. (pybind11.readthedocs.io)
Alternative: reference_internal / keep_alive<0,1>
If you’re returning a pointer/reference to internal data (not building a NumPy array yourself), return_value_policy::reference_internal applies the equivalent of keep_alive<0,1> (keep the implicit this alive for the return value’s lifetime). (pybind11.readthedocs.io)
When the buffer isn’t owned by this: use a capsule as base
If the memory should be freed when the array dies (or is owned by something other than this), use a py::capsule as the array base with a destructor callback. (exchangetuts.com)
Citations:
- 1: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html?utm_source=openai
- 2: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html?utm_source=openai
- 3: https://pybind11.readthedocs.io/en/latest/advanced/functions.html?utm_source=openai
- 4: https://exchangetuts.com/returning-numpy-arrays-via-pybind11-1639677247749392?utm_source=openai
Fix lifetime management and C++20 incompatibility in array return.
The no-op capsule does not keep the owning FunctionSupport alive; Python can hold the array after the destructor frees memory, causing use-after-free. Additionally, auto in function parameters requires C++20—without it, this won't compile.
Change to an explicit float* parameter and pass py::cast(this) as the array's base to tie its lifetime to the Python wrapper of this:
🔧 Suggested fix
- py::array_t<float> make(auto p) {
- return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })};
- }
+ py::array_t<float> make(float* p) {
+ return py::array_t<float>(
+ {32},
+ {sizeof(float)},
+ p,
+ py::cast(this));
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| py::array_t<float> make(auto p) { | |
| return {{32}, {sizeof(float)}, p, py::capsule(p, [](void* f) { /* noop */ })}; | |
| } | |
| py::array_t<float> make(float* p) { | |
| return py::array_t<float>( | |
| {32}, | |
| {sizeof(float)}, | |
| p, | |
| py::cast(this)); | |
| } |
🤖 Prompt for AI Agents
In `@test/poc/jit/pybind.cpp` around lines 44 - 46, The make function currently
takes an auto parameter and returns a py::array_t<float> with a no-op capsule,
which causes two problems: 'auto' in the parameter requires C++20 and the no-op
capsule does not extend the owning FunctionSupport lifetime (risking
use-after-free). Change the signature of make to accept an explicit float*
(replace auto p with float* p) and when constructing the py::array_t<float> pass
py::cast(this) as the base object so the Python wrapper for this FunctionSupport
instance is held alive for the array's lifetime, removing the no-op capsule.
There was a problem hiding this comment.
this is a POC to show C++ side memory allocation/lifetimes, so not so important
There was a problem hiding this comment.
@PhilipDeegan Understood! Since this is a POC for demonstrating memory allocation patterns, those details are fine as-is. Thanks for the context.
🧠 Learnings used
Learnt from: PhilipDeegan
Repo: mkn/mkn.gpu PR: 19
File: test/any/async_streaming.cpp:2-6
Timestamp: 2024-06-10T19:38:12.663Z
Learning: PhilipDeegan, thank you for sharing your preferred style for organizing includes. I'll remember this for future reviews in your codebase.
<!--
PhilipDeegan prefers to sort `#include` statements by line length first, and alphabetically second.
| .def_readwrite("a", &FunctionSupport::a) | ||
| .def("A", &FunctionSupport::A) | ||
| .def_readwrite("b", &FunctionSupport::b) | ||
| .def("B", &FunctionSupport::B) | ||
| .def_readwrite("a", &FunctionSupport::a) | ||
| .def("C", &FunctionSupport::C); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual file content around lines 61-66
cat -n test/poc/jit/pybind.cpp | sed -n '55,75p'Repository: mkn/mkn.gpu
Length of output: 780
🏁 Script executed:
# Let's also search for the FunctionSupport class definition
rg -A 20 'class FunctionSupport' test/poc/jit/Repository: mkn/mkn.gpu
Length of output: 37
🏁 Script executed:
# Check if there are other references to binding "c" in the file
rg 'def.*"c"' test/poc/jit/pybind.cppRepository: mkn/mkn.gpu
Length of output: 37
🏁 Script executed:
# Search for FunctionSupport class definition in the entire repo
rg -l 'class FunctionSupport'Repository: mkn/mkn.gpu
Length of output: 37
🏁 Script executed:
# Also look at the beginning of the pybind.cpp file to see includes and class definition
head -n 60 test/poc/jit/pybind.cppRepository: mkn/mkn.gpu
Length of output: 1429
Duplicate a binding—should expose c instead.
The FunctionSupport struct has three members (a, b, c) with corresponding methods (A(), B(), C()). Line 65 duplicates the a binding, preventing exposure of the c member variable that is allocated, used, and destroyed throughout the class lifecycle.
🔧 Suggested fix
.def("A", &FunctionSupport::A)
.def_readwrite("b", &FunctionSupport::b)
.def("B", &FunctionSupport::B)
- .def_readwrite("a", &FunctionSupport::a)
+ .def_readwrite("c", &FunctionSupport::c)
.def("C", &FunctionSupport::C);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .def_readwrite("a", &FunctionSupport::a) | |
| .def("A", &FunctionSupport::A) | |
| .def_readwrite("b", &FunctionSupport::b) | |
| .def("B", &FunctionSupport::B) | |
| .def_readwrite("a", &FunctionSupport::a) | |
| .def("C", &FunctionSupport::C); | |
| .def_readwrite("a", &FunctionSupport::a) | |
| .def("A", &FunctionSupport::A) | |
| .def_readwrite("b", &FunctionSupport::b) | |
| .def("B", &FunctionSupport::B) | |
| .def_readwrite("c", &FunctionSupport::c) | |
| .def("C", &FunctionSupport::C); |
🤖 Prompt for AI Agents
In `@test/poc/jit/pybind.cpp` around lines 61 - 66, The pybind registration
mistakenly repeats .def_readwrite("a", &FunctionSupport::a) instead of exposing
member c; update the duplicate binding in the chain that registers
FunctionSupport so it uses .def_readwrite("c", &FunctionSupport::c) (keep the
existing .def("A",&FunctionSupport::A), .def_readwrite("b",&FunctionSupport::b)
and .def("C",&FunctionSupport::C) intact) so the struct's c member is correctly
exposed to Python.
Summary by CodeRabbit
New Features
Build & Infrastructure
Tests & Examples
Chores
✏️ Tip: You can customize this high-level summary in your review settings.