-
Notifications
You must be signed in to change notification settings - Fork 224
Open
Description
Description
MLServer currently provides out-of-the-box support for several frameworks like Scikit-Learn, XGBoost, and LightGBM. However, it lacks a dedicated, native runtime for ONNX.
As ONNX is the industry standard for model interoperability, adding a first-class mlserver-onnx runtime would eliminate the need for users to write custom Python wrappers for every deployment. This would streamline the path from training (in PyTorch, TensorFlow, etc.) to production serving via MLServer.
Proposed Requirements
- Dedicated Runtime: A new
mlserver-onnxpackage that implements theMLModelinterface. - Metadata Auto-discovery: The runtime should automatically parse the
.onnxfile to infer input/output names, shapes, and types, reducing manual configuration inmodel-settings.json. - Execution Providers: Support for hardware acceleration (e.g.,
CUDAExecutionProvider,OpenVINOExecutionProvider) through theparametersfield. - Standardized Data Handling: Optimized mapping between MLServer's
InferenceRequestand ONNX Runtime’s tensor format.
Proposed Configuration Example
The user experience should be as simple as defining the implementation in model-settings.json:
{
"name": "resnet50-onnx",
"implementation": "mlserver_onnx.ONNXModel",
"parameters": {
"uri": "./model.onnx",
"extra": {
"execution_providers": ["CUDAExecutionProvider", "CPUExecutionProvider"]
}
}
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels