Skip to content

[Feature Request] Native ONNX Runtime support (mlserver-onnx) #2360

@Snomaan6846

Description

@Snomaan6846

Description

MLServer currently provides out-of-the-box support for several frameworks like Scikit-Learn, XGBoost, and LightGBM. However, it lacks a dedicated, native runtime for ONNX.

As ONNX is the industry standard for model interoperability, adding a first-class mlserver-onnx runtime would eliminate the need for users to write custom Python wrappers for every deployment. This would streamline the path from training (in PyTorch, TensorFlow, etc.) to production serving via MLServer.

Proposed Requirements

  • Dedicated Runtime: A new mlserver-onnx package that implements the MLModel interface.
  • Metadata Auto-discovery: The runtime should automatically parse the .onnx file to infer input/output names, shapes, and types, reducing manual configuration in model-settings.json.
  • Execution Providers: Support for hardware acceleration (e.g., CUDAExecutionProvider, OpenVINOExecutionProvider) through the parameters field.
  • Standardized Data Handling: Optimized mapping between MLServer's InferenceRequest and ONNX Runtime’s tensor format.

Proposed Configuration Example

The user experience should be as simple as defining the implementation in model-settings.json:

{
  "name": "resnet50-onnx",
  "implementation": "mlserver_onnx.ONNXModel",
  "parameters": {
    "uri": "./model.onnx",
    "extra": {
      "execution_providers": ["CUDAExecutionProvider", "CPUExecutionProvider"]
    }
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions