nvme_driver: split the mesh channel to the queue handler between commands and requests (#2708) by gurasinghMS · Pull Request #2723 · microsoft/openvmm

gurasinghMS · 2026-01-31T01:04:48Z

Clean cherry pick of PR #2708

Currently, in QueueHandler::poll_fn, we register a waker with the receiving channel only when there is space available in the submission queue (SQ). This channel multiplexes two kinds of traffic: requests that write to the submission queue, and control‑plane requests for the handler.
When the submission queue is full, poll_fn does not register a waker with recv. Instead, it relies entirely on incoming completions and interrupts to wake the task. As a result, if completions are slow, not only is command submission affected, but control‑plane operations such as Save, Inspect, and NextAen are also delayed—even though they are unrelated to SQ writes.
With this change, the run loop now listens on three separate channels:

A command channel for requests that write to the submission queue
A control‑plane channel for operations like Save and Inspect
A completion channel

This separation ensures that control‑plane operations can continue to make progress even when the submission queue is full and completions are slow. In particular, Save() should no longer be blocked by SQ backpressure.
Note: I am still working on a VMM test to validate behavior when the submission queue is full.

…ands and requests (microsoft#2708) Currently, in QueueHandler::poll_fn, we register a waker with the receiving channel only when there is space available in the submission queue (SQ). This channel multiplexes two kinds of traffic: requests that write to the submission queue, and control‑plane requests for the handler. When the submission queue is full, poll_fn does not register a waker with recv. Instead, it relies entirely on incoming completions and interrupts to wake the task. As a result, if completions are slow, not only is command submission affected, but control‑plane operations such as Save, Inspect, and NextAen are also delayed—even though they are unrelated to SQ writes. With this change, the run loop now listens on three separate channels: 1. A command channel for requests that write to the submission queue 2. A control‑plane channel for operations like Save and Inspect 3. A completion channel This separation ensures that control‑plane operations can continue to make progress even when the submission queue is full and completions are slow. In particular, Save() should no longer be blocked by SQ backpressure. Note: I am still working on a VMM test to validate behavior when the submission queue is full. (cherry picked from commit 86a1cba)

Copilot

Pull request overview

This PR refactors the NVMe queue pair’s internal messaging so that control‑plane operations (like Save and Inspect) are no longer backpressured by a full submission queue or slow completions, improving responsiveness under load.

Changes:

Splits the previous single mesh channel between QueuePair and QueueHandler into two channels: one for control‑plane requests (Req) and one for data‑plane commands (Cmd), updating QueueHandlerLoop and Issuer accordingly.
Refactors QueueHandler::run to poll the command channel only when there is SQ and pending‑command capacity, while always polling the control‑plane channel and giving it priority over completions.
Updates inspection wiring (#[inspect(flatten, with = ...)]) and save/restore paths to use the new control‑plane channel (send_req / recv_req) without changing external APIs.

Copilot · 2026-01-31T01:07:26Z

vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs

            let event = if !self.drain_after_restore {
                // Normal processing of the requests and completions.
                poll_fn(|cx| {
+                    // Look for NVME commands


Comment uses the acronym "NVME" while the rest of this module and NVMe spec consistently use "NVMe"; please fix the casing here for consistency (e.g., "NVMe commands").

Suggested change

// Look for NVME commands

// Look for NVMe commands

mattkur

Has been tested in production and in local vmm_tests (that change is still iterating in main).

Copilot AI review requested due to automatic review settings January 31, 2026 01:04

gurasinghMS requested review from a team as code owners January 31, 2026 01:04

github-actions bot added the release_1.7.2511 Targets the release/1.7.2511 branch. label Jan 31, 2026

Copilot started reviewing on behalf of gurasinghMS January 31, 2026 01:05 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

mattkur approved these changes Feb 2, 2026

View reviewed changes

mattkur merged commit 88b4160 into microsoft:release/1.7.2511 Feb 2, 2026
58 checks passed

benhillis mentioned this pull request Feb 3, 2026

nvme_driver: split the mesh channel to the queue handler between commands and requests #2708

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvme_driver: split the mesh channel to the queue handler between commands and requests (#2708)#2723

nvme_driver: split the mesh channel to the queue handler between commands and requests (#2708)#2723
mattkur merged 1 commit intomicrosoft:release/1.7.2511from
gurasinghMS:cherrypick/release/1.7.2511/pr-2708

gurasinghMS commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

mattkur left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gurasinghMS commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

mattkur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants