Skip to content

nvme_driver: split the mesh channel to the queue handler between commands and requests (#2708)#2723

Merged
mattkur merged 1 commit intomicrosoft:release/1.7.2511from
gurasinghMS:cherrypick/release/1.7.2511/pr-2708
Feb 2, 2026
Merged

nvme_driver: split the mesh channel to the queue handler between commands and requests (#2708)#2723
mattkur merged 1 commit intomicrosoft:release/1.7.2511from
gurasinghMS:cherrypick/release/1.7.2511/pr-2708

Conversation

@gurasinghMS
Copy link
Contributor

Clean cherry pick of PR #2708

Currently, in QueueHandler::poll_fn, we register a waker with the receiving channel only when there is space available in the submission queue (SQ). This channel multiplexes two kinds of traffic: requests that write to the submission queue, and control‑plane requests for the handler.
When the submission queue is full, poll_fn does not register a waker with recv. Instead, it relies entirely on incoming completions and interrupts to wake the task. As a result, if completions are slow, not only is command submission affected, but control‑plane operations such as Save, Inspect, and NextAen are also delayed—even though they are unrelated to SQ writes.
With this change, the run loop now listens on three separate channels:

  1. A command channel for requests that write to the submission queue
  2. A control‑plane channel for operations like Save and Inspect
  3. A completion channel

This separation ensures that control‑plane operations can continue to make progress even when the submission queue is full and completions are slow. In particular, Save() should no longer be blocked by SQ backpressure.
Note: I am still working on a VMM test to validate behavior when the submission queue is full.

…ands and requests (microsoft#2708)

Currently, in QueueHandler::poll_fn, we register a waker with the
receiving channel only when there is space available in the submission
queue (SQ). This channel multiplexes two kinds of traffic: requests that
write to the submission queue, and control‑plane requests for the
handler.
When the submission queue is full, poll_fn does not register a waker
with recv. Instead, it relies entirely on incoming completions and
interrupts to wake the task. As a result, if completions are slow, not
only is command submission affected, but control‑plane operations such
as Save, Inspect, and NextAen are also delayed—even though they are
unrelated to SQ writes.
With this change, the run loop now listens on three separate channels:

1. A command channel for requests that write to the submission queue
2. A control‑plane channel for operations like Save and Inspect
3. A completion channel

This separation ensures that control‑plane operations can continue to
make progress even when the submission queue is full and completions are
slow. In particular, Save() should no longer be blocked by SQ
backpressure.
Note: I am still working on a VMM test to validate behavior when the
submission queue is full.

(cherry picked from commit 86a1cba)
Copilot AI review requested due to automatic review settings January 31, 2026 01:04
@gurasinghMS gurasinghMS requested review from a team as code owners January 31, 2026 01:04
@github-actions github-actions bot added the release_1.7.2511 Targets the release/1.7.2511 branch. label Jan 31, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the NVMe queue pair’s internal messaging so that control‑plane operations (like Save and Inspect) are no longer backpressured by a full submission queue or slow completions, improving responsiveness under load.

Changes:

  • Splits the previous single mesh channel between QueuePair and QueueHandler into two channels: one for control‑plane requests (Req) and one for data‑plane commands (Cmd), updating QueueHandlerLoop and Issuer accordingly.
  • Refactors QueueHandler::run to poll the command channel only when there is SQ and pending‑command capacity, while always polling the control‑plane channel and giving it priority over completions.
  • Updates inspection wiring (#[inspect(flatten, with = ...)]) and save/restore paths to use the new control‑plane channel (send_req / recv_req) without changing external APIs.

let event = if !self.drain_after_restore {
// Normal processing of the requests and completions.
poll_fn(|cx| {
// Look for NVME commands
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment uses the acronym "NVME" while the rest of this module and NVMe spec consistently use "NVMe"; please fix the casing here for consistency (e.g., "NVMe commands").

Suggested change
// Look for NVME commands
// Look for NVMe commands

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has been tested in production and in local vmm_tests (that change is still iterating in main).

@mattkur mattkur merged commit 88b4160 into microsoft:release/1.7.2511 Feb 2, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_1.7.2511 Targets the release/1.7.2511 branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants