fix: prevent 2x response duplication when cursor-agent emits partial …#43
fix: prevent 2x response duplication when cursor-agent emits partial …#43Nomadcxx merged 1 commit intoNomadcxx:mainfrom
Conversation
…delta events cursor-agent --stream-partial-output emits two kinds of assistant text events: partial deltas (with timestamp_ms) containing only new text fragments, and a final accumulated event (without timestamp_ms) containing the complete text. DeltaTracker expects accumulated text and diffs against previous state, but partial deltas broke its diffing logic — causing the final accumulated event to re-emit the entire response. Discriminate between partial and accumulated events using the timestamp_ms field: partial events are emitted directly as deltas, and the final accumulated event is skipped when partials were seen. Backward-compatible with accumulated-only format (existing test fixtures).
There was a problem hiding this comment.
Pull request overview
This PR updates the streaming converters to correctly handle cursor-agent’s --stream-partial-output format, where assistant output may arrive as timestamped partial delta events followed by a final accumulated event, preventing the final accumulated event from being re-emitted and duplicating content.
Changes:
- Add “partial vs accumulated” discrimination using
timestamp_msfor assistant text and thinking paths. - Skip the final accumulated assistant/thinking event when partials were previously observed to prevent 2x duplication.
- Add unit tests covering partial-then-final scenarios plus accumulated-only (legacy) behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/streaming/openai-sse.ts |
Treat timestamped assistant events as deltas and skip accumulated follow-up after partials to prevent duplication in SSE output. |
src/streaming/ai-sdk-parts.ts |
Apply the same partial/accumulated handling to AI SDK stream parts output. |
tests/unit/streaming/openai-sse.test.ts |
Add regression tests ensuring SSE output doesn’t duplicate with partial+final patterns and still works in accumulated-only mode. |
tests/unit/streaming/ai-sdk-parts.test.ts |
Add equivalent regression tests for AI SDK stream parts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (isThinking(event)) { | ||
| const isPartial = typeof (event as any).timestamp_ms === "number"; | ||
| if (isPartial) { | ||
| this.sawThinkingPartials = true; | ||
| const text = extractThinking(event); | ||
| return text ? [{ type: "text-delta", textDelta: text }] : []; | ||
| } | ||
| if (this.sawThinkingPartials) { | ||
| return []; | ||
| } | ||
| const delta = this.tracker.nextThinking(extractThinking(event)); | ||
| return delta ? [{ type: "text-delta", textDelta: delta }] : []; | ||
| } |
There was a problem hiding this comment.
Same concern as openai-sse: isThinking(event) includes type:"thinking" events that typically have timestamp_ms. Treating all timestamp_ms thinking events as partials bypasses DeltaTracker and may duplicate output if those events are accumulated updates. Consider distinguishing assistant-message partials from real thinking events (or otherwise encode the actual semantics) and add a test case for type:"thinking" events that include timestamp_ms to prevent regressions.
| if (isAssistantText(event)) { | ||
| const isPartial = typeof (event as any).timestamp_ms === "number"; | ||
| if (isPartial) { | ||
| this.sawAssistantPartials = true; | ||
| const text = extractText(event); | ||
| return text ? [{ type: "text-delta", textDelta: text }] : []; | ||
| } |
There was a problem hiding this comment.
typeof (event as any).timestamp_ms === "number" introduces any-casting and makes it easy to drift from the declared stream-json types. Prefer widening StreamJsonAssistantEvent to allow timestamp_ms?: number (or using a typed predicate) so partial vs accumulated detection is type-safe and callers/tests don’t need as any.
| if (isThinking(event)) { | ||
| const isPartial = typeof (event as any).timestamp_ms === "number"; | ||
| if (isPartial) { | ||
| this.sawThinkingPartials = true; | ||
| const text = extractThinking(event); | ||
| return text ? [this.chunkWith({ reasoning_content: text })] : []; | ||
| } | ||
| if (this.sawThinkingPartials) { | ||
| return []; | ||
| } | ||
| const delta = this.tracker.nextThinking(extractThinking(event)); | ||
| return delta ? [this.chunkWith({ reasoning_content: delta })] : []; |
There was a problem hiding this comment.
isThinking(event) includes real type: "thinking" events which (per StreamJsonThinkingEvent) commonly carry timestamp_ms. With the current heuristic, those will bypass DeltaTracker and emit extractThinking(event) verbatim, which will re-emit the full accumulated text on each update if the thinking text field is accumulated (as assumed by the existing thinking-delta tests). Consider narrowing the timestamp_ms/partial handling to assistant message events only, or otherwise confirming/encoding whether type:"thinking" events are delta vs accumulated, and add a unit test that includes timestamp_ms on type:"thinking" events to lock in the intended behavior.
| if (isAssistantText(event)) { | ||
| const isPartial = typeof (event as any).timestamp_ms === "number"; | ||
| if (isPartial) { | ||
| this.sawAssistantPartials = true; | ||
| const text = extractText(event); | ||
| return text ? [this.chunkWith({ content: text })] : []; | ||
| } |
There was a problem hiding this comment.
Using typeof (event as any).timestamp_ms === "number" weakens type-safety and forces tests/callers to cast to any for assistant events that include timestamp_ms. Prefer updating StreamJsonAssistantEvent to include timestamp_ms?: number (or introducing a small type guard like hasTimestampMs(event): event is StreamJsonAssistantEvent & { timestamp_ms: number }) so this logic can stay fully typed without any casts.
|
Thank mate, this fixes the doubled output issue when streaming long responses. Basically cursor-agent sends the text in chunks, then sends the whole thing again at the end. We were showing both, so users saw everything twice. Your fix detects the chunks vs the final message and skips the duplicate. Works great, merging now. |
…delta events
cursor-agent --stream-partial-output emits two kinds of assistant text events: partial deltas (with timestamp_ms) containing only new text fragments, and a final accumulated event (without timestamp_ms) containing the complete text.
DeltaTracker expects accumulated text and diffs against previous state, but partial deltas broke its diffing logic — causing the final accumulated event to re-emit the entire response.
Discriminate between partial and accumulated events using the timestamp_ms field: partial events are emitted directly as deltas, and the final accumulated event is skipped when partials were seen. Backward-compatible with accumulated-only format (existing test fixtures).