Skip to content

Fix proxy hang on kubectl attach EOF#3584

Open
rcdailey wants to merge 1 commit intostacklok:mainfrom
rcdailey:fix-spdy-eof-hang
Open

Fix proxy hang on kubectl attach EOF#3584
rcdailey wants to merge 1 commit intostacklok:mainfrom
rcdailey:fix-spdy-eof-hang

Conversation

@rcdailey
Copy link

@rcdailey rcdailey commented Feb 4, 2026

Summary

Fixes #3583 - Proxy does not exit on EOF from kubectl attach SPDY connection.

PR #3183 added exit-on-failure logic for retry exhaustion, but EOF errors from k8s.io/client-go SPDY code bypassed this logic, leaving the proxy in a zombie state where HTTP health checks pass but MCP communication fails.

Changes

  1. Fresh SPDY executor per retry - The executor was created once and reused across retries. When the SPDY connection fails with EOF, the executor enters a corrupted state. Now each retry creates a fresh executor.

  2. Pipe closure on goroutine exit - Added deferred close of stdout/stdin pipes when the attach goroutine exits, ensuring the transport layer sees EOF and can trigger re-attachment or exit.

Validation

Deployed to production cluster and verified:

  • Proxy recovered from zombie state (104 consecutive failures → healthy)
  • MCP tool calls successful after fix
  • No unnecessary restarts during idle periods
  • Clean attachment in logs

Related

@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Feb 4, 2026
@JAORMX
Copy link
Collaborator

JAORMX commented Feb 4, 2026

The change LGTM. There are some linting issues which must be addressed first. Thanks a lot for your contribution!

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.52%. Comparing base (9c75769) to head (6219b10).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/container/kubernetes/client.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3584   +/-   ##
=======================================
  Coverage   65.52%   65.52%           
=======================================
  Files         404      404           
  Lines       39781    39787    +6     
=======================================
+ Hits        26068    26072    +4     
- Misses      11697    11700    +3     
+ Partials     2016     2015    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The proxy could hang indefinitely when kubectl attach terminated
its SPDY connection. This occurred because:

- SPDY executor retained corrupted state across retry attempts
- Pipe cleanup wasn't signaling EOF to the transport layer

Create fresh SPDY executor per retry to avoid state corruption.
Close pipes when attach goroutine exits to propagate EOF signal,
allowing the transport layer to detect the failure and trigger
re-attachment or exit.

Fixes: stacklok#3583
Signed-off-by: Robert Dailey <[email protected]>
@rcdailey
Copy link
Author

rcdailey commented Feb 4, 2026

Hold off on merging please, I found an issue. Sorry for the inconvenience. I am keeping this running in my cluster and when I observe issues, I am fixing them. I just pushed up a change but let me sit on this another day.

Also the lint failure appears to be a network timeout outside of my control: https://github.com/stacklok/toolhive/actions/runs/21656377704/job/62442458301?pr=3584#step:6:42

@rcdailey rcdailey marked this pull request as draft February 4, 2026 14:32
@JAORMX
Copy link
Collaborator

JAORMX commented Feb 4, 2026

@rcdailey thanks for the diligence! Let me know when it's ready

@rcdailey rcdailey marked this pull request as ready for review February 4, 2026 23:52
@rcdailey
Copy link
Author

rcdailey commented Feb 4, 2026

@JAORMX thanks for your patience as I took time to ensure my proposed changes were stable by running them in my homelab kubernetes cluster. I've had these services running since about 9 hours ago:
image

So far, no restarts (I had over 100 before) and I've been actively using searxng and context7 throughout the day. Everything appears to be running well so far. I've moved the PR out of draft and I believe it's ready for merge, pending any feedback you have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proxy does not exit on EOF from kubectl attach SPDY connection

2 participants