An AI-powered pipeline failure analysis agent that automatically detects, analyzes, and creates issues for failed Kubernetes/OpenShift pods using LLM tool calling via MCP (Model Context Protocol) servers.
The MCP GitOps project provides an intelligent agent that integrates with Tekton pipelines to automatically:
-
Receive webhook notifications when pipelines fail
-
Retrieve pod logs from OpenShift/Kubernetes
-
Analyze failures using an LLM (via LiteLLM)
-
Create detailed issues in Gitea with error summaries and potential solutions
The agent uses the Model Context Protocol (MCP) to communicate with external tool servers, enabling dynamic tool discovery and execution.
┌─────────────────┐ POST /report-failure ┌─────────────────┐
│ Tekton │ ────────────────────────────▶│ Pipeline │
│ Pipeline │ │ Failure Agent │
└─────────────────┘ └────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ LiteLLM │ │ MCP OpenShift │ │ MCP Gitea │
│ (LLM API) │ │ Server │ │ Server │
└────────────────┘ └────────────────┘ └────────────────┘
│ │
▼ ▼
Pod Logs Issue Creation-
HTTP Server (
main.py) - FastAPI server with/report-failureendpoint accepting POST requests -
MCP Client (
mcp_client.py) - Connects to external MCP servers to discover and call tools-
Supports SSE transport (for OpenShift MCP server)
-
Supports streamable-http transport (for Gitea MCP server)
-
-
Agent Loop - Iterative LiteLLM completion loop that:
-
Gets available tools from MCP servers at startup
-
Forwards tool calls from the model to the appropriate MCP server
-
Returns results back to the model until completion
-
mcp-gitops/
├── agent/ # Main Python agent application
│ ├── main.py # FastAPI server and core agent logic
│ ├── mcp_client.py # MCP server client implementation
│ ├── requirements.txt # Python dependencies
│ ├── Containerfile # Container build definition
│ ├── pytest.ini # Pytest configuration
│ └── tests/ # Test suite
│ ├── conftest.py
│ ├── test_api.py
│ ├── test_main.py
│ └── test_mcp_client.py
│
└── helm/ # Helm charts for Kubernetes deployment
├── agent/ # AI agent Helm chart
├── librechat/ # LibreChat Helm chart
├── mcp-gitea/ # Gitea MCP server Helm chart
└── mcp-openshift/ # OpenShift MCP server Helm chart-
Python 3.11+
-
OpenShift/Kubernetes cluster with Tekton pipelines
-
Gitea instance for issue tracking
-
LiteLLM-compatible AI model endpoint
-
MCP servers for OpenShift and Gitea
# Navigate to the agent directory
cd agent
# Install Python dependencies
pip install -r requirements.txt
# Set required environment variables (see Configuration section)
export LITELLM_URL="http://your-litellm-endpoint"
export LITELLM_API_KEY="your-api-key"
export MCP_OPENSHIFT_URL="http://mcp-openshift-server/sse"
export MCP_GITEA_URL="http://mcp-gitea-server/mcp"
export GITEA_OWNER="your-org"
export GITEA_REPO="your-repo"
# Run the server
python main.pycd agent
# Build with Podman
podman build -t pipeline-agent -f Containerfile .
# Or with Docker
docker build -t pipeline-agent -f Containerfile .
# Run the container
podman run -p 8000:8000 \
-e LITELLM_URL="http://your-litellm-endpoint" \
-e LITELLM_API_KEY="your-api-key" \
-e MCP_OPENSHIFT_URL="http://mcp-openshift-server/sse" \
-e MCP_GITEA_URL="http://mcp-gitea-server/mcp" \
-e GITEA_OWNER="your-org" \
-e GITEA_REPO="your-repo" \
pipeline-agent| Variable | Description | Default |
|---|---|---|
|
Base URL for the LiteLLM API endpoint |
(required) |
|
API key for LiteLLM authentication |
(required) |
|
Model identifier to use for analysis |
|
|
URL for the OpenShift MCP server |
(required) |
|
Transport type for OpenShift MCP server |
|
|
URL for the Gitea MCP server |
(required) |
|
Transport type for Gitea MCP server |
|
|
Gitea repository owner for issue creation |
|
|
Gitea repository name for issue creation |
|
|
Server listening port |
|
Trigger the agent to analyze a failed pod and create an issue.
Request Body:
{
"namespace": "pipelines",
"pod_name": "build-xyz-abc123",
"container_name": "step-buildah" // optional
}Response:
{
"status": "success",
"result": "Issue created: https://gitea.example.com/org/repo/issues/42"
}Configure your Tekton pipeline to call the agent on failure:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
spec:
finally:
- name: report-failure
when:
- input: $(tasks.status)
operator: in
values: ["Failed"]
taskRef:
name: report-failure-task
---
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: report-failure-task
spec:
steps:
- name: report
image: curlimages/curl
script: |
failed_pod=$(oc get pods --field-selector="status.phase=Failed" \
--sort-by="status.startTime" | tail -n 1 | awk '{print $1}')
curl -X POST http://agent.agent-namespace.svc:8000/report-failure \
-H "Content-Type: application/json" \
-d "{\"namespace\":\"$(context.pipelineRun.namespace)\",\"pod_name\":\"${failed_pod}\"}"The agent operates in an iterative loop:
-
Receive Failure Report - Pipeline sends pod details to
/report-failure -
Build Prompt - Agent creates a prompt with pod context and examples
-
LLM Analysis - Model analyzes the situation and requests tools
-
Tool Execution - Agent executes requested tools via MCP servers:
-
pods_log- Retrieve pod logs from OpenShift -
create_issue- Create issue in Gitea
-
-
Iterate - Process continues until the model completes or max iterations reached
-
Return Result - Agent returns the final result (usually issue URL)
Run the test suite:
cd agent
# Install test dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=. --cov-report=term-missingTests are also run during the container build process.
-
fastapi>=0.104.0- Web framework -
uvicorn>=0.24.0- ASGI server -
litellm>=1.0.0- LLM integration -
httpx>=0.25.0- Async HTTP client -
pydantic>=2.0.0- Data validation -
pytest>=7.4.0- Testing framework -
pytest-asyncio>=0.21.0- Async test support