Skip to content

philbudden/conduit

Repository files navigation

blueberry-k3s GitOps Repository

This repository is the single source of truth for the blueberry-k3s K3S cluster running on Raspberry Pi 4.

Architecture Philosophy: Minimal, deterministic, reproducible GitOps for edge/SBC environments. See AGENTS.md for detailed guardrails and constraints.


Cluster Information

  • Cluster Name: blueberry-k3s
  • Hardware: Raspberry Pi 4 (8GB RAM)
  • Architecture: aarch64
  • Kubernetes: K3S (single server node, may scale to +2 agent nodes)
  • Storage: USB-attached
  • GitOps: FluxCD v2.4.0

Repository Structure

.
├── clusters/
│   └── blueberry-k3s/          # Cluster-specific entrypoint
│       ├── flux-system/        # Flux controllers and GitRepository source
│       ├── infrastructure.yaml # Infrastructure Kustomization
│       ├── apps.yaml           # Apps Kustomization (depends on infrastructure)
│       └── kustomization.yaml  # Root composition
├── infrastructure/
│   └── monitoring/             # Prometheus + Grafana observability stack
├── apps/
│   └── matrix/                 # Matrix homeserver stack (Synapse + Element + PostgreSQL)
├── .github/
│   └── workflows/              # CI validation (lint, kubeconform, policy checks)
└── AGENTS.md                   # Repository guardrails and architectural philosophy

Prerequisites

Before bootstrapping Flux, ensure:

  1. K3S installed on blueberry-k3s

    • K3S should be configured with your desired settings
    • kubectl access to the cluster
  2. Flux CLI installed (v2.4.0)

    curl -s https://fluxcd.io/install.sh | sudo bash
  3. Git repository access

    • SSH key or personal access token configured
    • Write access to this repository
  4. No port conflicts

    • Cockpit runs on port 9090
    • Prometheus configured to use port 9091
    • Grafana uses port 3000

Bootstrap Instructions

First-Time Setup

  1. Fork or clone this repository

  2. Update the GitRepository URL in clusters/blueberry-k3s/flux-system/gotk-sync.yaml:

    spec:
      url: ssh://git@github.com/YOUR_ORG/YOUR_REPO
  3. Bootstrap Flux:

    flux bootstrap git \
      --url=ssh://git@github.com/YOUR_ORG/YOUR_REPO \
      --branch=main \
      --path=clusters/blueberry-k3s \
      --private-key-file=/path/to/ssh/key

    Or, if using GitHub directly:

    flux bootstrap github \
      --owner=YOUR_ORG \
      --repository=YOUR_REPO \
      --branch=main \
      --path=clusters/blueberry-k3s \
      --personal
  4. Verify reconciliation:

    flux get kustomizations
    flux get helmreleases -A
  5. Verify deployment:

    # Check Flux reconciliation
    flux get kustomizations
    flux get helmreleases -A
    
    # Verify all monitoring pods are running
    kubectl get pods -n monitoring
    
    # Expected pods:
    # - kube-prometheus-stack-operator-*
    # - prometheus-kube-prometheus-stack-prometheus-0
    # - grafana-*
    # - blackbox-exporter-*
    # - speedtest-exporter-*
    # - node-exporter-* (one per node)
  6. Validate internet monitoring:

    # Check Prometheus targets (all should be "UP")
    kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
    # Visit http://localhost:9091/targets
    # Look for: blackbox-http (3 targets), speedtest (1 target), node (1 target)
  7. Access Grafana:

    kubectl port-forward -n monitoring svc/grafana 3000:80
  8. Access Prometheus (optional):

    kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
  9. Access Element Web (Matrix):

    • URL: http://<node-ip>:30080
    • First create a user account (registration is open by default)
    • See Matrix Configuration for pre-deployment setup

Matrix Configuration

Pre-deployment: Set Secrets

⚠️ You MUST replace placeholder values before deploying the Matrix stack.

The apps/matrix/postgres.yaml Secret and apps/matrix/synapse.yaml ConfigMap both contain CHANGE_ME_BEFORE_DEPLOY placeholders.

Option A: Manual (non-production, local testing)

  1. Generate strong random values:

    # Generate secret values (run once, save output)
    python3 -c "import secrets; print(secrets.token_hex(32))"  # POSTGRES_PASSWORD
    python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_MACAROON_SECRET_KEY
    python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_REGISTRATION_SHARED_SECRET
    python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_FORM_SECRET
  2. Edit apps/matrix/postgres.yaml: replace all CHANGE_ME_BEFORE_DEPLOY in the Secret.

  3. Edit apps/matrix/synapse.yaml: replace the password: CHANGE_ME_BEFORE_DEPLOY in the homeserver.yaml ConfigMap key (must match POSTGRES_PASSWORD from step above). Also replace the three CHANGE_ME_BEFORE_DEPLOY values for macaroon_secret_key, registration_shared_secret, and form_secret.

Option B: SOPS + age (recommended for production)

See the Flux SOPS guide for encrypting secrets in Git using age keys.

Pre-deployment: Update Server Name and URLs

Edit apps/matrix/synapse.yaml (ConfigMap synapse-config) and set:

server_name: "your.domain.com"        # Your Matrix server name (cannot change later!)
public_baseurl: "http://<NODE_IP>:30067"   # Reachable URL for clients

Edit apps/matrix/element.yaml (ConfigMap element-config) and set:

"base_url": "http://<NODE_IP>:30067"       # Must match Synapse public_baseurl
"server_name": "your.domain.com"           # Must match server_name in homeserver.yaml

Replace <NODE_IP> with the actual IP address of the K3S node (e.g., 192.168.1.100).

Accessing Element

After deployment:

  • Element Web: http://<node-ip>:30080
  • Synapse API: http://<node-ip>:30067
  • Matrix health check: http://<node-ip>:30067/health

Creating the First Admin User

# Register an admin user via Synapse admin API
kubectl exec -it -n matrix deploy/synapse -- register_new_matrix_user \
  -c /conf/homeserver.yaml \
  -u admin \
  -p <admin-password> \
  -a \
  http://localhost:8008

Disabling Open Registration

After creating your initial accounts, disable open registration by editing apps/matrix/synapse.yaml ConfigMap and setting:

enable_registration: false

Verifying Deployment

# Check all Matrix pods are running
kubectl get pods -n matrix

# Expected pods:
# - matrix-postgres-*    (PostgreSQL)
# - synapse-*            (Synapse homeserver)
# - element-web-*        (Element web client)

# Check Synapse health
kubectl port-forward -n matrix svc/synapse 8008:8008
curl http://localhost:8008/health
# Expected: "OK"

# Check Matrix client discovery
curl http://localhost:8008/_matrix/client/versions

Known Limitations

  • No TLS: Served over HTTP. Add an ingress with TLS cert-manager for production.
  • No TURN/VoIP: TURN server is not configured (required for voice/video calls across NAT).
  • Single replica: No high availability - acceptable for edge/SBC deployment.
  • Open federation: Federation with matrix.org is enabled. Restrict in homeserver.yaml if not desired (federation_domain_whitelist).
  • No SSO/OIDC: Basic password authentication only.

Deployed Components

Applications

Matrix Stack (apps/matrix/)

Self-hosted Matrix communications platform:

PostgreSQL (postgres:16.6-alpine):

  • Dedicated database for Synapse
  • 5Gi persistent volume (default StorageClass / K3S local-path)
  • ClusterIP only (not exposed externally)
  • Resource usage: 100m CPU / 256Mi RAM (requests)

Synapse (ghcr.io/element-hq/synapse:v1.147.1):

  • Matrix homeserver by Element
  • 5Gi persistent volume for signing key and media store
  • NodePort 30067 → container port 8008 (client/federation API)
  • Resource usage: 200m CPU / 256Mi RAM (requests), up to 500m CPU / 1Gi RAM

Element Web (ghcr.io/element-hq/element-web:v1.12.10):

  • Official Matrix web client
  • Stateless (no PVC) - serves static files via nginx
  • NodePort 30080 → container port 80
  • Resource usage: 50m CPU / 64Mi RAM (requests)

Total Matrix resource usage (approximate):

  • CPU: ~350m requests / ~700m limits
  • RAM: ~576Mi requests / ~1.2Gi limits
  • Storage: 10Gi (2 × 5Gi PVCs)

Infrastructure

Monitoring (infrastructure/monitoring/)

Prometheus (kube-prometheus-stack v67.4.0):

  • Prometheus Operator + Prometheus server
  • Port: 9091 (to avoid Cockpit conflict on 9090)
  • Retention: 30 days / 10GB (increased for internet monitoring historical data)
  • Scrape interval: 60s (tuned for edge IO constraints)
  • Resource limits: 1 CPU / 1.5GB RAM
  • Disabled: Alertmanager, built-in node-exporter, kube-state-metrics (can enable later)
  • No persistence (emptyDir) - can be added later if needed

Grafana (v8.5.2 / image 11.4.0):

  • Pre-configured Prometheus datasource
  • Default dashboards: Kubernetes cluster overview, pod monitoring, internet connection, node metrics
  • Resource limits: 500m CPU / 512MB RAM
  • No persistence (emptyDir)
  • Default credentials: admin / admin ⚠️ Change after first login

Internet Monitoring Exporters:

Internet monitoring tracks connectivity quality (bandwidth, latency, uptime) to detect ISP issues or network degradation.

Blackbox Exporter (prom/blackbox-exporter:v0.25.0):

  • HTTP/ICMP probing for uptime and latency monitoring
  • Default targets: google.com, github.com, cloudflare.com (customizable via ConfigMap)
  • Scrape interval: 30s
  • Resource usage: 50m CPU / 64Mi RAM (limits: 200m / 128Mi)

Speedtest Exporter (miguelndecarvalho/speedtest-exporter:v0.5.1):

  • Bandwidth testing via Speedtest.net
  • Scrape interval: 60m
  • ⚠️ Bandwidth consumption: ~500MB/day (not suitable for metered connections)
  • To reduce bandwidth, increase scrape interval in prometheus-helmrelease.yaml
  • Resource usage: 100m CPU / 128Mi RAM (limits: 500m / 256Mi, spikes during test)

Node Exporter (prom/node-exporter:v1.8.2):

  • System metrics (CPU, memory, disk, network)
  • Deployed as DaemonSet (runs on all nodes)
  • Security note: Requires privileged: true and hostNetwork: true for full system access (standard node-exporter requirement)
  • Deployed separately from kube-prometheus-stack's built-in node-exporter for explicit configuration control and version independence
  • Important: Do not enable nodeExporter.enabled: true in Prometheus HelmRelease - it will conflict with this deployment
  • Scrape interval: 15s
  • Resource usage: 100m CPU / 128Mi RAM (limits: 250m / 256Mi)

Grafana Dashboards:

  • Internet connection - Bandwidth graphs, latency gauges, uptime timeline (in "Internet Monitoring" folder)
  • Node Exporter Full (gnetId 1860) - System metrics visualization
  • Note: Speedtest metrics appear after first 60-minute scrape cycle

Resource Usage (approximate):

  • Total CPU: ~1.6 cores (requests) / ~2.5 cores (limits)
  • Total RAM: ~1.3GB (requests) / ~2.9GB (limits)
  • Network: ~500MB/day (speedtest-exporter only)
  • Storage growth: ~500MB/week with all exporters enabled
  • Acceptable for 8GB Raspberry Pi 4 with headroom for workloads

Operations

Check Flux Status

# Overall health
flux check

# Reconciliation status
flux get sources git
flux get kustomizations

# HelmRelease status
flux get helmreleases -A

Force Reconciliation

# Reconcile infrastructure
flux reconcile kustomization infrastructure --with-source

# Reconcile specific HelmRelease
flux reconcile helmrelease -n monitoring kube-prometheus-stack
flux reconcile helmrelease -n monitoring grafana

View Logs

# Flux controller logs
flux logs --level=error --all-namespaces

# Prometheus operator logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator

# Grafana logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

Troubleshooting

HelmRelease stuck or failing:

kubectl describe helmrelease -n monitoring kube-prometheus-stack
kubectl describe helmrelease -n monitoring grafana

Prometheus not scraping:

# Check Prometheus targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
# Visit http://localhost:9091/targets

Grafana datasource issues:

  • Verify Prometheus service name: kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9091
  • Check Grafana datasource config in Grafana UI

Speedtest Exporter failures:

Common causes:

  • DNS resolution failure (check /etc/resolv.conf in pod)
  • Speedtest.net outage or rate limiting
  • Network connectivity issues
  • First scrape takes 60 minutes - dashboard gauges remain empty until first test completes

Diagnostics:

kubectl logs -n monitoring -l app=speedtest-exporter
kubectl exec -it -n monitoring deploy/speedtest-exporter -- ping -c3 www.speedtest.net

Reducing bandwidth usage: If 500MB/day is too high, edit infrastructure/monitoring/prometheus-helmrelease.yaml:

scrape_interval: 120m  # Reduces to ~250MB/day
# or
scrape_interval: 180m  # Reduces to ~170MB/day

Node Exporter not showing metrics:

  • Verify privileged security context is allowed
  • Check hostPath mounts are accessible
  • Ensure no port conflict with built-in node-exporter (should be disabled)

Customization

Customizing Internet Monitoring Targets

To add or change HTTP probe targets:

  1. Edit infrastructure/monitoring/prometheus-targets-configmap.yaml:

    data:
      blackbox-targets.yaml: |
        - targets:
            - http://www.google.com/
            - https://github.com/
            - https://www.cloudflare.com/
            - https://your-isp-homepage.com/  # Add custom target
            - http://192.168.1.1/              # Monitor local gateway
  2. Commit and push:

    git add infrastructure/monitoring/prometheus-targets-configmap.yaml
    git commit -m "chore(monitoring): add custom probe targets"
    git push
  3. Prometheus auto-reloads configuration within 30 seconds


Upgrade Strategy

All upgrades must be done via Git commits (PRs recommended).

Upgrading Matrix Stack (Raw Manifests)

Synapse:

  1. Check new version at https://github.com/element-hq/synapse/releases
  2. Review changelog for breaking changes (especially database migrations)
  3. Update image tag in apps/matrix/synapse.yaml:
    image: ghcr.io/element-hq/synapse:vX.Y.Z
  4. Commit and push; Flux reconciles automatically

Element Web:

  1. Check new version at https://github.com/element-hq/element-web/releases
  2. Update image tag in apps/matrix/element.yaml
  3. Commit and push

PostgreSQL:

  • PostgreSQL major version upgrades require a dump/restore - plan carefully
  • Minor version upgrades (e.g., 16.6 → 16.7) are safe to apply directly

Rollback: Revert the Git commit - Flux will re-pull the previous image tag.

Upgrading Helm Charts

  1. Update chart version in infrastructure/monitoring/*-helmrelease.yaml
  2. Review upstream changelog
  3. Test reconciliation: flux reconcile helmrelease -n monitoring <name>
  4. Monitor for errors: flux logs

Upgrading Internet Monitoring Exporters (Raw Manifests)

Exporters are deployed as raw Kubernetes manifests (not Helm).

To upgrade an exporter:

  1. Check for new version in upstream repository:

  2. Review CHANGELOG for breaking changes:

    • ConfigMap structure changes (blackbox-exporter)
    • Metrics format changes (all exporters)
    • New resource requirements
    • Security updates
  3. Update image tag and digest in infrastructure/monitoring/exporters/<exporter>.yaml:

    # Example: Upgrading blackbox-exporter
    image: prom/blackbox-exporter:v0.26.0@sha256:NEW_DIGEST_HERE
  4. Get ARM64 digest (for Prometheus official images):

    docker manifest inspect prom/blackbox-exporter:v0.26.0 | \
      jq -r '.manifests[] | select(.platform.architecture == "arm64") | .digest'
  5. Update ConfigMap if needed (blackbox-exporter only):

    # If blackbox.yml config format changed
    vim infrastructure/monitoring/exporters/blackbox-exporter.yaml
  6. Commit and push:

    git add infrastructure/monitoring/exporters/
    git commit -m "chore(monitoring): upgrade blackbox-exporter to v0.26.0"
    git push
  7. Verify deployment:

    kubectl get pods -n monitoring -w
    kubectl logs -n monitoring -l app=blackbox-exporter
    
    # Check metrics endpoint
    kubectl port-forward -n monitoring svc/blackbox-exporter 9115:9115
    curl http://localhost:9115/metrics

Rollback: Revert Git commit if issues arise:

git revert HEAD
git push

Note: Prometheus data is stored in emptyDir (ephemeral). Rolling back exporter versions does not affect historical data, but data will be lost if Prometheus pod is deleted.

Upgrading Flux

# Check current version
flux version

# Upgrade Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# Upgrade controllers
flux install --export > clusters/blueberry-k3s/flux-system/gotk-components.yaml
git add clusters/blueberry-k3s/flux-system/gotk-components.yaml
git commit -m "chore: upgrade Flux to vX.Y.Z"
git push

Rollback

To rollback to a previous state:

# Find known-good commit
git log --oneline

# Revert to commit
git revert <commit-sha>
git push

# Or hard reset (use with caution)
git reset --hard <commit-sha>
git push --force

Flux will reconcile to the reverted state automatically.

Note: CRD changes and stateful components may not rollback cleanly. Always test upgrades in a non-production environment first.


Adding New Components

  1. Create component directory under infrastructure/ or apps/
  2. Add manifests or HelmRelease
  3. Update parent kustomization.yaml to reference new component
  4. Commit and push
  5. Verify reconciliation: flux get kustomizations

Example:

mkdir -p infrastructure/ingress
# Add manifests...
echo "  - ingress" >> infrastructure/kustomization.yaml
git add infrastructure/
git commit -m "feat: add ingress-nginx"
git push

CI/CD Validation

Pull requests are automatically validated with:

  • yamllint: YAML syntax and formatting
  • kustomize build: Ensure manifests build successfully
  • kubeconform: Kubernetes schema validation
  • Policy checks: No :latest tags, explicit namespaces

See .github/workflows/validate.yaml for details.


Resource Constraints and Edge Considerations

This cluster runs on a Raspberry Pi 4 with limited resources:

  • RAM: 8GB total (K3S + system overhead ~1-2GB)
  • CPU: 4 cores (ARM Cortex-A72)
  • Storage: USB-attached (limited IO bandwidth, avoid write-heavy workloads)

Design Principles:

  • Conservative resource requests/limits
  • Minimal scrape intervals for Prometheus
  • No persistent storage by default (can be added later)
  • Disabled non-essential exporters and controllers
  • Single-replica deployments (no HA)

See AGENTS.md for full architectural constraints.


Security Notes

⚠️ Default Grafana credentials are admin / admin. Change these immediately after first login.

For production use, consider:

  • Implementing SOPS encryption for secrets (see Flux SOPS guide)
  • Setting up proper ingress with TLS
  • Configuring authentication for Prometheus/Grafana
  • Enabling RBAC policies

Contributing

See AGENTS.md for contribution guidelines and architectural guardrails.

Key principles:

  • Keep changes minimal and justified
  • Pin all versions (charts, images)
  • Test in CI before merging
  • Document resource impact
  • Ensure reproducibility

License

See LICENSE

About

🔩 single source of truth for my local data & ai cluster running on blueberry-k3s

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors