This repository is the single source of truth for the blueberry-k3s K3S cluster running on Raspberry Pi 4.
Architecture Philosophy: Minimal, deterministic, reproducible GitOps for edge/SBC environments. See AGENTS.md for detailed guardrails and constraints.
- Cluster Name:
blueberry-k3s - Hardware: Raspberry Pi 4 (8GB RAM)
- Architecture: aarch64
- Kubernetes: K3S (single server node, may scale to +2 agent nodes)
- Storage: USB-attached
- GitOps: FluxCD v2.4.0
.
├── clusters/
│ └── blueberry-k3s/ # Cluster-specific entrypoint
│ ├── flux-system/ # Flux controllers and GitRepository source
│ ├── infrastructure.yaml # Infrastructure Kustomization
│ ├── apps.yaml # Apps Kustomization (depends on infrastructure)
│ └── kustomization.yaml # Root composition
├── infrastructure/
│ └── monitoring/ # Prometheus + Grafana observability stack
├── apps/
│ └── matrix/ # Matrix homeserver stack (Synapse + Element + PostgreSQL)
├── .github/
│ └── workflows/ # CI validation (lint, kubeconform, policy checks)
└── AGENTS.md # Repository guardrails and architectural philosophy
Before bootstrapping Flux, ensure:
-
K3S installed on
blueberry-k3s- K3S should be configured with your desired settings
kubectlaccess to the cluster
-
Flux CLI installed (v2.4.0)
curl -s https://fluxcd.io/install.sh | sudo bash -
Git repository access
- SSH key or personal access token configured
- Write access to this repository
-
No port conflicts
- Cockpit runs on port 9090
- Prometheus configured to use port 9091
- Grafana uses port 3000
-
Fork or clone this repository
-
Update the GitRepository URL in
clusters/blueberry-k3s/flux-system/gotk-sync.yaml:spec: url: ssh://git@github.com/YOUR_ORG/YOUR_REPO
-
Bootstrap Flux:
flux bootstrap git \ --url=ssh://git@github.com/YOUR_ORG/YOUR_REPO \ --branch=main \ --path=clusters/blueberry-k3s \ --private-key-file=/path/to/ssh/key
Or, if using GitHub directly:
flux bootstrap github \ --owner=YOUR_ORG \ --repository=YOUR_REPO \ --branch=main \ --path=clusters/blueberry-k3s \ --personal
-
Verify reconciliation:
flux get kustomizations flux get helmreleases -A
-
Verify deployment:
# Check Flux reconciliation flux get kustomizations flux get helmreleases -A # Verify all monitoring pods are running kubectl get pods -n monitoring # Expected pods: # - kube-prometheus-stack-operator-* # - prometheus-kube-prometheus-stack-prometheus-0 # - grafana-* # - blackbox-exporter-* # - speedtest-exporter-* # - node-exporter-* (one per node)
-
Validate internet monitoring:
# Check Prometheus targets (all should be "UP") kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091 # Visit http://localhost:9091/targets # Look for: blackbox-http (3 targets), speedtest (1 target), node (1 target)
-
Access Grafana:
kubectl port-forward -n monitoring svc/grafana 3000:80
- URL: http://localhost:3000
- Default credentials:
admin/admin(change immediately)
-
Access Prometheus (optional):
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
-
Access Element Web (Matrix):
- URL:
http://<node-ip>:30080 - First create a user account (registration is open by default)
- See Matrix Configuration for pre-deployment setup
- URL:
The apps/matrix/postgres.yaml Secret and apps/matrix/synapse.yaml ConfigMap both contain CHANGE_ME_BEFORE_DEPLOY placeholders.
Option A: Manual (non-production, local testing)
-
Generate strong random values:
# Generate secret values (run once, save output) python3 -c "import secrets; print(secrets.token_hex(32))" # POSTGRES_PASSWORD python3 -c "import secrets; print(secrets.token_hex(32))" # SYNAPSE_MACAROON_SECRET_KEY python3 -c "import secrets; print(secrets.token_hex(32))" # SYNAPSE_REGISTRATION_SHARED_SECRET python3 -c "import secrets; print(secrets.token_hex(32))" # SYNAPSE_FORM_SECRET
-
Edit
apps/matrix/postgres.yaml: replace allCHANGE_ME_BEFORE_DEPLOYin the Secret. -
Edit
apps/matrix/synapse.yaml: replace thepassword: CHANGE_ME_BEFORE_DEPLOYin thehomeserver.yamlConfigMap key (must matchPOSTGRES_PASSWORDfrom step above). Also replace the threeCHANGE_ME_BEFORE_DEPLOYvalues formacaroon_secret_key,registration_shared_secret, andform_secret.
Option B: SOPS + age (recommended for production)
See the Flux SOPS guide for encrypting secrets in Git using age keys.
Edit apps/matrix/synapse.yaml (ConfigMap synapse-config) and set:
server_name: "your.domain.com" # Your Matrix server name (cannot change later!)
public_baseurl: "http://<NODE_IP>:30067" # Reachable URL for clientsEdit apps/matrix/element.yaml (ConfigMap element-config) and set:
"base_url": "http://<NODE_IP>:30067" # Must match Synapse public_baseurl
"server_name": "your.domain.com" # Must match server_name in homeserver.yamlReplace <NODE_IP> with the actual IP address of the K3S node (e.g., 192.168.1.100).
After deployment:
- Element Web:
http://<node-ip>:30080 - Synapse API:
http://<node-ip>:30067 - Matrix health check:
http://<node-ip>:30067/health
# Register an admin user via Synapse admin API
kubectl exec -it -n matrix deploy/synapse -- register_new_matrix_user \
-c /conf/homeserver.yaml \
-u admin \
-p <admin-password> \
-a \
http://localhost:8008After creating your initial accounts, disable open registration by editing
apps/matrix/synapse.yaml ConfigMap and setting:
enable_registration: false# Check all Matrix pods are running
kubectl get pods -n matrix
# Expected pods:
# - matrix-postgres-* (PostgreSQL)
# - synapse-* (Synapse homeserver)
# - element-web-* (Element web client)
# Check Synapse health
kubectl port-forward -n matrix svc/synapse 8008:8008
curl http://localhost:8008/health
# Expected: "OK"
# Check Matrix client discovery
curl http://localhost:8008/_matrix/client/versions- No TLS: Served over HTTP. Add an ingress with TLS cert-manager for production.
- No TURN/VoIP: TURN server is not configured (required for voice/video calls across NAT).
- Single replica: No high availability - acceptable for edge/SBC deployment.
- Open federation: Federation with matrix.org is enabled. Restrict in homeserver.yaml
if not desired (
federation_domain_whitelist). - No SSO/OIDC: Basic password authentication only.
Self-hosted Matrix communications platform:
PostgreSQL (postgres:16.6-alpine):
- Dedicated database for Synapse
- 5Gi persistent volume (default StorageClass / K3S local-path)
- ClusterIP only (not exposed externally)
- Resource usage: 100m CPU / 256Mi RAM (requests)
Synapse (ghcr.io/element-hq/synapse:v1.147.1):
- Matrix homeserver by Element
- 5Gi persistent volume for signing key and media store
- NodePort 30067 → container port 8008 (client/federation API)
- Resource usage: 200m CPU / 256Mi RAM (requests), up to 500m CPU / 1Gi RAM
Element Web (ghcr.io/element-hq/element-web:v1.12.10):
- Official Matrix web client
- Stateless (no PVC) - serves static files via nginx
- NodePort 30080 → container port 80
- Resource usage: 50m CPU / 64Mi RAM (requests)
Total Matrix resource usage (approximate):
- CPU: ~350m requests / ~700m limits
- RAM: ~576Mi requests / ~1.2Gi limits
- Storage: 10Gi (2 × 5Gi PVCs)
Prometheus (kube-prometheus-stack v67.4.0):
- Prometheus Operator + Prometheus server
- Port: 9091 (to avoid Cockpit conflict on 9090)
- Retention: 30 days / 10GB (increased for internet monitoring historical data)
- Scrape interval: 60s (tuned for edge IO constraints)
- Resource limits: 1 CPU / 1.5GB RAM
- Disabled: Alertmanager, built-in node-exporter, kube-state-metrics (can enable later)
- No persistence (emptyDir) - can be added later if needed
Grafana (v8.5.2 / image 11.4.0):
- Pre-configured Prometheus datasource
- Default dashboards: Kubernetes cluster overview, pod monitoring, internet connection, node metrics
- Resource limits: 500m CPU / 512MB RAM
- No persistence (emptyDir)
- Default credentials:
admin/admin⚠️ Change after first login
Internet Monitoring Exporters:
Internet monitoring tracks connectivity quality (bandwidth, latency, uptime) to detect ISP issues or network degradation.
Blackbox Exporter (prom/blackbox-exporter:v0.25.0):
- HTTP/ICMP probing for uptime and latency monitoring
- Default targets: google.com, github.com, cloudflare.com (customizable via ConfigMap)
- Scrape interval: 30s
- Resource usage: 50m CPU / 64Mi RAM (limits: 200m / 128Mi)
Speedtest Exporter (miguelndecarvalho/speedtest-exporter:v0.5.1):
- Bandwidth testing via Speedtest.net
- Scrape interval: 60m
⚠️ Bandwidth consumption: ~500MB/day (not suitable for metered connections)- To reduce bandwidth, increase scrape interval in
prometheus-helmrelease.yaml - Resource usage: 100m CPU / 128Mi RAM (limits: 500m / 256Mi, spikes during test)
Node Exporter (prom/node-exporter:v1.8.2):
- System metrics (CPU, memory, disk, network)
- Deployed as DaemonSet (runs on all nodes)
- Security note: Requires
privileged: trueandhostNetwork: truefor full system access (standard node-exporter requirement) - Deployed separately from kube-prometheus-stack's built-in node-exporter for explicit configuration control and version independence
- Important: Do not enable
nodeExporter.enabled: truein Prometheus HelmRelease - it will conflict with this deployment - Scrape interval: 15s
- Resource usage: 100m CPU / 128Mi RAM (limits: 250m / 256Mi)
Grafana Dashboards:
- Internet connection - Bandwidth graphs, latency gauges, uptime timeline (in "Internet Monitoring" folder)
- Node Exporter Full (gnetId 1860) - System metrics visualization
- Note: Speedtest metrics appear after first 60-minute scrape cycle
Resource Usage (approximate):
- Total CPU: ~1.6 cores (requests) / ~2.5 cores (limits)
- Total RAM: ~1.3GB (requests) / ~2.9GB (limits)
- Network: ~500MB/day (speedtest-exporter only)
- Storage growth: ~500MB/week with all exporters enabled
- Acceptable for 8GB Raspberry Pi 4 with headroom for workloads
# Overall health
flux check
# Reconciliation status
flux get sources git
flux get kustomizations
# HelmRelease status
flux get helmreleases -A# Reconcile infrastructure
flux reconcile kustomization infrastructure --with-source
# Reconcile specific HelmRelease
flux reconcile helmrelease -n monitoring kube-prometheus-stack
flux reconcile helmrelease -n monitoring grafana# Flux controller logs
flux logs --level=error --all-namespaces
# Prometheus operator logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator
# Grafana logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafanaHelmRelease stuck or failing:
kubectl describe helmrelease -n monitoring kube-prometheus-stack
kubectl describe helmrelease -n monitoring grafanaPrometheus not scraping:
# Check Prometheus targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
# Visit http://localhost:9091/targetsGrafana datasource issues:
- Verify Prometheus service name:
kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9091 - Check Grafana datasource config in Grafana UI
Speedtest Exporter failures:
Common causes:
- DNS resolution failure (check
/etc/resolv.confin pod) - Speedtest.net outage or rate limiting
- Network connectivity issues
- First scrape takes 60 minutes - dashboard gauges remain empty until first test completes
Diagnostics:
kubectl logs -n monitoring -l app=speedtest-exporter
kubectl exec -it -n monitoring deploy/speedtest-exporter -- ping -c3 www.speedtest.netReducing bandwidth usage:
If 500MB/day is too high, edit infrastructure/monitoring/prometheus-helmrelease.yaml:
scrape_interval: 120m # Reduces to ~250MB/day
# or
scrape_interval: 180m # Reduces to ~170MB/dayNode Exporter not showing metrics:
- Verify privileged security context is allowed
- Check hostPath mounts are accessible
- Ensure no port conflict with built-in node-exporter (should be disabled)
To add or change HTTP probe targets:
-
Edit
infrastructure/monitoring/prometheus-targets-configmap.yaml:data: blackbox-targets.yaml: | - targets: - http://www.google.com/ - https://github.com/ - https://www.cloudflare.com/ - https://your-isp-homepage.com/ # Add custom target - http://192.168.1.1/ # Monitor local gateway
-
Commit and push:
git add infrastructure/monitoring/prometheus-targets-configmap.yaml git commit -m "chore(monitoring): add custom probe targets" git push -
Prometheus auto-reloads configuration within 30 seconds
All upgrades must be done via Git commits (PRs recommended).
Synapse:
- Check new version at https://github.com/element-hq/synapse/releases
- Review changelog for breaking changes (especially database migrations)
- Update image tag in
apps/matrix/synapse.yaml:image: ghcr.io/element-hq/synapse:vX.Y.Z
- Commit and push; Flux reconciles automatically
Element Web:
- Check new version at https://github.com/element-hq/element-web/releases
- Update image tag in
apps/matrix/element.yaml - Commit and push
PostgreSQL:
- PostgreSQL major version upgrades require a dump/restore - plan carefully
- Minor version upgrades (e.g., 16.6 → 16.7) are safe to apply directly
Rollback: Revert the Git commit - Flux will re-pull the previous image tag.
- Update chart version in
infrastructure/monitoring/*-helmrelease.yaml - Review upstream changelog
- Test reconciliation:
flux reconcile helmrelease -n monitoring <name> - Monitor for errors:
flux logs
Exporters are deployed as raw Kubernetes manifests (not Helm).
To upgrade an exporter:
-
Check for new version in upstream repository:
-
Review CHANGELOG for breaking changes:
- ConfigMap structure changes (blackbox-exporter)
- Metrics format changes (all exporters)
- New resource requirements
- Security updates
-
Update image tag and digest in
infrastructure/monitoring/exporters/<exporter>.yaml:# Example: Upgrading blackbox-exporter image: prom/blackbox-exporter:v0.26.0@sha256:NEW_DIGEST_HERE
-
Get ARM64 digest (for Prometheus official images):
docker manifest inspect prom/blackbox-exporter:v0.26.0 | \ jq -r '.manifests[] | select(.platform.architecture == "arm64") | .digest'
-
Update ConfigMap if needed (blackbox-exporter only):
# If blackbox.yml config format changed vim infrastructure/monitoring/exporters/blackbox-exporter.yaml -
Commit and push:
git add infrastructure/monitoring/exporters/ git commit -m "chore(monitoring): upgrade blackbox-exporter to v0.26.0" git push -
Verify deployment:
kubectl get pods -n monitoring -w kubectl logs -n monitoring -l app=blackbox-exporter # Check metrics endpoint kubectl port-forward -n monitoring svc/blackbox-exporter 9115:9115 curl http://localhost:9115/metrics
Rollback: Revert Git commit if issues arise:
git revert HEAD
git pushNote: Prometheus data is stored in emptyDir (ephemeral). Rolling back exporter versions does not affect historical data, but data will be lost if Prometheus pod is deleted.
# Check current version
flux version
# Upgrade Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash
# Upgrade controllers
flux install --export > clusters/blueberry-k3s/flux-system/gotk-components.yaml
git add clusters/blueberry-k3s/flux-system/gotk-components.yaml
git commit -m "chore: upgrade Flux to vX.Y.Z"
git pushTo rollback to a previous state:
# Find known-good commit
git log --oneline
# Revert to commit
git revert <commit-sha>
git push
# Or hard reset (use with caution)
git reset --hard <commit-sha>
git push --forceFlux will reconcile to the reverted state automatically.
Note: CRD changes and stateful components may not rollback cleanly. Always test upgrades in a non-production environment first.
- Create component directory under
infrastructure/orapps/ - Add manifests or HelmRelease
- Update parent kustomization.yaml to reference new component
- Commit and push
- Verify reconciliation:
flux get kustomizations
Example:
mkdir -p infrastructure/ingress
# Add manifests...
echo " - ingress" >> infrastructure/kustomization.yaml
git add infrastructure/
git commit -m "feat: add ingress-nginx"
git pushPull requests are automatically validated with:
- yamllint: YAML syntax and formatting
- kustomize build: Ensure manifests build successfully
- kubeconform: Kubernetes schema validation
- Policy checks: No
:latesttags, explicit namespaces
See .github/workflows/validate.yaml for details.
This cluster runs on a Raspberry Pi 4 with limited resources:
- RAM: 8GB total (K3S + system overhead ~1-2GB)
- CPU: 4 cores (ARM Cortex-A72)
- Storage: USB-attached (limited IO bandwidth, avoid write-heavy workloads)
Design Principles:
- Conservative resource requests/limits
- Minimal scrape intervals for Prometheus
- No persistent storage by default (can be added later)
- Disabled non-essential exporters and controllers
- Single-replica deployments (no HA)
See AGENTS.md for full architectural constraints.
admin / admin. Change these immediately after first login.
For production use, consider:
- Implementing SOPS encryption for secrets (see Flux SOPS guide)
- Setting up proper ingress with TLS
- Configuring authentication for Prometheus/Grafana
- Enabling RBAC policies
See AGENTS.md for contribution guidelines and architectural guardrails.
Key principles:
- Keep changes minimal and justified
- Pin all versions (charts, images)
- Test in CI before merging
- Document resource impact
- Ensure reproducibility
See LICENSE