blueberry-k3s GitOps Repository

This repository is the single source of truth for the blueberry-k3s K3S cluster running on Raspberry Pi 4.

Architecture Philosophy: Minimal, deterministic, reproducible GitOps for edge/SBC environments. See AGENTS.md for detailed guardrails and constraints.

Cluster Information

Cluster Name: blueberry-k3s
Hardware: Raspberry Pi 4 (8GB RAM)
Architecture: aarch64
Kubernetes: K3S (single server node, may scale to +2 agent nodes)
Storage: USB-attached
GitOps: FluxCD v2.4.0

Repository Structure

.
├── clusters/
│   └── blueberry-k3s/          # Cluster-specific entrypoint
│       ├── flux-system/        # Flux controllers and GitRepository source
│       ├── infrastructure.yaml # Infrastructure Kustomization
│       ├── apps.yaml           # Apps Kustomization (depends on infrastructure)
│       └── kustomization.yaml  # Root composition
├── infrastructure/
│   └── monitoring/             # Prometheus + Grafana observability stack
├── apps/
│   └── matrix/                 # Matrix homeserver stack (Synapse + Element + PostgreSQL)
├── .github/
│   └── workflows/              # CI validation (lint, kubeconform, policy checks)
└── AGENTS.md                   # Repository guardrails and architectural philosophy

Prerequisites

Before bootstrapping Flux, ensure:

K3S installed on blueberry-k3s
- K3S should be configured with your desired settings
- kubectl access to the cluster

Flux CLI installed (v2.4.0)

curl -s https://fluxcd.io/install.sh | sudo bash

Git repository access
- SSH key or personal access token configured
- Write access to this repository
No port conflicts
- Cockpit runs on port 9090
- Prometheus configured to use port 9091
- Grafana uses port 3000

Bootstrap Instructions

First-Time Setup

Fork or clone this repository
Update the GitRepository URL in clusters/blueberry-k3s/flux-system/gotk-sync.yaml:
```
spec:
  url: ssh://git@github.com/YOUR_ORG/YOUR_REPO
```

Bootstrap Flux:

flux bootstrap git \
  --url=ssh://git@github.com/YOUR_ORG/YOUR_REPO \
  --branch=main \
  --path=clusters/blueberry-k3s \
  --private-key-file=/path/to/ssh/key

Or, if using GitHub directly:

flux bootstrap github \
  --owner=YOUR_ORG \
  --repository=YOUR_REPO \
  --branch=main \
  --path=clusters/blueberry-k3s \
  --personal

Verify reconciliation:

flux get kustomizations
flux get helmreleases -A

Verify deployment:

# Check Flux reconciliation
flux get kustomizations
flux get helmreleases -A

# Verify all monitoring pods are running
kubectl get pods -n monitoring

# Expected pods:
# - kube-prometheus-stack-operator-*
# - prometheus-kube-prometheus-stack-prometheus-0
# - grafana-*
# - blackbox-exporter-*
# - speedtest-exporter-*
# - node-exporter-* (one per node)

Validate internet monitoring:

# Check Prometheus targets (all should be "UP")
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
# Visit http://localhost:9091/targets
# Look for: blackbox-http (3 targets), speedtest (1 target), node (1 target)

Access Grafana:
```
kubectl port-forward -n monitoring svc/grafana 3000:80
```
- URL: http://localhost:3000
- Default credentials: admin / admin (change immediately)

Access Prometheus (optional):

kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091

URL: http://localhost:9091

Access Element Web (Matrix):
- URL: http://<node-ip>:30080
- First create a user account (registration is open by default)
- See Matrix Configuration for pre-deployment setup

Matrix Configuration

Pre-deployment: Set Secrets

⚠️ You MUST replace placeholder values before deploying the Matrix stack.

The apps/matrix/postgres.yaml Secret and apps/matrix/synapse.yaml ConfigMap both contain CHANGE_ME_BEFORE_DEPLOY placeholders.

Option A: Manual (non-production, local testing)

Generate strong random values:

# Generate secret values (run once, save output)
python3 -c "import secrets; print(secrets.token_hex(32))"  # POSTGRES_PASSWORD
python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_MACAROON_SECRET_KEY
python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_REGISTRATION_SHARED_SECRET
python3 -c "import secrets; print(secrets.token_hex(32))"  # SYNAPSE_FORM_SECRET

Edit apps/matrix/postgres.yaml: replace all CHANGE_ME_BEFORE_DEPLOY in the Secret.
Edit apps/matrix/synapse.yaml: replace the password: CHANGE_ME_BEFORE_DEPLOY in the homeserver.yaml ConfigMap key (must match POSTGRES_PASSWORD from step above). Also replace the three CHANGE_ME_BEFORE_DEPLOY values for macaroon_secret_key, registration_shared_secret, and form_secret.

Option B: SOPS + age (recommended for production)

See the Flux SOPS guide for encrypting secrets in Git using age keys.

Pre-deployment: Update Server Name and URLs

Edit apps/matrix/synapse.yaml (ConfigMap synapse-config) and set:

server_name: "your.domain.com"        # Your Matrix server name (cannot change later!)
public_baseurl: "http://<NODE_IP>:30067"   # Reachable URL for clients

Edit apps/matrix/element.yaml (ConfigMap element-config) and set:

"base_url": "http://<NODE_IP>:30067"       # Must match Synapse public_baseurl
"server_name": "your.domain.com"           # Must match server_name in homeserver.yaml

Replace <NODE_IP> with the actual IP address of the K3S node (e.g., 192.168.1.100).

Accessing Element

After deployment:

Element Web: http://<node-ip>:30080
Synapse API: http://<node-ip>:30067
Matrix health check: http://<node-ip>:30067/health

Creating the First Admin User

# Register an admin user via Synapse admin API
kubectl exec -it -n matrix deploy/synapse -- register_new_matrix_user \
  -c /conf/homeserver.yaml \
  -u admin \
  -p <admin-password> \
  -a \
  http://localhost:8008

Disabling Open Registration

After creating your initial accounts, disable open registration by editing apps/matrix/synapse.yaml ConfigMap and setting:

enable_registration: false

Verifying Deployment

# Check all Matrix pods are running
kubectl get pods -n matrix

# Expected pods:
# - matrix-postgres-*    (PostgreSQL)
# - synapse-*            (Synapse homeserver)
# - element-web-*        (Element web client)

# Check Synapse health
kubectl port-forward -n matrix svc/synapse 8008:8008
curl http://localhost:8008/health
# Expected: "OK"

# Check Matrix client discovery
curl http://localhost:8008/_matrix/client/versions

Known Limitations

No TLS: Served over HTTP. Add an ingress with TLS cert-manager for production.
No TURN/VoIP: TURN server is not configured (required for voice/video calls across NAT).
Single replica: No high availability - acceptable for edge/SBC deployment.
Open federation: Federation with matrix.org is enabled. Restrict in homeserver.yaml if not desired (federation_domain_whitelist).
No SSO/OIDC: Basic password authentication only.

Deployed Components

Applications

Matrix Stack (`apps/matrix/`)

Self-hosted Matrix communications platform:

PostgreSQL (postgres:16.6-alpine):

Dedicated database for Synapse
5Gi persistent volume (default StorageClass / K3S local-path)
ClusterIP only (not exposed externally)
Resource usage: 100m CPU / 256Mi RAM (requests)

Synapse (ghcr.io/element-hq/synapse:v1.147.1):

Matrix homeserver by Element
5Gi persistent volume for signing key and media store
NodePort 30067 → container port 8008 (client/federation API)
Resource usage: 200m CPU / 256Mi RAM (requests), up to 500m CPU / 1Gi RAM

Element Web (ghcr.io/element-hq/element-web:v1.12.10):

Official Matrix web client
Stateless (no PVC) - serves static files via nginx
NodePort 30080 → container port 80
Resource usage: 50m CPU / 64Mi RAM (requests)

Total Matrix resource usage (approximate):

CPU: ~350m requests / ~700m limits
RAM: ~576Mi requests / ~1.2Gi limits
Storage: 10Gi (2 × 5Gi PVCs)

Infrastructure

Monitoring (`infrastructure/monitoring/`)

Prometheus (kube-prometheus-stack v67.4.0):

Prometheus Operator + Prometheus server
Port: 9091 (to avoid Cockpit conflict on 9090)
Retention: 30 days / 10GB (increased for internet monitoring historical data)
Scrape interval: 60s (tuned for edge IO constraints)
Resource limits: 1 CPU / 1.5GB RAM
Disabled: Alertmanager, built-in node-exporter, kube-state-metrics (can enable later)
No persistence (emptyDir) - can be added later if needed

Grafana (v8.5.2 / image 11.4.0):

Pre-configured Prometheus datasource
Default dashboards: Kubernetes cluster overview, pod monitoring, internet connection, node metrics
Resource limits: 500m CPU / 512MB RAM
No persistence (emptyDir)
Default credentials: admin / admin ⚠️ Change after first login

Internet Monitoring Exporters:

Internet monitoring tracks connectivity quality (bandwidth, latency, uptime) to detect ISP issues or network degradation.

Blackbox Exporter (prom/blackbox-exporter:v0.25.0):

HTTP/ICMP probing for uptime and latency monitoring
Default targets: google.com, github.com, cloudflare.com (customizable via ConfigMap)
Scrape interval: 30s
Resource usage: 50m CPU / 64Mi RAM (limits: 200m / 128Mi)

Speedtest Exporter (miguelndecarvalho/speedtest-exporter:v0.5.1):

Bandwidth testing via Speedtest.net
Scrape interval: 60m
⚠️ Bandwidth consumption: ~500MB/day (not suitable for metered connections)
To reduce bandwidth, increase scrape interval in prometheus-helmrelease.yaml
Resource usage: 100m CPU / 128Mi RAM (limits: 500m / 256Mi, spikes during test)

Node Exporter (prom/node-exporter:v1.8.2):

System metrics (CPU, memory, disk, network)
Deployed as DaemonSet (runs on all nodes)
Security note: Requires privileged: true and hostNetwork: true for full system access (standard node-exporter requirement)
Deployed separately from kube-prometheus-stack's built-in node-exporter for explicit configuration control and version independence
Important: Do not enable nodeExporter.enabled: true in Prometheus HelmRelease - it will conflict with this deployment
Scrape interval: 15s
Resource usage: 100m CPU / 128Mi RAM (limits: 250m / 256Mi)

Grafana Dashboards:

Internet connection - Bandwidth graphs, latency gauges, uptime timeline (in "Internet Monitoring" folder)
Node Exporter Full (gnetId 1860) - System metrics visualization
Note: Speedtest metrics appear after first 60-minute scrape cycle

Resource Usage (approximate):

Total CPU: ~1.6 cores (requests) / ~2.5 cores (limits)
Total RAM: ~1.3GB (requests) / ~2.9GB (limits)
Network: ~500MB/day (speedtest-exporter only)
Storage growth: ~500MB/week with all exporters enabled
Acceptable for 8GB Raspberry Pi 4 with headroom for workloads

Operations

Check Flux Status

# Overall health
flux check

# Reconciliation status
flux get sources git
flux get kustomizations

# HelmRelease status
flux get helmreleases -A

Force Reconciliation

# Reconcile infrastructure
flux reconcile kustomization infrastructure --with-source

# Reconcile specific HelmRelease
flux reconcile helmrelease -n monitoring kube-prometheus-stack
flux reconcile helmrelease -n monitoring grafana

View Logs

# Flux controller logs
flux logs --level=error --all-namespaces

# Prometheus operator logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator

# Grafana logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana

Troubleshooting

HelmRelease stuck or failing:

kubectl describe helmrelease -n monitoring kube-prometheus-stack
kubectl describe helmrelease -n monitoring grafana

Prometheus not scraping:

# Check Prometheus targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
# Visit http://localhost:9091/targets

Grafana datasource issues:

Verify Prometheus service name: kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9091
Check Grafana datasource config in Grafana UI

Speedtest Exporter failures:

Common causes:

DNS resolution failure (check /etc/resolv.conf in pod)
Speedtest.net outage or rate limiting
Network connectivity issues
First scrape takes 60 minutes - dashboard gauges remain empty until first test completes

Diagnostics:

kubectl logs -n monitoring -l app=speedtest-exporter
kubectl exec -it -n monitoring deploy/speedtest-exporter -- ping -c3 www.speedtest.net

Reducing bandwidth usage: If 500MB/day is too high, edit infrastructure/monitoring/prometheus-helmrelease.yaml:

scrape_interval: 120m  # Reduces to ~250MB/day
# or
scrape_interval: 180m  # Reduces to ~170MB/day

Node Exporter not showing metrics:

Verify privileged security context is allowed
Check hostPath mounts are accessible
Ensure no port conflict with built-in node-exporter (should be disabled)

Customization

Customizing Internet Monitoring Targets

To add or change HTTP probe targets:

Edit infrastructure/monitoring/prometheus-targets-configmap.yaml:

data:
  blackbox-targets.yaml: |
    - targets:
        - http://www.google.com/
        - https://github.com/
        - https://www.cloudflare.com/
        - https://your-isp-homepage.com/  # Add custom target
        - http://192.168.1.1/              # Monitor local gateway

Commit and push:

git add infrastructure/monitoring/prometheus-targets-configmap.yaml
git commit -m "chore(monitoring): add custom probe targets"
git push

Prometheus auto-reloads configuration within 30 seconds

Upgrade Strategy

All upgrades must be done via Git commits (PRs recommended).

Upgrading Matrix Stack (Raw Manifests)

Synapse:

Check new version at https://github.com/element-hq/synapse/releases
Review changelog for breaking changes (especially database migrations)
Update image tag in apps/matrix/synapse.yaml:
```
image: ghcr.io/element-hq/synapse:vX.Y.Z
```
Commit and push; Flux reconciles automatically

Element Web:

Check new version at https://github.com/element-hq/element-web/releases
Update image tag in apps/matrix/element.yaml
Commit and push

PostgreSQL:

PostgreSQL major version upgrades require a dump/restore - plan carefully
Minor version upgrades (e.g., 16.6 → 16.7) are safe to apply directly

Rollback: Revert the Git commit - Flux will re-pull the previous image tag.

Upgrading Helm Charts

Update chart version in infrastructure/monitoring/*-helmrelease.yaml
Review upstream changelog
Test reconciliation: flux reconcile helmrelease -n monitoring <name>
Monitor for errors: flux logs

Upgrading Internet Monitoring Exporters (Raw Manifests)

Exporters are deployed as raw Kubernetes manifests (not Helm).

To upgrade an exporter:

Check for new version in upstream repository:
- Blackbox: https://github.com/prometheus/blackbox_exporter/releases
- Speedtest: https://github.com/MiguelNdeCarvalho/speedtest-exporter/releases
- Node: https://github.com/prometheus/node_exporter/releases
Review CHANGELOG for breaking changes:
- ConfigMap structure changes (blackbox-exporter)
- Metrics format changes (all exporters)
- New resource requirements
- Security updates

Update image tag and digest in infrastructure/monitoring/exporters/<exporter>.yaml:

# Example: Upgrading blackbox-exporter
image: prom/blackbox-exporter:v0.26.0@sha256:NEW_DIGEST_HERE

Get ARM64 digest (for Prometheus official images):

docker manifest inspect prom/blackbox-exporter:v0.26.0 | \
  jq -r '.manifests[] | select(.platform.architecture == "arm64") | .digest'

Update ConfigMap if needed (blackbox-exporter only):

# If blackbox.yml config format changed
vim infrastructure/monitoring/exporters/blackbox-exporter.yaml

Commit and push:

git add infrastructure/monitoring/exporters/
git commit -m "chore(monitoring): upgrade blackbox-exporter to v0.26.0"
git push

Verify deployment:

kubectl get pods -n monitoring -w
kubectl logs -n monitoring -l app=blackbox-exporter

# Check metrics endpoint
kubectl port-forward -n monitoring svc/blackbox-exporter 9115:9115
curl http://localhost:9115/metrics

Rollback: Revert Git commit if issues arise:

git revert HEAD
git push

Note: Prometheus data is stored in emptyDir (ephemeral). Rolling back exporter versions does not affect historical data, but data will be lost if Prometheus pod is deleted.

Upgrading Flux

# Check current version
flux version

# Upgrade Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# Upgrade controllers
flux install --export > clusters/blueberry-k3s/flux-system/gotk-components.yaml
git add clusters/blueberry-k3s/flux-system/gotk-components.yaml
git commit -m "chore: upgrade Flux to vX.Y.Z"
git push

Rollback

To rollback to a previous state:

# Find known-good commit
git log --oneline

# Revert to commit
git revert <commit-sha>
git push

# Or hard reset (use with caution)
git reset --hard <commit-sha>
git push --force

Flux will reconcile to the reverted state automatically.

Note: CRD changes and stateful components may not rollback cleanly. Always test upgrades in a non-production environment first.

Adding New Components

Create component directory under infrastructure/ or apps/
Add manifests or HelmRelease
Update parent kustomization.yaml to reference new component
Commit and push
Verify reconciliation: flux get kustomizations

Example:

mkdir -p infrastructure/ingress
# Add manifests...
echo "  - ingress" >> infrastructure/kustomization.yaml
git add infrastructure/
git commit -m "feat: add ingress-nginx"
git push

CI/CD Validation

Pull requests are automatically validated with:

yamllint: YAML syntax and formatting
kustomize build: Ensure manifests build successfully
kubeconform: Kubernetes schema validation
Policy checks: No :latest tags, explicit namespaces

See .github/workflows/validate.yaml for details.

Resource Constraints and Edge Considerations

This cluster runs on a Raspberry Pi 4 with limited resources:

RAM: 8GB total (K3S + system overhead ~1-2GB)
CPU: 4 cores (ARM Cortex-A72)
Storage: USB-attached (limited IO bandwidth, avoid write-heavy workloads)

Design Principles:

Conservative resource requests/limits
Minimal scrape intervals for Prometheus
No persistent storage by default (can be added later)
Disabled non-essential exporters and controllers
Single-replica deployments (no HA)

See AGENTS.md for full architectural constraints.

Security Notes

⚠️ Default Grafana credentials are admin / admin. Change these immediately after first login.

For production use, consider:

Implementing SOPS encryption for secrets (see Flux SOPS guide)
Setting up proper ingress with TLS
Configuring authentication for Prometheus/Grafana
Enabling RBAC policies

Contributing

See AGENTS.md for contribution guidelines and architectural guardrails.

Key principles:

Keep changes minimal and justified
Pin all versions (charts, images)
Test in CI before merging
Document resource impact
Ensure reproducibility

License

See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
apps		apps
clusters		clusters
infrastructure		infrastructure
.devcontainer.json		.devcontainer.json
.gitignore		.gitignore
.yamllint.yaml		.yamllint.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MATRIX.md		MATRIX.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REPOSITORY_SUMMARY.md		REPOSITORY_SUMMARY.md
validate.sh		validate.sh

Folders and files

Latest commit

History

Repository files navigation

blueberry-k3s GitOps Repository

Cluster Information

Repository Structure

Prerequisites

Bootstrap Instructions

First-Time Setup

Matrix Configuration

Pre-deployment: Set Secrets

Pre-deployment: Update Server Name and URLs

Accessing Element

Creating the First Admin User

Disabling Open Registration

Verifying Deployment

Known Limitations

Deployed Components

Applications

Matrix Stack (apps/matrix/)

Infrastructure

Monitoring (infrastructure/monitoring/)

Operations

Check Flux Status

Force Reconciliation

View Logs

Troubleshooting

Customization

Customizing Internet Monitoring Targets

Upgrade Strategy

Upgrading Matrix Stack (Raw Manifests)

Upgrading Helm Charts

Upgrading Internet Monitoring Exporters (Raw Manifests)

Upgrading Flux

Rollback

Adding New Components

CI/CD Validation

Resource Constraints and Edge Considerations

Security Notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Matrix Stack (`apps/matrix/`)

Monitoring (`infrastructure/monitoring/`)

Packages