Rook-Ceph Setup Guide¶
Overview¶
This document describes the rook-ceph configuration that has been set up for this cluster. The setup uses custom disk partitioning on worker nodes to create dedicated storage for Ceph OSDs.
Disk Configuration¶
Worker Node Disk Layout¶
Each worker node (worker0, worker1, worker2) has been configured with the following partition layout on /dev/nvme0n1:
| Partition | Size | Mount Point | Purpose |
|---|---|---|---|
| nvme0n1p1 | 2.2 GB | /boot/efi | EFI boot partition |
| nvme0n1p2 | 1.0 MB | - | Talos META partition |
| nvme0n1p3 | 110 MB | /var | Talos STATE partition |
| nvme0n1p4 | 150 GB | /var/lib/containerd | Container runtime storage (EPHEMERAL) |
| nvme0n1p5 | 100 GB | /var/lib/rook | Dedicated Ceph storage (NEW) |
The new partition (nvme0n1p5) is a dedicated 100GB partition for rook-ceph storage, separate from the EPHEMERAL partition.
Important Notes¶
⚠️ Worker Node Reinstallation Required¶
The disk partitioning changes require reinstallation of Talos on the worker nodes. The existing nodes cannot be repartitioned in-place.
See WORKER_UPGRADE_PROCEDURE.md for safe, rolling upgrade instructions that won't disrupt your cluster.
Configuration Files¶
The following files have been created/modified:
- Talos Configuration:
/talos/patches/worker/machine-disks.yaml- Custom disk partitioning patch for worker nodes-
/talos/talconfig.yaml- Updated to include worker patches -
Rook-Ceph Operator:
/kubernetes/apps/storage/rook-ceph/operator/ks.yaml/kubernetes/apps/storage/rook-ceph/operator/app/namespace.yaml/kubernetes/apps/storage/rook-ceph/operator/app/oci-repository.yaml/kubernetes/apps/storage/rook-ceph/operator/app/helm-release.yaml-
/kubernetes/apps/storage/rook-ceph/operator/app/kustomization.yaml -
Rook-Ceph Cluster:
/kubernetes/apps/storage/rook-ceph/cluster/ks.yaml/kubernetes/apps/storage/rook-ceph/cluster/app/ceph-cluster.yaml- Uses/dev/nvme0n1p5on each worker/kubernetes/apps/storage/rook-ceph/cluster/app/ceph-block-pool.yaml- 3-way replication/kubernetes/apps/storage/rook-ceph/cluster/app/storage-class.yaml- RBD storage class/kubernetes/apps/storage/rook-ceph/cluster/app/ceph-filesystem.yaml- CephFS configuration/kubernetes/apps/storage/rook-ceph/cluster/app/storage-class-filesystem.yaml- CephFS storage class-
/kubernetes/apps/storage/rook-ceph/cluster/app/kustomization.yaml -
Storage Namespace:
/kubernetes/apps/storage/kustomization.yaml- Registers rook-ceph with Flux
Ceph Configuration¶
Cluster Specifications¶
- Ceph Version: v19.2.0 (Squid)
- Monitors: 3 replicas (1 per worker node)
- Managers: 2 replicas
- OSDs: 1 per worker node (using
/dev/nvme0n1p5) - Data Replication: 3 replicas (min 2 for safety)
- Failure Domain: host-level
- Compression: Aggressive mode enabled
Storage Classes¶
Two storage classes are provided:
- ceph-block - RBD (Block) storage
- Provisioner:
rook-ceph.rbd.csi.ceph.com - Format: ext4
- Supports volume expansion
-
Reclaim policy: Delete
-
ceph-filesystem - CephFS (Shared filesystem) storage
- Provisioner:
rook-ceph.cephfs.csi.ceph.com - MDS: 1 active + 1 standby
- Supports volume expansion
- Reclaim policy: Delete
Resource Allocation¶
- MON: 500m CPU, 1-2Gi memory
- OSD: 1000m CPU, 2-4Gi memory
- MGR: 500m CPU, 512Mi-1Gi memory
- MDS: 500m CPU, 1-2Gi memory
Deployment Steps¶
1. Apply Talos Changes¶
Follow the Worker Upgrade Procedure to safely upgrade each worker node with the new disk layout.
2. Deploy Rook-Ceph¶
After all workers are upgraded, commit and push the configuration:
git add -A
git commit -m "feat: add rook-ceph with dedicated storage partitions"
git push
# Flux will automatically deploy rook-ceph from git
# Monitor the deployment:
flux get ks -A
flux get hr -A
# Check rook-ceph operator
kubectl -n rook-ceph get pods
# Check Ceph cluster status (after cluster is deployed)
kubectl -n rook-ceph get cephcluster
3. Verify Ceph Health¶
# Get a shell in the rook-ceph toolbox
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash
# Inside the toolbox:
ceph status
ceph osd status
ceph df
4. Use Ceph Storage¶
Create a PVC using one of the storage classes:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-block
resources:
requests:
storage: 10Gi
Monitoring¶
The rook-ceph operator includes monitoring support. Metrics are exposed for Prometheus scraping.
Dashboard¶
Ceph dashboard is enabled (HTTP, not HTTPS). To access it:
# Port-forward to the dashboard
kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 8443:8443
# Get dashboard password
kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
-o jsonpath="{['data']['password']}" | base64 --decode
Troubleshooting¶
OSDs not starting¶
# Check OSD pod logs
kubectl -n rook-ceph logs -l app=rook-ceph-osd
# Verify the partition exists and is accessible
talosctl --nodes 10.0.50.10 get discoveredvolumes | grep nvme0n1p5
Ceph cluster stuck in HEALTH_WARN¶
# Get detailed status
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph -s
# Common issues:
# - Not enough OSDs (need at least 3)
# - Clock skew between nodes
# - Network issues between nodes
Re-creating the cluster¶
If you need to completely wipe Ceph data:
# Delete the Ceph cluster (but not operator)
kubectl -n rook-ceph delete cephcluster rook-ceph
# On each worker node, wipe the Ceph partition:
talosctl --nodes 10.0.50.10 reset --graceful=false --reboot \
--system-labels-to-wipe STATE --system-labels-to-wipe EPHEMERAL
# Then re-apply the Ceph cluster
flux reconcile kustomization rook-ceph-cluster