Skip to content

Rook-Ceph Setup Guide

Overview

This document describes the rook-ceph configuration that has been set up for this cluster. The setup uses custom disk partitioning on worker nodes to create dedicated storage for Ceph OSDs.

Disk Configuration

Worker Node Disk Layout

Each worker node (worker0, worker1, worker2) has been configured with the following partition layout on /dev/nvme0n1:

Partition Size Mount Point Purpose
nvme0n1p1 2.2 GB /boot/efi EFI boot partition
nvme0n1p2 1.0 MB - Talos META partition
nvme0n1p3 110 MB /var Talos STATE partition
nvme0n1p4 150 GB /var/lib/containerd Container runtime storage (EPHEMERAL)
nvme0n1p5 100 GB /var/lib/rook Dedicated Ceph storage (NEW)

The new partition (nvme0n1p5) is a dedicated 100GB partition for rook-ceph storage, separate from the EPHEMERAL partition.

Important Notes

⚠️ Worker Node Reinstallation Required

The disk partitioning changes require reinstallation of Talos on the worker nodes. The existing nodes cannot be repartitioned in-place.

See WORKER_UPGRADE_PROCEDURE.md for safe, rolling upgrade instructions that won't disrupt your cluster.

Configuration Files

The following files have been created/modified:

  1. Talos Configuration:
  2. /talos/patches/worker/machine-disks.yaml - Custom disk partitioning patch for worker nodes
  3. /talos/talconfig.yaml - Updated to include worker patches

  4. Rook-Ceph Operator:

  5. /kubernetes/apps/storage/rook-ceph/operator/ks.yaml
  6. /kubernetes/apps/storage/rook-ceph/operator/app/namespace.yaml
  7. /kubernetes/apps/storage/rook-ceph/operator/app/oci-repository.yaml
  8. /kubernetes/apps/storage/rook-ceph/operator/app/helm-release.yaml
  9. /kubernetes/apps/storage/rook-ceph/operator/app/kustomization.yaml

  10. Rook-Ceph Cluster:

  11. /kubernetes/apps/storage/rook-ceph/cluster/ks.yaml
  12. /kubernetes/apps/storage/rook-ceph/cluster/app/ceph-cluster.yaml - Uses /dev/nvme0n1p5 on each worker
  13. /kubernetes/apps/storage/rook-ceph/cluster/app/ceph-block-pool.yaml - 3-way replication
  14. /kubernetes/apps/storage/rook-ceph/cluster/app/storage-class.yaml - RBD storage class
  15. /kubernetes/apps/storage/rook-ceph/cluster/app/ceph-filesystem.yaml - CephFS configuration
  16. /kubernetes/apps/storage/rook-ceph/cluster/app/storage-class-filesystem.yaml - CephFS storage class
  17. /kubernetes/apps/storage/rook-ceph/cluster/app/kustomization.yaml

  18. Storage Namespace:

  19. /kubernetes/apps/storage/kustomization.yaml - Registers rook-ceph with Flux

Ceph Configuration

Cluster Specifications

  • Ceph Version: v19.2.0 (Squid)
  • Monitors: 3 replicas (1 per worker node)
  • Managers: 2 replicas
  • OSDs: 1 per worker node (using /dev/nvme0n1p5)
  • Data Replication: 3 replicas (min 2 for safety)
  • Failure Domain: host-level
  • Compression: Aggressive mode enabled

Storage Classes

Two storage classes are provided:

  1. ceph-block - RBD (Block) storage
  2. Provisioner: rook-ceph.rbd.csi.ceph.com
  3. Format: ext4
  4. Supports volume expansion
  5. Reclaim policy: Delete

  6. ceph-filesystem - CephFS (Shared filesystem) storage

  7. Provisioner: rook-ceph.cephfs.csi.ceph.com
  8. MDS: 1 active + 1 standby
  9. Supports volume expansion
  10. Reclaim policy: Delete

Resource Allocation

  • MON: 500m CPU, 1-2Gi memory
  • OSD: 1000m CPU, 2-4Gi memory
  • MGR: 500m CPU, 512Mi-1Gi memory
  • MDS: 500m CPU, 1-2Gi memory

Deployment Steps

1. Apply Talos Changes

Follow the Worker Upgrade Procedure to safely upgrade each worker node with the new disk layout.

2. Deploy Rook-Ceph

After all workers are upgraded, commit and push the configuration:

git add -A
git commit -m "feat: add rook-ceph with dedicated storage partitions"
git push

# Flux will automatically deploy rook-ceph from git
# Monitor the deployment:
flux get ks -A
flux get hr -A

# Check rook-ceph operator
kubectl -n rook-ceph get pods

# Check Ceph cluster status (after cluster is deployed)
kubectl -n rook-ceph get cephcluster

3. Verify Ceph Health

# Get a shell in the rook-ceph toolbox
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- bash

# Inside the toolbox:
ceph status
ceph osd status
ceph df

4. Use Ceph Storage

Create a PVC using one of the storage classes:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-block
  resources:
    requests:
      storage: 10Gi

Monitoring

The rook-ceph operator includes monitoring support. Metrics are exposed for Prometheus scraping.

Dashboard

Ceph dashboard is enabled (HTTP, not HTTPS). To access it:

# Port-forward to the dashboard
kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 8443:8443

# Get dashboard password
kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
  -o jsonpath="{['data']['password']}" | base64 --decode

Troubleshooting

OSDs not starting

# Check OSD pod logs
kubectl -n rook-ceph logs -l app=rook-ceph-osd

# Verify the partition exists and is accessible
talosctl --nodes 10.0.50.10 get discoveredvolumes | grep nvme0n1p5

Ceph cluster stuck in HEALTH_WARN

# Get detailed status
kubectl -n rook-ceph exec -it deployment/rook-ceph-tools -- ceph -s

# Common issues:
# - Not enough OSDs (need at least 3)
# - Clock skew between nodes
# - Network issues between nodes

Re-creating the cluster

If you need to completely wipe Ceph data:

# Delete the Ceph cluster (but not operator)
kubectl -n rook-ceph delete cephcluster rook-ceph

# On each worker node, wipe the Ceph partition:
talosctl --nodes 10.0.50.10 reset --graceful=false --reboot \
  --system-labels-to-wipe STATE --system-labels-to-wipe EPHEMERAL

# Then re-apply the Ceph cluster
flux reconcile kustomization rook-ceph-cluster

References