K8s CSI 存储卷生命周期管理:探针设计与自动运维系统
2026/6/6 5:36:17 网站建设 项目流程

K8s CSI 存储卷生命周期管理:探针设计与自动运维系统

引言

Kubernetes 容器存储接口 (CSI) 是云原生生态中至关重要的组件,它实现了存储系统与容器编排平台的解耦。CSI 驱动通常由多个组件组成,包括 Controller Plugin、Node Plugin 等,每个组件都有其特殊的健康检查需求。设计合理的就绪探针 (Readiness Probe) 和存活探针 (Liveness Probe) 对于确保 CSI 驱动的稳定运行至关重要。

本文将深入探讨如何在 K8s 中部署 CSI 存储卷生命周期管理服务时,合理设计和配置这些探针,以及如何构建一套完整的自动运维系统。

一、 CSI Sidecar 的探针设计

1.1 CSI 组件的探针特殊性

CSI 组件与传统应用的探针设计有显著差异,需要考虑存储系统的特殊性:

CSI 组件关键检测项就绪条件存活条件
csi-provisionergRPC 端点可达Controller 服务注册完成gRPC 服务正常
csi-attacherVolumeAttachment 处理无积压的 Attach 请求不 panic
csi-node-driver存储设备可达节点存储后端可达不 segfault
csi-snapshotter快照 API 可达Snapshot Controller 正常gRPC 连接正常
csi-resizer扩容能力支持 VolumeExpansion资源充足
flowchart TD A[CSI 组件] --> B{存活探针检测} B -->|失败 | C[重启容器] B -->|成功 | D{就绪探针检测} D -->|失败 | E[从服务端点移除] D -->|成功 | F[正常服务] F --> G[周期性检测]

1.2 探针配置详解

apiVersion: apps/v1 kind: Deployment metadata: name: csi-provisioner namespace: kube-system spec: template: spec: containers: - name: csi-provisioner image: registry.k8s.io/sig-storage/csi-provisioner:v4.0.0 args: - --csi-address=/var/lib/csi/csi.sock - --feature-gates=Topology=true - --timeout=300s - --worker-threads=10 - --health-port=9808 livenessProbe: httpGet: path: /healthz port: 9808 initialDelaySeconds: 30 periodSeconds: 30 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: exec: command: - /bin/sh - -c - | # 检查 gRPC socket if [ -S /var/lib/csi/csi.sock ]; then echo "gRPC socket ready" exit 0 fi echo "gRPC socket not ready" exit 1 initialDelaySeconds: 10 periodSeconds: 15 timeoutSeconds: 3 startupProbe: exec: command: - /csi-provisioner - --health-check failureThreshold: 30 periodSeconds: 10 volumes: - name: socket-dir hostPath: path: /var/lib/kubelet/plugins/csi.example.com type: DirectoryOrCreate --- kind: DaemonSet metadata: name: csi-node-driver spec: template: spec: containers: - name: node-driver-registrar image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.8.0 args: - --csi-address=/csi/csi.sock - --kubelet-registration-path=/var/lib/kubelet/plugins/csi.example.com/csi.sock livenessProbe: exec: command: - /csi-node-driver-registrar - --health-check periodSeconds: 20 readinessProbe: exec: command: - /bin/sh - -c - | # 检查注册文件 if [ -f /registration/csi.example.com-reg.sock ]; then echo "Driver registered" exit 0 fi exit 1 initialDelaySeconds: 5 - name: csi-driver image: example.com/csi-driver:v1.0.0 args: - --nodeid=$(NODE_ID) - --endpoint=unix:///csi/csi.sock - --health-port=9809 livenessProbe: httpGet: path: /healthz port: 9809 initialDelaySeconds: 15 readinessProbe: exec: command: - /bin/sh - -c - | # 检查存储后端可达 if mount | grep -q "csi-example" || lsblk | grep -q "csi-device"; then echo "Storage backend reachable" exit 0 fi # 尝试连接存储后端 if /csi-driver check-storage; then exit 0 fi exit 1

二、 自动运维系统设计

2.1 探针故障自动恢复

apiVersion: v1 kind: ConfigMap metadata: name: csi-auto-recovery namespace: kube-system data: recovery.sh: | #!/bin/bash set -e NAMESPACE="kube-system" LOG_FILE="/var/log/csi-recovery.log" log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> $LOG_FILE } # 检查 csi-provisioner 状态 log "Checking CSI provisioner status" for pod in $(kubectl get pods -n $NAMESPACE -l app=csi-provisioner -o jsonpath='{.items[*].metadata.name}'); do ready=$(kubectl get pod $pod -n $NAMESPACE -o jsonpath='{.status.conditions[?(at.type=="Ready")].status}') if [ "$ready" != "True" ]; then log "Restarting unready provisioner: $pod" kubectl delete pod $pod -n $NAMESPACE --force --grace-period=0 fi done # 检查 csi-node-driver 状态 log "Checking CSI node driver status" for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do pod=$(kubectl get pods -n $NAMESPACE -l app=csi-node-driver --field-selector spec.nodeName=$node -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true) if [ -z "$pod" ]; then log "Missing node driver on node: $node" continue fi ready=$(kubectl get pod $pod -n $NAMESPACE -o jsonpath='{.status.conditions[?(at.type=="Ready")].status}' 2>/dev/null || true) if [ "$ready" != "True" ]; then log "Restarting unready node driver: $pod on node $node" # 检查是否为存储路径问题 log "Checking storage paths on node $node" kubectl debug node/$node -it --image=busybox -- chroot /host ls -la /var/lib/kubelet/plugins/csi.example.com/ || true kubectl delete pod $pod -n $NAMESPACE --force --grace-period=0 fi done # 检查 VolumeAttachment 积压 log "Checking VolumeAttachment backlog" pending_va=$(kubectl get volumeattachments --no-headers | grep -v "attached" | wc -l) if [ "$pending_va" -gt 10 ]; then log "High VolumeAttachment backlog: $pending_va" # 触发告警或自动处理 fi log "Recovery check completed" --- apiVersion: batch/v1 kind: CronJob metadata: name: csi-auto-recovery namespace: kube-system spec: schedule: "*/5 * * * *" concurrencyPolicy: Forbid jobTemplate: spec: template: spec: serviceAccountName: csi-operator containers: - name: recovery image: bitnami/kubectl:latest command: ["/bin/bash", "/scripts/recovery.sh"] volumeMounts: - name: scripts mountPath: /scripts - name: log-volume mountPath: /var/log volumes: - name: scripts configMap: name: csi-auto-recovery defaultMode: 0755 - name: log-volume emptyDir: {} restartPolicy: OnFailure

2.2 Operator 化运维

使用 Operator 模式管理 CSI 组件的全生命周期:

package main import ( "context" corev1 "k8s.io/api/core/v1" "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/client-go/kubernetes" "k8s.io/client-go/rest" ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/client" "sigs.k8s.io/controller-runtime/pkg/log" storagev1 "k8s.io/api/storage/v1" ) type CSIReconciler struct { client.Client KubeClient *kubernetes.Clientset } func (r *CSIReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { log := log.FromContext(ctx) // 检查 Pod 状态 var pod corev1.Pod if err := r.Get(ctx, req.NamespacedName, &pod); err != nil { if errors.IsNotFound(err) { return ctrl.Result{}, nil } return ctrl.Result{}, err } // 检查 CSI Driver 健康 if pod.Labels["app"] == "csi-node-driver" { for _, cond := range pod.Status.Conditions { if cond.Type == corev1

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询