需求分析

  • 环境 kubernetes 1.12.6
  • 使用 storageclass 和 pvc 动态生成 pv
  • storageclass 后端存储使用 ceph rbd
  • 使用 kubectl edit 编辑 pvc,对已有的 pvc 进行扩容,并且容器内部文件系统自动扩容

1. kubernetes 1.12.6 环境部署

由于 1.12.6 版本比较古老,kubeadm 易用性远不如后边较新几个版本,使用方式不太一致,并且存在一定的不合理的问题

  1. docker 版本最高 18.6:yum install -y docker-ce-18.06.0.ce-3.el7 docker-ce-cli-18.06.0.ce-3.el7

  2. 能翻墙的话,尽量直接使用 kubeadm-1.12.6 版本部署: kubeadm init --kubernetes-version=1.12.6

  3. 使用 kubeadm-1.12.0 并且使用配置文件(配置文件更改镜像源为国内可用源),部署会遇到

    1. invalid configuration: kinds [InitConfiguration MasterConfiguration JoinConfiguration NodeConfiguration] are mutually exclusive 由于 kubeadm-1.12 输出的配置信息过多——包含 master 节点信息以及 计算节点 加入(join)的信息,kubeadm-1.12.0 无法正确识别,解决方式:
      1. 我们是要部署master 节点需要移除其中 JoinConfiguration 部分
      2. 升级 kubeadm 到 1.12.6 版本
    2. pause 需要另外配置,否则会出现
    [init] this might take a minute or longer if the control plane images have to be pulled
    
    Unfortunately, an error has occurred:
    timed out waiting for the condition
    
    This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    
    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
    

    查看 kubelet 日志信息,一切正常,且无其他日志信息,解决方式

    1. 直接更改 pause 镜像 tag:docker tag registry.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
    2. 修改 kubeadm init 使用的配置文件:
    kind: InitConfiguration
    nodeRegistration:
    kubeletExtraArgs:
        pod-infra-container-image: registry.aliyuncs.com/google_containers/pause:3.1
    
  4. kubelet 日志:node "master" not found

    1. 首先:查看 /etc/hosts 文件,确定配好静态域名解析
    2. 更改 kubeadm init 配置文件中的 advertiseAddress:
    apiEndpoint:
    advertiseAddress: 192.168.1.147(改为虚机可用的 ip)
    

2. storageclass->pvc 动态创建 pv 测试

多次测试,只能使用 rbac 模式下的 out-of-tree 模式部署!

参考:https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd,这个库还是有些问题:

  1. non-rbac 模式无法完成部署,也没有报错日志
  2. rbac 模式,权限定义有问题
  • 故只描述安装自己验证成功的过程
  1. 先创建 role,clusterrole,serviceaccount,rolebinding,clusterrolebinding 确保权限

    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
    name: rbd-provisioner
    rules:
    - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
    - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
    name: rbd-provisioner
    rules:
    - apiGroups: [""]
        resources: ["persistentvolumes"]
        verbs: ["get", "list", "watch", "create", "delete"]
    - apiGroups: [""]
        resources: ["persistentvolumeclaims"]
        verbs: ["get", "list", "watch", "update"]
    - apiGroups: ["storage.k8s.io"]
        resources: ["storageclasses"]
        verbs: ["get", "list", "watch"]
    - apiGroups: [""]
        resources: ["events"]
        verbs: ["create", "update", "patch"]
    - apiGroups: [""]
        resources: ["services"]
        resourceNames: ["kube-dns","coredns"]
        verbs: ["list", "get"]
    - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["get", "list", "watch", "create", "update", "patch"]
    - apiGroups: [""]
        resources: ["secrets"]
        verbs: ["create","get","list","watch"]
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: rbd-provisioner
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
    name: rbd-provisioner
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: Role
    name: rbd-provisioner
    subjects:
    - kind: ServiceAccount
    name: rbd-provisioner
    namespace: default
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
    name: rbd-provisioner
    subjects:
    - kind: ServiceAccount
        name: rbd-provisioner
        namespace: default
    roleRef:
    kind: ClusterRole
    name: rbd-provisioner
    apiGroup: rbac.authorization.k8s.io
    
  2. 创建 out-of-tree provisioner (deployment)

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: rbd-provisioner
    spec:
    replicas: 1
    selector:
        matchLabels:
        app: rbd-provisioner
    strategy:
        type: Recreate
    template:
        metadata:
        labels:
            app: rbd-provisioner
        spec:
        containers:
        - name: rbd-provisioner
            image: "quay.io/external_storage/rbd-provisioner:latest"
            env:
            - name: PROVISIONER_NAME
            value: ceph.com/rbd
        serviceAccount: rbd-provisioner
    
  • 使用 storageclass 动态创建 pv 大致流程
    • 创建 secret 连接 ceph

      apiVersion: v1
      kind: Secret
      metadata:
      name: ceph-admin-secret
      namespace: kube-system
      type: "kubernetes.io/rbd"
      data:
      # ceph auth get-key client.admin | base64
      key: QVFBTGVhSllrdzRRQkJBQXA0d3BRTlJZaDFDOFUwUTRUcDE0OXc9PQ==
      ---
      apiVersion: v1
      kind: Secret
      metadata:
      name: ceph-secret
      namespace: kube-system
      type: "kubernetes.io/rbd"
      data:
      # ceph auth add client.kube mon 'allow r' osd 'allow rwx pool=kube'
      # ceph auth get-key client.kube | base64
      key: QVFEc2paMWNqU0UyRWhBQUlPR254dVhWRWtKd2I1cnFhUlI4VVE9PQ==
      
    • 创建 storageclass

# resize 需打开,且只能扩大 不能缩小
allowVolumeExpansion: true
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: 10.20.9.22:6789
  pool: kube
  adminId: admin
  adminSecretNamespace: kube-system
  adminSecretName: ceph-admin-secret
  userId: kube
  userSecretNamespace: kube-system
  userSecretName: ceph-secret
  imageFormat: "2"
  • 创建 pvc 对象
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: claim1
spec:
accessModes:
    - ReadWriteOnce
storageClassName: rbd
resources:
    requests:
    storage: 1Gi
  • 创建 挂载 pvc 的 pod
kind: Pod
apiVersion: v1
metadata:
name: test-pod2
spec:
containers:
- name: test-pod
    image: busybox
    command:
    - "/bin/sh"
    args:
    - "-c"
    #- "touch /mnt/SUCCESS && exit 0 || exit 1"
    - "touch /mnt/SUCCESS && sleep 3600"
    volumeMounts:
    - name: pvc
    mountPath: "/mnt"
restartPolicy: "Never"
volumes:
- name: pvc
    persistentVolumeClaim:
    claimName: claim1

此时,能够正确建立 pvc,pod, 而且能自动生成 pv,接下来验证 resize 功能

开始测试resize

resize 时 发现还是需要 rbd 命令行

  • 流程:创建 pvc,创建pod,edit pvc

  • 看到的效果:VolumeResizeFailed

Conditions:
  Type       Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----       ------  -----------------                 ------------------                ------  -------
  Resizing   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 25 Dec 2019 20:13:14 +0800           
Events:
  Type     Reason                 Age                   From                                                                               Message
  ----     ------                 ----                  ----                                                                               -------
  Normal   ExternalProvisioning   21m                   persistentvolume-controller                                                        waiting for a volume to be created, either by external provisioner "ceph.com/rbd" or manually created by system administrator
  Normal   Provisioning           21m                   ceph.com/rbd_rbd-provisioner-98b88f5d6-vdl99_bc97b613-26da-11ea-a08c-9e09d9def392  External provisioner is provisioning volume for claim "default/claim2"
  Normal   ProvisioningSucceeded  21m                   ceph.com/rbd_rbd-provisioner-98b88f5d6-vdl99_bc97b613-26da-11ea-a08c-9e09d9def392  Successfully provisioned volume pvc-2d89a03c-ca23-401e-bdfa-3daea35b228f
  Warning  VolumeResizeFailed     9m22s (x18 over 20m)  volume_expand
  • 缺少 rbd 相关命令,还是得替换 controller-manager 镜像:直接替换 /etc/kubernetes/manifests/kube-controller-manager.yaml 中的镜像,集群会自动重启(kubelet 监控)

然后重复 流程:创建 pvc,创建pod,edit pvc