'데이터 엔지니어'로 성장하기

정리하는 걸 좋아하고, 남이 읽으면 더 좋아함

Storage/Ceph

Ceph) k8s_rook-ceph_osd 제거하기

MightyTedKim 2022. 8. 23. 10:00
728x90
반응형

개발계 k8s cluster를 xen server의 vm에 올려 사용하고 있어요

그런데 snapshot 버그 이슈로, SR(storage repository)가 꽉차는 문제가 발생했어요.

disk를 확보해야했기 때문에 osd를 하나씩 삭제하고, 다른 hdd를 할당해줬습니다.

(osd는 down되도, 데이터가 날라가지 않으니까요)

 

나중에 까먹을 것 같아서 정리해요

 

요약

  1. 상황
  2. 조치
  3. 기타

 

설명

상황

k8s02서버의 ceph가 꽉차서, write가 되지 않는 문제에요

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  k8s04  6210M  1017G      1     10.8k      3       60    exists,up
 1  k8s05  7175M  1016G      0     3276       1        0    exists,up
 2  k8s06  6365M  1017G      0        0       3       15      exists,up
 3  k8s02     0      0       0        0       0        0                autoout,exists
 4  k8s03  8032M  1016G      0     10.3k      2        0   exists,up
 5  k8s02  6454M  1017G      0        0       4        0      exists,up

osd-prepare를 보면 read-only file system이어서 Errorr 나요

$ k describe pod rook-ceph-osd-prepare-k8s03-gdxzm -n rook-ceph
Events:
  Type     Reason       Age                   From               Message
  ----     ------       ----                  ----               -------
  Normal   Scheduled    12m                   default-scheduler  Successfully assigned rook-ceph/rook-ceph-osd-prepare-gpu03-gdxzm to gpu03
  Warning  FailedMount  12m (x7 over 12m)     kubelet            MountVolume.SetUp failed for volume "ceph-conf-emptydir" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dm
  Warning  FailedMount  12m (x7 over 12m)     kubelet            MountVolume.SetUp failed for volume "rook-binaries" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dede1c
  Warning  FailedMount  12m (x7 over 12m)     kubelet            MountVolume.SetUp failed for volume "kube-api-access-7mnq9" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52stem
  Warning  Failed       2m33s (x48 over 12m)  kubelet            error making pod data directories: mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dede1c3: read-only file system

 

ceph

$ k get deploy -n rook-ceph | grep osd-3
rook-ceph-osd-3                    0/1     1            0           205d

조치

rook-ceph로 올려서 deployment를 삭제해줘요

# $ kubectl delete deployment -n rook-ceph rook-ceph-osd-ID
$ kubectl delete deployment -n rook-ceph rook-ceph-osd-3

# osd-3이 사라짐
$ k get deploy -n rook-ceph | grep osd-
rook-ceph-osd-0                    1/1     1            1           205d
rook-ceph-osd-1                    1/1     1            1           205d
rook-ceph-osd-2                    1/1     1            1           205d
rook-ceph-osd-4                    1/1     1            1           205d

rook-ceph tool에 들어가서 아래 명령어도 날려줘요

ceph osd crush remove osd.3
ceph auth del osd.3
ceph osd rm 3

명령어 입력할 때마다 osd status로 상황을 화인했어요

# 시작 전
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status

ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  k8s04  6219M  1017G      2     25.5k      4       89   exists,up
 1  k8s05  6982M  1017G      0      818       1        0   exists,up
 2  k8s06  6159M  1017G      0        0       6      424   exists,up
 3  k8s02     0      0       0        0       0        0   autoout,exists
 4  k8s03  8032M  1016G      0     2457      16     2486   exists,up
 5  k8s02  6327M  1017G      0        0      15     1504   exists,up

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph auth del osd.3
updated

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  k8s04  6210M  1017G      1     10.8k      3       60   exists,up
 1  k8s05  7175M  1016G      0     3276       1        0   exists,up
 2  k8s06  6365M  1017G      0        0       3       15   exists,up
 3  k8s02     0      0       0        0       0        0   autoout,exists
 4  k8s03  8032M  1016G      0     10.3k      2        0   exists,up
 5  k8s02  6454M  1017G      0        0       4        0   exists,up

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd rm 3
removed osd.3

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID  HOST      USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  k8s04  5992M  1018G      2     25.2k      7      147   exists,up
 1  k8s05  7291M  1016G      0     2457       1        0   exists,up
 2  k8s06  6524M  1017G      0        0       3       58   exists,up
 4  k8s03  7699M  1016G      1        4       3      660   exists,up
 5  k8s02  6547M  1017G      0        0       4       42   exists,up

[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph status
  cluster:
    id:     9cc2dec6-5cbf-49c3-abdf-1eaa15ec54e2
    health: HEALTH_WARN
            690 daemons have recently crashed

  services:
    mon: 3 daemons, quorum a,b,c (age 8w)
    mgr: a(active, since 2w)
    mds: 1/1 daemons up, 1 hot standby
    osd: 5 osds: 5 up (since 30m), 5 in (since 30m); 23 remapped pgs
    rgw: 6 daemons active (3 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 177 pgs
    objects: 8.08k objects, 9.9 GiB
    usage:   33 GiB used, 5.0 TiB / 5.0 TiB avail
    pgs:     1924/24231 objects misplaced (7.940%)
             153 active+clean
             22  active+remapped+backfill_wait
             2   active+remapped+backfilling

  io:
    client:   2.5 KiB/s rd, 17 KiB/s wr, 3 op/s rd, 2 op/s wr
    recovery: 24 MiB/s, 2 keys/s, 0 objects/s

참고

https://documentation.suse.com/ses/7/html/ses-all/admin-caasp-cephosd.html

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.2.3/html/red_hat_ceph_administration_guide/removing-osds-manual

 

728x90
반응형