728x90
반응형
개발계 k8s cluster를 xen server의 vm에 올려 사용하고 있어요
그런데 snapshot 버그 이슈로, SR(storage repository)가 꽉차는 문제가 발생했어요.
disk를 확보해야했기 때문에 osd를 하나씩 삭제하고, 다른 hdd를 할당해줬습니다.
(osd는 down되도, 데이터가 날라가지 않으니까요)
나중에 까먹을 것 같아서 정리해요
요약
- 상황
- 조치
- 기타
설명
상황
k8s02서버의 ceph가 꽉차서, write가 되지 않는 문제에요
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 k8s04 6210M 1017G 1 10.8k 3 60 exists,up
1 k8s05 7175M 1016G 0 3276 1 0 exists,up
2 k8s06 6365M 1017G 0 0 3 15 exists,up
3 k8s02 0 0 0 0 0 0 autoout,exists
4 k8s03 8032M 1016G 0 10.3k 2 0 exists,up
5 k8s02 6454M 1017G 0 0 4 0 exists,up
osd-prepare를 보면 read-only file system이어서 Errorr 나요
$ k describe pod rook-ceph-osd-prepare-k8s03-gdxzm -n rook-ceph
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned rook-ceph/rook-ceph-osd-prepare-gpu03-gdxzm to gpu03
Warning FailedMount 12m (x7 over 12m) kubelet MountVolume.SetUp failed for volume "ceph-conf-emptydir" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dm
Warning FailedMount 12m (x7 over 12m) kubelet MountVolume.SetUp failed for volume "rook-binaries" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dede1c
Warning FailedMount 12m (x7 over 12m) kubelet MountVolume.SetUp failed for volume "kube-api-access-7mnq9" : mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52stem
Warning Failed 2m33s (x48 over 12m) kubelet error making pod data directories: mkdir /var/lib/kubelet/pods/bd502051-3ccc-4972-92aa-e52c6dede1c3: read-only file system
ceph
$ k get deploy -n rook-ceph | grep osd-3
rook-ceph-osd-3 0/1 1 0 205d
조치
rook-ceph로 올려서 deployment를 삭제해줘요
# $ kubectl delete deployment -n rook-ceph rook-ceph-osd-ID
$ kubectl delete deployment -n rook-ceph rook-ceph-osd-3
# osd-3이 사라짐
$ k get deploy -n rook-ceph | grep osd-
rook-ceph-osd-0 1/1 1 1 205d
rook-ceph-osd-1 1/1 1 1 205d
rook-ceph-osd-2 1/1 1 1 205d
rook-ceph-osd-4 1/1 1 1 205d
rook-ceph tool에 들어가서 아래 명령어도 날려줘요
ceph osd crush remove osd.3
ceph auth del osd.3
ceph osd rm 3
명령어 입력할 때마다 osd status로 상황을 화인했어요
# 시작 전
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 k8s04 6219M 1017G 2 25.5k 4 89 exists,up
1 k8s05 6982M 1017G 0 818 1 0 exists,up
2 k8s06 6159M 1017G 0 0 6 424 exists,up
3 k8s02 0 0 0 0 0 0 autoout,exists
4 k8s03 8032M 1016G 0 2457 16 2486 exists,up
5 k8s02 6327M 1017G 0 0 15 1504 exists,up
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph auth del osd.3
updated
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 k8s04 6210M 1017G 1 10.8k 3 60 exists,up
1 k8s05 7175M 1016G 0 3276 1 0 exists,up
2 k8s06 6365M 1017G 0 0 3 15 exists,up
3 k8s02 0 0 0 0 0 0 autoout,exists
4 k8s03 8032M 1016G 0 10.3k 2 0 exists,up
5 k8s02 6454M 1017G 0 0 4 0 exists,up
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd rm 3
removed osd.3
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 k8s04 5992M 1018G 2 25.2k 7 147 exists,up
1 k8s05 7291M 1016G 0 2457 1 0 exists,up
2 k8s06 6524M 1017G 0 0 3 58 exists,up
4 k8s03 7699M 1016G 1 4 3 660 exists,up
5 k8s02 6547M 1017G 0 0 4 42 exists,up
[rook@rook-ceph-tools-74bb778c5-mpzhl /]$ ceph status
cluster:
id: 9cc2dec6-5cbf-49c3-abdf-1eaa15ec54e2
health: HEALTH_WARN
690 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 8w)
mgr: a(active, since 2w)
mds: 1/1 daemons up, 1 hot standby
osd: 5 osds: 5 up (since 30m), 5 in (since 30m); 23 remapped pgs
rgw: 6 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 177 pgs
objects: 8.08k objects, 9.9 GiB
usage: 33 GiB used, 5.0 TiB / 5.0 TiB avail
pgs: 1924/24231 objects misplaced (7.940%)
153 active+clean
22 active+remapped+backfill_wait
2 active+remapped+backfilling
io:
client: 2.5 KiB/s rd, 17 KiB/s wr, 3 op/s rd, 2 op/s wr
recovery: 24 MiB/s, 2 keys/s, 0 objects/s
참고
https://documentation.suse.com/ses/7/html/ses-all/admin-caasp-cephosd.html
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.2.3/html/red_hat_ceph_administration_guide/removing-osds-manual
728x90
반응형
'Storage > Ceph' 카테고리의 다른 글
Ceph) 기타 설정_memory, mgr replication, osd 지정 (0) | 2022.12.23 |
---|---|
ceph) osd 초기화시키_TYPE="ceph_bluestore" 초기화 (0) | 2022.11.30 |
ceph)mc cp로 클러스터간 데이터 옮기기 (0) | 2022.05.31 |
Ceph) readonly 계정 만들기_radosgw,subuser,s3 policy (0) | 2022.04.25 |
Ceph) ceph-dashboard object-storage ui 보이게 설정 (0) | 2022.04.21 |