'데이터 엔지니어'로 성장하기

정리하는 걸 좋아하고, 남이 읽으면 더 좋아함

Storage/mariadb

Kubernetes) helm mariadb innodb recovery 하기

MightyTedKim 2022. 9. 21. 15:16
728x90
반응형

 

주말 쉬고 오니까 mariadb가 내려가있더라고요
crashloopbackoff 가 돌고 있길래 급하게 pvc를 백업하고 살펴봤습니다.

$ k get pod -n mariadb
NAME READY STATUS RESTARTS AGE
mariadb-primary-0 0/1 CrashLoopBackOff 562 (2m2s ago) 117d

요약

결론 부터 말하면, innodb_force_recovery=5로 해결했습니다.
여전히 tablespace error가 나오길래, sqldump 후 .sql 을 local에 저장하고
새로운 mariadb를 만들고 import 시켰습니다.

예상 독자는 아래와 같아요

  1. helm mariadb를 돌리다 장애나신 분
  2. mariadb heuristic recovery가 안되는 분

 

설명

1. mariadb crashloopbackoff 로그 보기

처음에 로그를 보니 딱히 내용이 없어요

$k logs pod/mariadb-primary-0 -n mariadb
mariadb 23:08:15.93
mariadb 23:08:15.93 Welcome to the Bitnami mariadb container
mariadb 23:08:15.93 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-mariadb
mariadb 23:08:15.93 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-mariadb/issues
mariadb 23:08:15.93
mariadb 23:08:15.93 INFO  ==> ** Starting MariaDB setup **
mariadb 23:08:15.95 INFO  ==> Validating settings in MYSQL_*/MARIADB_* env vars
mariadb 23:08:15.95 INFO  ==> Initializing mariadb database
mariadb 23:08:15.96 WARN  ==> The mariadb configuration file '/opt/bitnami/mariadb/conf/my.cnf' is not writable. Configurations based on environment variables will not be applied for this file.
mariadb 23:08:15.96 INFO  ==> Using persisted data
mariadb 23:08:15.98 INFO  ==> Running mysql_upgrade
mariadb 23:08:15.98 INFO  ==> Starting mariadb in background

describe를 봐도 socket이 없다는 에러만 나오더라고요

Events:
  Type     Reason     Age                      From     Message
  ----     ------     ----                     ----     -------
  Normal   Pulled     22m (x551 over 5d2h)     kubelet  Container image "harbor.hgk/library/bitnami/mariadb:v10.5.13-20211206" already present on machine
  Warning  Unhealthy  7m40s (x9567 over 5d2h)  kubelet  Readiness probe failed: mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local server through socket '/opt/bitnami/mariadb/tmp/mysql.sock' (2)'
Check that mariadbd is running and that the socket: '/opt/bitnami/mariadb/tmp/mysql.sock' exists!
  Warning  BackOff  2m48s (x6804 over 2d2h)  kubelet  Back-off restarting failed container

그래서 values.yaml에 debug: true를 해서 로그를 자세하게 봤어요

$ vi values.yaml
image:
debug: true #false

$ helm upgrade --install mariadb harbor.hgk/mariadb -n mariadb -f values.yaml --set auth.replicationPassword=mysql --debug --insecure-skip-tls-verify

이제 로그가 더 자세히 보여요

[ERROR] Found 151 prepared transactions!
It means that server was not shown properly last time and critical recovery information (last binlog or tc.log file) was manueleted after a crash.
You have to start server with --tc-heuristic-recover switch to commit rollback pending transactions.

공부할 때는 pvc 날리거나 ibdata, ib_logfile0 삭제해서 해결했는데, 운영 데이터 복구가 목적이라서 고민에 빠졌어요

[상세로그]

더보기

$ k logs mariadb-primary-0 -n mariadb -f
mariadb 03:33:15.70
mariadb 03:33:15.70 Welcome to the Bitnami mariadb container
mariadb 03:33:15.70 Subscribe to project updates by watching https://github.com/bitnami/bitnaker-mariadb
mariadb 03:33:15.70 Submit issues and feature requests at https://github.com/bitnami/bitnami--mariadb/issues
mariadb 03:33:15.70
mariadb 03:33:15.70 INFO ==> ** Starting MariaDB setup **
mariadb 03:33:15.72 INFO ==> Validating settings in MYSQL_*/MARIADB_* env vars
mariadb 03:33:15.72 INFO ==> Initializing mariadb database
mariadb 03:33:15.73 DEBUG ==> Ensuring expected directories/files exist
mariadb 03:33:15.73 WARN ==> The mariadb configuration file '/opt/bitnami/mariadb/conf/my.cnnot writable. Configurations based on environment variables will not be applied for this file
mariadb 03:33:15.74 INFO ==> Using persisted data
mariadb 03:33:15.75 INFO ==> Running mysql_upgrade
mariadb 03:33:15.76 INFO ==> Starting mariadb in background
2022-09-19 3:33:15 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (server 10.6.5-MariaDB-log) staas process 52 ...
2022-09-19 3:33:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-09-19 3:33:15 0 [Note] InnoDB: Number of pools: 1
2022-09-19 3:33:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-09-19 3:33:15 0 [Note] mysqld: O_TMPFILE is not supported on /opt/bitnami/mariadb/tmp (ing future attempts)
2022-09-19 3:33:15 0 [Note] InnoDB: Using Linux native AIO
2022-09-19 3:33:15 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk 134217728
2022-09-19 3:33:15 0 [Note] InnoDB: Completed initialization of buffer pool
2022-09-19 3:33:16 0 [Note] InnoDB: Transaction 90848603 was in the XA prepared state.
(생략)
2022-09-19 3:33:16 0 [Note] InnoDB: Transaction 90848602 was in the XA prepared state.
2022-09-19 3:33:16 0 [Note] InnoDB: 151 transaction(s) which must be rolled back or cleaned total 0 row operations to undo
2022-09-19 3:33:16 0 [Note] InnoDB: Trx id counter is 90848718
2022-09-19 3:33:16 0 [Note] InnoDB: 128 rollback segments are active.
2022-09-19 3:33:16 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-09-19 3:33:16 0 [Note] InnoDB: Starting in background the rollback of recovered transac
2022-09-19 3:33:16 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writinfile full; Please wait ...
2022-09-19 3:33:16 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2022-09-19 3:33:16 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-09-19 3:33:16 0 [Note] InnoDB: 10.6.5 started; log sequence number 33296898334; transacd 90848719
2022-09-19 3:33:16 0 [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_bufol
2022-09-19 3:33:16 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-09-19 3:33:16 0 [Note] Recovering after a crash using mysql-bin
2022-09-19 3:33:16 0 [Note] Starting table crash recovery...
2022-09-19 3:33:16 0 [Note] InnoDB: Starting recovery for XA transactions...
2022-09-19 3:33:16 0 [Note] InnoDB: Transaction 90848485 in prepared state after recovery
2022-09-19 3:33:16 0 [Note] InnoDB: Transaction contains changes to 1 rows
(생략)
2022-09-19 3:33:16 0 [Note] InnoDB: Transaction contains changes to 1 rows
2022-09-19 3:33:16 0 [Note] InnoDB: 151 transactions in prepared state after recovery
2022-09-19 3:33:16 0 [Note] Found 151 prepared transaction(s) in InnoDB
2022-09-19 3:33:16 0 [Note] Crash table recovery finished.
2022-09-19 3:33:16 0 [Note] InnoDB: Buffer pool(s) load completed at 220919 3:33:16
2022-09-19 3:33:17 0 [Note] InnoDB: Starting recovery for XA transactions...
2022-09-19 3:33:17 0 [Note] InnoDB: Transaction 90848485 in prepared state after recovery
2022-09-19 3:33:17 0 [Note] InnoDB: Transaction contains changes to 1 rows
(생략)
2022-09-19 3:33:17 0 [Note] InnoDB: Transaction contains changes to 1 rows
2022-09-19 3:33:17 0 [Note] InnoDB: 151 transactions in prepared state after recovery
2022-09-19 3:33:17 0 [Note] Found 151 prepared transaction(s) in InnoDB
2022-09-19 3:33:17 0 [ERROR] Found 151 prepared transactions! It means that server was not shown properly last time and critical recovery information (last binlog or tc.log file) was manueleted after a crash. You have to start server with --tc-heuristic-recover switch to commit rollback pending transactions.
2022-09-19 3:33:17 0 [ERROR] Aborting

로그를 더 자세히 보고 싶었서 values.yaml의 configuration에 값을 추가했어요

configuration: |-
[mysqld]
general_log_file=/opt/bitnami/mariadb/logs/mysql.log
general_log=1
skip-name-resolve
explicit_defaults_for_timestamp


/opt/bitnami/mariadb/logs/mysql.log에서 상세 로그를 볼 수 있어서 만족스럽네요.
원인은 찾았고, pod안에 들어가서 데이터를 보기로 했어요

2. mariadb 내부 데이터 보기(diagnosticMode: True)

로그를 더 자세히 보고 싶었서 values.yaml의 diagnosticMode에 값을 변경했어요

diagnosticMode:
enabled: true #false
command:
- sleep
args:
- infinity

diagnosticMode를 True로 변경하면, process 1 이 sleep inifinity로 변경되요.

$ k exec -it mariadb-primary-0 -n mariadb -- bash
I have no name!@mariadb-primary-0:/$ ps -ef
UID PID PPID C STIME TTY TIME CMD
1001 1 0 0 02:36 ? 00:00:00 sleep infinity
1001 7 0 0 04:57 pts/0 00:00:00 bash
1001 13 7 0 04:57 pts/0 00:00:00 ps -ef

pod 에 들어가서 데이터가 어디에 있는지 확인했어요

$ k exec -it mariadb-primary-0 -n mariadb -- bash
I have no name!@mariadb-primary-0:/bitnami/mariadb/data$ ls -al -h
drwxrwsr-x 11 1001 root 4.0K Sep 21 02:35 .
drwxrwsrwx 4 root root 4.0K Sep 20 09:03 ..
-rw-rw---- 1 1001 root 1.6M Sep 21 02:35 aria_log.00000001
-rw-rw---- 1 1001 root 52 Sep 21 02:35 aria_log_control
-rw-rw---- 1 1001 root 18K Sep 21 02:35 ib_buffer_pool
-rw-rw---- 1 1001 root 96M Sep 21 02:35 ib_logfile0
-rw-rw---- 1 1001 root 140M Sep 21 02:35 ibdata1
-rw-rw---- 1 1001 root 0 Sep 20 09:02 multi-master.info
drwxrws--- 2 1001 root 4.0K Sep 20 09:02 my_database
drwxrws--- 2 1001 root 4.0K Sep 20 09:12 mysql
-rw-rw---- 1 1001 root 133 Sep 20 23:15 mysql-bin.index
-rw-rw---- 1 1001 root 30 Sep 21 02:35 mysql-bin.state
-rw-rw-r-- 1 1001 root 15 Sep 20 09:02 mysql_upgrade_info
drwxrws--- 2 1001 root 4.0K Sep 20 09:02 performance_schema
drwxrws--- 2 1001 root 12K Sep 20 09:02 sys
drwxrws--- 2 1001 root 4.0K Sep 20 09:02 test
(생략)

일단 백업해줍니다. snapshot을 떠놨지만, 백업은 아무리해도 부족하니까요
$ kubectl cp mariadb/mariadb-primary-0://bitnami/mariadb/data .backup_20220921/

3. 복구 시도하기

3-1. innodb_force_recovery 시도하기

이게 `diagnosticMode: false`로 실행했을 때의 명령이에요

/opt/bitnami/mariadb/sbin/mysqld \
--defaults-file=/opt/bitnami/mariadb/conf/my.cnf \
--basedir=/opt/bitnami/mariadb \
--datadir=/bitnami/mariadb/data \
--socket=/opt/bitnami/mariadb/tmp/mysql.sock \
--pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid \
--server-id=544 \
--binlog-format=ROW \
--log-bin=mysql-bin \
--sync-binlog=1\
--innodb_flush_log_at_trx_commit=1

innodb_force_recovery는 6단계까지 있고,
4단계 이상 부터는 데이터가 완전히 틀어져버린다고 하는데 저는 백업해뒀으니까 지릅니다.
자세한 내용은 아래 링크 참고
http://intomysql.blogspot.com/2010/12/innodb_3017.html

# values.yaml의 configuration [mysqld] 부분에 추가해도 동일하게 동작해요.
# 하지만 계속 helm 재시작해야해서 귀찮으니까. 저는 diagnosticMode: true로 설정하고 아래 명령어를 때렸어요
/opt/bitnami/mariadb/sbin/mysqld \
--defaults-file=/opt/bitnami/mariadb/conf/my.cnf \
--basedir=/opt/bitnami/mariadb \
--datadir=/bitnami/mariadb/data \
--socket=/opt/bitnami/mariadb/tmp/mysql.sock \
--pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid \
--server-id=544 \
--binlog-format=ROW \
--log-bin=mysql-bin \
--sync-binlog=1\
--innodb_flush_log_at_trx_commit=1 \
--innodb_force_recovery=5

다행히 5단계에서 데이터가 살아났어요.

5 ( SRV_FORCE_NO_UNDO_LOG_SCAN )

복구 모드 3 ( SRV_FORCE_NO_TRX_UNDO )의 처리와 비슷하며, 이 모드에서는 Undo 로그를 완전히 무시하고 완전히 종료되지 않은 트랜잭션(COMMIT/ROLLBACK되지 않은)을 모두 COMMIT된 것으로 처리해 버린다. 또한 이 모드에서는 Undo 로그를 전혀 참조하지 않는 형태로 강제 복구 시킨다.

아래는 실행했던 명령어고
`--innodb_force_recovery=5`

$ /opt/bitnami/mariadb/sbin/mysqld
--defaults-file=/opt/bitnami/mariadb/conf/my.cnf 
--basedir=/opt/bitnami/mariadb 
--datadir=/bitnami/mariadb/data
--socket=/opt/bitnami/mariadb/tmp/mysql.sock
--pid-file=/opt/bitnami/mariadb/tmp/mysqld.pid 
--server-id=544 
--binlog-format=ROW 
--log-bin=mysql-bin --sync-binlog=1
--innodb_flush_log_at_trx_commit=1
--innodb_force_recovery=5

이거는 로그에요
`2022-09-21 2:16:51 0 [Note] InnoDB: !!! innodb_force_recovery is set to 5 !!!`

2022-09-21 2:16:51 0 [Note] /opt/bitnami/mariadb/sbin/mysqld (server 10.6.5-MariaDB-log) starting as process 98 ...
2022-09-21 2:16:51 0 [Note] InnoDB: !!! innodb_force_recovery is set to 5 !!!
2022-09-21 2:16:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2022-09-21 2:16:51 0 [Note] InnoDB: Number of pools: 1 2022-09-21 2:16:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-09-21 2:16:51 0 [Note] mysqld: O_TMPFILE is not supported on /opt/bitnami/mariadb/tmp (disabling future attempts)
2022-09-21 2:16:51 0 [Note] InnoDB: Using Linux native AIO
2022-09-21 2:16:51 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-09-21 2:16:51 0 [Note] InnoDB: Completed initialization of buffer pool
2022-09-21 2:16:51 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-09-21 2:16:51 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-09-21 2:16:51 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-09-21 2:16:51 0 [Note] InnoDB: 10.6.5 started; log sequence number 33296901938; transaction id 0
2022-09-21 2:16:51 0 [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
2022-09-21 2:16:51 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-09-21 2:16:51 0 [Note] InnoDB: Buffer pool(s) load completed at 220921 2:16:51
2022-09-21 2:16:51 0 [Note] Recovering after a crash using mysql-bin
2022-09-21 2:16:51 0 [Note] Starting table crash recovery... 2022-09-21 2:16:51 0 [Note] Crash table recovery finished.
2022-09-21 2:16:52 1 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`gtid_slave_pos` in the cache. Attempting to load the tablespace with space id 4
2022-09-21 2:16:52 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-09-21 2:16:52 0 [Warning] 'proxies_priv' entry '@% root@mariadb-primary-0' ignored in --skip-name-resolve mode.
2022-09-21 2:16:52 0 [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.Version: '10.6.5-MariaDB-log' socket: '/opt/bitnami/mariadb/tmp/mysql.sock' port: 3306 Source distribution

 

3-2. sqldump하기

data도 조회가 되는데, tablespace 찾지 못한다는 에러가 나와요.
이 때는 table을 새로 만들고 data를 이동시켜줬던 기억이나서
sqldump로 테이블을 export하기로 했어요

급하게 sqldump를 해줍니다.

mysqldump -u root -p --all-databases > alldb.sql

Dump가 용령이 좀 커서, kubectl cp로 local에 복사해줬어요

3-3. import 하기

새로운 ns에 mariadb를 만들고, kubectl cp로 pvc에 파일 옮기고 alldb.sql을 import 해줬어요

mysql -u root -p < alldb.sql

https://stackoverflow.com/questions/9497869/export-and-import-all-mysql-databases-at-one-time

dbeaver를 통해서 root로 접근하니 데이터가 정상 조회되고,
tablespace 관련 에러도 나오지 않아요!

3-3. FLUSH PRIVILEGES 하기

뿌듯하게 밥먹으러 가려했는데, 끝난 줄 알았는데 권한 관련 작업이 남았더라고요

기존에 있던 user는 있는데 권한을 초기회해줘야 한다고 하네요.

FLUSH PRIVILEGES;
GRANT USAGE ON *.* TO `manager-***`@`%`;
GRANT ALL PRIVILEGES ON `db_***`.* TO `manager-***`@`%`;

다시 접근하니 해당 user로 접근이 가능합니다.
이제 진짜 끝인거 같은데... 또 문제 생기면 추가해야겠어요
쫄깃한 이틀이엇네요.

4. 복구 실패 사례(메모)

검색하다보니 ibdata1을 삭제해서 해결했다는 분들이 있더라고요.
대부분의 글들은 ibdata1 절대 삭제하지 말라고 하기 때문에 삭제하지 않는 것을 추천드려요
buffer, tablespace 같은 값의 정보들이 들어있어서 연결고리가 끊긴데요(저도 이 부분은 잘..)

삭제하고 실행해봣는데 schema 구조는 볼 수 있어요.
DDL이라도 건져야겠다는 분들은 추천드려요.
실제 데이터를 조회하려고 하면, "Table doesn't exist in engine" 라는 에러를 볼 수 있을 거에요

마무리

진짜 쫄깃했네요. k8s에 DB는 올리지 않기로 했는데..
올려도 백업이나 관리는 하지 않기로 했는데..
문제가 생기니 다들 저만 보고 있어서 힘들었어요.

이제는 안정적인 운영을 위해서, 책임질 수 없는 업무는 받지 않도록 해야겠어요.
급하데서 어쩔 수 없이 mariadb 하나 만들어드렸더니,
다음 프로젝트는 DB도 은근슬쩍 저희 k8s에 올리려 하더라고요..

어쩃든 데이터가 되살아나서 뿌듯합니다 :)

참고

https://stackoverflow.com/questions/38759870/xampp-mysql-table-doesnt-exist-in-engine-1932
https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=noorol&logNo=220981973355
https://indienote.tistory.com/170

728x90
반응형