trino) mysql insert 에러 발생, 방화벽 실수_could not create connection
K8S에 Trino를 올리고, 오픈 전 테스트 진행을 하고 있었어요.
요약
- trino이용해서 mysql insert하는데 `could not create connection`나옴.
- 알고보니, k8s worker 중 1개 방화벽 설정을실수함
문제
최근에, Spark Thrift Server를 이용해 진행하던 data pipeline 작업을 Trino로 이전하고 있습니다.
(hive metatstore로 s3의 parquet를 읽고, mysql에 insert하는 작업)
빠른 테스트를 위해서 trino-worker를 3개로 설정했었고, 문제 없다는 것도 확인했죠.
그런데 trino-worker 9개로 늘리고 퇴근을 했더니 Airflow에서 에러를 마구 뱉습니다.
Select는 되는데, Insert할 때만 에러를 뱉더라고요.
<에러 로그 전체>
Exception in thread "main" java.sql.SQLException: Query failed (#20200814_082036_00092_pd6az): Could not create connection to database server. Attempted reconnect 3 times. Giving up.
at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1754)
at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:225)
at io.prestosql.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:185)
at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
at io.prestosql.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at io.prestosql.jdbc.PrestoResultSet$AsyncIterator.lambda$new$0(PrestoResultSet.java:131)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1800)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: io.prestosql.spi.PrestoException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
at io.prestosql.plugin.jdbc.JdbcPageSink.(JdbcPageSink.java:57)
at io.prestosql.plugin.jdbc.JdbcPageSinkProvider.createPageSink(JdbcPageSinkProvider.java:47)
at io.prestosql.split.PageSinkManager.createPageSink(PageSinkManager.java:61)
at io.prestosql.operator.TableWriterOperator$TableWriterOperatorFactory.createPageSink(TableWriterOperator.java:120)
at io.prestosql.operator.TableWriterOperator$TableWriterOperatorFactory.createOperator(TableWriterOperator.java:111)
at io.prestosql.operator.DriverFactory.createDriver(DriverFactory.java:114)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunnerFactory.createDriver(SqlTaskExecution.java:942)
at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1070)
at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.prestosql.$gen.Presto_340____20200814_080750_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at co m.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at co m.mysql.jdbc.Util.getInstance(Util.java:408)
at co m.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
at co m.mysql.jdbc.SQLError.createSQLException(SQLError.java:898)
at co m.mysql.jdbc.SQLError.createSQLException(SQLError.java:887)
at co m.mysql.jdbc.SQLError.createSQLException(SQLError.java:861)
at co m.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2095)
at co m.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2020)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:768)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at jdk.internal.reflect.GeneratedConstructorAccessor135.newInstance(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at co m.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at co m.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:385)
at co m.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:323)
at io.prestosql.plugin.jdbc.DriverConnectionFactory.openConnection(DriverConnectionFactory.java:73)
at io.prestosql.plugin.jdbc.jmx.StatisticsAwareConnectionFactory.lambda$openConnection$0(StatisticsAwareConnectionFactory.java:42)
at io.prestosql.plugin.jdbc.jmx.JdbcApiStats.wrap(JdbcApiStats.java:35)
at io.prestosql.plugin.jdbc.jmx.StatisticsAwareConnectionFactory.openConnection(StatisticsAwareConnectionFactory.java:42)
at io.prestosql.plugin.jdbc.BaseJdbcClient.getConnection(BaseJdbcClient.java:683)
at io.prestosql.plugin.jdbc.ForwardingJdbcClient.getConnection(ForwardingJdbcClient.java:184)
at io.prestosql.plugin.jdbc.jmx.StatisticsAwareJdbcClient.lambda$getConnection$24(StatisticsAwareJdbcClient.java:231)
at io.prestosql.plugin.jdbc.jmx.JdbcApiStats.wrap(JdbcApiStats.java:35)
at io.prestosql.plugin.jdbc.jmx.StatisticsAwareJdbcClient.getConnection(StatisticsAwareJdbcClient.java:231)
at io.prestosql.plugin.jdbc.CachingJdbcClient.getConnection(CachingJdbcClient.java:241)
at io.prestosql.plugin.jdbc.JdbcPageSink.(JdbcPageSink.java:54)
... 13 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at co m.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at co m.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:335)
at co m.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2187)
at co m.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2036)
... 33 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at co m.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:211)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:299)
... 35 more
Exception in thread "main" java.sql.SQLException: Query failed (#20200814_082036_00092_pd6az): Could not create connection to database server. Attempted reconnect 3 times. Giving up.
(생략)
Caused by: io.prestosql.spi.PrestoException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
(생략)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
원인파악
1. Mysql에 문제가 있나? -> ㄴㄴ
mysql에 직접 접속해서 insert를 해봤는데 특이사항이 없었어요.
2. worker 갯수가 많으면 부하가 큰 작업에 영향을 줄까? -> ㄴㄴ
coordinator가 trino-worker들에 던져주는거니까, 자체 관리되는거라고 판단했어요
3. 그러면 worker를 늘렸는데 왜 문제가 생기는걸까?
trino-worker를 하나씩 줄여가면서, 에러가 발생하지 않는 시점을 찾았어요.
9 -> 8 -> 7 -> 6 줄여가다가, 6개 일때는 에러가 나지 않더라고요.
다시 7개로 늘려줬는데 k8s-worker4에 trino-worker가 실행되는 것을 확인했어요.
해당 trino-worker pod의 logs를 보니까. 문제의 log가 나오는 걸 확인했습니다.
`Could not create connection to database server. Attempted reconnect 3 times. Giving up.`
nodeaffinity로 해당 k8s worker에 pod 이생기지 않게 설정했더니 문제가 발생하지 않았어요.
혹시? 하고 k8s-worker node에서 telnet을 날려보니 응답이 없더라고요ㅜ
해결
k8s worker node 1개에 방화벽을 새로 설정해줬습니다.
후기
이제 문제는 해결되었고, 회고를 해보려고해요
처음에 로그는 이 한 줄 뿐이 들어오지 않았어요
`Could not create connection to database server. Attempted reconnect 3 times. Giving up.`
이제 두 눈을 크게 뜨고 범위를 넓혀봅니다.
jdbc 관련 에러가 보이고 소켓이 눈에 들어오는군요
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at co m.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:211)
at com.mysql.jdbc.MysqlIO.(MysqlIO.java:299)
더 눈을 크게 떠봅니다.
소켓 에러를 보고, 다시 보니 패킷이 전달되지 않았다는 로그가 보이는군요.
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
처음부터 네트워크를 의심했어야되는데 참 멍청했네요
문제 해결한 시간보다 블로그 글을 쓰는데 더 오랜 시간이 걸렸는데,
다음에는 당황하지 않고 에러 로그를 침착하게 보자는 의미에서 썼습니다ㅜ